pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training¶

Classes¶

ASTEDataset

Attributes

Functions¶

`generate_tags`(tokens, start, end, scheme)
`load_tokens`(data)

Module Contents¶

pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.generate_tags(tokens, start, end, scheme)¶

pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.load_tokens(data)¶

class pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.ASTEDataset(config, tokenizer, dataset_type='train')¶

Bases: pyabsa.framework.dataset_class.dataset_template.PyABSADataset

Attributes: data: a list of the loaded and preprocessed data samples.
Methods: __init__(self, config, tokenizer, dataset_type, **kwargs): constructs a new PyABSADataset object by loading and preprocessing a dataset based on the given configuration and dataset type. config is a configuration object containing the settings for loading and preprocessing the dataset, tokenizer is a pre-trained tokenizer object to tokenize the text data, and dataset_type is the type of the dataset to load (e.g., “train”, “dev”, “test”). Additional keyword arguments can be passed to customize the loading and preprocessing behavior. covert_to_tensor(data): a static method that converts the preprocessed data samples to PyTorch tensors. load_data_from_dict(self, dataset_dict, dataset_type, **kwargs): loads the dataset from a dictionary object containing the preprocessed data. dataset_dict is the dictionary object, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. load_data_from_file(self, dataset_file, dataset_type, **kwargs): loads the dataset from a file containing the preprocessed data. dataset_file is the file path, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. get_labels(self): returns a list of the labels for each data sample in the dataset. __len__(self): returns the number of data samples in the dataset. __str__(self): returns a string representation of the dataset. __repr__(self): returns a string representation of the dataset.

all_tokens = []¶

all_deprel = []¶

all_postag = []¶

all_postag_ca = []¶

all_max_len = []¶

labels = ['N', 'B-A', 'I-A', 'A', 'B-O', 'I-O', 'O', 'Negative', 'Neutral', 'Positive']¶

load_data_from_dict(data_dict, **kwargs)¶: Load the dataset from a dictionary. :param dataset_dict: A dictionary containing the dataset. :param dataset_type: The type of the dataset, which can be “train”, “dev”, or “test”. :param kwargs: Additional arguments for loading the dataset, such as “text_column”, “aspect_column”, “label_column”, “separator”, and “data_path”.

load_data_from_file(file_path, **kwargs)¶

Load data from a file.

Parameters:

dataset_file – The file to load data from.
dataset_type – The type of dataset to load, e.g. “train”, “test”, “dev”.
kwargs – Optional additional arguments for loading data.

nlp¶

__getitem__(index)¶: Get a data sample from the dataset at a specific index. :param index: The index of the data sample to retrieve. :return: A dictionary representing a data sample, with keys “text”, “aspect”, and “label”.

__len__()¶: Get the number of data samples in the dataset. :return: The number of data samples in the dataset.

convert_examples_to_features()¶

get_syntax_annotation(sentence, annotation)¶

generate_tags(tokens, start, end, scheme)¶

get_dependencies(tokens)¶

get_vocabs()¶