pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training¶
Classes¶
Attributes |
Functions¶
|
|
|
Module Contents¶
- pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.generate_tags(tokens, start, end, scheme)¶
- pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.load_tokens(data)¶
- class pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.data_utils_for_training.ASTEDataset(config, tokenizer, dataset_type='train')¶
Bases:
pyabsa.framework.dataset_class.dataset_template.PyABSADataset- Attributes
data: a list of the loaded and preprocessed data samples.
- Methods
__init__(self, config, tokenizer, dataset_type, **kwargs): constructs a new PyABSADataset object by loading and preprocessing a dataset based on the given configuration and dataset type. config is a configuration object containing the settings for loading and preprocessing the dataset, tokenizer is a pre-trained tokenizer object to tokenize the text data, and dataset_type is the type of the dataset to load (e.g., “train”, “dev”, “test”). Additional keyword arguments can be passed to customize the loading and preprocessing behavior. covert_to_tensor(data): a static method that converts the preprocessed data samples to PyTorch tensors. load_data_from_dict(self, dataset_dict, dataset_type, **kwargs): loads the dataset from a dictionary object containing the preprocessed data. dataset_dict is the dictionary object, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. load_data_from_file(self, dataset_file, dataset_type, **kwargs): loads the dataset from a file containing the preprocessed data. dataset_file is the file path, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. get_labels(self): returns a list of the labels for each data sample in the dataset. __len__(self): returns the number of data samples in the dataset. __str__(self): returns a string representation of the dataset. __repr__(self): returns a string representation of the dataset.
- all_tokens = []¶
- all_deprel = []¶
- all_postag = []¶
- all_postag_ca = []¶
- all_max_len = []¶
- labels = ['N', 'B-A', 'I-A', 'A', 'B-O', 'I-O', 'O', 'Negative', 'Neutral', 'Positive']¶
- load_data_from_dict(data_dict, **kwargs)¶
Load the dataset from a dictionary. :param dataset_dict: A dictionary containing the dataset. :param dataset_type: The type of the dataset, which can be “train”, “dev”, or “test”. :param kwargs: Additional arguments for loading the dataset, such as “text_column”, “aspect_column”, “label_column”, “separator”, and “data_path”.
- load_data_from_file(file_path, **kwargs)¶
Load data from a file.
- Parameters:
dataset_file – The file to load data from.
dataset_type – The type of dataset to load, e.g. “train”, “test”, “dev”.
kwargs – Optional additional arguments for loading data.
- nlp¶
- __getitem__(index)¶
Get a data sample from the dataset at a specific index. :param index: The index of the data sample to retrieve. :return: A dictionary representing a data sample, with keys “text”, “aspect”, and “label”.
- __len__()¶
Get the number of data samples in the dataset. :return: The number of data samples in the dataset.
- convert_examples_to_features()¶
- get_syntax_annotation(sentence, annotation)¶
- generate_tags(tokens, start, end, scheme)¶
- get_dependencies(tokens)¶
- get_vocabs()¶