data_utils_for_training

Module Contents

Classes

GloVeRNACDataset

Attributes

class data_utils_for_training.GloVeRNACDataset(config, tokenizer, dataset_type='train', **kwargs)[source]

Bases: pyabsa.framework.dataset_class.dataset_template.PyABSADataset

Attributes

data: a list of the loaded and preprocessed data samples.

Methods

__init__(self, config, tokenizer, dataset_type, **kwargs): constructs a new PyABSADataset object by loading and preprocessing a dataset based on the given configuration and dataset type. config is a configuration object containing the settings for loading and preprocessing the dataset, tokenizer is a pre-trained tokenizer object to tokenize the text data, and dataset_type is the type of the dataset to load (e.g., “train”, “dev”, “test”). Additional keyword arguments can be passed to customize the loading and preprocessing behavior. covert_to_tensor(data): a static method that converts the preprocessed data samples to PyTorch tensors. load_data_from_dict(self, dataset_dict, dataset_type, **kwargs): loads the dataset from a dictionary object containing the preprocessed data. dataset_dict is the dictionary object, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. load_data_from_file(self, dataset_file, dataset_type, **kwargs): loads the dataset from a file containing the preprocessed data. dataset_file is the file path, dataset_type is the type of the dataset to load, and additional keyword arguments can be passed to customize the loading behavior. get_labels(self): returns a list of the labels for each data sample in the dataset. __len__(self): returns the number of data samples in the dataset. __str__(self): returns a string representation of the dataset. __repr__(self): returns a string representation of the dataset.

load_data_from_dict(dataset_dict, **kwargs)[source]

Load the dataset from a dictionary. :param dataset_dict: A dictionary containing the dataset. :param dataset_type: The type of the dataset, which can be “train”, “dev”, or “test”. :param kwargs: Additional arguments for loading the dataset, such as “text_column”, “aspect_column”, “label_column”, “separator”, and “data_path”.

load_data_from_file(dataset_file, **kwargs)[source]

Load data from a file.

Parameters:
  • dataset_file – The file to load data from.

  • dataset_type – The type of dataset to load, e.g. “train”, “test”, “dev”.

  • kwargs – Optional additional arguments for loading data.

__getitem__(index)[source]

Get a data sample from the dataset at a specific index. :param index: The index of the data sample to retrieve. :return: A dictionary representing a data sample, with keys “text”, “aspect”, and “label”.

__len__()[source]

Get the number of data samples in the dataset. :return: The number of data samples in the dataset.