pyabsa.utils.absa_utils.absa_utils
Module Contents
Functions
|
Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets. |
|
Determines if two strings are similar based on the number of common tokens they share. |
|
Preprocesses the input file, groups sentences with similar aspects, and generates samples with the corresponding aspect labels and polarities. |
|
Splits a sentence into multiple aspects, each with its own context and polarity. |
|
Converts the input file to the Aspect Term Extraction and Polarity Classification (ATEPC) format. |
|
Converts APC dataset to ATEPC dataset. |
|
Refactors the Chinese dataset by splitting it into train and test sets and converting it into the ATEPC format. |
|
Detects errors in a given dataset by checking if the sentences with similar aspects have different lengths. |
- pyabsa.utils.absa_utils.absa_utils.generate_inference_set_for_apc(dataset_path)[source]
Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.
- pyabsa.utils.absa_utils.absa_utils.is_similar(s1: str, s2: str) bool [source]
Determines if two strings are similar based on the number of common tokens they share. :param s1: string 1 :param s2: string 2 :return: True if strings are similar, False otherwise
- pyabsa.utils.absa_utils.absa_utils.assemble_aspects(fname, use_tokenizer=False)[source]
Preprocesses the input file, groups sentences with similar aspects, and generates samples with the corresponding aspect labels and polarities.
- Parameters:
fname (str) – The filename to be preprocessed
use_tokenizer (bool, optional) – Whether to use a tokenizer, defaults to False
- Returns:
A list of samples
- Return type:
list
- pyabsa.utils.absa_utils.absa_utils.split_aspects(sentence)[source]
Splits a sentence into multiple aspects, each with its own context and polarity. :param sentence: input sentence with multiple aspects :return: list of tuples containing single aspect with its context and polarity
- pyabsa.utils.absa_utils.absa_utils.convert_atepc(fname, use_tokenizer)[source]
Converts the input file to the Aspect Term Extraction and Polarity Classification (ATEPC) format. :param fname: filename :param use_tokenizer: whether to use a tokenizer
- pyabsa.utils.absa_utils.absa_utils.convert_apc_set_to_atepc_set(path, use_tokenizer=False)[source]
Converts APC dataset to ATEPC dataset. :param path: path to the dataset :param use_tokenizer: whether to use a tokenizer
- pyabsa.utils.absa_utils.absa_utils.refactor_chinese_dataset(fname, train_fname, test_fname)[source]
Refactors the Chinese dataset by splitting it into train and test sets and converting it into the ATEPC format. :param fname: the name of the dataset file :param train_fname: the name of the output train file :param test_fname: the name of the output test file