`pyabsa.utils.absa_utils.absa_utils`

Module Contents

`generate_inference_set_for_apc`(dataset_path)	Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.
`is_similar`(→ bool)	Determines if two strings are similar based on the number of common tokens they share.
`assemble_aspects`(fname[, use_tokenizer])	Preprocesses the input file, groups sentences with similar aspects, and generates samples with the corresponding aspect labels and polarities.
`split_aspects`(sentence)	Splits a sentence into multiple aspects, each with its own context and polarity.
`convert_atepc`(fname, use_tokenizer)	Converts the input file to the Aspect Term Extraction and Polarity Classification (ATEPC) format.
`convert_apc_set_to_atepc_set`(path[, use_tokenizer])	Converts APC dataset to ATEPC dataset.
`refactor_chinese_dataset`(fname, train_fname, test_fname)	Refactors the Chinese dataset by splitting it into train and test sets and converting it into the ATEPC format.
`detect_error_in_dataset`(dataset)	Detects errors in a given dataset by checking if the sentences with similar aspects have different lengths.

pyabsa.utils.absa_utils.absa_utils.generate_inference_set_for_apc(dataset_path)[source]: Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.

pyabsa.utils.absa_utils.absa_utils.is_similar(s1: str, s2: str) → bool[source]: Determines if two strings are similar based on the number of common tokens they share. :param s1: string 1 :param s2: string 2 :return: True if strings are similar, False otherwise

pyabsa.utils.absa_utils.absa_utils.assemble_aspects(fname, use_tokenizer=False)[source]

Preprocesses the input file, groups sentences with similar aspects, and generates samples with the corresponding aspect labels and polarities.

Parameters:

fname (str) – The filename to be preprocessed
use_tokenizer (bool, optional) – Whether to use a tokenizer, defaults to False

Returns:

A list of samples

Return type:

list

pyabsa.utils.absa_utils.absa_utils.split_aspects(sentence)[source]: Splits a sentence into multiple aspects, each with its own context and polarity. :param sentence: input sentence with multiple aspects :return: list of tuples containing single aspect with its context and polarity

pyabsa.utils.absa_utils.absa_utils.convert_atepc(fname, use_tokenizer)[source]: Converts the input file to the Aspect Term Extraction and Polarity Classification (ATEPC) format. :param fname: filename :param use_tokenizer: whether to use a tokenizer

pyabsa.utils.absa_utils.absa_utils.convert_apc_set_to_atepc_set(path, use_tokenizer=False)[source]: Converts APC dataset to ATEPC dataset. :param path: path to the dataset :param use_tokenizer: whether to use a tokenizer

pyabsa.utils.absa_utils.absa_utils.refactor_chinese_dataset(fname, train_fname, test_fname)[source]: Refactors the Chinese dataset by splitting it into train and test sets and converting it into the ATEPC format. :param fname: the name of the dataset file :param train_fname: the name of the output train file :param test_fname: the name of the output test file

pyabsa.utils.absa_utils.absa_utils.detect_error_in_dataset(dataset)[source]: Detects errors in a given dataset by checking if the sentences with similar aspects have different lengths. :param dataset: dataset file name