pyabsa

Subpackages

Package Contents

Classes

TaskNameOption

A dictionary subclass that maps task codes to task names.

TaskCodeOption

A class that defines task codes for various tasks.

LabelPaddingOption

A class that defines label padding options.

ModelSaveOption

A class that defines options for saving models.

ProxyAddressOption

A class that defines proxy address options.

DeviceTypeOption

A class that defines device type options.

DatasetItem

Built-in mutable sequence.

DatasetDict

dict() -> new empty dictionary

APCCheckpointManager

ATEPCCheckpointManager

This class manages the checkpoints for Aspect Term Extraction and Polarity Classification.

ASTECheckpointManager

This class manages the checkpoints for Aspect Sentiment Term Extraction.

TCCheckpointManager

This class manages the checkpoints for text classification.

TADCheckpointManager

This class manages the checkpoints for text adversarial defense.

RNACCheckpointManager

This class manages the checkpoints for RNA sequence classification.

RNARCheckpointManager

This class manages the checkpoints for RNA sequence regression.

APCDatasetList

The following datasets are for aspect polarity classification task.

Functions

check_emergency_notification()

Check if there is any emergency notification from PyABSA

make_ABSA_dataset(dataset_name_or_path[, checkpoint])

Make APC and ATEPC datasets for PyABSA, using aspect extractor from PyABSA to automatically build datasets. This method WILL NOT give you the best performance but is quite fast and labor-free.

generate_inference_set_for_apc(dataset_path)

Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.

convert_apc_set_to_atepc_set(path[, use_tokenizer])

Converts APC dataset to ATEPC dataset.

download_all_available_datasets(**kwargs)

Download datasets from GitHub

download_dataset_by_name([task_code, dataset_name])

If download all datasets failed, try to download dataset by name from Huggingface

load_dataset_from_file(fname, config)

Loads a dataset from one or multiple files.

available_checkpoints(→ Union[Dict[str, Any], ...)

Retrieves the available checkpoints for a given task.

download_checkpoint(→ str)

Download a pretrained checkpoint for a given task and language.

meta_load(path, **kwargs)

Load data from a file, which can be plain text, json file, Excel file,

meta_save(data, path, **kwargs)

Save data to a pickle file, which can be plain text, json file, Excel file,

clean()

validate_pyabsa_version()

query_release_notes(**kwargs)

check_pyabsa_update()

check_package_version(min_version[, max_version])

Attributes

__name__

__version__

PyABSAMaterialHostAddress

ABSADatasetList

pyabsa.__name__ = 'pyabsa'[source]
pyabsa.__version__ = '2.4.1.post1'[source]
pyabsa.check_emergency_notification()[source]

Check if there is any emergency notification from PyABSA

class pyabsa.TaskNameOption[source]

Bases: dict

A dictionary subclass that maps task codes to task names.

code2name
get(key)[source]

Get the task name from the task code. :param key: The task code. :return: The task name.

class pyabsa.TaskCodeOption[source]

A class that defines task codes for various tasks.

Aspect_Polarity_Classification = 'APC'
Aspect_Term_Extraction_and_Classification = 'ATEPC'
Aspect_Sentiment_Triplet_Extraction = 'ASTE'
Sentiment_Analysis = 'TC'
Text_Classification = 'TC'
Text_Adversarial_Defense = 'TAD'
RNASequenceClassification = 'RNAC'
RNASequenceRegression = 'RNAR'
ProteinSequenceRegression = 'PR'
CodeDefectDetection = 'CDD'
Aspect_Category_Opinion_Sentiment_Triplet_Extraction = 'ACOS'
Universal_Sentiment_Analysis = 'USA'
class pyabsa.LabelPaddingOption[source]

A class that defines label padding options.

SENTIMENT_PADDING
LABEL_PADDING
class pyabsa.ModelSaveOption[source]

A class that defines options for saving models.

DO_NOT_SAVE_MODEL = 0
SAVE_MODEL_STATE_DICT = 1
SAVE_FULL_MODEL = 2
SAVE_FINE_TUNED_PLM = 3
class pyabsa.ProxyAddressOption[source]

A class that defines proxy address options.

CN_GITHUB_MIRROR = 'https://gitee.com/'
class pyabsa.DeviceTypeOption[source]

A class that defines device type options.

AUTO = True
CPU = 'cpu'
CUDA = 'cuda'
ALL_CUDA = 'allcuda'
pyabsa.PyABSAMaterialHostAddress = 'https://huggingface.co/spaces/yangheng/PyABSA/'[source]
class pyabsa.DatasetItem(dataset_name, dataset_items=None)[source]

Bases: list

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

pyabsa.make_ABSA_dataset(dataset_name_or_path, checkpoint='english')[source]

Make APC and ATEPC datasets for PyABSA, using aspect extractor from PyABSA to automatically build datasets. This method WILL NOT give you the best performance but is quite fast and labor-free. The names of dataset files to be processed should end with ‘.raw.ignore’. The files will be processed and saved to the same directory. The files will be overwritten if they already exist. The data in the dataset files will be plain text row by row.

For obtaining the best performance, you should use DPT tool in ABSADatasets to manually annotate the dataset files, which can be found in the following link: https://github.com/yangheng95/ABSADatasets/tree/v1.2/DPT . This tool should be downloaded and run on a browser.

is much more time-consuming. :param dataset_name_or_path: The name of the dataset to be processed. If the name is a directory, all files in the directory will be processed. If it is a file, only the file will be processed. If it is a directory name, I use the findfile to find all files in the directory. :param checkpoint: Which checkpoint to use. Basically, You can select from {‘multilingual’, ‘english’, ‘chinese’}, Default is ‘english’. :return:

pyabsa.generate_inference_set_for_apc(dataset_path)[source]

Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.

pyabsa.convert_apc_set_to_atepc_set(path, use_tokenizer=False)[source]

Converts APC dataset to ATEPC dataset. :param path: path to the dataset :param use_tokenizer: whether to use a tokenizer

pyabsa.download_all_available_datasets(**kwargs)[source]

Download datasets from GitHub :param kwargs: other arguments

pyabsa.download_dataset_by_name(task_code: pyabsa.framework.flag_class.TaskCodeOption | str = TaskCodeOption.Aspect_Polarity_Classification, dataset_name: pyabsa.utils.data_utils.dataset_item.DatasetItem | str = None, **kwargs)[source]

If download all datasets failed, try to download dataset by name from Huggingface Download dataset from Huggingface: https://huggingface.co/spaces/yangheng/PyABSA :param task_code: task code -> e.g., TaskCodeOption.Aspect_Polarity_Classification :param dataset_name: dataset name -> e.g, pyabsa.tasks.AspectPolarityClassification.APCDatasetList.Laptop14

pyabsa.load_dataset_from_file(fname, config)[source]

Loads a dataset from one or multiple files.

Parameters:
  • fname (str or List[str]) – The name of the file(s) containing the dataset.

  • config (dict) – The configuration dictionary containing the logger (optional) and the maximum number of data to load (optional).

Returns:

A list of strings containing the loaded dataset.

Raises:

ValueError – If an empty line is found in the dataset.

pyabsa.available_checkpoints(task_code: pyabsa.framework.flag_class.TaskCodeOption = None, show_ckpts: bool = False) Dict[str, Any] | Dict[str, Dict[str, Any]][source]

Retrieves the available checkpoints for a given task.

Parameters:
  • task_code – The code of the task. It should be one of the constants in TaskCodeOption, e.g. TaskCodeOption.Aspect_Polarity_Classification. see TaskCodeOption: from pyabsa import TaskCodeOption TaskCodeOption.Aspect_Polarity_Classification TaskCodeOption.Aspect_Term_Extraction_and_Classification TaskCodeOption.Sentiment_Analysis TaskCodeOption.Text_Classification TaskCodeOption.Text_Adversarial_Defense

  • show_ckpts – A flag indicating whether to show detailed information about the checkpoints.

  • task_code

  • show_ckpts – show all checkpoints

Returns:

A dictionary with the available checkpoints for the specified task. If no task code is provided, a dictionary with all available checkpoints is returned.

pyabsa.download_checkpoint(task: str, language: str, checkpoint: dict) str[source]

Download a pretrained checkpoint for a given task and language. The download_checkpoint() function downloads a checkpoint from a given URL using the requests library. It saves the downloaded checkpoint to a temporary directory with a name that corresponds to the task and language. If the checkpoint has already been downloaded and saved in the temporary directory, the function simply returns the directory path. The function then unzips the downloaded checkpoint file, removes the zip file and returns the directory path of the unzipped checkpoint. If the download is unsuccessful, a ConnectionError is raised.

Parameters:
  • task – A string representing the task to download the checkpoint for (e.g. “sentiment_analysis”).

  • language – A string representing the language to download the checkpoint for (e.g. “english”).

  • checkpoint – A dictionary containing the information about the checkpoint to download.

Returns:

A string representing the path to the downloaded checkpoint.

class pyabsa.DatasetDict(*args, **kwargs)[source]

Bases: dict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

class pyabsa.APCCheckpointManager[source]

Bases: CheckpointManager

static get_sentiment_classifier(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.AspectPolarityClassification.SentimentClassifier[source]

Returns a pre-trained aspect sentiment classification model.

Parameters:
  • checkpoint (Union[str, Path], optional) – A string specifying the path to a checkpoint or the name of a checkpoint registered in PyABSA. If None, the default checkpoint is used.

  • **kwargs – Additional keyword arguments.

Returns:

A pre-trained aspect sentiment classification model.

Return type:

SentimentClassifier

Example

from pyabsa import APCCheckpointManager

sentiment_classifier = APCCheckpointManager.get_sentiment_classifier()

class pyabsa.ATEPCCheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for Aspect Term Extraction and Polarity Classification.

static get_aspect_extractor(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.AspectTermExtraction.AspectExtractor[source]

Get an AspectExtractor object initialized with the given checkpoint for Aspect Term Extraction and Polarity Classification.

Parameters:
  • checkpoint – A string or Path object indicating the path to the checkpoint or a zip file containing the checkpoint. If the checkpoint is not registered in PyABSA, it should be the name of the checkpoint queried from Google Drive.

  • kwargs – Additional keyword arguments to be passed to the function.

Returns:

An AspectExtractor object initialized with the given checkpoint.

class pyabsa.ASTECheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for Aspect Sentiment Term Extraction.

static get_aspect_sentiment_triplet_extractor(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.AspectSentimentTripletExtraction.AspectSentimentTripletExtractor[source]

Get an AspectExtractor object initialized with the given checkpoint for Aspect Sentiment Term Extraction.

Parameters:
  • checkpoint – A string or Path object indicating the path to the checkpoint or a zip file containing the checkpoint. If the checkpoint is not registered in PyABSA, it should be the name of the checkpoint queried from Google Drive.

  • kwargs – Additional keyword arguments to be passed to the AspectExtractor constructor.

Returns:

An AspectExtractor object initialized with the given checkpoint.

class pyabsa.TCCheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for text classification.

static get_text_classifier(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.TextClassification.TextClassifier[source]

Returns a TextClassifier instance loaded with a pre-trained checkpoint for text classification.

Parameters:
  • checkpoint (Union[str, Path], optional) – The name of a zipped checkpoint file, a path to a checkpoint file, or the name of a checkpoint registered in PyABSA. If None, the latest version of the default checkpoint will be used. Defaults to None.

  • **kwargs – Additional keyword arguments. Not used in this method.

Returns:

A TextClassifier instance loaded with the specified checkpoint.

Return type:

TextClassifier

class pyabsa.TADCheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for text adversarial defense.

get_tad_text_classifier(**kwargs) pyabsa.tasks.TextAdversarialDefense.TADTextClassifier[source]

Return a TADTextClassifier object initialized with the specified checkpoint.

Parameters:

checkpoint (Union[str, Path], optional) – The path to the checkpoint, the name of the zipped checkpoint, or the name of the checkpoint queried from Google Drive. Defaults to None.

Returns:

A TADTextClassifier object initialized with the given checkpoint.

Return type:

TADTextClassifier

class pyabsa.RNACCheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for RNA sequence classification.

static get_rna_classifier(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.RNAClassification.RNAClassifier[source]

This method returns an instance of the RNAClassifier class with a parsed checkpoint for RNA sequence classification.

Parameters:
  • checkpoint (Union[str, Path], optional) – The name of the zipped checkpoint or the path to the checkpoint file. If not provided, the default checkpoint will be used. Defaults to None.

  • **kwargs – Additional keyword arguments.

Returns:

An instance of the RNAClassifier class with a parsed checkpoint for RNA sequence classification.

Return type:

RNAClassifier

Raises:

ValueError – If the provided checkpoint is not found.

class pyabsa.RNARCheckpointManager[source]

Bases: CheckpointManager

This class manages the checkpoints for RNA sequence regression.

static get_rna_regressor(checkpoint: str | pathlib.Path = None, **kwargs) pyabsa.tasks.RNARegression.RNARegressor[source]

Loads a pre-trained checkpoint for RNA sequence regression and returns an instance of the RNARegressor class that is ready to make predictions.

Parameters:

checkpoint (Union[str, Path]) – (Optional) The name of a zipped checkpoint file, the path to a checkpoint file, or the name of a checkpoint file that can be found in Google Drive. If checkpoint is not provided, the default checkpoint for RNA sequence regression will be loaded.

Returns:

An instance of the RNARegressor class that has been initialized with the specified checkpoint file.

Return type:

RNARegressor

class pyabsa.APCDatasetList[source]

Bases: list

The following datasets are for aspect polarity classification task. The datasets are collected from different sources, you can use the id to locate the dataset.

Laptop14
Restaurant14
ARTS_Laptop14
ARTS_Restaurant14
Restaurant15
Restaurant16
ACL_Twitter
MAMS
Television
TShirt
Yelp
Phone
Car
Notebook
Camera
Shampoo
MOOC
MOOC_En
Kaggle
Chinese_Zhang
Chinese
Binary_Polarity_Chinese
Triple_Polarity_Chinese
SemEval2016Task5
Arabic_SemEval2016Task5
Dutch_SemEval2016Task5
Spanish_SemEval2016Task5
Turkish_SemEval2016Task5
Russian_SemEval2016Task5
French_SemEval2016Task5
English_SemEval2016Task5
English
SemEval
Restaurant
Multilingual
pyabsa.meta_load(path, **kwargs)[source]
Load data from a file, which can be plain text, json file, Excel file,

pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls

Parameters:
  • path (str) – The path to the file.

  • kwargs – Other arguments for the corresponding load function.

Returns:

The loaded data.

pyabsa.meta_save(data, path, **kwargs)[source]
Save data to a pickle file, which can be plain text, json file, Excel file,

pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls

Parameters:
  • data – The data to be saved.

  • path (str) – The path to the file.

  • kwargs – Other arguments for the corresponding save function.

pyabsa.clean()[source]
pyabsa.ABSADatasetList[source]
pyabsa.validate_pyabsa_version()[source]
pyabsa.query_release_notes(**kwargs)[source]
pyabsa.check_pyabsa_update()[source]
pyabsa.check_package_version(min_version, max_version=None)[source]