pyabsa
Subpackages
pyabsa.augmentation
pyabsa.framework
pyabsa.framework.checkpoint_class
pyabsa.framework.configuration_class
pyabsa.framework.dataset_class
pyabsa.framework.flag_class
pyabsa.framework.instructor_class
pyabsa.framework.prediction_class
pyabsa.framework.predictor_class
pyabsa.framework.sampler_class
pyabsa.framework.tokenizer_class
pyabsa.framework.trainer_class
pyabsa.networks
pyabsa.tasks
pyabsa.tasks.ABSAInstruction
pyabsa.tasks.AspectPolarityClassification
pyabsa.tasks.AspectPolarityClassification.configuration
pyabsa.tasks.AspectPolarityClassification.dataset_utils
pyabsa.tasks.AspectPolarityClassification.instructor
pyabsa.tasks.AspectPolarityClassification.models
pyabsa.tasks.AspectPolarityClassification.prediction
pyabsa.tasks.AspectPolarityClassification.trainer
pyabsa.tasks.AspectSentimentTripletExtraction
pyabsa.tasks.AspectSentimentTripletExtraction.configuration
pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils
pyabsa.tasks.AspectSentimentTripletExtraction.instructor
pyabsa.tasks.AspectSentimentTripletExtraction.models
pyabsa.tasks.AspectSentimentTripletExtraction.prediction
pyabsa.tasks.AspectSentimentTripletExtraction.trainer
pyabsa.tasks.AspectTermExtraction
pyabsa.tasks.CodeDefectDetection
pyabsa.tasks.RNAClassification
pyabsa.tasks.RNARegression
pyabsa.tasks.TextAdversarialDefense
pyabsa.tasks.TextClassification
pyabsa.tasks._Archive
pyabsa.tasks.__SubtaskTemplate__
pyabsa.utils
pyabsa.utils.absa_utils
pyabsa.utils.cache_utils
pyabsa.utils.check_utils
pyabsa.utils.data_utils
pyabsa.utils.ensemble_prediction
pyabsa.utils.exception_utils
pyabsa.utils.file_utils
pyabsa.utils.logger
pyabsa.utils.notification_utils
pyabsa.utils.proxy_utils
pyabsa.utils.text_utils
pyabsa.utils.wrappers
pyabsa.utils.pyabsa_utils
Package Contents
Classes
A dictionary subclass that maps task codes to task names. |
|
A class that defines task codes for various tasks. |
|
A class that defines label padding options. |
|
A class that defines options for saving models. |
|
A class that defines proxy address options. |
|
A class that defines device type options. |
|
Built-in mutable sequence. |
|
dict() -> new empty dictionary |
|
This class manages the checkpoints for Aspect Term Extraction and Polarity Classification. |
|
This class manages the checkpoints for Aspect Sentiment Term Extraction. |
|
This class manages the checkpoints for text classification. |
|
This class manages the checkpoints for text adversarial defense. |
|
This class manages the checkpoints for RNA sequence classification. |
|
This class manages the checkpoints for RNA sequence regression. |
|
The following datasets are for aspect polarity classification task. |
Functions
Check if there is any emergency notification from PyABSA |
|
|
Make APC and ATEPC datasets for PyABSA, using aspect extractor from PyABSA to automatically build datasets. |
|
Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets. |
|
Converts APC dataset to ATEPC dataset. |
|
Download datasets from GitHub |
|
If download all datasets failed, try to download dataset by name from Huggingface |
|
Loads a dataset from one or multiple files. |
|
Retrieves the available checkpoints for a given task. |
|
Download a pretrained checkpoint for a given task and language. |
|
Load data from a file, which can be plain text, json file, Excel file, |
|
Save data to a pickle file, which can be plain text, json file, Excel file, |
|
|
|
|
|
Attributes
- pyabsa.check_emergency_notification()[source]
Check if there is any emergency notification from PyABSA
- class pyabsa.TaskNameOption[source]
Bases:
dict
A dictionary subclass that maps task codes to task names.
- code2name
- class pyabsa.TaskCodeOption[source]
A class that defines task codes for various tasks.
- Aspect_Polarity_Classification = 'APC'
- Aspect_Term_Extraction_and_Classification = 'ATEPC'
- Aspect_Sentiment_Triplet_Extraction = 'ASTE'
- Sentiment_Analysis = 'TC'
- Text_Classification = 'TC'
- Text_Adversarial_Defense = 'TAD'
- RNASequenceClassification = 'RNAC'
- RNASequenceRegression = 'RNAR'
- ProteinSequenceRegression = 'PR'
- CodeDefectDetection = 'CDD'
- class pyabsa.LabelPaddingOption[source]
A class that defines label padding options.
- SENTIMENT_PADDING
- LABEL_PADDING
- class pyabsa.ModelSaveOption[source]
A class that defines options for saving models.
- DO_NOT_SAVE_MODEL = 0
- SAVE_MODEL_STATE_DICT = 1
- SAVE_FULL_MODEL = 2
- SAVE_FINE_TUNED_PLM = 3
- class pyabsa.ProxyAddressOption[source]
A class that defines proxy address options.
- CN_GITHUB_MIRROR = 'https://gitee.com/'
- class pyabsa.DeviceTypeOption[source]
A class that defines device type options.
- AUTO = True
- CPU = 'cpu'
- CUDA = 'cuda'
- ALL_CUDA = 'allcuda'
- class pyabsa.DatasetItem(dataset_name, dataset_items=None)[source]
Bases:
list
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
- pyabsa.make_ABSA_dataset(dataset_name_or_path, checkpoint='english')[source]
Make APC and ATEPC datasets for PyABSA, using aspect extractor from PyABSA to automatically build datasets. This method will not give you the best performance but is quite fast and labor-free. The names of dataset files to be processed should end with ‘.raw.ignore’. The files will be processed and saved to the same directory. The files will be overwritten if they already exist. The data in the dataset files will be plain text row by row.
For obtaining the best performance, you should use DPT tool in ABSADatasets to manually annotate the dataset files, which can be found in the following link: https://github.com/yangheng95/ABSADatasets/tree/v2.0/DPT. This tool should be downloaded and run on a browser, and is much more time-consuming.
- Parameters:
dataset_name_or_path (str) – The name of the dataset to be processed. If the name is a directory, all files in the directory will be processed. If it is a file, only the file will be processed. If it is a directory name, use findfile to find all files in the directory.
checkpoint (str, optional) – Which checkpoint to use. You can select from {‘multilingual’, ‘english’, ‘chinese’}. Default is ‘english’.
- Returns:
None
- pyabsa.generate_inference_set_for_apc(dataset_path)[source]
Generate inference set for APC dataset. This function only works for APC datasets located in integrated_datasets.
- pyabsa.convert_apc_set_to_atepc_set(path, use_tokenizer=False)[source]
Converts APC dataset to ATEPC dataset. :param path: path to the dataset :param use_tokenizer: whether to use a tokenizer
- pyabsa.download_all_available_datasets(**kwargs)[source]
Download datasets from GitHub :param kwargs: other arguments
- pyabsa.download_dataset_by_name(task_code: Union[pyabsa.framework.flag_class.TaskCodeOption, str] = TaskCodeOption.Aspect_Polarity_Classification, dataset_name: Union[pyabsa.utils.data_utils.dataset_item.DatasetItem, str] = None, **kwargs)[source]
If download all datasets failed, try to download dataset by name from Huggingface Download dataset from Huggingface: https://huggingface.co/spaces/yangheng/PyABSA :param task_code: task code -> e.g., TaskCodeOption.Aspect_Polarity_Classification :param dataset_name: dataset name -> e.g, pyabsa.tasks.AspectPolarityClassification.APCDatasetList.Laptop14
- pyabsa.load_dataset_from_file(fname, config)[source]
Loads a dataset from one or multiple files.
- Parameters:
fname (str or List[str]) – The name of the file(s) containing the dataset.
config (dict) – The configuration dictionary containing the logger (optional) and the maximum number of data to load (optional).
- Returns:
A list of strings containing the loaded dataset.
- Raises:
ValueError – If an empty line is found in the dataset.
- pyabsa.available_checkpoints(task_code: pyabsa.framework.flag_class.TaskCodeOption = None, show_ckpts: bool = False) Union[Dict[str, Any], Dict[str, Dict[str, Any]]] [source]
Retrieves the available checkpoints for a given task.
- Parameters:
task_code – The code of the task. It should be one of the constants in TaskCodeOption, e.g. TaskCodeOption.Aspect_Polarity_Classification. see TaskCodeOption: from pyabsa import TaskCodeOption TaskCodeOption.Aspect_Polarity_Classification TaskCodeOption.Aspect_Term_Extraction_and_Classification TaskCodeOption.Sentiment_Analysis TaskCodeOption.Text_Classification TaskCodeOption.Text_Adversarial_Defense
show_ckpts – A flag indicating whether to show detailed information about the checkpoints.
task_code –
show_ckpts – show all checkpoints
- Returns:
A dictionary with the available checkpoints for the specified task. If no task code is provided, a dictionary with all available checkpoints is returned.
- pyabsa.download_checkpoint(task: str, language: str, checkpoint: dict) str [source]
Download a pretrained checkpoint for a given task and language. The download_checkpoint() function downloads a checkpoint from a given URL using the requests library. It saves the downloaded checkpoint to a temporary directory with a name that corresponds to the task and language. If the checkpoint has already been downloaded and saved in the temporary directory, the function simply returns the directory path. The function then unzips the downloaded checkpoint file, removes the zip file and returns the directory path of the unzipped checkpoint. If the download is unsuccessful, a ConnectionError is raised.
- Parameters:
task – A string representing the task to download the checkpoint for (e.g. “sentiment_analysis”).
language – A string representing the language to download the checkpoint for (e.g. “english”).
checkpoint – A dictionary containing the information about the checkpoint to download.
- Returns:
A string representing the path to the downloaded checkpoint.
- class pyabsa.DatasetDict(*args, **kwargs)[source]
Bases:
dict
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
- class pyabsa.APCCheckpointManager[source]
Bases:
CheckpointManager
- static get_sentiment_classifier(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.AspectPolarityClassification.SentimentClassifier [source]
Returns a pre-trained aspect sentiment classification model.
- Parameters:
checkpoint (Union[str, Path], optional) – A string specifying the path to a checkpoint or the name of a checkpoint registered in PyABSA. If None, the default checkpoint is used.
**kwargs – Additional keyword arguments.
- Returns:
A pre-trained aspect sentiment classification model.
- Return type:
Example
from pyabsa import APCCheckpointManager
sentiment_classifier = APCCheckpointManager.get_sentiment_classifier()
- class pyabsa.ATEPCCheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for Aspect Term Extraction and Polarity Classification.
- static get_aspect_extractor(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.AspectTermExtraction.AspectExtractor [source]
Get an AspectExtractor object initialized with the given checkpoint for Aspect Term Extraction and Polarity Classification.
- Parameters:
checkpoint – A string or Path object indicating the path to the checkpoint or a zip file containing the checkpoint. If the checkpoint is not registered in PyABSA, it should be the name of the checkpoint queried from Google Drive.
kwargs – Additional keyword arguments to be passed to the function.
- Returns:
An AspectExtractor object initialized with the given checkpoint.
- class pyabsa.ASTECheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for Aspect Sentiment Term Extraction.
- static get_aspect_sentiment_triplet_extractor(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.AspectSentimentTripletExtraction.AspectSentimentTripletExtractor [source]
Get an AspectExtractor object initialized with the given checkpoint for Aspect Sentiment Term Extraction.
- Parameters:
checkpoint – A string or Path object indicating the path to the checkpoint or a zip file containing the checkpoint. If the checkpoint is not registered in PyABSA, it should be the name of the checkpoint queried from Google Drive.
kwargs – Additional keyword arguments to be passed to the AspectExtractor constructor.
- Returns:
An AspectExtractor object initialized with the given checkpoint.
- class pyabsa.TCCheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for text classification.
- static get_text_classifier(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.TextClassification.TextClassifier [source]
Returns a TextClassifier instance loaded with a pre-trained checkpoint for text classification.
- Parameters:
checkpoint (Union[str, Path], optional) – The name of a zipped checkpoint file, a path to a checkpoint file, or the name of a checkpoint registered in PyABSA. If None, the latest version of the default checkpoint will be used. Defaults to None.
**kwargs – Additional keyword arguments. Not used in this method.
- Returns:
A TextClassifier instance loaded with the specified checkpoint.
- Return type:
- class pyabsa.TADCheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for text adversarial defense.
- get_tad_text_classifier(**kwargs) pyabsa.tasks.TextAdversarialDefense.TADTextClassifier [source]
Return a TADTextClassifier object initialized with the specified checkpoint.
- Parameters:
checkpoint (Union[str, Path], optional) – The path to the checkpoint, the name of the zipped checkpoint, or the name of the checkpoint queried from Google Drive. Defaults to None.
- Returns:
A TADTextClassifier object initialized with the given checkpoint.
- Return type:
- class pyabsa.RNACCheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for RNA sequence classification.
- static get_rna_classifier(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.RNAClassification.RNAClassifier [source]
This method returns an instance of the RNAClassifier class with a parsed checkpoint for RNA sequence classification.
- Parameters:
checkpoint (Union[str, Path], optional) – The name of the zipped checkpoint or the path to the checkpoint file. If not provided, the default checkpoint will be used. Defaults to None.
**kwargs – Additional keyword arguments.
- Returns:
An instance of the RNAClassifier class with a parsed checkpoint for RNA sequence classification.
- Return type:
- Raises:
ValueError – If the provided checkpoint is not found.
- class pyabsa.RNARCheckpointManager[source]
Bases:
CheckpointManager
This class manages the checkpoints for RNA sequence regression.
- static get_rna_regressor(checkpoint: Union[str, pathlib.Path] = None, **kwargs) pyabsa.tasks.RNARegression.RNARegressor [source]
Loads a pre-trained checkpoint for RNA sequence regression and returns an instance of the RNARegressor class that is ready to make predictions.
- Parameters:
checkpoint (Union[str, Path]) – (Optional) The name of a zipped checkpoint file, the path to a checkpoint file, or the name of a checkpoint file that can be found in Google Drive. If checkpoint is not provided, the default checkpoint for RNA sequence regression will be loaded.
- Returns:
An instance of the RNARegressor class that has been initialized with the specified checkpoint file.
- Return type:
- class pyabsa.APCDatasetList[source]
Bases:
list
The following datasets are for aspect polarity classification task. The datasets are collected from different sources, you can use the id to locate the dataset.
- Laptop14
- Restaurant14
- ARTS_Laptop14
- ARTS_Restaurant14
- Restaurant15
- Restaurant16
- ACL_Twitter
- MAMS
- Television
- TShirt
- Yelp
- Phone
- Car
- Notebook
- Camera
- Shampoo
- MOOC
- MOOC_En
- Kaggle
- Chinese_Zhang
- Chinese
- Binary_Polarity_Chinese
- Triple_Polarity_Chinese
- SemEval2016Task5
- Arabic_SemEval2016Task5
- Dutch_SemEval2016Task5
- Spanish_SemEval2016Task5
- Turkish_SemEval2016Task5
- Russian_SemEval2016Task5
- French_SemEval2016Task5
- English_SemEval2016Task5
- English
- SemEval
- Restaurant
- Multilingual
- pyabsa.meta_load(path, **kwargs)[source]
- Load data from a file, which can be plain text, json file, Excel file,
pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls
- Parameters:
path (str) – The path to the file.
kwargs – Other arguments for the corresponding load function.
- Returns:
The loaded data.
- pyabsa.meta_save(data, path, **kwargs)[source]
- Save data to a pickle file, which can be plain text, json file, Excel file,
pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls
- Parameters:
data – The data to be saved.
path (str) – The path to the file.
kwargs – Other arguments for the corresponding save function.