Configuration Guide¶
PyABSA provides a flexible configuration system that allows you to customize every aspect of model training, evaluation, and inference. This guide explains how to work with configurations effectively.
Understanding Configuration Objects¶
Configuration objects in PyABSA are Python namespace objects that contain all the parameters needed for a specific task. They provide default values that work well out of the box, but can be easily customized for your specific needs.
Checking Available Attributes¶
To see what attributes are available in a configuration object, you can use:
# Check if an attribute exists
config.get('attribute_name', None)
# Print all attributes (for debugging)
print(vars(config))
Setting and Getting Values¶
Configuration attributes can be accessed and modified like regular Python object attributes:
# Set a value
config.learning_rate = 1e-5
# Get a value
print(config.learning_rate)
# Set any custom attribute
config.my_custom_parameter = "custom_value"
Getting Configuration Objects¶
Each task in PyABSA provides a configuration manager with pre-defined templates.
Aspect Polarity Classification (APC)¶
from pyabsa import AspectPolarityClassification as APC
# Get default configurations for different languages
config_english = APC.APCConfigManager.get_apc_config_english()
config_chinese = APC.APCConfigManager.get_apc_config_chinese()
config_multilingual = APC.APCConfigManager.get_apc_config_multilingual()
Text Classification¶
from pyabsa import TextClassification as TC
# Get default text classification configuration
config = TC.TCConfigManager.get_tc_config_english()
Aspect Term Extraction (ATE)¶
from pyabsa import AspectTermExtraction as ATE
# Get default ATE configuration
config = ATE.ATEConfigManager.get_ate_config_english()
Common Configuration Parameters¶
Here are the most frequently used configuration parameters across different tasks:
Model and Architecture¶
# Choose the model architecture
config.model = APC.APCModelList.FAST_LSA_T_V2
# Set the pre-trained backbone model
config.pretrained_bert = 'microsoft/deberta-v3-base'
# Maximum sequence length
config.max_seq_len = 80
Training Parameters¶
# Training epochs
config.num_epoch = 10
# When to start evaluation during training
config.evaluate_begin = 2
# Learning rate
config.learning_rate = 1e-5
# L2 regularization
config.l2reg = 1e-8
# Dropout rate
config.dropout = 0.5
# Batch size
config.train_batch_size = 16
config.eval_batch_size = 32
# Random seed for reproducibility
config.seed = 42
Performance Optimization¶
# Use Automatic Mixed Precision for faster training
config.use_amp = True
# Cache dataset in memory for faster data loading
config.cache_dataset = True
# Number of workers for data loading
config.num_workers = 4
# Device selection (auto-detected by default)
config.device = 'cuda' # or 'cpu'
Logging and Saving¶
# How often to print training logs
config.log_step = 100
# Model saving options
config.model_path_to_save = './checkpoints'
# Whether to save the model state dict only
config.save_mode = 1 # 0: full model, 1: state dict only
Advanced Configuration Examples¶
Fine-tuning a Pre-trained Model¶
from pyabsa import AspectPolarityClassification as APC
config = APC.APCConfigManager.get_apc_config_english()
# Use a more powerful backbone
config.pretrained_bert = 'microsoft/deberta-v3-large'
# Adjust learning rate for large models
config.learning_rate = 5e-6
# Increase max sequence length for longer texts
config.max_seq_len = 128
# Use gradient accumulation for large batch effects
config.gradient_accumulation_steps = 2
Training for High Performance¶
# Configuration for achieving best possible results
config.num_epoch = 20
config.evaluate_begin = 5
config.learning_rate = 1e-5
config.l2reg = 1e-8
config.dropout = 0.1
# Enable data augmentation
config.load_aug = True
# Use ensemble during evaluation
config.use_ensemble_inference = True
Fast Prototyping Configuration¶
# Configuration for quick experimentation
config.num_epoch = 3
config.evaluate_begin = 1
config.max_seq_len = 64
config.train_batch_size = 32
config.cache_dataset = True
config.use_amp = True
Task-Specific Configurations¶
For Aspect Term Extraction¶
from pyabsa import AspectTermExtraction as ATE
config = ATE.ATEConfigManager.get_ate_config_english()
# ATE-specific parameters
config.window_size = 5 # Context window for aspect detection
config.use_syntax_based_srd = True # Use syntax-based semantic relative distance
For Text Classification¶
from pyabsa import TextClassification as TC
config = TC.TCConfigManager.get_tc_config_english()
# TC-specific parameters
config.use_bert_spc = True # Use BERT for sentence pair classification
config.class_dim = 3 # Number of classes in your classification task
Saving and Loading Configurations¶
You can save your custom configurations for later use:
import pickle
# Save configuration
with open('my_config.pkl', 'wb') as f:
pickle.dump(config, f)
# Load configuration
with open('my_config.pkl', 'rb') as f:
loaded_config = pickle.load(f)
Best Practices¶
Start with defaults: Use the pre-defined configuration templates as starting points
Incremental changes: Modify one parameter at a time to understand its impact
Document changes: Keep track of which parameters you’ve modified
Reproducibility: Always set a fixed seed for reproducible results
Validation: Test your configuration on a small dataset first
Troubleshooting¶
Common Issues¶
Out of memory: Reduce
train_batch_size,max_seq_len, or enableuse_ampSlow training: Enable
cache_dataset, increasenum_workers, or useuse_ampPoor performance: Try different
learning_rate, increasenum_epoch, or use a betterpretrained_bert
Getting Help¶
If you encounter issues with configurations, check:
The parameter spelling and type
Compatibility between different parameters
Hardware limitations (GPU memory, CPU cores)
The PyABSA GitHub issues for similar problems