pyabsa.utils.file_utils.file_utils

Module Contents

Functions

meta_load(path, **kwargs)

Load data from a file, which can be plain text, json file, Excel file,

meta_save(data, path, **kwargs)

Save data to a pickle file, which can be plain text, json file, Excel file,

save_jsonl(data, file_path, **kwargs)

Save data to a jsonl file.

save_txt(data, file_path, **kwargs)

Save data to a plain text file.

save_json(data, file_path, **kwargs)

Save data to a json file.

save_excel(data, file_path, **kwargs)

Save data to an Excel file.

save_csv(data, file_path, **kwargs)

Save data to a csv file.

save_npy(data, file_path, **kwargs)

Save data to a numpy file.

save_torch(data, file_path, **kwargs)

Save data to a torch file.

save_pickle(data, file_path, **kwargs)

Save data to a pickle file.

load_excel(file_path, **kwargs)

Load an Excel file and return the data.

load_csv(file_path, **kwargs)

Load a csv file and return the data.

load_npy(file_path, **kwargs)

Load a numpy file and return the data.

load_torch(file_path, **kwargs)

Load a torch file and return the data.

load_pickle(file_path, **kwargs)

Load a pickle file and return the data.

load_txt(file_path)

Load a plain text file and return a list of strings.

remove_empty_line(files)

Remove empty lines from the input files.

save_json(data, file_path, **kwargs)

Save data to a json file.

load_json(file_path, **kwargs)

Load a JSON file and return a Python dictionary.

load_jsonl(file_path, **kwargs)

Load a JSONL file and return a list of Python dictionaries.

load_dataset_from_file(fname, config)

Loads a dataset from one or multiple files.

prepare_glove840_embedding(glove_path, embedding_dim, ...)

Check if the provided GloVe embedding exists, if not, search for a similar file in the current directory, or download

unzip_checkpoint(zip_path)

Unzip a checkpoint file in zip format.

save_model(config, model, tokenizer, save_path, **kwargs)

Save a trained model, configuration, and tokenizer to the specified path.

pyabsa.utils.file_utils.file_utils.meta_load(path, **kwargs)[source]
Load data from a file, which can be plain text, json file, Excel file,

pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls

Parameters:
  • path (str) – The path to the file.

  • kwargs – Other arguments for the corresponding load function.

Returns:

The loaded data.

pyabsa.utils.file_utils.file_utils.meta_save(data, path, **kwargs)[source]
Save data to a pickle file, which can be plain text, json file, Excel file,

pickle file, numpy file, torch file, pandas file, etc. File types: txt, json, pickle, npy, pkl, pt, torch, csv, xlsx, xls

Parameters:
  • data – The data to be saved.

  • path (str) – The path to the file.

  • kwargs – Other arguments for the corresponding save function.

pyabsa.utils.file_utils.file_utils.save_jsonl(data, file_path, **kwargs)[source]

Save data to a jsonl file.

pyabsa.utils.file_utils.file_utils.save_txt(data, file_path, **kwargs)[source]

Save data to a plain text file.

pyabsa.utils.file_utils.file_utils.save_json(data, file_path, **kwargs)[source]

Save data to a json file.

pyabsa.utils.file_utils.file_utils.save_excel(data, file_path, **kwargs)[source]

Save data to an Excel file.

pyabsa.utils.file_utils.file_utils.save_csv(data, file_path, **kwargs)[source]

Save data to a csv file.

pyabsa.utils.file_utils.file_utils.save_npy(data, file_path, **kwargs)[source]

Save data to a numpy file.

pyabsa.utils.file_utils.file_utils.save_torch(data, file_path, **kwargs)[source]

Save data to a torch file.

pyabsa.utils.file_utils.file_utils.save_pickle(data, file_path, **kwargs)[source]

Save data to a pickle file.

pyabsa.utils.file_utils.file_utils.load_excel(file_path, **kwargs)[source]

Load an Excel file and return the data.

pyabsa.utils.file_utils.file_utils.load_csv(file_path, **kwargs)[source]

Load a csv file and return the data.

pyabsa.utils.file_utils.file_utils.load_npy(file_path, **kwargs)[source]

Load a numpy file and return the data.

pyabsa.utils.file_utils.file_utils.load_torch(file_path, **kwargs)[source]

Load a torch file and return the data.

pyabsa.utils.file_utils.file_utils.load_pickle(file_path, **kwargs)[source]

Load a pickle file and return the data.

pyabsa.utils.file_utils.file_utils.load_txt(file_path)[source]

Load a plain text file and return a list of strings.

pyabsa.utils.file_utils.file_utils.remove_empty_line(files: str | List[str])[source]

Remove empty lines from the input files.

pyabsa.utils.file_utils.file_utils.save_json(dic, save_path)[source]

Save a Python dictionary to a JSON file.

pyabsa.utils.file_utils.file_utils.load_json(file_path, **kwargs)[source]

Load a JSON file and return a Python dictionary.

pyabsa.utils.file_utils.file_utils.load_jsonl(file_path, **kwargs)[source]

Load a JSONL file and return a list of Python dictionaries.

pyabsa.utils.file_utils.file_utils.load_dataset_from_file(fname, config)[source]

Loads a dataset from one or multiple files.

Parameters:
  • fname (str or List[str]) – The name of the file(s) containing the dataset.

  • config (dict) – The configuration dictionary containing the logger (optional) and the maximum number of data to load (optional).

Returns:

A list of strings containing the loaded dataset.

Raises:

ValueError – If an empty line is found in the dataset.

pyabsa.utils.file_utils.file_utils.prepare_glove840_embedding(glove_path, embedding_dim, config)[source]

Check if the provided GloVe embedding exists, if not, search for a similar file in the current directory, or download the 840B GloVe embedding. If none of the above exists, raise an error. :param glove_path: str, path to the GloVe embedding :param embedding_dim: int, the dimension of the embedding :param config: dict, configuration dictionary :return: str, the path to the GloVe embedding

pyabsa.utils.file_utils.file_utils.unzip_checkpoint(zip_path)[source]

Unzip a checkpoint file in zip format.

Parameters:

zip_path (str) – path to the zip file.

Returns:

path to the unzipped checkpoint directory.

Return type:

str

pyabsa.utils.file_utils.file_utils.save_model(config, model, tokenizer, save_path, **kwargs)[source]

Save a trained model, configuration, and tokenizer to the specified path.

Parameters:
  • config (Config) – Configuration for the model.

  • model (nn.Module) – The trained model.

  • tokenizer – Tokenizer used by the model.

  • save_path (str) – The path where to save the model, config, and tokenizer.

  • **kwargs – Additional keyword arguments.