translate package#

Submodules#

translate.translate module#

translate.translate.open_mpi_handler(worker_inputs: List[str], root_worker_inputs: Dict[str, Any] | None = None)[source]#

translate.translate.translate(data_path: str | List[str] | Path, output_directory: str, model_name: str | None = None, source_language: str | None = None, target_language: str | None = None, device: str | None = None, model_kwargs: dict | None = None, batch_size: int = 1, translation_kwargs: dict | None = None, verbose: bool = False) → Tuple[str, DataFrame, dict][source]#

Translate text files using a transformer model from Huggingface’s hub according to the source and target languages given (or using the directly provided model name). The end result is a directory of translated text files and a dataframe containing the following columns:

text_file - The text file path.
translation_file - The translation text file name in the output directory.

Parameters:

data_path – A directory of text files or a single file or a list of files to translate.
output_directory – Directory where the translated files will be saved.
model_name – The name of a model to load. If None, the model name is constructed using the source and target languages parameters.
source_language – The source language code (e.g., ‘en’ for English).
target_language – The target language code (e.g., ‘en’ for English).
model_kwargs – Keyword arguments to pass regarding the loading of the model in HuggingFace’s pipeline function.
device – The device index for transformers. Default will prefer cuda if available.
batch_size – The number of batches to use in translation. The files are translated one by one, but the sentences can be batched.
translation_kwargs – Additional keyword arguments to pass to a transformers.TranslationPipeline when doing the translation inference. Notice the batch size here is being added automatically.
verbose – Whether to present logs of a progress bar and errors. Default: True.

Returns:

A tuple of:

Path to the output directory.
A dataframe dataset of the translated file names.
A dictionary of errored files that were not translated.

translate package

Contents

translate package#

Submodules#

translate.translate module#

Module contents#