translate package
Contents
translate package#
Submodules#
translate.translate module#
- translate.translate.open_mpi_handler(worker_inputs: List[str], root_worker_inputs: Optional[Dict[str, Any]] = None)[source]#
- translate.translate.translate(data_path: Union[str, List[str], pathlib.Path], output_directory: str, model_name: Optional[str] = None, source_language: Optional[str] = None, target_language: Optional[str] = None, device: Optional[str] = None, model_kwargs: Optional[dict] = None, batch_size: int = 1, translation_kwargs: Optional[dict] = None, verbose: bool = False) → Tuple[str, pandas.core.frame.DataFrame, dict][source]#
Translate text files using a transformer model from Huggingface’s hub according to the source and target languages given (or using the directly provided model name). The end result is a directory of translated text files and a dataframe containing the following columns:
text_file - The text file path.
translation_file - The translation text file name in the output directory.
- Parameters
data_path – A directory of text files or a single file or a list of files to translate.
output_directory – Directory where the translated files will be saved.
model_name – The name of a model to load. If None, the model name is constructed using the source and target languages parameters.
source_language – The source language code (e.g., ‘en’ for English).
target_language – The target language code (e.g., ‘en’ for English).
model_kwargs – Keyword arguments to pass regarding the loading of the model in HuggingFace’s pipeline function.
device – The device index for transformers. Default will prefer cuda if available.
batch_size – The number of batches to use in translation. The files are translated one by one, but the sentences can be batched.
translation_kwargs – Additional keyword arguments to pass to a transformers.TranslationPipeline when doing the translation inference. Notice the batch size here is being added automatically.
verbose – Whether to present logs of a progress bar and errors. Default: True.
- Returns
A tuple of:
Path to the output directory.
A dataframe dataset of the translated file names.
A dictionary of errored files that were not translated.