text_to_audio_generator package#

Submodules#

text_to_audio_generator.text_to_audio_generator module#

class text_to_audio_generator.text_to_audio_generator.BarkEngine(use_gpu: bool = True, use_small_models: bool = False, offload_cpu: bool = False)[source]#: Bases: SpeechEngine

class text_to_audio_generator.text_to_audio_generator.OpenAIEngine(model: str = 'tts-1', file_format: str = 'wav', speed: float = 1.0)[source]#: Bases: SpeechEngine

class text_to_audio_generator.text_to_audio_generator.SpeechEngine[source]#: Bases: ABC

text_to_audio_generator.text_to_audio_generator.generate_multi_speakers_audio(data_path: str, speakers: List[str] | Dict[str, int], available_voices: List[str], engine: str = 'openai', output_directory: str | None = None, use_gpu: bool | None = None, use_small_models: bool | None = None, offload_cpu: bool | None = None, model: str | None = None, speed: float | None = None, sample_rate: int = 16000, file_format: str = 'wav', verbose: bool = True, bits_per_sample: int | None = None) → Tuple[str, DataFrame, dict][source]#

Generate audio files from text files.

Parameters:

data_path – Path to the text file or directory containing the text files to generate audio from.
speakers – List / Dict of speakers to generate audio for. If a list is given, the speakers will be assigned to channels in the order given. If dictionary, the keys will be the speakers and the values will be the channels.
available_voices – List of available voices to use for the generation. See here for the available voices for bark engine: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c See here for the available voices for openai engine: https://beta.openai.com/docs/api-reference/speech
engine – The engine to use for the generation. Select either “bark” or “openai”. Default is “openai”.
output_directory – Path to the directory to save the generated audio files to.
use_gpu – Whether to use the GPU for the generation. Supported only in “bark” engine.
use_small_models – Whether to use the small models for the generation. Supported only in “bark” engine.
offload_cpu – To reduce the memory footprint, the models can be offloaded to the CPU after loading. Supported only in “bark” engine.
model – Which model to use for the generation. Supported only in “openai” engine. Default is “tts-1”.
speed – The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
sample_rate – The sampling rate of the generated audio.
file_format – The format of the generated audio files.
verbose – Whether to print the progress of the generation.
bits_per_sample – Changes the bit depth for the supported formats. Supported only in “wav” or “flac” formats.

Returns:

A tuple of: - The output directory path. - The generated audio files dataframe. - The errors’ dictionary.

text_to_audio_generator package

Contents

text_to_audio_generator package#

Submodules#

text_to_audio_generator.text_to_audio_generator module#

Module contents#