text_to_audio_generator package#
Submodules#
text_to_audio_generator.text_to_audio_generator module#
- class text_to_audio_generator.text_to_audio_generator.BarkEngine(use_gpu: bool = True, use_small_models: bool = False, offload_cpu: bool = False)[source]#
- Bases: - SpeechEngine
- class text_to_audio_generator.text_to_audio_generator.OpenAIEngine(model: str = 'tts-1', file_format: str = 'wav', speed: float = 1.0)[source]#
- Bases: - SpeechEngine
- text_to_audio_generator.text_to_audio_generator.generate_multi_speakers_audio(data_path: str, speakers: List[str] | Dict[str, int], available_voices: List[str], engine: str = 'openai', output_directory: str | None = None, use_gpu: bool | None = None, use_small_models: bool | None = None, offload_cpu: bool | None = None, model: str | None = None, speed: float | None = None, sample_rate: int = 16000, file_format: str = 'wav', verbose: bool = True, bits_per_sample: int | None = None) Tuple[str, DataFrame, dict][source]#
- Generate audio files from text files. - Parameters:
- data_path – Path to the text file or directory containing the text files to generate audio from. 
- speakers – List / Dict of speakers to generate audio for. If a list is given, the speakers will be assigned to channels in the order given. If dictionary, the keys will be the speakers and the values will be the channels. 
- available_voices – List of available voices to use for the generation. See here for the available voices for bark engine: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c See here for the available voices for openai engine: https://beta.openai.com/docs/api-reference/speech 
- engine – The engine to use for the generation. Select either “bark” or “openai”. Default is “openai”. 
- output_directory – Path to the directory to save the generated audio files to. 
- use_gpu – Whether to use the GPU for the generation. Supported only in “bark” engine. 
- use_small_models – Whether to use the small models for the generation. Supported only in “bark” engine. 
- offload_cpu – To reduce the memory footprint, the models can be offloaded to the CPU after loading. Supported only in “bark” engine. 
- model – Which model to use for the generation. Supported only in “openai” engine. Default is “tts-1”. 
- speed – The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default. 
- sample_rate – The sampling rate of the generated audio. 
- file_format – The format of the generated audio files. 
- verbose – Whether to print the progress of the generation. 
- bits_per_sample – Changes the bit depth for the supported formats. Supported only in “wav” or “flac” formats. 
 
- Returns:
- A tuple of: - The output directory path. - The generated audio files dataframe. - The errors’ dictionary.