text_to_audio_generator package
Contents
text_to_audio_generator package#
Submodules#
text_to_audio_generator.text_to_audio_generator module#
- class text_to_audio_generator.text_to_audio_generator.BarkEngine(use_gpu: bool = True, use_small_models: bool = False, offload_cpu: bool = False)[source]#
Bases:
text_to_audio_generator.text_to_audio_generator.SpeechEngine
- class text_to_audio_generator.text_to_audio_generator.OpenAIEngine(model: str = 'tts-1', file_format: str = 'wav', speed: float = 1.0)[source]#
Bases:
text_to_audio_generator.text_to_audio_generator.SpeechEngine
- text_to_audio_generator.text_to_audio_generator.generate_multi_speakers_audio(data_path: str, speakers: Union[List[str], Dict[str, int]], available_voices: List[str], engine: str = 'openai', output_directory: Optional[str] = None, use_gpu: Optional[bool] = None, use_small_models: Optional[bool] = None, offload_cpu: Optional[bool] = None, model: Optional[str] = None, speed: Optional[float] = None, sample_rate: int = 16000, file_format: str = 'wav', verbose: bool = True, bits_per_sample: Optional[int] = None) → Tuple[str, pandas.core.frame.DataFrame, dict][source]#
Generate audio files from text files.
- Parameters
data_path – Path to the text file or directory containing the text files to generate audio from.
speakers – List / Dict of speakers to generate audio for. If a list is given, the speakers will be assigned to channels in the order given. If dictionary, the keys will be the speakers and the values will be the channels.
available_voices – List of available voices to use for the generation. See here for the available voices for bark engine: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c See here for the available voices for openai engine: https://beta.openai.com/docs/api-reference/speech
engine – The engine to use for the generation. Select either “bark” or “openai”. Default is “openai”.
output_directory – Path to the directory to save the generated audio files to.
use_gpu – Whether to use the GPU for the generation. Supported only in “bark” engine.
use_small_models – Whether to use the small models for the generation. Supported only in “bark” engine.
offload_cpu – To reduce the memory footprint, the models can be offloaded to the CPU after loading. Supported only in “bark” engine.
model – Which model to use for the generation. Supported only in “openai” engine. Default is “tts-1”.
speed – The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
sample_rate – The sampling rate of the generated audio.
file_format – The format of the generated audio files.
verbose – Whether to print the progress of the generation.
bits_per_sample – Changes the bit depth for the supported formats. Supported only in “wav” or “flac” formats.
- Returns
A tuple of: - The output directory path. - The generated audio files dataframe. - The errors’ dictionary.