text_to_audio_generator package#

Submodules#

text_to_audio_generator.text_to_audio_generator module#

class text_to_audio_generator.text_to_audio_generator.BarkEngine(use_gpu: bool = True, use_small_models: bool = False, offload_cpu: bool = False)[source]#

Bases: text_to_audio_generator.text_to_audio_generator.SpeechEngine

class text_to_audio_generator.text_to_audio_generator.OpenAIEngine(model: str = 'tts-1', file_format: str = 'wav', speed: float = 1.0)[source]#

Bases: text_to_audio_generator.text_to_audio_generator.SpeechEngine

class text_to_audio_generator.text_to_audio_generator.SpeechEngine[source]#

Bases: abc.ABC

text_to_audio_generator.text_to_audio_generator.generate_multi_speakers_audio(data_path: str, speakers: Union[List[str], Dict[str, int]], available_voices: List[str], engine: str = 'openai', output_directory: Optional[str] = None, use_gpu: Optional[bool] = None, use_small_models: Optional[bool] = None, offload_cpu: Optional[bool] = None, model: Optional[str] = None, speed: Optional[float] = None, sample_rate: int = 16000, file_format: str = 'wav', verbose: bool = True, bits_per_sample: Optional[int] = None)Tuple[str, pandas.core.frame.DataFrame, dict][source]#

Generate audio files from text files.

Parameters
  • data_path – Path to the text file or directory containing the text files to generate audio from.

  • speakers – List / Dict of speakers to generate audio for. If a list is given, the speakers will be assigned to channels in the order given. If dictionary, the keys will be the speakers and the values will be the channels.

  • available_voices – List of available voices to use for the generation. See here for the available voices for bark engine: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c See here for the available voices for openai engine: https://beta.openai.com/docs/api-reference/speech

  • engine – The engine to use for the generation. Select either “bark” or “openai”. Default is “openai”.

  • output_directory – Path to the directory to save the generated audio files to.

  • use_gpu – Whether to use the GPU for the generation. Supported only in “bark” engine.

  • use_small_models – Whether to use the small models for the generation. Supported only in “bark” engine.

  • offload_cpu – To reduce the memory footprint, the models can be offloaded to the CPU after loading. Supported only in “bark” engine.

  • model – Which model to use for the generation. Supported only in “openai” engine. Default is “tts-1”.

  • speed – The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

  • sample_rate – The sampling rate of the generated audio.

  • file_format – The format of the generated audio files.

  • verbose – Whether to print the progress of the generation.

  • bits_per_sample – Changes the bit depth for the supported formats. Supported only in “wav” or “flac” formats.

Returns

A tuple of: - The output directory path. - The generated audio files dataframe. - The errors’ dictionary.

Module contents#