pyannote_audio package#

Submodules#

pyannote_audio.pyannote_audio module#

pyannote_audio.pyannote_audio.diarize(data_path: Union[str, List[str]], model_name: str = 'pyannote/speaker-diarization-3.0', access_token: Optional[str] = None, device: Optional[str] = None, speakers_labels: Optional[List[str]] = None, speaker_prefix: str = 'speaker_', separate_by_channels: bool = False, minimum_speakers: Optional[int] = None, maximum_speakers: Optional[int] = None, verbose: bool = False)Tuple[Dict[str, List[Tuple[float, float, str]]], Dict[str, str]][source]#

Perform speech diarization on given audio files using pyannote-audio (https://github.com/pyannote/pyannote-audio). The end result is a dictionary with the file names as keys and their diarization as value. A diarization is a list of tuples: (start, end, speaker_label).

To use the pyannote.audio models you must pass a Huggingface token and get access to the required models. The token can be passed in one of the following options:

  • Use the parameter access_token.

  • Set an environment variable named “HUGGING_FACE_HUB_TOKEN”.

  • If using MLRun, you can pass it as a secret named “HUGGING_FACE_HUB_TOKEN”.

To get access to the models on Huggingface, visit their page. For example, to use the default diarization model set in this function (“pyannote/speaker-diarization-3.0”), you need access for these two models:

Note: To control the recognized speakers in the diarization output you can choose one of the following methods:

  • For a known speakers amount, you may set speaker labels via the speakers_labels parameter that will be used in the order of speaking in the audio (first person speaking be the first label in the list). In addition, you can do diarization per channel (setting the parameter separate_by_channels to True). Each label will be assigned to a specific channel by order (first label to channel 0, second label to channel 1 and so on). Notice, this will increase runtime.

  • For unknown speakers amount, you can set the speaker_prefix parameter to add a prefix for each speaker number. You can also help the diarization by setting the speakers range via the speakers_amount_range parameter.

Parameters
  • data_path – A directory of the audio files, a single file or a list of files to transcribe.

  • model_name – One of the official diarization model names (referred as diarization pipelines) of pyannote.audio Huggingface page. Default: “pyannote/speaker-diarization-3.0”.

  • access_token – An access token to pass for using the pyannote.audio models. If not provided, it will be looking for the environment variable “HUGGING_FACE_HUB_TOKEN”. If MLRun is available, it will look for a secret “HUGGING_FACE_HUB_TOKEN”.

  • device – Device to load the model. Can be one of {“cuda”, “cpu”}. Default will prefer “cuda” if available.

  • speakers_labels – Labels to use for the recognized speakers. Default: numeric labels (0, 1, …).

  • separate_by_channels – If each speaker is speaking in a separate channel, you can diarize each channel and combine the result into a single diarization. Each label set in the speakers_labels parameter will be assigned to a specific channel by order.

  • speaker_prefix – A prefix to add for the speakers labels. This parameter is ignored if speakers_labels is not None. Default: “speaker”.

  • minimum_speakers – Set the minimum expected amount of speakers to be in the audio files. This parameter is ignored if speakers_labels is not None.

  • maximum_speakers – Set the maximum expected amount of speakers to be in the audio files. This parameter is ignored if speakers_labels is not None.

  • verbose – Whether to present logs of a progress bar and errors. Default: True.

Returns

A tuple of:

  • Speech diarization dictionary.

  • A dictionary of errored files that were not transcribed.

pyannote_audio.pyannote_audio.open_mpi_handler(worker_inputs: List[str], root_worker_inputs: Optional[Dict[str, Any]] = None)[source]#

Module contents#