Speech diarization example notebook#
In this notebook we will utilize a call diarization capability to get per-speaker speech durations from a call recording.
This can be useful for quantifying participation rates in calls for things like customer service analysis.
We will demonstrate this by:
- Loading in a sample call recording between multiple participants 
- Using a diarize() function to automatically detect speakers and estimate per-speaker talk time 
- Return a dictionary of described results, and a df of errors 
import os
import mlrun
# To use the `pyannote.audio` models you must pass a Huggingface token and get access to the required models. The
#    token can be passed in one of the following options:
#
#    * Use the parameter `access_token`.
#    * Set an environment variable named "HUGGING_FACE_HUB_TOKEN".
#    * If using MLRun, you can pass it as a secret named "HUGGING_FACE_HUB_TOKEN".
os.environ["HUGGING_FACE_HUB_TOKEN"] = <"add your token here">
# Create an mlrun project
project = mlrun.get_or_create_project("diarization-test")
# Import the function from the yaml file, once it's in the the we can import from there 
speech_diarization = project.set_function(func="hub://speech_diarization", name="speech_diarization")
> 2023-12-05 15:28:51,758 [info] Project loaded successfully: {'project_name': 'diarization-test'}
# Set the desired run params and files
audio_files = os.path.join("test_data.wav")
device = "cpu"
speakers_labels = ["Agent", "Client"]
separate_by_channels = True
# Run the imported function with desired file/s and params
diarize_run = speech_diarization.run(
    handler="diarize",
    inputs={"data_path": audio_files},
    params={
        "device": device,
        "speakers_labels": speakers_labels,
        "separate_by_channels": separate_by_channels,
    },
    returns=["speech-diarization: file", "diarize-errors: file"],
    local=True,
)
> 2023-12-05 15:28:52,229 [info] Storing function: {'name': 'speech-diarization-diarize', 'uid': 'ec6cd014e4674966b30303ea14048acf', 'db': 'http://mlrun-api:8080'}
| project | uid | iter | start | state | name | labels | inputs | parameters | results | artifacts | 
|---|---|---|---|---|---|---|---|---|---|---|
| diarization-test | 0 | Dec 05 15:28:52 | completed | speech-diarization-diarize | v3io_user=zeevr kind=local owner=zeevr host=jupyter-zeev-gpu-5995df47dc-rtpvr | data_path | device=cpu speakers_labels=['Agent', 'Client'] separate_by_channels=True | speech-diarization diarize-errors | 
 > to track results use the .show() or .logs() methods  or click here to open in UI
> 2023-12-05 15:28:53,350 [info] Run execution finished: {'status': 'completed', 'name': 'speech-diarization-diarize'}