Speech diarization example notebook#

In this notebook we will utilize a call diarization capability to get per-speaker speech durations from a call recording.
This can be useful for quantifying participation rates in calls for things like customer service analysis.

We will demonstrate this by:

  1. Loading in a sample call recording between multiple participants

  2. Using a diarize() function to automatically detect speakers and estimate per-speaker talk time

  3. Return a dictionary of described results, and a df of errors

import os
import mlrun
# To use the `pyannote.audio` models you must pass a Huggingface token and get access to the required models. The
#    token can be passed in one of the following options:
#
#    * Use the parameter `access_token`.
#    * Set an environment variable named "HUGGING_FACE_HUB_TOKEN".
#    * If using MLRun, you can pass it as a secret named "HUGGING_FACE_HUB_TOKEN".
os.environ["HUGGING_FACE_HUB_TOKEN"] = <"add your token here">
# Create an mlrun project
project = mlrun.get_or_create_project("diarization-test")

# Import the function from the yaml file, once it's in the the we can import from there 
speech_diarization = project.set_function(func="hub://speech_diarization", name="speech_diarization")
> 2023-12-05 15:28:51,758 [info] Project loaded successfully: {'project_name': 'diarization-test'}
# Set the desired run params and files
audio_files = os.path.join("test_data.wav")
device = "cpu"
speakers_labels = ["Agent", "Client"]
separate_by_channels = True
# Run the imported function with desired file/s and params
diarize_run = speech_diarization.run(
    handler="diarize",
    inputs={"data_path": audio_files},
    params={
        "device": device,
        "speakers_labels": speakers_labels,
        "separate_by_channels": separate_by_channels,
    },
    returns=["speech-diarization: file", "diarize-errors: file"],
    local=True,
)
> 2023-12-05 15:28:52,229 [info] Storing function: {'name': 'speech-diarization-diarize', 'uid': 'ec6cd014e4674966b30303ea14048acf', 'db': 'http://mlrun-api:8080'}
project uid iter start state name labels inputs parameters results artifacts
diarization-test 0 Dec 05 15:28:52 completed speech-diarization-diarize
v3io_user=zeevr
kind=local
owner=zeevr
host=jupyter-zeev-gpu-5995df47dc-rtpvr
data_path
device=cpu
speakers_labels=['Agent', 'Client']
separate_by_channels=True
speech-diarization
diarize-errors

> to track results use the .show() or .logs() methods or click here to open in UI
> 2023-12-05 15:28:53,350 [info] Run execution finished: {'status': 'completed', 'name': 'speech-diarization-diarize'}