Speech diarization example notebook#

In this notebook we will utilize a call diarization capability to get per-speaker speech durations from a call recording.
This can be useful for quantifying participation rates in calls for things like customer service analysis.

We will demonstrate this by:

Loading in a sample call recording between multiple participants
Using a diarize() function to automatically detect speakers and estimate per-speaker talk time
Return a dictionary of described results, and a df of errors

import os
import mlrun

# To use the `pyannote.audio` models you must pass a Huggingface token and get access to the required models. The
#    token can be passed in one of the following options:
#
#    * Use the parameter `access_token`.
#    * Set an environment variable named "HUGGING_FACE_HUB_TOKEN".
#    * If using MLRun, you can pass it as a secret named "HUGGING_FACE_HUB_TOKEN".
os.environ["HUGGING_FACE_HUB_TOKEN"] = <"add your token here">

# Create an mlrun project
project = mlrun.get_or_create_project("diarization-test")

# Import the function from the yaml file, once it's in the the we can import from there 
speech_diarization = project.set_function(func="hub://speech_diarization", name="speech_diarization")

> 2023-12-05 15:28:51,758 [info] Project loaded successfully: {'project_name': 'diarization-test'}

# Set the desired run params and files
audio_files = os.path.join("test_data.wav")
device = "cpu"
speakers_labels = ["Agent", "Client"]
separate_by_channels = True

# Run the imported function with desired file/s and params
diarize_run = speech_diarization.run(
    handler="diarize",
    inputs={"data_path": audio_files},
    params={
        "device": device,
        "speakers_labels": speakers_labels,
        "separate_by_channels": separate_by_channels,
    },
    returns=["speech-diarization: file", "diarize-errors: file"],
    local=True,
)

> 2023-12-05 15:28:52,229 [info] Storing function: {'name': 'speech-diarization-diarize', 'uid': 'ec6cd014e4674966b30303ea14048acf', 'db': 'http://mlrun-api:8080'}

project	uid	iter	start	state	name	labels	inputs	parameters	results	artifacts
diarization-test	...14048acf	0	Dec 05 15:28:52	completed	speech-diarization-diarize	v3io_user=zeevr kind=local owner=zeevr host=jupyter-zeev-gpu-5995df47dc-rtpvr	data_path	device=cpu speakers_labels=['Agent', 'Client'] separate_by_channels=True		speech-diarization diarize-errors

> to track results use the .show() or .logs() methods or click here to open in UI

> 2023-12-05 15:28:53,350 [info] Run execution finished: {'status': 'completed', 'name': 'speech-diarization-diarize'}