demos

MLRun Demos

The mlrun/demos repository provides demos that implement full end-to-end ML use-case applications with MLRun and demonstrate different aspects of working with MLRun.

For more information about the MLRun Hackathon, refer to the hackathon getting-started section.

In This Document

Overview

The MLRun demos are end-to-end use-case applications that leverage MLRun to implement complete machine-learning (ML) pipelines — including data collection and preparation, model training, and deployment automation.

The demos demonstrate how you can

The demo applications are tested on the Iguazio Data Science Platform (“the platform”) and use its shared data fabric, which is accessible via the v3io file-system mount; if you’re not already a platform user, request a free trial.

General ML Workflow

The provided demos implement some or all of the ML workflow steps illustrated in the following image:

ML workflow

Prerequisites

To run the MLRun demos, first do the following:

Getting-started Tutorial

The tutorial covers MLRun fundamentals such as creation of projects and data ingestion and preparation, and demonstrates how to create an end-to-end machine-learning (ML) pipeline. MLRun is integrated as a default (pre-deployed) shared service in the Iguazio Data Science Platform.

You’ll learn how to

You’ll also learn about the basic concepts, components, and APIs that allow you to perform these tasks, including

How-To: Converting Existing ML Code to an MLRun Project

The converting-to-mlrun how-to demo demonstrates how to convert existing ML code to an MLRun project. The demo implements an MLRun project for taxi ride-fare prediction based on a Kaggle notebook with an ML Python script that uses data from the New York City Taxi Fare Prediction competition.

The code includes the following components:

  1. Data ingestion
  2. Data cleaning and preparation
  3. Model training
  4. Model serving

Pipeline Output

converting-to-mlrun pipeline output

Integrating with CI Pipelines

The CI Pipeline demo demonstrates how to build a full end-to-end automated-ML pipeline using scikit-learn and the UCI Iris data set.

Users may want to run their ML Pipelines using CI frameworks like Github Actions, GitLab CI/CD, etc. MLRun support simple and native integration with the CI systems, see the following example in which we combine local code (from the repository) with MLRun marketplace functions to build an automated ML pipeline which:

The demo by default uses Slack notifications. To run slack notification, you will need to create an app and enable webhooks. This process is straightforward and should take a few minutes. For more information see the slack documentation

scikit-learn tress image

Model deployment Pipeline: Real-time operational Pipeline

This demo shows how to deploy a model with streaming information.

This demo is comprised of several steps:

Model deployment Pipeline Real-time operational Pipeline

Note: this demo uses the multi-model data layer (V3IO), primarily for real-time streaming. Contact Iguazio to get credentials to access a V3IO system. To test access to the V3IO API see the v3io-api test notebook.

While this demo covers the use case of 1st-day churn, it is easy to replace the data, related features and training model and reuse the same workflow for different business cases.

These steps are covered by the following pipeline:

Healthcare Demo with Feature Store

This demo shows the usage of MLRun and the feature store. The demo will showcase:

Healthcare facilities need to closely monitor their patients and identify early signs that can indicate that medical intervention is necessary. Time is a key factor, the earlier the medical teams can attend to an issue, the better the outcome. This means an effective system that can alert of issues in real-time can save lives.

In this demo we will learn how to Ingest different data sources to our Feature Store. Specifically, this patient data has been successfully used to treat hospitalized COVID-19 patients prior to their condition becoming severe or critical. To do this we will use a medical dataset which includes three types of data:

Note: this demo uses the multi-model data layer (V3IO), primarily for real-time streaming. Contact Iguazio to get credentials to access a V3IO system. To test access to the V3IO API see the v3io-api test notebook.