MLflow tracker demo#

This demo demonstrates how to seamlessly integrate and transfer logs from MLflow to MLRun, creating a unified and powerful platform for your machine learning experiments.

You can combine MLflow and MLRun for a comprehensive solution for managing, tracking, and deploying machine learning models.

This notebook guides you through the process of:

  1. Setting up the integration between MLflow and MLRun.

  2. Extracting data, metrics, and artifacts from MLflow experiments.

  3. Creating MLRun artifacts and projects to organize and manage the transferred data.

  4. Leveraging MLRun’s capabilities for model deployment and data processing.

By the end of this demo, you will have a understanding of how to establish a smooth flow of data between MLflow and MLRun.

MLRun installation and configuration#

Before running this notebook make sure the mlrun package is installed (pip install mlrun) and that you have configured the access to MLRun service.

# Install MLRun and scikit-learn if not already installed. Run this only once. Restart the notebook after the install!
# %pip install mlrun scikit-learn~=1.3.0

Then you can import the necessary packages.

import pandas as pd
import os
import mlrun
from mlrun.datastore.targets import ParquetTarget
import mlrun.feature_store as fstore

Create a project for this demo:

# Create a project for this demo:
project = mlrun.get_or_create_project(name="mlflow-tracking-example", context="./")
> 2024-03-27 15:34:40,940 [info] Project loaded successfully: {'project_name': 'mlflow-tracking-example-guy'}

Set all the necessary environment variables for the Databricks cluster:

DATABRICKS_HOST="add your host"
DATABRICKS_TOKEN="add your token"
DATABRICKS_CLUSTER_ID="add your cluster id"
os.environ["DATABRICKS_HOST"] = DATABRICKS_HOST
os.environ["DATABRICKS_TOKEN"] = DATABRICKS_TOKEN
# Set the Databricks environment variables
job_env = {
    "DATABRICKS_HOST": DATABRICKS_HOST,
    "DATABRICKS_CLUSTER_ID": DATABRICKS_CLUSTER_ID
}
secrets = {"DATABRICKS_TOKEN": DATABRICKS_TOKEN}

# Set the secrets in the project
project.set_secrets(secrets)

Create a feature set and ingest data#

This is a short example of how to create a feature set about music preferences.

# create df
columns = ["id", "name", "age", "gender", "favorite_music_type"]
data = [
    (1, "Alice", 20, "f", "Pop"),
    (2, "Bob", 30, "m", "Rock"),
    (3, "Charlie", 25, "m", "Pop"),
    (4, "David", 40, "m", "Classical"),
    (5, "Eva", 18, "f", "Pop"),
    (6, "Frank", 32, "m", "Rock"),
    (7, "Grace", 28, "f", "Pop"),
    (8, "Henry", 45, "m", "Classical"),
    (9, "Ivy", 22, "f", "Pop"),
    (10, "Jack", 38, "m", "Classical"),
    (11, "Karen", 27, "f", "Pop"),
    (12, "Liam", 19, "m", "Pop"),
    (13, "Mia", 27, "f", "Rock"),
    (14, "Nora", 31, "f", "Rock"),
    (15, "Oliver", 29, "m", "Pop"),
    (16, "Ben", 38, "m", "Pop"),
    (17, "Alicia", 20, "f", "Pop"),
    (18, "Bobby", 30, "m", "Rock"),
    (19, "Charlien", 22, "f", "Pop"),
    (20, "Davide", 40, "m", "Classical"),
    (21, "Evans", 19, "m", "Pop"),
    (22, "Franklin", 34, "m", "Rock"),
    (23, "Grace", 22, "f", "Pop"),
    (24, "Henrik", 48, "m", "Classical"),
    (25, "eevee", 29, "f", "Pop"),
    (26, "Jack", 75, "m", "Classical"),
    (27, "Karen", 26, "f", "Pop"),
    (28, "Lian", 21, "f", "Pop"),
    (29, "kia", 27, "f", "Rock"),
    (30, "Novak", 30, "m", "Rock"),
    (31, "Olivia", 29, "f", "Pop"),
    (32, "Benjamin", 18, "m", "Pop")
]
df = pd.DataFrame(data, columns=columns)

Transfer the data to DataBricks.

# Where to save the data in DataBricks
target_path = f"dbfs:///demos/mlrun_databricks_demo/music.parquet"
output_path = f"dbfs:///demos/mlrun_databricks_demo/music_output_new.parquet"

targets = [ParquetTarget(path=target_path)]

# Create a feature set and ingest the data
fset = fstore.FeatureSet(name="music_fset", entities=[fstore.Entity("name")])
fstore.ingest(fset, df, targets=targets, overwrite=True)

# Get the target path and check it
dbfs_data_path = fset.get_target_path()
dbfs_data_path
'dbfs:///demos/mlrun_databricks_demo/1711553684480_33/music.parquet'

We can look and see how how our data is logged in the DataBricks cluster: (only top 20 rows)

image.png

Create a data processing function#

The following code demonstrates how to create a simple data processing function using MLRun. The function will process the data and show some statistics.

%%writefile process_data.py


#  Here is an example of Spark processing.
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg, min, max
import pandas as pd
import json
import fsspec

def process_data(data_path: str, data_output_path: str):
    spark = SparkSession.builder.appName("MusicDemo").getOrCreate()
    spark_df = spark.read.parquet(data_path, header=True)
    spark_df = spark_df.drop("name", "id")
    
    music_stats = spark_df.groupBy("favorite_music_type").agg(
        avg("age").alias("avg_age"),
        min("age").alias("min_age"),
        max("age").alias("max_age")
    )
    music_stats.show()
    pandas_df = spark_df.toPandas()
    pandas_df.to_parquet(data_output_path)
    # spark_df.write.mode("overwrite").parquet(data_output_path)

    return {"music_data": data_output_path}
process_data_function = project.set_function(
    func="./zeev-demos/mlflow-databricks/process_data.py",
    name="process-data",
    kind="databricks",
    image="mlrun/mlrun",
)
                                

Set all parameters necessary for the function and run it.

for name, val in job_env.items():
    process_data_function.spec.env.append({"name": name, "value": val})
params = {
    "task_parameters": {"timeout_minutes": 15},
    "data_path": dbfs_data_path,
    "data_output_path": output_path.replace("dbfs://", "/dbfs"),
}
run = process_data_function.run(
    handler="process_data",
    params=params,
)
> 2024-03-27 15:34:45,422 [info] Storing function: {'name': 'process-data-process-data', 'uid': 'a9c770f8377046bda3061e61a5c015c2', 'db': 'http://mlrun-api:8080'}
> 2024-03-27 15:34:45,675 [info] Job is running in the background, pod: process-data-process-data-89bhh
> 2024-03-27 15:34:49,272 [info] Running with an existing cluster: {'cluster_id': '0327-134616-43m7kfxk'}
> 2024-03-27 15:34:49,492 [info] Starting to poll: 493449112310004
> 2024-03-27 15:34:49,539 [info] Workflow intermediate status: mlrun_task__15_34_48_703046: RunLifeCycleState.PENDING
> 2024-03-27 15:34:50,947 [info] Workflow intermediate status: mlrun_task__15_34_48_703046: RunLifeCycleState.PENDING
> 2024-03-27 15:34:53,063 [info] Workflow intermediate status: mlrun_task__15_34_48_703046: RunLifeCycleState.RUNNING
> 2024-03-27 15:34:56,737 [info] Workflow intermediate status: mlrun_task__15_34_48_703046: RunLifeCycleState.RUNNING
> 2024-03-27 15:35:00,947 [info] Artifacts found. Run name: mlrun_task__15_34_48_703046
> 2024-03-27 15:35:01,881 [info] Job finished: https://dbc-94c947ab-feb9.cloud.databricks.com/?o=4658245941722457#job/499259196347814/run/493449112310004
> 2024-03-27 15:35:01,881 [info] Logs:
+-------------------+------------------+-------+-------+
|favorite_music_type|           avg_age|min_age|max_age|
+-------------------+------------------+-------+-------+
|               Rock|            30.125|     27|     34|
|          Classical|47.666666666666664|     38|     75|
|                Pop|              24.0|     18|     38|
+-------------------+------------------+-------+-------+

2024-03-27 15:34:54,980 - mlrun_logger - INFO - successfully wrote artifact details to the artifact JSON file in DBFS - music_data : /dbfs/demos/mlrun_databricks_demo/music_output_new.parquet
> 2024-03-27 15:35:02,182 [info] To track results use the CLI: {'info_cmd': 'mlrun get run a9c770f8377046bda3061e61a5c015c2 -p mlflow-tracking-example-guy', 'logs_cmd': 'mlrun logs a9c770f8377046bda3061e61a5c015c2 -p mlflow-tracking-example-guy'}
> 2024-03-27 15:35:02,182 [info] Or click for UI: {'ui_url': 'https://dashboard.default-tenant.app.llm-dev.iguazio-cd1.com/mlprojects/mlflow-tracking-example-guy/jobs/monitor/a9c770f8377046bda3061e61a5c015c2/overview'}
> 2024-03-27 15:35:02,182 [info] Run execution finished: {'status': 'completed', 'name': 'process-data-process-data'}
project uid iter start state name labels inputs parameters results artifacts
mlflow-tracking-example-guy 0 Mar 27 15:34:48 completed process-data-process-data
v3io_user=zeevr
kind=databricks
owner=zeevr
mlrun/client_version=1.6.1
mlrun/client_python_version=3.9.16
host=process-data-process-data-89bhh
task_parameters={'timeout_minutes': 15, 'spark_app_code': 'IAoKaW1wb3J0IG9zCmltcG9ydCBsb2dnaW5nCm1scnVuX2xvZ2dlciA9IGxvZ2dpbmcuZ2V0TG9nZ2VyKCdtbHJ1bl9sb2dnZXInKQptbHJ1bl9sb2dnZXIuc2V0TGV2ZWwobG9nZ2luZy5ERUJVRykKCm1scnVuX2NvbnNvbGVfaGFuZGxlciA9IGxvZ2dpbmcuU3RyZWFtSGFuZGxlcigpCm1scnVuX2NvbnNvbGVfaGFuZGxlci5zZXRMZXZlbChsb2dnaW5nLkRFQlVHKQptbHJ1bl9mb3JtYXR0ZXIgPSBsb2dnaW5nLkZvcm1hdHRlcignJShhc2N0aW1lKXMgLSAlKG5hbWUpcyAtICUobGV2ZWxuYW1lKXMgLSAlKG1lc3NhZ2UpcycpCm1scnVuX2NvbnNvbGVfaGFuZGxlci5zZXRGb3JtYXR0ZXIobWxydW5fZm9ybWF0dGVyKQptbHJ1bl9sb2dnZXIuYWRkSGFuZGxlcihtbHJ1bl9jb25zb2xlX2hhbmRsZXIpCgptbHJ1bl9kZWZhdWx0X2FydGlmYWN0X3RlbXBsYXRlID0gJ21scnVuX3JldHVybl92YWx1ZV8nCm1scnVuX2FydGlmYWN0X2luZGV4ID0gMAoKCmRlZiBtbHJ1bl9sb2dfYXJ0aWZhY3QobmFtZT0nJywgcGF0aD0nJyk6CiAgICBnbG9iYWwgbWxydW5fYXJ0aWZhY3RfaW5kZXgKICAgIG1scnVuX2FydGlmYWN0X2luZGV4Kz0xICAjICBieSBob3cgbWFueSBhcnRpZmFjdHMgd2UgdHJpZWQgdG8gbG9nLCBub3QgaG93IG1hbnkgc3VjY2VlZC4KICAgIGlmIG5hbWUgaXMgTm9uZSBvciBuYW1lID09ICcnOgogICAgICAgIG5hbWUgPSBmJ3ttbHJ1bl9kZWZhdWx0X2FydGlmYWN0X3RlbXBsYXRlfXttbHJ1bl9hcnRpZmFjdF9pbmRleH0nCiAgICBpZiBub3QgcGF0aDoKICAgICAgICBtbHJ1bl9sb2dnZXIuZXJyb3IoZidwYXRoIHJlcXVpcmVkIGZvciBsb2dnaW5nIGFuIG1scnVuIGFydGlmYWN0IC0ge25hbWV9IDoge3BhdGh9JykKICAgICAgICByZXR1cm4KICAgIGlmIG5vdCBpc2luc3RhbmNlKG5hbWUsIHN0cikgb3Igbm90IGlzaW5zdGFuY2UocGF0aCwgc3RyKToKICAgICAgICBtbHJ1bl9sb2dnZXIuZXJyb3IoZiduYW1lIGFuZCBwYXRoIG11c3QgYmUgaW4gc3RyaW5nIHR5cGUgZm9yIGxvZ2dpbmcgYW4gbWxydW4gYXJ0aWZhY3QgLSB7bmFtZX0gOiB7cGF0aH0nKQogICAgICAgIHJldHVybgogICAgaWYgbm90IHBhdGguc3RhcnRzd2l0aCgnL2RiZnMnKSBhbmQgbm90IHBhdGguc3RhcnRzd2l0aCgnZGJmczovJyk6CiAgICAgICAgbWxydW5fbG9nZ2VyLmVycm9yKGYncGF0aCBmb3IgYW4gbWxydW4gYXJ0aWZhY3QgbXVzdCBzdGFydCB3aXRoIC9kYmZzIG9yIGRiZnM6LyAtIHtuYW1lfSA6IHtwYXRofScpCiAgICAgICAgcmV0dXJuCiAgICBtbHJ1bl9hcnRpZmFjdHNfcGF0aCA9ICcvZGJmcy9tbHJ1bl9kYXRhYnJpY2tzX3J1bnRpbWUvYXJ0aWZhY3RzX2RpY3Rpb25hcmllcy9tbHJ1bl9hcnRpZmFjdF9hOWM3NzBmODM3NzA0NmJkYTMwNjFlNjFhNWMwMTVjMi5qc29uJwogICAgdHJ5OgogICAgICAgIG5ld19kYXRhID0ge25hbWU6cGF0aH0KICAgICAgICBpZiBvcy5wYXRoLmV4aXN0cyhtbHJ1bl9hcnRpZmFjdHNfcGF0aCk6CiAgICAgICAgICAgIHdpdGggb3BlbihtbHJ1bl9hcnRpZmFjdHNfcGF0aCwgJ3IrJykgYXMganNvbl9maWxlOgogICAgICAgICAgICAgICAgZXhpc3RpbmdfZGF0YSA9IGpzb24ubG9hZChqc29uX2ZpbGUpCiAgICAgICAgICAgICAgICBleGlzdGluZ19kYXRhLnVwZGF0ZShuZXdfZGF0YSkKICAgICAgICAgICAgICAgIGpzb25fZmlsZS5zZWVrKDApCiAgICAgICAgICAgICAgICBqc29uLmR1bXAoZXhpc3RpbmdfZGF0YSwganNvbl9maWxlKQogICAgICAgIGVsc2U6CiAgICAgICAgICAgIHBhcmVudF9kaXIgPSBvcy5wYXRoLmRpcm5hbWUobWxydW5fYXJ0aWZhY3RzX3BhdGgpCiAgICAgICAgICAgIGlmIHBhcmVudF9kaXIgIT0gJy9kYmZzJzoKICAgICAgICAgICAgICAgIG9zLm1ha2VkaXJzKHBhcmVudF9kaXIsIGV4aXN0X29rPVRydWUpCiAgICAgICAgICAgIHdpdGggb3BlbihtbHJ1bl9hcnRpZmFjdHNfcGF0aCwgJ3cnKSBhcyBqc29uX2ZpbGU6CiAgICAgICAgICAgICAgICBqc29uLmR1bXAobmV3X2RhdGEsIGpzb25fZmlsZSkKICAgICAgICBzdWNjZXNzX2xvZyA9IGYnc3VjY2Vzc2Z1bGx5IHdyb3RlIGFydGlmYWN0IGRldGFpbHMgdG8gdGhlIGFydGlmYWN0IEpTT04gZmlsZSBpbiBEQkZTIC0ge25hbWV9IDoge3BhdGh9JwogICAgICAgIG1scnVuX2xvZ2dlci5pbmZvKHN1Y2Nlc3NfbG9nKQogICAgZXhjZXB0IEV4Y2VwdGlvbiBhcyB1bmtub3duX2V4Y2VwdGlvbjoKICAgICAgICBtbHJ1bl9sb2dnZXIuZXJyb3IoZidsb2cgbWxydW4gYXJ0aWZhY3QgZmFpbGVkIC0ge25hbWV9IDoge3BhdGh9LiBlcnJvcjoge3Vua25vd25fZXhjZXB0aW9ufScpCgoKCgppbXBvcnQgYXJncGFyc2UKaW1wb3J0IGpzb24KcGFyc2VyID0gYXJncGFyc2UuQXJndW1lbnRQYXJzZXIoKQpwYXJzZXIuYWRkX2FyZ3VtZW50KCdoYW5kbGVyX2FyZ3VtZW50cycpCmhhbmRsZXJfYXJndW1lbnRzID0gcGFyc2VyLnBhcnNlX2FyZ3MoKS5oYW5kbGVyX2FyZ3VtZW50cwpoYW5kbGVyX2FyZ3VtZW50cyA9IGpzb24ubG9hZHMoaGFuZGxlcl9hcmd1bWVudHMpCgoKZnJvbSBweXNwYXJrLnNxbCBpbXBvcnQgU3BhcmtTZXNzaW9uCmZyb20gcHlzcGFyay5zcWwuZnVuY3Rpb25zIGltcG9ydCBhdmcsIG1pbiwgbWF4CmltcG9ydCBwYW5kYXMgYXMgcGQKaW1wb3J0IGpzb24KaW1wb3J0IGZzc3BlYwoKZGVmIHByb2Nlc3NfZGF0YShkYXRhX3BhdGg6IHN0ciwgZGF0YV9vdXRwdXRfcGF0aDogc3RyKToKICAgIHNwYXJrID0gU3BhcmtTZXNzaW9uLmJ1aWxkZXIuYXBwTmFtZSgnTXVzaWNEZW1vJykuZ2V0T3JDcmVhdGUoKQogICAgc3BhcmtfZGYgPSBzcGFyay5yZWFkLnBhcnF1ZXQoZGF0YV9wYXRoLCBoZWFkZXI9VHJ1ZSkKICAgIHNwYXJrX2RmID0gc3BhcmtfZGYuZHJvcCgnbmFtZScsICdpZCcpCiAgICBtdXNpY19zdGF0cyA9IHNwYXJrX2RmLmdyb3VwQnkoJ2Zhdm9yaXRlX211c2ljX3R5cGUnKS5hZ2coYXZnKCdhZ2UnKS5hbGlhcygnYXZnX2FnZScpLCBtaW4oJ2FnZScpLmFsaWFzKCdtaW5fYWdlJyksIG1heCgnYWdlJykuYWxpYXMoJ21heF9hZ2UnKSkKICAgIG11c2ljX3N0YXRzLnNob3coKQogICAgcGFuZGFzX2RmID0gc3BhcmtfZGYudG9QYW5kYXMoKQogICAgcGFuZGFzX2RmLnRvX3BhcnF1ZXQoZGF0YV9vdXRwdXRfcGF0aCkKICAgIHJldHVybiB7J211c2ljX2RhdGEnOiBkYXRhX291dHB1dF9wYXRofQpyZXN1bHQgPSBwcm9jZXNzX2RhdGEoKipoYW5kbGVyX2FyZ3VtZW50cykKCgppZiByZXN1bHQ6CiAgICBpZiBpc2luc3RhbmNlKHJlc3VsdCwgZGljdCk6CiAgICAgICAgZm9yIGtleSwgcGF0aCBpbiByZXN1bHQuaXRlbXMoKToKICAgICAgICAgICAgbWxydW5fbG9nX2FydGlmYWN0KG5hbWU9a2V5LCBwYXRoPXBhdGgpCiAgICBlbGlmIGlzaW5zdGFuY2UocmVzdWx0LCAobGlzdCwgdHVwbGUsIHNldCkpOgogICAgICAgIGZvciBhcnRpZmFjdF9wYXRoIGluIHJlc3VsdDoKICAgICAgICAgICAgbWxydW5fbG9nX2FydGlmYWN0KHBhdGg9YXJ0aWZhY3RfcGF0aCkKICAgIGVsaWYgaXNpbnN0YW5jZShyZXN1bHQsIHN0cik6CiAgICAgICAgbWxydW5fbG9nX2FydGlmYWN0KHBhdGg9cmVzdWx0KQogICAgZWxzZToKICAgICAgICBtbHJ1bl9sb2dnZXIud2FybmluZyhmJ2NhbiBub3QgbG9nIGFydGlmYWN0cyB3aXRoIHRoZSByZXN1bHQgb2YgaGFuZGxlciBmdW5jdGlvbiAtIHJlc3VsdCBpbiB1bnN1cHBvcnRlZCB0eXBlLiB7dHlwZShyZXN1bHQpfScpCg==', 'original_handler': 'process_data', 'artifact_json_path': '/mlrun_databricks_runtime/artifacts_dictionaries/mlrun_artifact_a9c770f8377046bda3061e61a5c015c2.json'}
data_path=dbfs:///demos/mlrun_databricks_demo/1711553684480_33/music.parquet
data_output_path=/dbfs/demos/mlrun_databricks_demo/music_output_new.parquet
music_data
databricks_run_metadata

> to track results use the .show() or .logs() methods or click here to open in UI
> 2024-03-27 15:35:07,910 [info] Run execution finished: {'status': 'completed', 'name': 'process-data-process-data'}

Create an MLflow Xgboost function#

The following code demonstrates how to create a simple Xgboost model using MLflow and log the results. MLflow will log the model, parameters, metrics, and artifacts, and MLRun will track the run and collect the data.

%%writefile training.py

import mlflow
import mlflow.xgboost
import xgboost as xgb
from mlflow import log_metric
from sklearn import datasets
from sklearn.metrics import accuracy_score, log_loss
from sklearn.model_selection import train_test_split
import pandas as pd

def example_xgb_run(df: str):
    df = pd.read_parquet(df)
    
    df = df.replace(["f", "m"], [0, 1])
    df = df.replace(["Pop", "Rock", "Classical"], [0, 1, 2])
    
    # Prepare, train, and test data
    y = df.pop('favorite_music_type')
    X = df

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    # Enable auto logging
    mlflow.xgboost.autolog()

    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)

    with mlflow.start_run():
        # Train model
        params = {
            "objective": "multi:softprob",
            "num_class": 3,
            "learning_rate": 0.3,
            "eval_metric": "mlogloss",
            "colsample_bytree": 1.0,
            "subsample": 1.0,
            "seed": 42,
        }
        model = xgb.train(params, dtrain, evals=[(dtrain, "train")])
        
        # Evaluate model
        y_proba = model.predict(dtest)
        y_pred = y_proba.argmax(axis=1)
        loss = log_loss(y_test, y_proba)
        acc = accuracy_score(y_test, y_pred)
        
        # Log metrics by hand
        mlflow.log_metrics({"log_loss": loss, "accuracy": acc})
Overwriting training.py

Log the data from MLflow in MLRun#

Change the MLRun configuration to use the tracker#

import mlrun

mlrun.mlconf.external_platform_tracking.enabled = True

These are the three options to run tracking:

  • Set: mlrun.mlconf.external_platform_tracking.mlflow.match_experiment_to_runtime to True. This determines the run id and is the safest method

  • Set the experiment name at: mlflow.environment_variables.MLFLOW_EXPERIMENT_NAME.set. This determines the experiment mlrun will track and find the run added to it.

  • Just run it, mlrun will look across all experiments and search for added run, this is not recomended.

Create the mlrun function#

# Use the first run option from above
mlrun.mlconf.external_platform_tracking.mlflow.match_experiment_to_runtime = True

# Create a MLRun function using the example train file (all the functions must be located in it):
training_func = project.set_function(
    func="training.py",
    name="example-xgb-run",
    kind="job",
    image="mlrun/mlrun",
)

Run the function#

Run the function using MLRun. This will log the data from MLflow in MLRun. After running the function, you can look at the UI and see that all metrics and parameters are logged in MLRun.

import mlrun.feature_store as fstore

feature_set = fstore.get_feature_set("music_fset", "mlflow-tracking-example")
df = feature_set.to_dataframe()
df = df.drop(['id'], axis=1)
# df = project.list_().to_objects()[0].to_dataitem().as_df()
df_path = "./music.parquet"
df.to_parquet(df_path)
# Run the example code using mlrun
train_run = training_func.run(
    local=True,
    handler="example_xgb_run",
    inputs={"df": df_path},
)
> 2024-03-27 15:37:22,829 [info] Storing function: {'name': 'example-xgb-run-example-xgb-run', 'uid': '6ff324dd21d64b6290d45a001957dda2', 'db': 'http://mlrun-api:8080'}
> 2024-03-27 15:37:22,912 [warning] `mlconf.external_platform_tracking.mlflow.match_experiment_to_runtime` is set to True but the MLFlow experiment name environment variable ('MLFLOW_EXPERIMENT_NAME') is set for using the name: 'example-xgb-run-example-xgb-run'. This name will be overriden with MLRun's runtime name as set in the MLRun configuration: 'example-xgb-run-example-xgb-run'.
[0]	train-mlogloss:0.82467
[1]	train-mlogloss:0.64706
[2]	train-mlogloss:0.52480
[3]	train-mlogloss:0.43768
[4]	train-mlogloss:0.37410
[5]	train-mlogloss:0.32686
[6]	train-mlogloss:0.29057
[7]	train-mlogloss:0.26192
[8]	train-mlogloss:0.23885
[9]	train-mlogloss:0.22004
2024/03/27 15:37:23 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/User/.pythonlibs/mlrun-base/lib/python3.9/site-packages/mlflow/types/utils.py:393: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
2024/03/27 15:37:23 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/User/.pythonlibs/mlrun-base/lib/python3.9/site-packages/xgboost/core.py:160: UserWarning: [15:37:23] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using `json` or `ubj`. Model format will default to JSON in XGBoost 2.2 if not specified."
project uid iter start state name labels inputs parameters results artifacts
mlflow-tracking-example-guy 0 Mar 27 15:37:22 completed example-xgb-run-example-xgb-run
v3io_user=zeevr
kind=local
owner=zeevr
host=jupyter-zeevr-9f4ffb7bb-8c4mf
mlflow-user=iguazio
mlflow-run-name=stately-cow-437
mlflow-run-id=f66d6149d54c4958a2485c941d86a538
mlflow-experiment-id=608717337209571124
df
colsample_bytree=1.0
custom_metric=None
early_stopping_rounds=None
eval_metric=mlogloss
learning_rate=0.3
maximize=None
num_boost_round=10
num_class=3
objective=multi:softprob
seed=42
subsample=1.0
verbose_eval=True
accuracy=0.7142857142857143
log_loss=0.9622776094122579
train-mlogloss=0.2200447738170624
feature_importance_weight_json
feature_importance_weight_png
model

> to track results use the .show() or .logs() methods or click here to open in UI
> 2024-03-27 15:37:31,415 [info] Run execution finished: {'status': 'completed', 'name': 'example-xgb-run-example-xgb-run'}

Examine the results#

You can examine the results using the UI or by looking at the outputs of the run. The outputs include the model, the metrics, and the artifacts, and are completely independent of MLflow.

train_run.outputs
{'accuracy': 0.7142857142857143,
 'log_loss': 0.9622776094122579,
 'train-mlogloss': 0.2200447738170624,
 'feature_importance_weight_json': 'store://artifacts/mlflow-tracking-example-guy/example-xgb-run-example-xgb-run_feature_importance_weight_json@6ff324dd21d64b6290d45a001957dda2',
 'feature_importance_weight_png': 'store://artifacts/mlflow-tracking-example-guy/example-xgb-run-example-xgb-run_feature_importance_weight_png@6ff324dd21d64b6290d45a001957dda2',
 'model': 'store://artifacts/mlflow-tracking-example-guy/example-xgb-run-example-xgb-run_model@6ff324dd21d64b6290d45a001957dda2'}
train_run.status.results
{'accuracy': 0.7142857142857143,
 'log_loss': 0.9622776094122579,
 'train-mlogloss': 0.2200447738170624}
train_run.artifact("feature_importance_weight_png").show()
_images/3edebfb8ca78ff860681bc18084495dd1fefd44d69ead3b64c7b3e0b0170adbb.png

You can also examine the results using the UI#

Look at collected artifacts:

image.png

And at results:

image.png

Use the function for model serving#

Create the server and serving function#

Create a serving function that uses the model from the previous run and serves it using MLRun. We will create a mock server to test the model in a local environment.

serving_func = project.set_function(
    func="function.yaml",
    name="example-xgb-server",
)
# Add the model
serving_func.add_model(
    "mlflow_xgb_model",
    class_name="MLFlowModelServer",
    model_path=train_run.outputs["model"],
)
<mlrun.serving.states.TaskStep at 0x7f77c3e4c9a0>
# Create a mock server
server = serving_func.to_mock_server()
> 2024-03-27 15:37:31,627 [info] model mlflow_xgb_model was loaded
> 2024-03-27 15:37:31,628 [info] Loaded ['mlflow_xgb_model']

Test the model#

# An example taken randomly  
result = server.test("/v2/models/mlflow_xgb_model/predict", {"inputs":[{"age": 20, "gender": 0}]})
# Look at the result, it shows the probability of the given example to be each of the 
# irises featured in the dataset
result
{'id': '43a61d06f2694fa695bdd6561b487131',
 'model_name': 'mlflow_xgb_model',
 'outputs': [[0.9242361187934875, 0.0418272465467453, 0.033936627209186554]]}

We predicted that a 20 year old female would like pop!