{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MLRun CI Example\n", "\n", "Users may want to run their ML Pipelines using CI frameworks like Github Actions, GitLab CI/CD, etc. MLRun support simple and native integration with the CI systems, see the following example in which we combine local code (from the repository) with MLRun marketplace functions to build an automated ML pipeline which:\n", "\n", "- Runs data preparation\n", "- Train a model\n", "- Test the trained model\n", "- Deploy the model into a cluster\n", "- Test the deployed model" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting python-dotenv\n", " Downloading python_dotenv-0.17.1-py2.py3-none-any.whl (18 kB)\n", "Installing collected packages: python-dotenv\n", "Successfully installed python-dotenv-0.17.1\n", "\u001b[33mWARNING: You are using pip version 20.2.4; however, version 21.1.2 is available.\n", "You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install python-dotenv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example shows how to run an entire CI pipeline with notifications.\n", "To run this example with Slack notifications, follow the instructions at to create an app, and select the Incoming Webhooks feature, and click the Activate Incoming Webhooks toggle to switch it on.\n", "Once you have a webhook URL, set `SLACK_WEBHOOK` environment variable in the `env.txt` file." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dotenv import load_dotenv\n", "\n", "load_dotenv('env.txt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below performs the following steps:\n", "\n", "- Ingest the iris data\n", "- Train and test the model\n", "- Deploy the model as a real-time serverless function" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pipeline started in project ci, check progress in http://localhost:30060/projects/ci/jobs\n" ] }, { "data": { "text/html": [ "Pipeline started in project ci
click here to check progress
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "> 2021-05-24 01:05:22,539 [info] starting run prep_data uid=efc393357d8a4ae5bf8d7456b6b9cce0 DB=http://mlrun-api:8080\n", "> 2021-05-24 01:05:22,694 [info] Job is running in the background, pod: prep-data-2rn2s\n", "> 2021-05-24 01:05:31,730 [info] run executed, status=completed\n", "final state: completed\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
projectuiditerstartstatenamelabelsinputsparametersresultsartifacts
ci0May 24 01:05:30completedprep_data
kind=job
owner=jovyan
host=prep-data-2rn2s
source_url
num_rows=150
cleaned_data
\n", "
\n", "
\n", "
\n", " Title\n", " ×\n", "
\n", " \n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "to track results use .show() or .logs() or in CLI: \n", "!mlrun get run efc393357d8a4ae5bf8d7456b6b9cce0 --project ci , !mlrun logs efc393357d8a4ae5bf8d7456b6b9cce0 --project ci\n", "> 2021-05-24 01:05:32,976 [info] run executed, status=completed\n", "> 2021-05-24 01:05:33,595 [info] starting run train uid=01b7ae42329840cc9433f5a6eae47da8 DB=http://mlrun-api:8080\n", "> 2021-05-24 01:05:33,719 [info] Job is running in the background, pod: train-7trd4\n", "> 2021-05-24 01:05:43,457 [info] run executed, status=completed\n", "final state: completed\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
projectuiditerstartstatenamelabelsinputsparametersresultsartifacts
ci0May 24 01:05:41completedtrain
kind=job
owner=jovyan
host=train-7trd4
class=sklearn.linear_model.LogisticRegression
dataset
model_pkg_class=sklearn.linear_model.LogisticRegression
label_column=label
accuracy=0.9375
test-error=0.0625
auc-micro=0.9921875
auc-weighted=1.0
f1-score=0.9206349206349206
precision_score=0.9047619047619048
recall_score=0.9555555555555556
test_set
confusion-matrix
precision-recall-multiclass
roc-multiclass
model
\n", "
\n", "
\n", "
\n", " Title\n", " ×\n", "
\n", " \n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "to track results use .show() or .logs() or in CLI: \n", "!mlrun get run 01b7ae42329840cc9433f5a6eae47da8 --project ci , !mlrun logs 01b7ae42329840cc9433f5a6eae47da8 --project ci\n", "> 2021-05-24 01:05:45,420 [info] run executed, status=completed\n", "> 2021-05-24 01:05:45,856 [info] starting run test uid=6118eedd515048c9a10adffe1f9d19eb DB=http://mlrun-api:8080\n", "> 2021-05-24 01:05:45,968 [info] Job is running in the background, pod: test-hn4xg\n", "> 2021-05-24 01:05:51,291 [info] run executed, status=completed\n", "final state: completed\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
projectuiditerstartstatenamelabelsinputsparametersresultsartifacts
ci0May 24 01:05:50completedtest
kind=job
owner=jovyan
host=test-hn4xg
models_path
test_set
label_column=label
accuracy=0.9777777777777777
test-error=0.022222222222222223
auc-micro=0.9985185185185185
auc-weighted=0.9985392720306513
f1-score=0.9769016328156113
precision_score=0.9761904761904763
recall_score=0.9791666666666666
confusion-matrix
precision-recall-multiclass
roc-multiclass
test_set_preds
\n", "
\n", "
\n", "
\n", " Title\n", " ×\n", "
\n", " \n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "to track results use .show() or .logs() or in CLI: \n", "!mlrun get run 6118eedd515048c9a10adffe1f9d19eb --project ci , !mlrun logs 6118eedd515048c9a10adffe1f9d19eb --project ci\n", "> 2021-05-24 01:05:53,282 [info] run executed, status=completed\n", "pipeline run finished\n", "status name uid results\n", "--------- --------- -------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", "completed prep_data ..b9cce0 num_rows=150\n", "completed train ..e47da8 accuracy=0.9375,test-error=0.0625,auc-micro=0.9921875,auc-weighted=1.0,f1-score=0.9206349206349206,precision_score=0.9047619047619048,recall_score=0.9555555555555556\n", "completed test ..9d19eb accuracy=0.9777777777777777,test-error=0.022222222222222223,auc-micro=0.9985185185185185,auc-weighted=0.9985392720306513,f1-score=0.9769016328156113,precision_score=0.9761904761904763,recall_score=0.9791666666666666\n" ] }, { "data": { "text/html": [ "

Run Results

pipeline run finished
click the hyper links below to see detailed results
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
uidstartstatenameresultsartifacts
May 24 01:05:30completedprep_data
num_rows=150
cleaned_data
May 24 01:05:41completedtrain
accuracy=0.9375
test-error=0.0625
auc-micro=0.9921875
auc-weighted=1.0
f1-score=0.9206349206349206
precision_score=0.9047619047619048
recall_score=0.9555555555555556
test_set
confusion-matrix
precision-recall-multiclass
roc-multiclass
model
May 24 01:05:50completedtest
accuracy=0.9777777777777777
test-error=0.022222222222222223
auc-micro=0.9985185185185185
auc-weighted=0.9985392720306513
f1-score=0.9769016328156113
precision_score=0.9761904761904763
recall_score=0.9791666666666666
confusion-matrix
precision-recall-multiclass
roc-multiclass
test_set_preds
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "> 2021-05-24 01:05:54,489 [info] Starting remote function deploy\n", "2021-05-24 01:05:54 (info) Deploying function\n", "2021-05-24 01:05:54 (info) Building\n", "2021-05-24 01:05:54 (info) Staging files and preparing base images\n", "2021-05-24 01:05:54 (info) Building processor image\n", "2021-05-24 01:06:34 (info) Build complete\n", "2021-05-24 01:06:44 (info) Function deploy complete\n", "> 2021-05-24 01:06:45,777 [info] function deployed, address=192.168.65.4:30843\n", "model iris is deployed at http://192.168.65.4:30843\n" ] }, { "data": { "text/html": [ "model iris is deployed at http://192.168.65.4:30843" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "model iris test passed Ok\n" ] }, { "data": { "text/html": [ "model iris test passed Ok" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import json\n", "from mlrun.utils import RunNotifications\n", "import mlrun\n", "from mlrun.platforms import auto_mount\n", "\n", "project = \"ci\"\n", "mlrun.set_environment(project=project)\n", "\n", "# create notification object (console, Git, Slack as outputs) and push start message\n", "notifier = RunNotifications(with_slack=True).print()\n", "\n", "\n", "# Use the following line only when running inside Github actions or Gitlab CI.\n", "# The `GITHUB_TOKEN` environment variable be set automatically in Github Actions\n", "# When running from GitLab, set the `GIT_TOKEN` environment variable\n", "#notifier.git_comment()\n", "\n", "notifier.push_start_message(project)\n", "\n", "# define and run a local data prep function\n", "data_prep_func = mlrun.code_to_function(\"prep-data\", filename=\"./functions/prep_data.py\", kind=\"job\",\n", " image=\"mlrun/mlrun\", handler=\"prep_data\").apply(auto_mount())\n", "\n", "# Set the source-data URL\n", "source_url = 'https://s3.wasabisys.com/iguazio/data/iris/iris.data.raw.csv'\n", "prep_data_run = data_prep_func.run(name='prep_data', inputs={'source_url': source_url})\n", "\n", "# train the model using a library (hub://) function and the generated data\n", "train = mlrun.import_function('hub://sklearn_classifier').apply(auto_mount())\n", "train_run = train.run(name='train',\n", " inputs={'dataset': prep_data_run.outputs['cleaned_data']},\n", " params={'model_pkg_class': 'sklearn.linear_model.LogisticRegression',\n", " 'label_column': 'label'})\n", "\n", "# test the model using a library (hub://) function and the generated model\n", "test = mlrun.import_function('hub://test_classifier').apply(auto_mount())\n", "test_run = test.run(name=\"test\",\n", " params={\"label_column\": \"label\"},\n", " inputs={\"models_path\": train_run.outputs['model'],\n", " \"test_set\": train_run.outputs['test_set']})\n", "\n", "# push results via notification to Git, Slack, ..\n", "notifier.push_run_results([prep_data_run, train_run, test_run])\n", "\n", "# Create model serving function using the new model\n", "serve = mlrun.import_function('hub://v2_model_server').apply(auto_mount())\n", "model_name = 'iris'\n", "serve.add_model(model_name, model_path=train_run.outputs['model'])\n", "addr = serve.deploy()\n", "\n", "notifier.push(f\"model {model_name} is deployed at {addr}\")\n", "\n", "# test the model serving function\n", "inputs = [[5.1, 3.5, 1.4, 0.2],\n", " [7.7, 3.8, 6.7, 2.2]]\n", "my_data = json.dumps({'inputs': inputs})\n", "serve.invoke(f'v2/models/{model_name}/infer', my_data)\n", "\n", "notifier.push(f\"model {model_name} test passed Ok\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Done\n", "With a few lines of code we have successfully ran " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }