V2 Model Server (SKLearn)#

Test one or more classifier models against held-out dataset.
Using held-out test features, evaluates the peformance of the estimated model.
Can be part of a kubeflow pipeline as a test step that is run post EDA and training/validation cycles.
This function is part of the scikit-learn-pipeline demo.
To see how the model is trained or how the data-set is generated, check out sklearn_classifier function from the function marketplace repository

import warnings
warnings.filterwarnings("ignore")

Setup function parameters#

data_path = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/sklearn_classifier/iris_dataset.csv'
models_path = 'https://s3.wasabisys.com/iguazio/models/function-marketplace-models/test_classifier/RandomForestClassifier.pkl'

Importing the function#

import mlrun
mlrun.set_environment(project='function-marketplace')

# Importing the function from the hub
fn = mlrun.import_function("hub://v2_model_server")
fn.apply(mlrun.auto_mount())

# Adding the model 
fn.add_model(key='RandomForestClassifier', model_path=models_path ,class_name='ClassifierModel')

> 2021-10-17 14:04:23,167 [info] loaded project function-marketplace from MLRun DB

<mlrun.serving.states.TaskStep at 0x7f95f58e5f50>

Testing the function locally#

Test against the iris dataset

# When mocking, class has to be present
from v2_model_server import *

# Mocking function
server = fn.to_mock_server()

> 2021-10-17 14:04:26,871 [info] model RandomForestClassifier was loaded
> 2021-10-17 14:04:26,872 [info] Initializing endpoint records
> 2021-10-17 14:04:26,899 [info] Loaded ['RandomForestClassifier']

# Getting the data
import pandas as pd

iris_dataset = pd.read_csv(data_path)
iris_dataset.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

# KFServing protocol event
event_data = {"inputs": iris_dataset.drop(['label'],axis=1).values.tolist()}

response = server.test(path='/v2/models/RandomForestClassifier/predict',body=event_data)

print(f'When mocking to server, returned dict has the following fields : {", ".join([x for x in response.keys()])}')
print(f"model's accuracy { sum(1 for x,y in zip(iris_dataset['label'],response['outputs']) if x == y) / len(response['outputs'])}")

When mocking to server, returned dict has the following fields : id, model_name, outputs
model's accuracy 0.9733333333333334

Testing the function remotely#

address = fn.deploy()

> 2021-10-17 14:04:27,617 [info] Starting remote function deploy
2021-10-17 14:04:27  (info) Deploying function
2021-10-17 14:04:27  (info) Building
2021-10-17 14:04:27  (info) Staging files and preparing base images
2021-10-17 14:04:27  (info) Building processor image
2021-10-17 14:04:29  (info) Build complete
> 2021-10-17 14:04:39,180 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-function-marketplace-v2-model-server.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['default-tenant.app.dev39.lab.iguazeng.com:31003']}

import json
import requests

# Made up data
my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''

# using requests to predict
response = requests.put(address + "/v2/models/RandomForestClassifier/predict", json=json.dumps(my_data))
response.text

'{"id": "ac6be063-b05f-4276-972b-5e0acb96dfd9", "model_name": "RandomForestClassifier", "outputs": [0, 2]}'

Back to the top

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

V2 Model Server (SKLearn)