V2 Model Server (SKLearn)#

Test one or more classifier models against held-out dataset.
Using held-out test features, evaluates the peformance of the estimated model.
Can be part of a kubeflow pipeline as a test step that is run post EDA and training/validation cycles.
This function is part of the scikit-learn-pipeline demo.
To see how the model is trained or how the data-set is generated, check out sklearn_classifier function from the function marketplace repository

Steps#

  1. Setup function parameters

  2. Importing the function

  3. Testing the function locally

  4. Testing the function remotely

import warnings
warnings.filterwarnings("ignore")

Setup function parameters#

data_path = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/sklearn_classifier/iris_dataset.csv'
models_path = 'https://s3.wasabisys.com/iguazio/models/function-marketplace-models/test_classifier/RandomForestClassifier.pkl'

Importing the function#

import mlrun
mlrun.set_environment(project='function-marketplace')

# Importing the function from the hub
fn = mlrun.import_function("hub://v2_model_server")
fn.apply(mlrun.auto_mount())

# Adding the model 
fn.add_model(key='RandomForestClassifier', model_path=models_path ,class_name='ClassifierModel')
> 2021-10-17 14:04:23,167 [info] loaded project function-marketplace from MLRun DB
<mlrun.serving.states.TaskStep at 0x7f95f58e5f50>

Testing the function locally#

Test against the iris dataset

# When mocking, class has to be present
from v2_model_server import *

# Mocking function
server = fn.to_mock_server()
> 2021-10-17 14:04:26,871 [info] model RandomForestClassifier was loaded
> 2021-10-17 14:04:26,872 [info] Initializing endpoint records
> 2021-10-17 14:04:26,899 [info] Loaded ['RandomForestClassifier']
# Getting the data
import pandas as pd

iris_dataset = pd.read_csv(data_path)
iris_dataset.head()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) label
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
# KFServing protocol event
event_data = {"inputs": iris_dataset.drop(['label'],axis=1).values.tolist()}
response = server.test(path='/v2/models/RandomForestClassifier/predict',body=event_data)
print(f'When mocking to server, returned dict has the following fields : {", ".join([x for x in response.keys()])}')
print(f"model's accuracy { sum(1 for x,y in zip(iris_dataset['label'],response['outputs']) if x == y) / len(response['outputs'])}")
When mocking to server, returned dict has the following fields : id, model_name, outputs
model's accuracy 0.9733333333333334

Testing the function remotely#

address = fn.deploy()
> 2021-10-17 14:04:27,617 [info] Starting remote function deploy
2021-10-17 14:04:27  (info) Deploying function
2021-10-17 14:04:27  (info) Building
2021-10-17 14:04:27  (info) Staging files and preparing base images
2021-10-17 14:04:27  (info) Building processor image
2021-10-17 14:04:29  (info) Build complete
> 2021-10-17 14:04:39,180 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-function-marketplace-v2-model-server.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['default-tenant.app.dev39.lab.iguazeng.com:31003']}
import json
import requests

# Made up data
my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''

# using requests to predict
response = requests.put(address + "/v2/models/RandomForestClassifier/predict", json=json.dumps(my_data))
response.text
'{"id": "ac6be063-b05f-4276-972b-5e0acb96dfd9", "model_name": "RandomForestClassifier", "outputs": [0, 2]}'

Back to the top