MLRun Auto-Trainer Tutorial
Contents
MLRun Auto-Trainer Tutorial#
This notebook shows how to use the handlers of the MLRun’s Auto-trainer. the following handlers are:
train
evaluate
predict
All you need is simply ML model type and a dataset.
import mlrun
mlrun.get_or_create_project('auto-trainer', context="./", user_project=True)
Fetching a Dataset#
To generate the dataset we used the “gen_class_data” function from the hub,
which wraps scikit-learn’s make_classification.
See the link for a description of all parameters.
DATASET_URL = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'
mlrun.get_dataitem(DATASET_URL).show()
Importing the MLhandlers functions from the Marketplace#
auto_trainer = mlrun.import_function("hub://auto_trainer")
Training a model#
Choosing the train
handler
Define task parameters¶#
Class parameters should contain the prefix
CLASS_
Fit parameters should contain the prefix
FIT_
Predict parameters should contain the prefix
PREDICT_
model_class = "sklearn.ensemble.RandomForestClassifier"
additional_parameters = {
"CLASS_max_depth": 8,
}
Running the Training job with the “train” handler#
train_run = auto_trainer.run(
inputs={"dataset": DATASET_URL},
params = {
"model_class": model_class,
"drop_columns": ["feat_0", "feat_2"],
"train_test_split_size": 0.2,
"random_state": 42,
"label_columns": "labels",
"model_name": 'MyModel',
**additional_parameters
},
handler='train',
local=True
)
The result of the train run#
train_run.outputs
train_run.artifact('confusion-matrix').show()
Getting the model for evaluating and predicting#
model_path = train_run.outputs['model']
Evaluating a model#
Choosing the evaluate
handler
evaluate_run = auto_trainer.run(
inputs={"dataset": train_run.outputs['test_set']},
params={
"model": model_path,
"drop_columns": ["feat_0", "feat_2"], # Not actually necessary on the test set (already done in the previous step)
"label_columns": "labels",
},
handler="evaluate",
local=True,
)
The result of the evaluate run#
evaluate_run.outputs
Making a prediction#
Choosing the predict
handler. For predicting from a simple sample (a list
of lists
,dict
) pass the dataset as a param
.
sample = mlrun.get_dataitem(DATASET_URL).as_df().head().drop("labels", axis=1)
sample = sample.values.tolist()
predict_run = auto_trainer.run(
params={
"dataset": sample,
"model": model_path,
"drop_columns": [0, 2],
"label_columns": "labels",
},
handler="predict",
local=True,
)
Showing the predeiction results#
predict_run.outputs
predict_run.artifact('prediction').show()
Back to the top