MLRun Auto-Trainer Tutorial#
This notebook shows how to use the handlers of the MLRun’s Auto-trainer. the following handlers are:
- train
- evaluate
- predict
All you need is simply ML model type and a dataset.
import mlrun
mlrun.get_or_create_project('auto-trainer', context="./", user_project=True)
Fetching a Dataset#
To generate the dataset we used the “gen_class_data” function from the hub,
which wraps scikit-learn’s make_classification.
See the link for a description of all parameters.
DATASET_URL = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'
mlrun.get_dataitem(DATASET_URL).show()
Importing the MLhandlers functions from the Marketplace#
auto_trainer = mlrun.import_function("hub://auto_trainer")
Training a model#
Choosing the train handler
Define task parameters¶#
- Class parameters should contain the prefix - CLASS_
- Fit parameters should contain the prefix - FIT_
- Predict parameters should contain the prefix - PREDICT_
model_class = "sklearn.ensemble.RandomForestClassifier"
additional_parameters = {
    "CLASS_max_depth": 8,
}
Running the Training job with the “train” handler#
train_run = auto_trainer.run(
    inputs={"dataset": DATASET_URL},
    params = {
        "model_class": model_class,
        "drop_columns": ["feat_0", "feat_2"],
        "train_test_split_size": 0.2,
        "random_state": 42,
        "label_columns": "labels",
        "model_name": 'MyModel',
        **additional_parameters
    }, 
    handler='train',
    local=True
)
The result of the train run#
train_run.outputs
train_run.artifact('confusion-matrix').show()
Getting the model for evaluating and predicting#
model_path = train_run.outputs['model']
Evaluating a model#
Choosing the evaluate handler
evaluate_run = auto_trainer.run(
    inputs={"dataset": train_run.outputs['test_set']},
    params={
        "model": model_path,
        "drop_columns": ["feat_0", "feat_2"], # Not actually necessary on the test set (already done in the previous step)
        "label_columns": "labels",
    },
    handler="evaluate",
    local=True,
)
The result of the evaluate run#
evaluate_run.outputs
Making a prediction#
Choosing the predict handler. For predicting from a simple sample (a list of lists,dict) pass the dataset as a param.
sample = mlrun.get_dataitem(DATASET_URL).as_df().head().drop("labels", axis=1)
sample = sample.values.tolist()
predict_run = auto_trainer.run(
    params={
        "dataset": sample,
        "model": model_path,
        "drop_columns": [0, 2],
        "label_columns": "labels",
    },
    handler="predict",
    local=True,
)
Showing the predeiction results#
predict_run.outputs
predict_run.artifact('prediction').show()