MLRun Auto-Trainer Tutorial#

This notebook shows how to use the handlers of the MLRun’s Auto-trainer. the following handlers are:

  • train

  • evaluate

  • predict

All you need is simply ML model type and a dataset.

import mlrun
mlrun.get_or_create_project('auto-trainer', context="./", user_project=True)

Fetching a Dataset#

To generate the dataset we used the “gen_class_data” function from the hub, which wraps scikit-learn’s make_classification.
See the link for a description of all parameters.

DATASET_URL = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'
mlrun.get_dataitem(DATASET_URL).show()

Importing the MLhandlers functions from the Marketplace#

auto_trainer = mlrun.import_function("hub://auto_trainer")

Training a model#

Choosing the train handler

Define task parameters¶#

  • Class parameters should contain the prefix CLASS_

  • Fit parameters should contain the prefix FIT_

  • Predict parameters should contain the prefix PREDICT_

model_class = "sklearn.ensemble.RandomForestClassifier"
additional_parameters = {
    "CLASS_max_depth": 8,
}

Running the Training job with the “train” handler#

train_run = auto_trainer.run(
    inputs={"dataset": DATASET_URL},
    params = {
        "model_class": model_class,
        "drop_columns": ["feat_0", "feat_2"],
        "train_test_split_size": 0.2,
        "random_state": 42,
        "label_columns": "labels",
        "model_name": 'MyModel',
        **additional_parameters
    }, 
    handler='train',
    local=True
)

The result of the train run#

train_run.outputs
train_run.artifact('confusion-matrix').show()

Getting the model for evaluating and predicting#

model_path = train_run.outputs['model']

Evaluating a model#

Choosing the evaluate handler

evaluate_run = auto_trainer.run(
    inputs={"dataset": train_run.outputs['test_set']},
    params={
        "model": model_path,
        "drop_columns": ["feat_0", "feat_2"], # Not actually necessary on the test set (already done in the previous step)
        "label_columns": "labels",
    },
    handler="evaluate",
    local=True,
)

The result of the evaluate run#

evaluate_run.outputs

Making a prediction#

Choosing the predict handler. For predicting from a simple sample (a list of lists,dict) pass the dataset as a param.

sample = mlrun.get_dataitem(DATASET_URL).as_df().head().drop("labels", axis=1)
sample = sample.values.tolist()
predict_run = auto_trainer.run(
    params={
        "dataset": sample,
        "model": model_path,
        "drop_columns": [0, 2],
        "label_columns": "labels",
    },
    handler="predict",
    local=True,
)

Showing the predeiction results#

predict_run.outputs
predict_run.artifact('prediction').show()

Back to the top