# MLRun Auto-Trainer Tutorial

This notebook shows how to use the handlers of the MLRun's Auto-trainer.
the following handlers are:
- `train`
- `evaluate`
- `predict`

All you need is simply **ML model type** and a **dataset**.

In [None]:
import mlrun

In [None]:
mlrun.get_or_create_project('auto-trainer', context="./", user_project=True)

### **Fetching a Dataset**

To generate the dataset we used the "gen_class_data" function from the hub, 
which wraps scikit-learn's [make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn-datasets-make-classification).<br> 
See the link for a description of all parameters.

In [None]:
DATASET_URL = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/xgb_trainer/classifier-data.csv'

In [None]:
mlrun.get_dataitem(DATASET_URL).show()

### **Importing the MLhandlers functions from the Marketplace**

In [None]:
auto_trainer = mlrun.import_function("hub://auto_trainer")

### **Training a model**

Choosing the `train` handler

#### Define task parametersÂ¶
* Class parameters should contain the prefix `CLASS_`
* Fit parameters should contain the prefix `FIT_`
* Predict parameters should contain the prefix `PREDICT_`

In [None]:
model_class = "sklearn.ensemble.RandomForestClassifier"
additional_parameters = {
    "CLASS_max_depth": 8,
}

#### Running the Training job with the "train" handler

In [None]:
train_run = auto_trainer.run(
    inputs={"dataset": DATASET_URL},
    params = {
        "model_class": model_class,
        "drop_columns": ["feat_0", "feat_2"],
        "train_test_split_size": 0.2,
        "random_state": 42,
        "label_columns": "labels",
        "model_name": 'MyModel',
        **additional_parameters
    }, 
    handler='train',
    local=True
)

#### The result of the train run

In [None]:
train_run.outputs

In [None]:
train_run.artifact('confusion-matrix').show()

#### Getting the model for evaluating and predicting

In [None]:
model_path = train_run.outputs['model']

### **Evaluating a model**

Choosing the `evaluate` handler

In [None]:
evaluate_run = auto_trainer.run(
    inputs={"dataset": train_run.outputs['test_set']},
    params={
        "model": model_path,
        "drop_columns": ["feat_0", "feat_2"], # Not actually necessary on the test set (already done in the previous step)
        "label_columns": "labels",
    },
    handler="evaluate",
    local=True,
)

#### The result of the evaluate run

In [None]:
evaluate_run.outputs

### **Making a prediction**

Choosing the `predict` handler. For predicting from a simple sample (a `list` of `lists`,`dict`) pass the dataset as a `param`.

In [None]:
sample = mlrun.get_dataitem(DATASET_URL).as_df().head().drop("labels", axis=1)

In [None]:
sample = sample.values.tolist()

In [None]:
predict_run = auto_trainer.run(
    params={
        "dataset": sample,
        "model": model_path,
        "drop_columns": [0, 2],
        "label_columns": "labels",
    },
    handler="predict",
    local=True,
)

#### Showing the predeiction results

In [None]:
predict_run.outputs

In [None]:
predict_run.artifact('prediction').show()

[Back to the top](#XGBoost-trainer)