auto_trainer package#

Submodules#

auto_trainer.auto_trainer module#

class auto_trainer.auto_trainer.KWArgsPrefixes[source]#

Bases: object

FIT = 'FIT_'#
MODEL_CLASS = 'CLASS_'#
TRAIN = 'TRAIN_'#
auto_trainer.auto_trainer.evaluate(context: mlrun.execution.MLClientCtx, model: str, dataset: mlrun.datastore.base.DataItem, drop_columns: Optional[List[str]] = None, label_columns: Optional[Union[str, List[str]]] = None, **kwargs)[source]#

Evaluating a model. Artifacts generated by the MLHandler.

Parameters
  • context – MLRun context.

  • model – The model Store path.

  • dataset – The dataset to evaluate the model on. Can be either a URI or a FeatureVector.

  • drop_columns – str or a list of strings that represent the columns to drop.

  • label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.

  • kwargs – Here you can pass keyword arguments to the predict function (PREDICT_ prefix is not required).

auto_trainer.auto_trainer.predict(context: mlrun.execution.MLClientCtx, model: str, dataset: mlrun.datastore.base.DataItem, drop_columns: Optional[Union[str, List[str], int, List[int]]] = None, label_columns: Optional[Union[str, List[str]]] = None, result_set: Optional[str] = None, **kwargs)[source]#

Predicting dataset by a model.

Parameters
  • context – MLRun context.

  • model – The model Store path.

  • dataset – The dataset to predict the model on. Can be either a URI, a FeatureVector or a sample in a shape of a list/dict. When passing a sample, pass the dataset as a field in params instead of inputs.

  • drop_columns – str/int or a list of strings/ints that represent the column names/indices to drop. When the dataset is a list/dict this parameter should be represented by integers.

  • label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.

  • result_set – The db key to set name of the prediction result and the filename. Default to ‘prediction’.

  • kwargs – Here you can pass keyword arguments to the predict function (PREDICT_ prefix is not required).

auto_trainer.auto_trainer.train(context: mlrun.execution.MLClientCtx, dataset: mlrun.datastore.base.DataItem, model_class: str, label_columns: Optional[Union[str, List[str]]] = None, drop_columns: Optional[List[str]] = None, model_name: str = 'model', tag: str = '', sample_set: Optional[mlrun.datastore.base.DataItem] = None, test_set: Optional[mlrun.datastore.base.DataItem] = None, train_test_split_size: Optional[float] = None, random_state: Optional[int] = None, labels: Optional[dict] = None, **kwargs)[source]#

Training a model with the given dataset.

example:

import mlrun
project = mlrun.get_or_create_project("my-project")
project.set_function("hub://auto_trainer", "train")
trainer_run = project.run(
    name="train",
    handler="train",
    inputs={"dataset": "./path/to/dataset.csv"},
    params={
        "model_class": "sklearn.linear_model.LogisticRegression",
        "label_columns": "label",
        "drop_columns": "id",
        "model_name": "my-model",
        "tag": "v1.0.0",
        "sample_set": "./path/to/sample_set.csv",
        "test_set": "./path/to/test_set.csv",
        "CLASS_solver": "liblinear",
    },
)
Parameters
  • context – MLRun context

  • dataset – The dataset to train the model on. Can be either a URI or a FeatureVector

  • model_class – The class of the model, e.g. sklearn.linear_model.LogisticRegression

  • label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.

  • drop_columns – str or a list of strings that represent the columns to drop

  • model_name – The model’s name to use for storing the model artifact, default to ‘model’

  • tag – The model’s tag to log with

  • sample_set – A sample set of inputs for the model for logging its stats along the model in favour of model monitoring. Can be either a URI or a FeatureVector

  • test_set – The test set to train the model with.

  • train_test_split_size – if test_set was provided then this argument is ignored. Should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. The size of the Training set is set to the complement of this value. Default = 0.2

  • random_state – Relevant only when using train_test_split_size. A random state seed to shuffle the data. For more information, see: https://scikit-learn.org/stable/glossary.html#term-random_state Notice that here we only pass integer values.

  • labels – Labels to log with the model

  • kwargs – Here you can pass keyword arguments with prefixes, that will be parsed and passed to the relevant function, by the following prefixes: - CLASS_ - for the model class arguments - FIT_ - for the fit function arguments - TRAIN_ - for the train function (in xgb or lgbm train function - future)

Module contents#