auto_trainer package
Contents
auto_trainer package#
Submodules#
auto_trainer.auto_trainer module#
- class auto_trainer.auto_trainer.KWArgsPrefixes[source]#
Bases:
object
- FIT = 'FIT_'#
- MODEL_CLASS = 'CLASS_'#
- TRAIN = 'TRAIN_'#
- auto_trainer.auto_trainer.evaluate(context: mlrun.execution.MLClientCtx, model: str, dataset: mlrun.datastore.base.DataItem, drop_columns: Optional[List[str]] = None, label_columns: Optional[Union[str, List[str]]] = None, **kwargs)[source]#
Evaluating a model. Artifacts generated by the MLHandler.
- Parameters
context – MLRun context.
model – The model Store path.
dataset – The dataset to evaluate the model on. Can be either a URI or a FeatureVector.
drop_columns – str or a list of strings that represent the columns to drop.
label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.
kwargs – Here you can pass keyword arguments to the predict function (PREDICT_ prefix is not required).
- auto_trainer.auto_trainer.predict(context: mlrun.execution.MLClientCtx, model: str, dataset: mlrun.datastore.base.DataItem, drop_columns: Optional[Union[str, List[str], int, List[int]]] = None, label_columns: Optional[Union[str, List[str]]] = None, result_set: Optional[str] = None, **kwargs)[source]#
Predicting dataset by a model.
- Parameters
context – MLRun context.
model – The model Store path.
dataset – The dataset to predict the model on. Can be either a URI, a FeatureVector or a sample in a shape of a list/dict. When passing a sample, pass the dataset as a field in params instead of inputs.
drop_columns – str/int or a list of strings/ints that represent the column names/indices to drop. When the dataset is a list/dict this parameter should be represented by integers.
label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.
result_set – The db key to set name of the prediction result and the filename. Default to ‘prediction’.
kwargs – Here you can pass keyword arguments to the predict function (PREDICT_ prefix is not required).
- auto_trainer.auto_trainer.train(context: mlrun.execution.MLClientCtx, dataset: mlrun.datastore.base.DataItem, model_class: str, label_columns: Optional[Union[str, List[str]]] = None, drop_columns: Optional[List[str]] = None, model_name: str = 'model', tag: str = '', sample_set: Optional[mlrun.datastore.base.DataItem] = None, test_set: Optional[mlrun.datastore.base.DataItem] = None, train_test_split_size: Optional[float] = None, random_state: Optional[int] = None, labels: Optional[dict] = None, **kwargs)[source]#
Training a model with the given dataset.
example:
import mlrun project = mlrun.get_or_create_project("my-project") project.set_function("hub://auto_trainer", "train") trainer_run = project.run( name="train", handler="train", inputs={"dataset": "./path/to/dataset.csv"}, params={ "model_class": "sklearn.linear_model.LogisticRegression", "label_columns": "label", "drop_columns": "id", "model_name": "my-model", "tag": "v1.0.0", "sample_set": "./path/to/sample_set.csv", "test_set": "./path/to/test_set.csv", "CLASS_solver": "liblinear", }, )
- Parameters
context – MLRun context
dataset – The dataset to train the model on. Can be either a URI or a FeatureVector
model_class – The class of the model, e.g. sklearn.linear_model.LogisticRegression
label_columns – The target label(s) of the column(s) in the dataset. for Regression or Classification tasks. Mandatory when dataset is not a FeatureVector.
drop_columns – str or a list of strings that represent the columns to drop
model_name – The model’s name to use for storing the model artifact, default to ‘model’
tag – The model’s tag to log with
sample_set – A sample set of inputs for the model for logging its stats along the model in favour of model monitoring. Can be either a URI or a FeatureVector
test_set – The test set to train the model with.
train_test_split_size – if test_set was provided then this argument is ignored. Should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. The size of the Training set is set to the complement of this value. Default = 0.2
random_state – Relevant only when using train_test_split_size. A random state seed to shuffle the data. For more information, see: https://scikit-learn.org/stable/glossary.html#term-random_state Notice that here we only pass integer values.
labels – Labels to log with the model
kwargs – Here you can pass keyword arguments with prefixes, that will be parsed and passed to the relevant function, by the following prefixes: - CLASS_ - for the model class arguments - FIT_ - for the fit function arguments - TRAIN_ - for the train function (in xgb or lgbm train function - future)