load_dataset package#

Submodules#

load_dataset.load_dataset module#

load_dataset.load_dataset.load_dataset(context: mlrun.execution.MLClientCtx, dataset: str, name: str = '', file_ext: str = 'parquet', params: dict = {})None[source]#

Loads a scikit-learn toy dataset for classification or regression

The following datasets are available (‘name’ : desription):

‘boston’ : boston house-prices dataset (regression) ‘iris’ : iris dataset (classification) ‘diabetes’ : diabetes dataset (regression) ‘digits’ : digits dataset (classification) ‘linnerud’ : linnerud dataset (multivariate regression) ‘wine’ : wine dataset (classification) ‘breast_cancer’ : breast cancer wisconsin dataset (classification)

The scikit-learn functions return a data bunch including the following items: - data the features matrix - target the ground truth labels - DESCR a description of the dataset - feature_names header for data

The features (and their names) are stored with the target labels in a DataFrame.

For further details see https://scikit-learn.org/stable/datasets/index.html#toy-datasets

Parameters
  • context – function execution context

  • dataset – name of the dataset to load

  • name – artifact name (defaults to dataset)

  • file_ext – output file_ext: parquet or csv

  • params – params of the sklearn load_data method

Module contents#