gen_class_data package#

Submodules#

gen_class_data.gen_class_data module#

gen_class_data.gen_class_data.gen_class_data(context: MLClientCtx, n_samples: int, m_features: int, k_classes: int, header: List[str] | None, label_column: str | None = 'labels', weight: float = 0.5, random_state: int = 1, key: str = 'classifier-data', file_ext: str = 'parquet', sk_params={})[source]#

Create a binary classification sample dataset and save. If no filename is given it will default to: “simdata-{n_samples}X{m_features}.parquet”.

Additional scikit-learn parameters can be set using **sk_params, please see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html for more details.

Parameters:
  • context – function context

  • n_samples – number of rows/samples

  • m_features – number of cols/features

  • k_classes – number of classes

  • header – header for features array

  • label_column – column name of ground-truth series

  • weight – fraction of sample negative value (ground-truth=0)

  • random_state – rng seed (see https://scikit-learn.org/stable/glossary.html#term-random-state)

  • key – key of data in artifact store

  • file_ext – (pqt) extension for parquet file

  • sk_params – additional parameters for sklearn.datasets.make_classification

Module contents#