gen_class_data package#
Submodules#
gen_class_data.gen_class_data module#
- gen_class_data.gen_class_data.gen_class_data(context: MLClientCtx, n_samples: int, m_features: int, k_classes: int, header: List[str] | None, label_column: str | None = 'labels', weight: float = 0.5, random_state: int = 1, key: str = 'classifier-data', file_ext: str = 'parquet', sk_params={})[source]#
Create a binary classification sample dataset and save. If no filename is given it will default to: “simdata-{n_samples}X{m_features}.parquet”.
Additional scikit-learn parameters can be set using **sk_params, please see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html for more details.
- Parameters:
context – function context
n_samples – number of rows/samples
m_features – number of cols/features
k_classes – number of classes
header – header for features array
label_column – column name of ground-truth series
weight – fraction of sample negative value (ground-truth=0)
random_state – rng seed (see https://scikit-learn.org/stable/glossary.html#term-random-state)
key – key of data in artifact store
file_ext – (pqt) extension for parquet file
sk_params – additional parameters for sklearn.datasets.make_classification