describe package#

Submodules#

describe.describe module#

describe.describe.analyze(context: MLClientCtx, name: str = 'dataset', table: FeatureSet | DataItem | None = None, label_column: str | None = None, plots_dest: str = 'plots', random_state: int = 1, problem_type: str = 'classification', dask_key: str = 'dask_key', dask_function: str | None = None, dask_client=None) → None[source]#

The function will output the following artifacts per column within the data frame (based on data types) If the data has more than 500,000 sample we sample randomly 500,000 samples:

describe csv histograms scatter-2d violin chart correlation-matrix chart correlation-matrix csv imbalance pie chart imbalance-weights-vec csv

Parameters:

context – The function context
name – Key of dataset to database (“dataset” for default)
table – MLRun input pointing to pandas dataframe (csv/parquet file path) or FeatureSet as param
label_column – Ground truth column label
plots_dest – Destination folder of summary plots (relative to artifact_path) (“plots” for default)
random_state – When the table has more than 500,000 samples, we sample randomly 500,000 samples

:param problem_type The type of the ML problem the data facing - regression, classification or None: (classification for default)

Parameters:

dask_key – Key of dataframe in dask client “datasets” attribute
dask_function – Dask function url (db://..)
dask_client – Dask client object

describe package

Contents

describe package#

Submodules#

describe.describe module#

Module contents#