describe package
Contents
describe package#
Submodules#
describe.describe module#
- describe.describe.analyze(context: mlrun.execution.MLClientCtx, name: str = 'dataset', table: Optional[Union[mlrun.feature_store.FeatureSet, mlrun.datastore.base.DataItem]] = None, label_column: Optional[str] = None, plots_dest: str = 'plots', random_state: int = 1, problem_type: str = 'classification', dask_key: str = 'dask_key', dask_function: Optional[str] = None, dask_client=None) → None[source]#
The function will output the following artifacts per column within the data frame (based on data types) If the data has more than 500,000 sample we sample randomly 500,000 samples:
describe csv histograms scatter-2d violin chart correlation-matrix chart correlation-matrix csv imbalance pie chart imbalance-weights-vec csv
- Parameters
context – The function context
name – Key of dataset to database (“dataset” for default)
table – MLRun input pointing to pandas dataframe (csv/parquet file path) or FeatureSet as param
label_column – Ground truth column label
plots_dest – Destination folder of summary plots (relative to artifact_path) (“plots” for default)
random_state – When the table has more than 500,000 samples, we sample randomly 500,000 samples
- :param problem_type The type of the ML problem the data facing - regression, classification or None
(classification for default)
- Parameters
dask_key – Key of dataframe in dask client “datasets” attribute
dask_function – Dask function url (db://..)
dask_client – Dask client object