describe package#

Submodules#

describe.describe module#

describe.describe.analyze(context: mlrun.execution.MLClientCtx, name: str = 'dataset', table: Optional[Union[mlrun.feature_store.FeatureSet, mlrun.datastore.base.DataItem]] = None, label_column: Optional[str] = None, plots_dest: str = 'plots', random_state: int = 1, problem_type: str = 'classification', dask_key: str = 'dask_key', dask_function: Optional[str] = None, dask_client=None)None[source]#

The function will output the following artifacts per column within the data frame (based on data types) If the data has more than 500,000 sample we sample randomly 500,000 samples:

describe csv histograms scatter-2d violin chart correlation-matrix chart correlation-matrix csv imbalance pie chart imbalance-weights-vec csv

Parameters
  • context – The function context

  • name – Key of dataset to database (“dataset” for default)

  • table – MLRun input pointing to pandas dataframe (csv/parquet file path) or FeatureSet as param

  • label_column – Ground truth column label

  • plots_dest – Destination folder of summary plots (relative to artifact_path) (“plots” for default)

  • random_state – When the table has more than 500,000 samples, we sample randomly 500,000 samples

:param problem_type The type of the ML problem the data facing - regression, classification or None

(classification for default)

Parameters
  • dask_key – Key of dataframe in dask client “datasets” attribute

  • dask_function – Dask function url (db://..)

  • dask_client – Dask client object

Module contents#