histogram_data_drift package#

Submodules#

histogram_data_drift.histogram_data_drift module#

class histogram_data_drift.histogram_data_drift.DataDriftClassifier(potential: float = 0.5, detected: float = 0.7)[source]#

Bases: object

Classify data drift numeric values into categorical status.

detected: float = 0.7#
potential: float = 0.5#
value_to_status(value: float) ResultStatusApp[source]#

Translate the numeric value into status category.

Parameters:

value – The numeric value of the data drift metric, between 0 and 1.

Returns:

ResultStatusApp according to the classification.

class histogram_data_drift.histogram_data_drift.HistogramDataDriftApplication(value_classifier: ValueClassifier | None = None, produce_json_artifact: bool = False, produce_plotly_artifact: bool = False)[source]#

Bases: ModelMonitoringApplicationBase

MLRun’s default data drift application for model monitoring.

The application expects tabular numerical data, and calculates three metrics over the shared features’ histograms. The metrics are calculated on features that have reference data from the training dataset. When there is no reference data (feature_stats), this application send a warning log and does nothing. The three metrics are:

  • Hellinger distance.

  • Total variance distance.

  • Kullback-Leibler divergence.

Each metric is calculated over all the features individually and the mean is taken as the metric value. The average of Hellinger and total variance distance is taken as the result.

The application can log two artifacts (disabled by default due to performance issues):

  • JSON with the general drift value per feature.

  • Plotly table with the various metrics and histograms per feature.

If you want to change the application defaults, such as the classifier or which artifacts to produce, you can either modify the downloaded source code file directly, or inherit from this class (in the same file), then deploy it as any other model monitoring application. Please make sure to keep the default application name. This ensures that the full functionality of the application, including the statistics view in the UI, is available.

NAME: Final[str] = 'histogram-data-drift'#
do_tracking(monitoring_context: MonitoringApplicationContext) list[ModelMonitoringApplicationResult | ModelMonitoringApplicationMetric | _ModelMonitoringApplicationStats][source]#

Calculate and return the data drift metrics, averaged over the features.

metrics: list[type[mlrun.model_monitoring.metrics.histogram_distance.HistogramDistanceMetric]] = [mlrun.model_monitoring.metrics.histogram_distance.HellingerDistance, mlrun.model_monitoring.metrics.histogram_distance.KullbackLeiblerDivergence, mlrun.model_monitoring.metrics.histogram_distance.TotalVarianceDistance]#
class histogram_data_drift.histogram_data_drift.HistogramDataDriftApplicationConstants[source]#

Bases: object

GENERAL_RESULT_NAME = 'general_drift'#
NAME = 'histogram-data-drift'#
exception histogram_data_drift.histogram_data_drift.InvalidMetricValueError[source]#

Bases: ValueError

exception histogram_data_drift.histogram_data_drift.InvalidThresholdValueError[source]#

Bases: ValueError

class histogram_data_drift.histogram_data_drift.ValueClassifier(*args, **kwargs)[source]#

Bases: Protocol

value_to_status(value: float) ResultStatusApp[source]#

Module contents#