batch_inference_v2 package#

Submodules#

batch_inference_v2.batch_inference_v2 module#

batch_inference_v2.batch_inference_v2.infer(context: mlrun.execution.MLClientCtx, dataset: Union[mlrun.datastore.base.DataItem, list, dict, pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray], model_path: Union[str, mlrun.datastore.base.DataItem], drop_columns: Optional[Union[str, List[str], int, List[int]]] = None, label_columns: Optional[Union[str, List[str]]] = None, feature_columns: Optional[Union[str, List[str]]] = None, log_result_set: bool = True, result_set_name: str = 'prediction', batch_id: Optional[str] = None, artifacts_tag: str = '', perform_drift_analysis: Optional[bool] = None, endpoint_id: str = '', model_endpoint_name: str = 'batch-infer', model_endpoint_sample_set: Optional[Union[mlrun.datastore.base.DataItem, list, dict, pandas.core.frame.DataFrame, pandas.core.series.Series, numpy.ndarray]] = None, trigger_monitoring_job: Optional[bool] = None, batch_image_job: Optional[str] = None, model_endpoint_drift_threshold: Optional[float] = None, model_endpoint_possible_drift_threshold: Optional[float] = None, **predict_kwargs: Dict[str, Any])[source]#

Perform a prediction on the provided dataset using the specified model. Ensure that the model has already been logged under the current project.

If you wish to apply monitoring tools (e.g., drift analysis), set the perform_drift_analysis parameter to True. This will create a new model endpoint record under the specified model_endpoint_name. Additionally, ensure that model monitoring is enabled at the project level by calling the project.enable_model_monitoring() function. You can also apply monitoring to an existing model by providing its endpoint id or name, and the monitoring tools will be applied to that endpoint.

At the moment, this function is supported for mlrun>=1.5.0 versions.

Parameters
  • context – MLRun context.

  • dataset – The dataset to infer through the model. Provided as an input (DataItem) that represents Dataset artifact / Feature vector URI. If using MLRun SDK, dataset can also be provided as a list, dictionary or numpy array.

  • model_path – Model store uri (should start with store://). Provided as an input (DataItem). If using MLRun SDK, model_path can also be provided as a parameter (string). To generate a valid model store URI, please log the model before running this function. If endpoint_id of existing model endpoint is provided, make sure that it has a similar model store path, otherwise the drift analysis won’t be triggered.

  • drop_columns – A string / integer or a list of strings / integers that represent the column names / indices to drop. When the dataset is a list or a numpy array this parameter must be represented by integers.

  • label_columns – The target label(s) of the column(s) in the dataset for Regression or Classification tasks. The label column can be accessed from the model object, or the feature vector provided if available.

  • feature_columns – List of feature columns that will be used to build the dataframe when dataset is from type list or numpy array.

  • log_result_set – Whether to log the result set - a DataFrame of the given inputs concatenated with the predictions. Defaulted to True.

  • result_set_name – The db key to set name of the prediction result and the filename. Defaulted to ‘prediction’.

  • batch_id – The ID of the given batch (inference dataset). If None, it will be generated. Will be logged as a result of the run.

  • artifacts_tag – Tag to use for prediction set result artifact.

  • perform_drift_analysis – Whether to perform drift analysis between the sample set of the model object to the dataset given. By default, None, which means it will perform drift analysis if the model already has feature stats that are considered as a reference sample set. Performing drift analysis on a new endpoint id will generate a new model endpoint record.

  • endpoint_id – Model endpoint unique ID. If perform_drift_analysis was set, the endpoint_id will be used either to perform the analysis on existing model endpoint or to generate a new model endpoint record.

  • model_endpoint_name – If a new model endpoint is generated, the model name will be presented under this endpoint.

  • model_endpoint_sample_set – A sample dataset to give to compare the inputs in the drift analysis. Can be provided as an input (DataItem) or as a parameter (e.g. string, list, DataFrame). The default chosen sample set will always be the one who is set in the model artifact itself.

  • trigger_monitoring_job – Whether to trigger the batch drift analysis after the infer job.

  • batch_image_job – The image that will be used to register the monitoring batch job if not exist. By default, the image is mlrun/mlrun.

  • model_endpoint_drift_threshold – The threshold of which to mark drifts. Defaulted to 0.7.

  • model_endpoint_possible_drift_threshold – The threshold of which to mark possible drifts. Defaulted to 0.5.

raises MLRunInvalidArgumentError: if both model_path and endpoint_id are not provided

Module contents#