aggregate package
Contents
aggregate package#
Submodules#
aggregate.aggregate module#
- aggregate.aggregate.aggregate(context, df_artifact: Union[mlrun.datastore.base.DataItem, pandas.core.frame.DataFrame], save_to: str = 'aggregated-df.pq', keys: Optional[list] = None, metrics: Optional[list] = None, labels: Optional[list] = None, metric_aggregations: list = ['mean'], label_aggregations: list = ['max'], suffix: str = '', window: int = 3, center: bool = False, inplace: bool = False, drop_na: bool = True, files_to_select: int = 1)[source]#
Time-series aggregation function
Will perform a rolling aggregation on {df_artifact}, over {window} by the selected {keys} applying {metric_aggregations} on {metrics} and {label_aggregations} on {labels}. adding {suffix} to the feature names.
if not {inplace}, will return the original {df_artifact}, joined by the aggregated result.
- Parameters
context – After running a job, you need to be able to track it. To gain the maximum value, MLRun uses the job context object inside the code. This provides access to job metadata, parameters, inputs, secrets, and API for logging and monitoring the results, as well as log text, files, artifacts, and labels.
df_artifact – MLRun input pointing to pandas dataframe (csv/parquet file path) or a directory containing parquet files. * When given a directory the latest {files_to_select} will be selected
save_to – Where to save the result dataframe. * If relative will add to the {artifact_path}
keys – Subset of indexes from the source dataframe to aggregate by (default=all)
metrics – Array containing a list of metrics to run the aggregations on. (default=None)
labels – Array containing a list of labels to run the aggregations on. (default=None)
metric_aggregations – Array containing a list of aggregation function names to run on {metrics}. (Ex: ‘mean’, ‘std’) (default=’mean’)
label_aggregations – Array containing a list of aggregation function names to run on {metrics}. (Ex: ‘max’, ‘min’) (default=’max’)
suffix – Suffix to add to the feature name, E.g: <Feature_Name>_<Agg_Function>_<Suffix> (Ex: ‘last_60_minutes’) (default=’’)
window – Window size to perform the rolling aggregate on. (default=3)
center – If True, Sets the value for the central sample in the window, If False, will set the value to the last sample. (default=False)
inplace – If True, will return only the aggregated results. If False, will join the aggregated results with the original dataframe
drop_na – Will drop na lines due to the Rolling.
files_to_select – Specifies the number of latest files to select (and concat) for aggregation.