Feature Selection

Contents

Feature Selection#

import mlrun
import os
> 2025-03-06 10:55:11,680 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-03-06 10:55:13,566 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"1.8.0"}
project = mlrun.get_or_create_project("feature-selection",'./')
> 2025-03-06 10:55:14,686 [info] Loading project from path: {"path":"./","project_name":"feature-selection","user_project":false}
> 2025-03-06 10:55:14,726 [warning] Project name mismatch, fhub-v2 != feature-selection, project is loaded from fhub-v2 project yaml. To prevent/allow this, you can take one of the following actions:
1. Set the `allow_cross_project=True` when loading the project.
2. Delete the existing project yaml, or ensure its name is equal to feature-selection.
3. Use different project context dir.
Project name='feature-selection' is different than specified on the context's project yaml. This behavior is deprecated and will not be supported from version 1.9.0.
> 2025-03-06 10:55:29,474 [info] Project loaded successfully: {"path":"./","project_name":"feature-selection","stored_in_db":true}

Local Test#

feature_selection = mlrun.import_function("function.yaml")
fs = feature_selection.run(handler="feature_selection", 
                      params={'k': 2,
                       'min_votes': 0.3,
                       'label_column': 'is_error'},
                      inputs={'df_artifact': os.path.abspath('data/metrics.pq')},
                      artifact_path=os.path.join(os.path.abspath('./'), 'artifacts'), local=True)
> 2025-03-06 10:59:27,279 [info] Storing function: {"db":null,"name":"feature-selection-feature-selection","uid":"fdcbc4e3f5c44769be5e64425f10aed8"}
> 2025-03-06 10:59:30,808 [info] votes needed to be selected: 2
/User/.pythonlibs/mlrun-extended/lib/python3.9/site-packages/mlrun/artifacts/dataset.py:387: RuntimeWarning:

Converting input from bool to <class 'numpy.uint8'> for compatibility.
project uid iter start end state kind name labels inputs parameters results artifact_uris
feature-selection 0 Mar 06 10:59:27 NaT completed run feature-selection-feature-selection
v3io_user=iguazio
kind=local
owner=iguazio
host=jupyter-75c4d4bf58-gkz6c
df_artifact
k=2
min_votes=0.3
label_column=is_error
f_classif=store://artifacts/feature-selection/feature-selection-feature-selection_f_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^eaf3bb0877b7a8502365fc3a919a6ba295febdd9
mutual_info_classif=store://artifacts/feature-selection/feature-selection-feature-selection_mutual_info_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^8d572b32e4d65007b048aa0b51899c0ebcb45219
chi2=store://artifacts/feature-selection/feature-selection-feature-selection_chi2#0@fdcbc4e3f5c44769be5e64425f10aed8^5775ae9bbecb726fbe4d229548b82d920a43526e
f_regression=store://artifacts/feature-selection/feature-selection-feature-selection_f_regression#0@fdcbc4e3f5c44769be5e64425f10aed8^90cd856fcf72b5daac1822b537dbef1ea504f6ef
LinearSVC=store://artifacts/feature-selection/feature-selection-feature-selection_LinearSVC#0@fdcbc4e3f5c44769be5e64425f10aed8^9723935a28b9ea141e7aaf6f242109ce9383eae0
LogisticRegression=store://artifacts/feature-selection/feature-selection-feature-selection_LogisticRegression#0@fdcbc4e3f5c44769be5e64425f10aed8^852483e49d4372edd493a130f7fd4aa85666af36
ExtraTreesClassifier=store://artifacts/feature-selection/feature-selection-feature-selection_ExtraTreesClassifier#0@fdcbc4e3f5c44769be5e64425f10aed8^0f32d3f338d4a7692859d06443c5e8499773a651
feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^9abe27f14a189c6ff34456a29403fabdb54113f2
max_scaled_scores_feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_max_scaled_scores_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^1910b21802b1d6a0aa412fca4f93cecbc8c755b4
selected_features_count=store://datasets/feature-selection/feature-selection-feature-selection_selected_features_count#0@fdcbc4e3f5c44769be5e64425f10aed8^e06baafaa333c72865103b3f3645da60367f9b85
selected_features=store://datasets/feature-selection/feature-selection-feature-selection_selected_features#0@fdcbc4e3f5c44769be5e64425f10aed8^7af89e5113ae0dce4c8b90f4a52cf57500d795b5

> to track results use the .show() or .logs() methods or click here to open in UI
> 2025-03-06 10:59:31,131 [info] Run execution finished: {"name":"feature-selection-feature-selection","status":"completed"}
mlrun.get_dataitem(fs.spec.inputs['df_artifact']).as_df()
cpu_utilization latency packet_loss throughput is_error
timestamp company data_center device
2021-04-27 14:46:46.780 Smith_Group Denise_Crest 5124209057231 75.598891 0.000000 0.000000 252.445971 False
2891755865712 50.090373 3.280849 0.000000 229.889187 False
Debra_Gateway 0388020295311 73.243063 9.372341 2.170138 260.883807 False
9633813691441 60.830420 12.241878 2.295717 244.238613 False
Ferrell_Ltd Murphy_Meadow 1517129765931 72.647964 0.535463 0.000000 212.944943 False
... ... ... ... ... ... ... ... ...
2021-04-27 15:46:46.780 Smith_Group Debra_Gateway 9633813691441 77.875954 3.250584 0.000000 245.150281 False
Ferrell_Ltd Murphy_Meadow 1517129765931 77.831459 0.000000 0.000000 235.109321 False
6964486699383 55.978514 2.977447 0.533963 277.622402 False
Nicholas_Estate 8002897098167 58.265446 4.090207 2.048268 272.717982 False
8499880735104 71.245041 0.000000 2.929407 235.659211 False

5768 rows × 5 columns

mlrun.get_dataitem(fs.outputs['feature_scores']).as_df()
f_classif mutual_info_classif chi2 f_regression LinearSVC LogisticRegression ExtraTreesClassifier
cpu_utilization 2520.015809 0.180451 4457.429360 2520.015809 -0.037533 -0.178632 0.023102
latency 10152.151995 0.199853 272872.890194 10152.151995 0.023732 0.104103 0.023102
packet_loss 14120.490547 0.210081 157191.427524 14120.490547 0.047873 0.214212 0.023102
throughput 20421.721030 0.231438 109129.511665 20421.721030 -0.014421 -0.084308 0.023102