Feature Selection#
import mlrun
import os
> 2025-03-06 10:55:11,680 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-03-06 10:55:13,566 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"1.8.0"}
project = mlrun.get_or_create_project("feature-selection",'./')
> 2025-03-06 10:55:14,686 [info] Loading project from path: {"path":"./","project_name":"feature-selection","user_project":false}
> 2025-03-06 10:55:14,726 [warning] Project name mismatch, fhub-v2 != feature-selection, project is loaded from fhub-v2 project yaml. To prevent/allow this, you can take one of the following actions:
1. Set the `allow_cross_project=True` when loading the project.
2. Delete the existing project yaml, or ensure its name is equal to feature-selection.
3. Use different project context dir.
Project name='feature-selection' is different than specified on the context's project yaml. This behavior is deprecated and will not be supported from version 1.9.0.
> 2025-03-06 10:55:29,474 [info] Project loaded successfully: {"path":"./","project_name":"feature-selection","stored_in_db":true}
Local Test#
feature_selection = mlrun.import_function("function.yaml")
fs = feature_selection.run(handler="feature_selection",
params={'k': 2,
'min_votes': 0.3,
'label_column': 'is_error'},
inputs={'df_artifact': os.path.abspath('data/metrics.pq')},
artifact_path=os.path.join(os.path.abspath('./'), 'artifacts'), local=True)
> 2025-03-06 10:59:27,279 [info] Storing function: {"db":null,"name":"feature-selection-feature-selection","uid":"fdcbc4e3f5c44769be5e64425f10aed8"}
> 2025-03-06 10:59:30,808 [info] votes needed to be selected: 2
/User/.pythonlibs/mlrun-extended/lib/python3.9/site-packages/mlrun/artifacts/dataset.py:387: RuntimeWarning:
Converting input from bool to <class 'numpy.uint8'> for compatibility.
project | uid | iter | start | end | state | kind | name | labels | inputs | parameters | results | artifact_uris |
---|---|---|---|---|---|---|---|---|---|---|---|---|
feature-selection | 0 | Mar 06 10:59:27 | NaT | completed | run | feature-selection-feature-selection | v3io_user=iguazio kind=local owner=iguazio host=jupyter-75c4d4bf58-gkz6c |
df_artifact |
k=2 min_votes=0.3 label_column=is_error |
f_classif=store://artifacts/feature-selection/feature-selection-feature-selection_f_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^eaf3bb0877b7a8502365fc3a919a6ba295febdd9 mutual_info_classif=store://artifacts/feature-selection/feature-selection-feature-selection_mutual_info_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^8d572b32e4d65007b048aa0b51899c0ebcb45219 chi2=store://artifacts/feature-selection/feature-selection-feature-selection_chi2#0@fdcbc4e3f5c44769be5e64425f10aed8^5775ae9bbecb726fbe4d229548b82d920a43526e f_regression=store://artifacts/feature-selection/feature-selection-feature-selection_f_regression#0@fdcbc4e3f5c44769be5e64425f10aed8^90cd856fcf72b5daac1822b537dbef1ea504f6ef LinearSVC=store://artifacts/feature-selection/feature-selection-feature-selection_LinearSVC#0@fdcbc4e3f5c44769be5e64425f10aed8^9723935a28b9ea141e7aaf6f242109ce9383eae0 LogisticRegression=store://artifacts/feature-selection/feature-selection-feature-selection_LogisticRegression#0@fdcbc4e3f5c44769be5e64425f10aed8^852483e49d4372edd493a130f7fd4aa85666af36 ExtraTreesClassifier=store://artifacts/feature-selection/feature-selection-feature-selection_ExtraTreesClassifier#0@fdcbc4e3f5c44769be5e64425f10aed8^0f32d3f338d4a7692859d06443c5e8499773a651 feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^9abe27f14a189c6ff34456a29403fabdb54113f2 max_scaled_scores_feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_max_scaled_scores_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^1910b21802b1d6a0aa412fca4f93cecbc8c755b4 selected_features_count=store://datasets/feature-selection/feature-selection-feature-selection_selected_features_count#0@fdcbc4e3f5c44769be5e64425f10aed8^e06baafaa333c72865103b3f3645da60367f9b85 selected_features=store://datasets/feature-selection/feature-selection-feature-selection_selected_features#0@fdcbc4e3f5c44769be5e64425f10aed8^7af89e5113ae0dce4c8b90f4a52cf57500d795b5 |
> to track results use the .show() or .logs() methods or click here to open in UI
> 2025-03-06 10:59:31,131 [info] Run execution finished: {"name":"feature-selection-feature-selection","status":"completed"}
mlrun.get_dataitem(fs.spec.inputs['df_artifact']).as_df()
cpu_utilization | latency | packet_loss | throughput | is_error | ||||
---|---|---|---|---|---|---|---|---|
timestamp | company | data_center | device | |||||
2021-04-27 14:46:46.780 | Smith_Group | Denise_Crest | 5124209057231 | 75.598891 | 0.000000 | 0.000000 | 252.445971 | False |
2891755865712 | 50.090373 | 3.280849 | 0.000000 | 229.889187 | False | |||
Debra_Gateway | 0388020295311 | 73.243063 | 9.372341 | 2.170138 | 260.883807 | False | ||
9633813691441 | 60.830420 | 12.241878 | 2.295717 | 244.238613 | False | |||
Ferrell_Ltd | Murphy_Meadow | 1517129765931 | 72.647964 | 0.535463 | 0.000000 | 212.944943 | False | |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-04-27 15:46:46.780 | Smith_Group | Debra_Gateway | 9633813691441 | 77.875954 | 3.250584 | 0.000000 | 245.150281 | False |
Ferrell_Ltd | Murphy_Meadow | 1517129765931 | 77.831459 | 0.000000 | 0.000000 | 235.109321 | False | |
6964486699383 | 55.978514 | 2.977447 | 0.533963 | 277.622402 | False | |||
Nicholas_Estate | 8002897098167 | 58.265446 | 4.090207 | 2.048268 | 272.717982 | False | ||
8499880735104 | 71.245041 | 0.000000 | 2.929407 | 235.659211 | False |
5768 rows × 5 columns
mlrun.get_dataitem(fs.outputs['feature_scores']).as_df()
f_classif | mutual_info_classif | chi2 | f_regression | LinearSVC | LogisticRegression | ExtraTreesClassifier | |
---|---|---|---|---|---|---|---|
cpu_utilization | 2520.015809 | 0.180451 | 4457.429360 | 2520.015809 | -0.037533 | -0.178632 | 0.023102 |
latency | 10152.151995 | 0.199853 | 272872.890194 | 10152.151995 | 0.023732 | 0.104103 | 0.023102 |
packet_loss | 14120.490547 | 0.210081 | 157191.427524 | 14120.490547 | 0.047873 | 0.214212 | 0.023102 |
throughput | 20421.721030 | 0.231438 | 109129.511665 | 20421.721030 | -0.014421 | -0.084308 | 0.023102 |