Feature Selection#

import mlrun
import os

> 2025-03-06 10:55:11,680 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-03-06 10:55:13,566 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"1.8.0"}

project = mlrun.get_or_create_project("feature-selection",'./')

> 2025-03-06 10:55:14,686 [info] Loading project from path: {"path":"./","project_name":"feature-selection","user_project":false}
> 2025-03-06 10:55:14,726 [warning] Project name mismatch, fhub-v2 != feature-selection, project is loaded from fhub-v2 project yaml. To prevent/allow this, you can take one of the following actions:
1. Set the `allow_cross_project=True` when loading the project.
2. Delete the existing project yaml, or ensure its name is equal to feature-selection.
3. Use different project context dir.

Project name='feature-selection' is different than specified on the context's project yaml. This behavior is deprecated and will not be supported from version 1.9.0.

> 2025-03-06 10:55:29,474 [info] Project loaded successfully: {"path":"./","project_name":"feature-selection","stored_in_db":true}

Local Test#

feature_selection = mlrun.import_function("function.yaml")

fs = feature_selection.run(handler="feature_selection", 
                      params={'k': 2,
                       'min_votes': 0.3,
                       'label_column': 'is_error'},
                      inputs={'df_artifact': os.path.abspath('data/metrics.pq')},
                      artifact_path=os.path.join(os.path.abspath('./'), 'artifacts'), local=True)

> 2025-03-06 10:59:27,279 [info] Storing function: {"db":null,"name":"feature-selection-feature-selection","uid":"fdcbc4e3f5c44769be5e64425f10aed8"}
> 2025-03-06 10:59:30,808 [info] votes needed to be selected: 2

/User/.pythonlibs/mlrun-extended/lib/python3.9/site-packages/mlrun/artifacts/dataset.py:387: RuntimeWarning:

Converting input from bool to <class 'numpy.uint8'> for compatibility.

project	uid	iter	start	end	state	kind	name	labels	inputs	parameters	results	artifact_uris
feature-selection	...5f10aed8	0	Mar 06 10:59:27	NaT	completed	run	feature-selection-feature-selection	v3io_user=iguazio kind=local owner=iguazio host=jupyter-75c4d4bf58-gkz6c	df_artifact	k=2 min_votes=0.3 label_column=is_error		f_classif=store://artifacts/feature-selection/feature-selection-feature-selection_f_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^eaf3bb0877b7a8502365fc3a919a6ba295febdd9 mutual_info_classif=store://artifacts/feature-selection/feature-selection-feature-selection_mutual_info_classif#0@fdcbc4e3f5c44769be5e64425f10aed8^8d572b32e4d65007b048aa0b51899c0ebcb45219 chi2=store://artifacts/feature-selection/feature-selection-feature-selection_chi2#0@fdcbc4e3f5c44769be5e64425f10aed8^5775ae9bbecb726fbe4d229548b82d920a43526e f_regression=store://artifacts/feature-selection/feature-selection-feature-selection_f_regression#0@fdcbc4e3f5c44769be5e64425f10aed8^90cd856fcf72b5daac1822b537dbef1ea504f6ef LinearSVC=store://artifacts/feature-selection/feature-selection-feature-selection_LinearSVC#0@fdcbc4e3f5c44769be5e64425f10aed8^9723935a28b9ea141e7aaf6f242109ce9383eae0 LogisticRegression=store://artifacts/feature-selection/feature-selection-feature-selection_LogisticRegression#0@fdcbc4e3f5c44769be5e64425f10aed8^852483e49d4372edd493a130f7fd4aa85666af36 ExtraTreesClassifier=store://artifacts/feature-selection/feature-selection-feature-selection_ExtraTreesClassifier#0@fdcbc4e3f5c44769be5e64425f10aed8^0f32d3f338d4a7692859d06443c5e8499773a651 feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^9abe27f14a189c6ff34456a29403fabdb54113f2 max_scaled_scores_feature_scores=store://datasets/feature-selection/feature-selection-feature-selection_max_scaled_scores_feature_scores#0@fdcbc4e3f5c44769be5e64425f10aed8^1910b21802b1d6a0aa412fca4f93cecbc8c755b4 selected_features_count=store://datasets/feature-selection/feature-selection-feature-selection_selected_features_count#0@fdcbc4e3f5c44769be5e64425f10aed8^e06baafaa333c72865103b3f3645da60367f9b85 selected_features=store://datasets/feature-selection/feature-selection-feature-selection_selected_features#0@fdcbc4e3f5c44769be5e64425f10aed8^7af89e5113ae0dce4c8b90f4a52cf57500d795b5

> to track results use the .show() or .logs() methods or click here to open in UI

> 2025-03-06 10:59:31,131 [info] Run execution finished: {"name":"feature-selection-feature-selection","status":"completed"}

mlrun.get_dataitem(fs.spec.inputs['df_artifact']).as_df()

				cpu_utilization	latency	packet_loss	throughput	is_error
timestamp	company	data_center	device
2021-04-27 14:46:46.780	Smith_Group	Denise_Crest	5124209057231	75.598891	0.000000	0.000000	252.445971	False
		Denise_Crest	2891755865712	50.090373	3.280849	0.000000	229.889187	False
		Debra_Gateway	0388020295311	73.243063	9.372341	2.170138	260.883807	False
		Debra_Gateway	9633813691441	60.830420	12.241878	2.295717	244.238613	False
	Ferrell_Ltd	Murphy_Meadow	1517129765931	72.647964	0.535463	0.000000	212.944943	False
...	...	...	...	...	...	...	...	...
2021-04-27 15:46:46.780	Smith_Group	Debra_Gateway	9633813691441	77.875954	3.250584	0.000000	245.150281	False
	Ferrell_Ltd	Murphy_Meadow	1517129765931	77.831459	0.000000	0.000000	235.109321	False
		Murphy_Meadow	6964486699383	55.978514	2.977447	0.533963	277.622402	False
		Nicholas_Estate	8002897098167	58.265446	4.090207	2.048268	272.717982	False
		Nicholas_Estate	8499880735104	71.245041	0.000000	2.929407	235.659211	False

5768 rows × 5 columns

mlrun.get_dataitem(fs.outputs['feature_scores']).as_df()

	f_classif	mutual_info_classif	chi2	f_regression	LinearSVC	LogisticRegression	ExtraTreesClassifier
cpu_utilization	2520.015809	0.180451	4457.429360	2520.015809	-0.037533	-0.178632	0.023102
latency	10152.151995	0.199853	272872.890194	10152.151995	0.023732	0.104103	0.023102
packet_loss	14120.490547	0.210081	157191.427524	14120.490547	0.047873	0.214212	0.023102
throughput	20421.721030	0.231438	109129.511665	20421.721030	-0.014421	-0.084308	0.023102

Feature Selection

Contents

Feature Selection#

Local Test#