While Recursive Feature Elimination (RFE) works by starting with all features and pruning the least important ones, Sequential Feature Selection (SFS) offers alternative greedy search strategies. Instead of relying solely on model coefficients or feature importances like RFE, SFS directly evaluates model performance (using a chosen scoring metric) on different subsets of features. It iteratively builds (forward selection) or shrinks (backward selection) the feature set.SFS methods belong to the wrapper category because they wrap the feature selection process around a specific machine learning model, using its performance as the objective function to guide the search. This makes them computationally more intensive than filter methods but potentially more attuned to the chosen model's needs.Forward SelectionSequential Forward Selection (SFS) starts with an empty set of features. In each iteration, it evaluates adding each feature not currently in the selected set. The feature whose addition results in the highest performance improvement (according to the chosen scoring metric, evaluated typically via cross-validation) is added to the set. This process continues until a predefined number of features is selected, or until adding any remaining feature does not yield a significant performance improvement.Algorithm Steps (Forward Selection):Start with an empty feature set $S = \emptyset$.Specify the target number of features, $k$.While $|S| < k$:For each feature $f$ not in $S$:Evaluate the performance of the chosen estimator using features $S \cup {f}$.Select the feature $f^*$ that results in the best performance.Update the selected set: $S = S \cup {f^*}$.Return the final feature set $S$.Imagine building a toolkit. You start with nothing and add one tool at a time, always picking the tool that helps you perform the task best with the tools you already have.Backward SelectionSequential Backward Selection (SBS), sometimes called Sequential Backward Elimination, operates in the opposite direction. It starts with the full set of available features. In each iteration, it evaluates removing each feature currently in the set. The feature whose removal causes the smallest decrease (or largest increase) in model performance is removed. This continues until the desired number of features is reached.Algorithm Steps (Backward Selection):Start with the full feature set $S = F$ (where $F$ is the set of all features).Specify the target number of features, $k$.While $|S| > k$:For each feature $f$ currently in $S$:Evaluate the performance of the chosen estimator using features $S \setminus {f}$.Select the feature $f^*$ whose removal results in the best performance (or least performance degradation).Update the selected set: $S = S \setminus {f^*}$.Return the final feature set $S$.This is like starting with a cluttered toolbox and removing one tool at a time, discarding the one you miss the least, until you have a streamlined, effective set.Implementation with Scikit-learnScikit-learn provides the SequentialFeatureSelector class for performing both forward and backward selection.import pandas as pd from sklearn.datasets import make_classification from sklearn.feature_selection import SequentialFeatureSelector from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline # Generate synthetic classification data X, y = make_classification(n_samples=200, n_features=15, n_informative=5, n_redundant=5, n_repeated=0, n_classes=2, n_clusters_per_class=2, random_state=42) X = pd.DataFrame(X, columns=[f'feature_{i+1}' for i in range(15)]) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create a pipeline with scaling and logistic regression # SFS works better with scaled data for many estimators pipe = Pipeline([ ('scaler', StandardScaler()), ('model', LogisticRegression(solver='liblinear', random_state=42)) ]) # --- Forward Selection --- print("Performing Forward Selection...") sfs_forward = SequentialFeatureSelector( estimator=pipe.named_steps['model'], # Use the model part of the pipeline n_features_to_select=5, # Target number of features direction='forward', # Specify forward selection scoring='accuracy', # Performance metric cv=5, # Cross-validation folds n_jobs=-1 # Use all available CPU cores ) # Note: SFS should ideally be fit on scaled data. # We scale data first, then fit SFS using the scaled data and the base model. scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) sfs_forward.fit(X_train_scaled, y_train) # Get selected feature indices and names selected_features_mask_fwd = sfs_forward.get_support() selected_feature_names_fwd = X.columns[selected_features_mask_fwd] print(f"Selected features (Forward): {selected_feature_names_fwd.tolist()}") print(f"Number of features selected: {sfs_forward.n_features_to_select_}") # --- Backward Selection --- print("\nPerforming Backward Selection...") sfs_backward = SequentialFeatureSelector( estimator=pipe.named_steps['model'], n_features_to_select=5, # Target number of features direction='backward', # Specify backward selection scoring='accuracy', cv=5, n_jobs=-1 ) sfs_backward.fit(X_train_scaled, y_train) # Get selected feature indices and names selected_features_mask_bwd = sfs_backward.get_support() selected_feature_names_bwd = X.columns[selected_features_mask_bwd] print(f"Selected features (Backward): {selected_feature_names_bwd.tolist()}") print(f"Number of features selected: {sfs_backward.n_features_to_select_}") # You can transform the data to keep only selected features # X_train_scaled_sfs_fwd = sfs_forward.transform(X_train_scaled) # X_test_scaled_sfs_fwd = sfs_forward.transform(X_test_scaled)Important parameters for SequentialFeatureSelector:estimator: The machine learning model used to evaluate feature subsets.n_features_to_select: The target number of features. Can be an integer, 'auto' (uses tol parameter), or a float between 0 and 1 (representing a fraction of features). Using 'auto' is often computationally expensive.direction: 'forward' or 'backward'.scoring: The metric used to evaluate performance (e.g., 'accuracy', 'roc_auc', 'r2', 'neg_mean_squared_error'). Must be a valid Scikit-learn scoring string or a callable scorer.cv: Number of cross-validation folds or a CV splitter strategy. Essential for performance evaluation.n_jobs: Number of CPU cores to use for parallel execution during cross-validation. -1 uses all available cores.Note that forward and backward selection do not necessarily yield the same set of features, as they explore the feature space differently.Advantages and DisadvantagesAdvantages:Model-Specific: Considers the interaction between features through the lens of the chosen model, potentially finding better subsets for that specific model than filter methods.Simple: The greedy approach is relatively easy to understand.Potentially Better than RFE: Unlike RFE which often relies on coefficients/importances that might change as features are removed, SFS re-evaluates subsets based on actual model performance at each step.Disadvantages:Computationally Expensive: Requires training the estimator multiple times for each feature added or removed in each iteration, multiplied by the number of CV folds. Backward selection can be particularly slow if starting with many features.Greedy Nature: Makes locally optimal choices at each step, which doesn't guarantee finding the globally best feature subset. A feature added early (forward) or kept early (backward) might prevent a better combination from being found later.Sensitivity: The results depend significantly on the chosen estimator, scoring metric, and CV strategy.When to Use SFSSFS is a valuable wrapper method when:You suspect feature interactions are important for your chosen model.Computational cost is acceptable (e.g., moderate number of features).You want to directly optimize feature selection based on a specific performance metric.You need an alternative perspective compared to RFE or embedded methods.Like other wrapper methods, SFS requires careful consideration of the trade-off between the potential for improved model performance and the computational resources required. It's often a good idea to compare its results with those from filter and embedded methods.