Skip to contents

The xplainfi package provides feature importance methods for machine learning models. It implements several approaches for measuring how much each feature contributes to model predictions, with a focus on model-agnostic methods that work with any learner.

Core Concepts

Feature importance methods in xplainfi answer different but related questions:

  • How much does each feature contribute to model performance? (Permutation Feature Importance)
  • What happens when we remove features and retrain? (Leave-One-Covariate-Out)
  • How much does each feature contribute individually? (Leave-One-Covariate-In)
  • How do features depend on each other? (Conditional and Relative methods)

All methods share a common interface built on mlr3, making them easy to use with any task, learner, measure, and resampling strategy.

Basic Example

Let’s use the Friedman1 task, which provides an ideal setup for demonstrating feature importance methods with known ground truth:

task <- tgen("friedman1")$generate(n = 300)
learner <- lrn("regr.ranger", num.trees = 100)
measure <- msr("regr.mse")
resampling <- rsmp("cv", folds = 3)

The task has 300 observations with 10 features. Features important1 through important5 truly affect the target, while unimportant1 through unimportant5 are pure noise. We’ll use a random forest learner with cross-validation for more stable estimates.

The target function is: \(y = 10 * \operatorname{sin}(\pi * x_1 * x_2) + 20 * (x_3 - 0.5)^2 + 10 * x_4 + 5 * x_5 + \epsilon\)

Permutation Feature Importance (PFI)

PFI is the most straightforward method: for each feature, we permute (shuffle) its values and measure how much model performance deteriorates. More important features cause larger performance drops when shuffled.

pfi <- PFI$new(
  task = task,
  learner = learner,
  measure = measure,
  resampling = resampling
)

pfi_results <- pfi$compute()
pfi_results
#> Key: <feature>
#>          feature   importance         sd
#>           <char>        <num>      <num>
#>  1:   important1  4.858724892 0.68442453
#>  2:   important2  8.155693005 2.26484810
#>  3:   important3  1.109254345 0.69151561
#>  4:   important4 10.784727349 1.29361802
#>  5:   important5  2.395793708 0.87273890
#>  6: unimportant1  0.009618005 0.11138825
#>  7: unimportant2  0.080903445 0.08050202
#>  8: unimportant3  0.044057887 0.04528352
#>  9: unimportant4 -0.082032243 0.10855146
#> 10: unimportant5 -0.137666350 0.08268950

The importance column shows the performance difference when each feature is permuted. Higher values indicate more important features.

For more stable estimates, we can use multiple permutation iterations per resampling fold:

pfi_stable <- PFI$new(
  task = task,
  learner = learner,
  measure = measure,
  resampling = resampling,
  iters_perm = 5
)

pfi_stable$compute()
#> Key: <feature>
#>          feature   importance         sd
#>           <char>        <num>      <num>
#>  1:   important1  5.625322621 0.84130375
#>  2:   important2  9.609986341 1.77518863
#>  3:   important3  1.196388744 0.44992082
#>  4:   important4 12.648328883 2.92740759
#>  5:   important5  1.705056896 0.54745713
#>  6: unimportant1 -0.002597636 0.09029340
#>  7: unimportant2  0.108962283 0.17123736
#>  8: unimportant3  0.039131183 0.08291645
#>  9: unimportant4 -0.058408934 0.08647166
#> 10: unimportant5 -0.041202334 0.10787124

We can also use ratio instead of difference for the importance calculation:

pfi_stable$compute(relation = "ratio")
#> Key: <feature>
#>          feature importance         sd
#>           <char>      <num>      <num>
#>  1:   important1  1.9484469 0.29293110
#>  2:   important2  2.4425122 0.27894990
#>  3:   important3  1.2212920 0.07502941
#>  4:   important4  2.9619275 0.45741386
#>  5:   important5  1.3790643 0.13580065
#>  6: unimportant1  0.9862282 0.01836363
#>  7: unimportant2  1.0098376 0.01946505
#>  8: unimportant3  1.0227458 0.01668362
#>  9: unimportant4  1.0090616 0.01819052
#> 10: unimportant5  0.9909342 0.01275179

Leave-One-Covariate-Out (LOCO)

LOCO measures importance by retraining the model without each feature and comparing performance to the full model. This shows the contribution of each feature when all other features are present.

loco <- LOCO$new(
  task = task,
  learner = learner,
  measure = measure,
  resampling = resampling
)

loco_results <- loco$compute()
loco_results
#> Key: <feature>
#>          feature importance        sd
#>           <char>      <num>     <num>
#>  1:   important1  3.5341950 0.4799813
#>  2:   important2  5.5076635 0.8946863
#>  3:   important3  0.8231575 0.4514476
#>  4:   important4  7.5628028 1.7412825
#>  5:   important5  0.7647955 0.7375444
#>  6: unimportant1 -0.3884817 0.4774518
#>  7: unimportant2 -0.3159022 0.1183964
#>  8: unimportant3 -0.1991742 0.4288578
#>  9: unimportant4 -0.3039987 0.3220437
#> 10: unimportant5 -0.3435275 0.5206174

LOCO is computationally expensive (requires retraining for each feature) but provides clear interpretation: higher values mean larger performance drop when the feature is removed. Important limitation: LOCO cannot distinguish between direct effects and indirect effects through correlated features.

Feature Samplers

For advanced methods that account for feature dependencies, xplainfi provides different sampling strategies. While PFI uses simple permutation (marginal sampling), conditional samplers can preserve feature relationships.

Let’s demonstrate conditional sampling using Adversarial Random Forests, which preserves relationships between features when sampling:

arf_sampler <- ARFSampler$new(task)

sample_data <- task$data(rows = 1:5)
sample_data[, .(y, important1, important2)]
#>           y important1  important2
#>       <num>      <num>       <num>
#> 1: 20.59935  0.2875775 0.784575267
#> 2: 10.48474  0.7883051 0.009429905
#> 3: 19.99049  0.4089769 0.779065883
#> 4: 19.70521  0.8830174 0.729390652
#> 5: 21.94251  0.9404673 0.630131853

Now we’ll conditionally sample the important1 feature given the values of important2 and important3:

sampled_conditional <- arf_sampler$sample(
  feature = "important1", 
  data = sample_data,
  conditioning_set = c("important2", "important3")
)

sample_data[, .(y, important1, important2, important3)]
#>           y important1  important2 important3
#>       <num>      <num>       <num>      <num>
#> 1: 20.59935  0.2875775 0.784575267  0.2372297
#> 2: 10.48474  0.7883051 0.009429905  0.6864904
#> 3: 19.99049  0.4089769 0.779065883  0.2258184
#> 4: 19.70521  0.8830174 0.729390652  0.3184946
#> 5: 21.94251  0.9404673 0.630131853  0.1739838
sampled_conditional[, .(y, important1, important2, important3)]
#>           y important1  important2 important3
#>       <num>      <num>       <num>      <num>
#> 1: 20.59935  0.2666614 0.784575267  0.2372297
#> 2: 10.48474  0.2551529 0.009429905  0.6864904
#> 3: 19.99049  0.4023150 0.779065883  0.2258184
#> 4: 19.70521  1.0645308 0.729390652  0.3184946
#> 5: 21.94251  0.6319506 0.630131853  0.1739838

This conditional sampling is essential for methods like CFI and RFI that need to preserve feature dependencies. See vignette("perturbation-importance") for detailed comparisons.

Advanced Features

xplainfi supports many advanced features for robust importance estimation:

  • Multiple resampling strategies: Cross-validation, bootstrap, custom splits
  • Multiple permutation/refit iterations: For more stable estimates
  • Feature grouping: Compute importance for groups of related features
  • Different relation types: Difference vs. ratio scoring
  • Conditional sampling: Account for feature dependencies (see vignette("perturbation-importance"))
  • SAGE methods: Shapley-based approaches (see vignette("sage-methods"))

Detailed Scoring Information

All methods store detailed scoring information for further analysis. Let’s examine the structure of PFI’s detailed scores:

head(pfi$scores, 10) |>
  knitr::kable(digits = 4, caption = "Detailed PFI scores (first 10 rows)")
Detailed PFI scores (first 10 rows)
feature iter_rsmp iter_perm regr.mse_orig regr.mse_perm importance
important1 1 1 4.3358 8.4459 4.1102
important1 2 1 7.7130 12.7265 5.0135
important1 3 1 6.1924 11.6449 5.4525
important2 1 1 4.3358 10.9357 6.6000
important2 2 1 7.7130 18.4671 10.7541
important2 3 1 6.1924 13.3055 7.1130
important3 1 1 4.3358 5.2284 0.8926
important3 2 1 7.7130 9.5961 1.8831
important3 3 1 6.1924 6.7444 0.5520
important4 1 1 4.3358 15.4558 11.1200

We can also summarize the scoring structure:

pfi$scores[, .(
  features = uniqueN(feature),
  resampling_folds = uniqueN(iter_rsmp), 
  permutation_iters = uniqueN(iter_perm),
  total_scores = .N
)]
#>    features resampling_folds permutation_iters total_scores
#>       <int>            <int>             <int>        <int>
#> 1:       10                3                 1           30