The xplainfi package provides feature importance methods for machine learning models. It implements several approaches for measuring how much each feature contributes to model predictions, with a focus on model-agnostic methods that work with any learner.
Core Concepts
Feature importance methods in xplainfi answer different but related questions:
- How much does each feature contribute to model performance? (Permutation Feature Importance)
-
What happens when we remove features and retrain?
(Leave-One-Covariate-Out)
- How much does each feature contribute individually? (Leave-One-Covariate-In)
- How do features depend on each other? (Conditional and Relative methods)
All methods share a common interface built on mlr3, making them easy to use with any task, learner, measure, and resampling strategy.
The general pattern is to call $compute()
to calculate
importance (which always re-computes), then
$importance()
to retrieve the aggregated results, with
intermediate results available in $scores
.
Basic Example
Let’s use the Friedman1 task, which provides an ideal setup for demonstrating feature importance methods with known ground truth:
task <- tgen("friedman1")$generate(n = 300)
learner <- lrn("regr.ranger", num.trees = 100)
measure <- msr("regr.mse")
resampling <- rsmp("cv", folds = 3)
The task has 300 observations with 10 features. Features
important1
through important5
truly affect the
target, while unimportant1
through
unimportant5
are pure noise. We’ll use a random forest
learner with cross-validation for more stable estimates.
The target function is: \(y = 10 * \operatorname{sin}(\pi * x_1 * x_2) + 20 * (x_3 - 0.5)^2 + 10 * x_4 + 5 * x_5 + \epsilon\)
Permutation Feature Importance (PFI)
PFI is the most straightforward method: for each feature, we permute (shuffle) its values and measure how much model performance deteriorates. More important features cause larger performance drops when shuffled.
pfi <- PFI$new(
task = task,
learner = learner,
measure = measure,
resampling = resampling
)
pfi$compute()
pfi$importance()
#> Key: <feature>
#> feature importance
#> <char> <num>
#> 1: important1 4.858724892
#> 2: important2 8.155693005
#> 3: important3 1.109254345
#> 4: important4 10.784727349
#> 5: important5 2.395793708
#> 6: unimportant1 0.009618005
#> 7: unimportant2 0.080903445
#> 8: unimportant3 0.044057887
#> 9: unimportant4 -0.082032243
#> 10: unimportant5 -0.137666350
The importance
column shows the performance difference
when each feature is permuted. Higher values indicate more important
features.
For more stable estimates, we can use multiple permutation iterations per resampling fold:
pfi_stable <- PFI$new(
task = task,
learner = learner,
measure = measure,
resampling = resampling,
iters_perm = 5
)
pfi_stable$compute()
pfi_stable$importance()
#> Key: <feature>
#> feature importance
#> <char> <num>
#> 1: important1 5.33911229
#> 2: important2 7.05857192
#> 3: important3 1.05836425
#> 4: important4 13.82458524
#> 5: important5 1.87787034
#> 6: unimportant1 -0.03467317
#> 7: unimportant2 0.02760776
#> 8: unimportant3 0.07803544
#> 9: unimportant4 0.03031981
#> 10: unimportant5 -0.03469244
We can also use ratio instead of difference for the importance calculation, meaning that an unimportant feature is now expected to get an importance score of 1 rather than 0:
pfi_stable$importance(relation = "ratio")
#> Key: <feature>
#> feature importance
#> <char> <num>
#> 1: important1 1.8111892
#> 2: important2 2.0655617
#> 3: important3 1.1602020
#> 4: important4 3.0854616
#> 5: important5 1.2852320
#> 6: unimportant1 0.9943552
#> 7: unimportant2 1.0039539
#> 8: unimportant3 1.0101863
#> 9: unimportant4 1.0042183
#> 10: unimportant5 0.9945303
Leave-One-Covariate-Out (LOCO)
LOCO measures importance by retraining the model without each feature and comparing performance to the full model. This shows the contribution of each feature when all other features are present.
loco <- LOCO$new(
task = task,
learner = learner,
measure = measure,
resampling = resampling
)
loco$compute()
loco$importance()
#> Key: <feature>
#> feature importance
#> <char> <num>
#> 1: important1 3.4140568
#> 2: important2 5.7771305
#> 3: important3 0.8518190
#> 4: important4 7.4712326
#> 5: important5 0.6982960
#> 6: unimportant1 -0.3671644
#> 7: unimportant2 -0.2105410
#> 8: unimportant3 -0.2953192
#> 9: unimportant4 -0.4608305
#> 10: unimportant5 -0.3506712
LOCO is computationally expensive (requires retraining for each feature) but provides clear interpretation: higher values mean larger performance drop when the feature is removed. Important limitation: LOCO cannot distinguish between direct effects and indirect effects through correlated features.
Feature Samplers
For advanced methods that account for feature dependencies, xplainfi provides different sampling strategies. While PFI uses simple permutation (marginal sampling), conditional samplers can preserve feature relationships.
Let’s demonstrate conditional sampling using Adversarial Random Forests, which preserves relationships between features when sampling:
arf_sampler <- ARFSampler$new(task)
sample_data <- task$data(rows = 1:5)
sample_data[, .(y, important1, important2)]
#> y important1 important2
#> <num> <num> <num>
#> 1: 20.59935 0.2875775 0.784575267
#> 2: 10.48474 0.7883051 0.009429905
#> 3: 19.99049 0.4089769 0.779065883
#> 4: 19.70521 0.8830174 0.729390652
#> 5: 21.94251 0.9404673 0.630131853
Now we’ll conditionally sample the important1
feature
given the values of important2
and
important3
:
sampled_conditional <- arf_sampler$sample_newdata(
feature = "important1",
newdata = sample_data,
conditioning_set = c("important2", "important3")
)
sample_data[, .(y, important1, important2, important3)]
#> y important1 important2 important3
#> <num> <num> <num> <num>
#> 1: 20.59935 0.2875775 0.784575267 0.2372297
#> 2: 10.48474 0.7883051 0.009429905 0.6864904
#> 3: 19.99049 0.4089769 0.779065883 0.2258184
#> 4: 19.70521 0.8830174 0.729390652 0.3184946
#> 5: 21.94251 0.9404673 0.630131853 0.1739838
sampled_conditional[, .(y, important1, important2, important3)]
#> y important1 important2 important3
#> <num> <num> <num> <num>
#> 1: 20.59935 0.1717862 0.784575267 0.2372297
#> 2: 10.48474 0.1667886 0.009429905 0.6864904
#> 3: 19.99049 0.9076188 0.779065883 0.2258184
#> 4: 19.70521 -0.3738736 0.729390652 0.3184946
#> 5: 21.94251 0.4353523 0.630131853 0.1739838
This conditional sampling is essential for methods like CFI and RFI
that need to preserve feature dependencies. See
vignette("perturbation-importance")
for detailed
comparisons.
Advanced Features
xplainfi supports many advanced features for robust importance estimation:
- Multiple resampling strategies: Cross-validation, bootstrap, custom splits
- Multiple permutation/refit iterations: For more stable estimates
- Feature grouping: Compute importance for groups of related features
- Different relation types: Difference vs. ratio scoring
-
Conditional sampling: Account for feature
dependencies (see
vignette("perturbation-importance")
) -
SAGE methods: Shapley-based approaches (see
vignette("sage-methods")
)
Detailed Scoring Information
All methods store detailed scoring information from each resampling iteration for further analysis. Let’s examine the structure of PFI’s detailed scores:
pfi$scores() |>
head(10) |>
knitr::kable(digits = 4, caption = "Detailed PFI scores (first 10 rows)")
feature | iter_rsmp | iter_perm | regr.mse_baseline | regr.mse_post | importance |
---|---|---|---|---|---|
important1 | 1 | 1 | 4.3358 | 8.4459 | 4.1102 |
important2 | 1 | 1 | 4.3358 | 10.9357 | 6.6000 |
important3 | 1 | 1 | 4.3358 | 5.2284 | 0.8926 |
important4 | 1 | 1 | 4.3358 | 15.4558 | 11.1200 |
important5 | 1 | 1 | 4.3358 | 6.5032 | 2.1674 |
unimportant1 | 1 | 1 | 4.3358 | 4.3324 | -0.0033 |
unimportant2 | 1 | 1 | 4.3358 | 4.3681 | 0.0323 |
unimportant3 | 1 | 1 | 4.3358 | 4.4284 | 0.0927 |
unimportant4 | 1 | 1 | 4.3358 | 4.3111 | -0.0247 |
unimportant5 | 1 | 1 | 4.3358 | 4.1194 | -0.2163 |
We can also summarize the scoring structure:
pfi$scores()[, .(
features = uniqueN(feature),
resampling_folds = uniqueN(iter_rsmp),
permutation_iters = uniqueN(iter_perm),
total_scores = .N
)]
#> features resampling_folds permutation_iters total_scores
#> <int> <int> <int> <int>
#> 1: 10 3 1 30
So $importance()
always gives us the aggregated
importances across multiple resampling- and permutation-/refitting
iterations, whereas $scores()
gives you the individual
scores as calculated by the supplied measures
and the
corresponding importance calculated from the difference of these scores
by default.
Analogously to $importance()
, you can also use
relation = "ratio"
here:
pfi$scores(relation = "ratio") |>
head(10) |>
knitr::kable(digits = 4, caption = "PFI scores using the ratio (first 10 rows)")
feature | iter_rsmp | iter_perm | regr.mse_baseline | regr.mse_post | importance |
---|---|---|---|---|---|
important1 | 1 | 1 | 4.3358 | 8.4459 | 1.9480 |
important2 | 1 | 1 | 4.3358 | 10.9357 | 2.5222 |
important3 | 1 | 1 | 4.3358 | 5.2284 | 1.2059 |
important4 | 1 | 1 | 4.3358 | 15.4558 | 3.5647 |
important5 | 1 | 1 | 4.3358 | 6.5032 | 1.4999 |
unimportant1 | 1 | 1 | 4.3358 | 4.3324 | 0.9992 |
unimportant2 | 1 | 1 | 4.3358 | 4.3681 | 1.0075 |
unimportant3 | 1 | 1 | 4.3358 | 4.4284 | 1.0214 |
unimportant4 | 1 | 1 | 4.3358 | 4.3111 | 0.9943 |
unimportant5 | 1 | 1 | 4.3358 | 4.1194 | 0.9501 |
Observation-wise losses and importances
For methods where importances are calculated based on
observation-level comparisons and with decomposable measures, we can
also retrieve observation-level information with
$obs_loss()
, which works analogously to
$scores()
and $importances()
but even more
detailed:
pfi$obs_loss()
#> feature iter_rsmp iter_perm row_ids loss_baseline loss_post
#> <char> <int> <int> <int> <num> <num>
#> 1: important1 1 1 1 3.3403244 0.26184209
#> 2: important1 1 1 9 0.4640003 0.00316609
#> 3: important1 1 1 11 1.0938319 10.11218211
#> 4: important1 1 1 12 2.0091331 2.28764800
#> 5: important1 1 1 15 11.4484770 38.11092543
#> ---
#> 2996: unimportant5 3 1 290 16.8041217 16.80412169
#> 2997: unimportant5 3 1 294 0.4212832 0.45933049
#> 2998: unimportant5 3 1 295 8.0016602 7.86721528
#> 2999: unimportant5 3 1 296 0.2308082 0.26544478
#> 3000: unimportant5 3 1 298 18.8129904 18.81299041
#> obs_importance
#> <num>
#> 1: -3.07848231
#> 2: -0.46083425
#> 3: 9.01835017
#> 4: 0.27851489
#> 5: 26.66244838
#> ---
#> 2996: 0.00000000
#> 2997: 0.03804724
#> 2998: -0.13444495
#> 2999: 0.03463658
#> 3000: 0.00000000
Since we computed PFI using the mean squared error
(msr("regr.mse")
), we can use the associated
Measure$obs_loss()
, the squared error.
In the resulting table we see
-
loss_baseline
: The loss (squared error) for the baseline model before permutation -
loss_post
: The loss for this observation after permutation (or in the case ofLOCO
, after refit) -
obs_importance
: The difference (or ratio ifrelation = "ratio"
) of the the two losses
Note that not all measures have a Measure$obs_loss()
:
Some measures like msr("classif.auc")
are not decomposable,
so observation-wise loss values are not available.
In other cases, the corresponding obs_loss()
is just not
yet implemented in mlr3measures
,
but will likely be in the future.