Calculates Leave-One-Covariate-In (LOCI) scores. Despite the name, this implementation can leave in one or more features at a time.
Details
LOCI measures feature importance by training models with only each individual feature (or feature subset) and comparing their performance to a featureless baseline model (optimal constant prediction). The importance is calculated as (featureless_model_loss - single_feature_loss). Positive values indicate the feature performs better than the baseline, negative values indicate worse performance.
Super classes
xplainfi::FeatureImportanceMethod
-> xplainfi::LeaveOutIn
-> LOCI
Methods
Method new()
Creates a new instance of this R6 class.
Usage
LOCI$new(
task,
learner,
measure,
resampling = NULL,
features = NULL,
iters_refit = 1L,
obs_loss = FALSE
)
Arguments
task
(mlr3::Task) Task to compute importance for.
learner
(mlr3::Learner) Learner to use for prediction.
measure
(mlr3::Measure) Measure to use for scoring.
resampling
(mlr3::Resampling) Resampling strategy. Defaults to holdout.
features
(
character()
) Features to compute importance for. Defaults to all features.iters_refit
(
integer(1)
) Number of refit iterations per resampling iteration.obs_loss
(
logical(1)
) Whether to use observation-wise loss calculation (analogous to LOCO) when supported by the measure. IfFALSE
(default), uses aggregated scores. WhenTRUE
, uses the measure's aggregation function ormean
as fallback.
Examples
library(mlr3)
library(mlr3learners)
task = tgen("friedman1")$generate(n = 200)
# Standard LOCI with aggregated scores
loci = LOCI$new(
task = task,
learner = lrn("regr.ranger", num.trees = 50),
measure = msr("regr.mse")
)
#> ℹ No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci$compute()
# Using observation-wise losses with measure's aggregation function
loci_obsloss = LOCI$new(
task = task,
learner = lrn("regr.ranger", num.trees = 50),
measure = msr("regr.mae"), # uses MAE's aggregation function (mean) internally
obs_loss = TRUE
)
#> ℹ No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci_obsloss$compute()
loci_obsloss$obs_losses
#> row_ids feature iteration iter_refit truth response_ref
#> <int> <char> <int> <int> <num> <num>
#> 1: 4 important1 1 1 15.36222 14.04979
#> 2: 4 important2 1 1 15.36222 14.04979
#> 3: 4 important3 1 1 15.36222 14.04979
#> 4: 4 important4 1 1 15.36222 14.04979
#> 5: 4 important5 1 1 15.36222 14.04979
#> ---
#> 666: 190 unimportant1 1 1 11.60406 14.04979
#> 667: 190 unimportant2 1 1 11.60406 14.04979
#> 668: 190 unimportant3 1 1 11.60406 14.04979
#> 669: 190 unimportant4 1 1 11.60406 14.04979
#> 670: 190 unimportant5 1 1 11.60406 14.04979
#> response_feature loss_ref loss_feature obs_diff
#> <num> <num> <num> <num>
#> 1: 15.42979 1.312421 0.06757879 1.2448426
#> 2: 12.11727 1.312421 3.24494453 -1.9325232
#> 3: 20.14413 1.312421 4.78191717 -3.4694958
#> 4: 11.69276 1.312421 3.66945606 -2.3570347
#> 5: 13.48251 1.312421 1.87971032 -0.5672890
#> ---
#> 666: 13.96568 2.445737 2.36161915 0.0841178
#> 667: 18.78093 2.445737 7.17687679 -4.7311398
#> 668: 13.16344 2.445737 1.55938273 0.8863542
#> 669: 15.96397 2.445737 4.35991736 -1.9141804
#> 670: 12.86600 2.445737 1.26193892 1.1837980
# LOCI with median aggregation (analogous to original LOCO)
mae_median = msr("regr.mae")
mae_median$aggregator = median
loci_median = LOCI$new(
task = task,
learner = lrn("regr.ranger", num.trees = 50),
measure = mae_median,
obs_loss = TRUE
)
#> ℹ No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci_median$compute()