Skip to contents

Calculates Leave-One-Covariate-In (LOCI) scores. Despite the name, this implementation can leave in one or more features at a time.

Details

LOCI measures feature importance by training models with only each individual feature (or feature subset) and comparing their performance to a featureless baseline model (optimal constant prediction). The importance is calculated as (featureless_model_loss - single_feature_loss). Positive values indicate the feature performs better than the baseline, negative values indicate worse performance.

Methods

Public methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage

LOCI$new(
  task,
  learner,
  measure,
  resampling = NULL,
  features = NULL,
  iters_refit = 1L,
  obs_loss = FALSE
)

Arguments

task

(mlr3::Task) Task to compute importance for.

learner

(mlr3::Learner) Learner to use for prediction.

measure

(mlr3::Measure) Measure to use for scoring.

resampling

(mlr3::Resampling) Resampling strategy. Defaults to holdout.

features

(character()) Features to compute importance for. Defaults to all features.

iters_refit

(integer(1)) Number of refit iterations per resampling iteration.

obs_loss

(logical(1)) Whether to use observation-wise loss calculation (analogous to LOCO) when supported by the measure. If FALSE (default), uses aggregated scores. When TRUE, uses the measure's aggregation function or mean as fallback.


Method clone()

The objects of this class are cloneable with this method.

Usage

LOCI$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

library(mlr3)
library(mlr3learners)
task = tgen("friedman1")$generate(n = 200)

# Standard LOCI with aggregated scores
loci = LOCI$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = msr("regr.mse")
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci$compute()

# Using observation-wise losses with measure's aggregation function
loci_obsloss = LOCI$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = msr("regr.mae"), # uses MAE's aggregation function (mean) internally
  obs_loss = TRUE
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci_obsloss$compute()
loci_obsloss$obs_losses
#>      row_ids      feature iteration iter_refit    truth response_ref
#>        <int>       <char>     <int>      <int>    <num>        <num>
#>   1:       4   important1         1          1 15.36222     14.04979
#>   2:       4   important2         1          1 15.36222     14.04979
#>   3:       4   important3         1          1 15.36222     14.04979
#>   4:       4   important4         1          1 15.36222     14.04979
#>   5:       4   important5         1          1 15.36222     14.04979
#>  ---                                                                
#> 666:     190 unimportant1         1          1 11.60406     14.04979
#> 667:     190 unimportant2         1          1 11.60406     14.04979
#> 668:     190 unimportant3         1          1 11.60406     14.04979
#> 669:     190 unimportant4         1          1 11.60406     14.04979
#> 670:     190 unimportant5         1          1 11.60406     14.04979
#>      response_feature loss_ref loss_feature   obs_diff
#>                 <num>    <num>        <num>      <num>
#>   1:         15.42979 1.312421   0.06757879  1.2448426
#>   2:         12.11727 1.312421   3.24494453 -1.9325232
#>   3:         20.14413 1.312421   4.78191717 -3.4694958
#>   4:         11.69276 1.312421   3.66945606 -2.3570347
#>   5:         13.48251 1.312421   1.87971032 -0.5672890
#>  ---                                                  
#> 666:         13.96568 2.445737   2.36161915  0.0841178
#> 667:         18.78093 2.445737   7.17687679 -4.7311398
#> 668:         13.16344 2.445737   1.55938273  0.8863542
#> 669:         15.96397 2.445737   4.35991736 -1.9141804
#> 670:         12.86600 2.445737   1.26193892  1.1837980

# LOCI with median aggregation (analogous to original LOCO)
mae_median = msr("regr.mae")
mae_median$aggregator = median
loci_median = LOCI$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = mae_median,
  obs_loss = TRUE
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loci_median$compute()