Skip to contents

Calculates Leave-One-Covariate-Out (LOCO) scores. Despite the name, this implementation can leave out one or more features at a time.

Details

LOCO measures feature importance by comparing model performance with and without each feature. For each feature, the model is retrained without that feature and the performance difference (reduced_model_loss - full_model_loss) indicates the feature's importance. Higher values indicate more important features.

References

Lei, Jing, Max, G'Sell, Alessandro, Rinaldo, J. R, Tibshirani, Wasserman, Larry (2018). “Distribution-Free Predictive Inference for Regression.” Journal of the American Statistical Association, 113(523), 1094–1111. ISSN 0162-1459, doi:10.1080/01621459.2017.1307116 .

Methods

Public methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage

LOCO$new(
  task,
  learner,
  measure,
  resampling = NULL,
  features = NULL,
  iters_refit = 1L,
  obs_loss = FALSE
)

Arguments

task

(mlr3::Task) Task to compute importance for.

learner

(mlr3::Learner) Learner to use for prediction.

measure

(mlr3::Measure) Measure to use for scoring.

resampling

(mlr3::Resampling) Resampling strategy. Defaults to holdout.

features

(character()) Features to compute importance for. Defaults to all features.

iters_refit

(integer(1): 1L) Number of refit iterations per resampling iteration.

obs_loss

(logical(1): FALSE) Whether to use observation-wise loss calculation (original LOCO formulation). If FALSE, uses aggregated scores. When TRUE, uses the measure's aggregation function or mean as fallback.


Method clone()

The objects of this class are cloneable with this method.

Usage

LOCO$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

library(mlr3learners)
task = tgen("friedman1")$generate(n = 200)

# Standard LOCO with aggregated scores
loco = LOCO$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = msr("regr.mse")
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loco$compute()

# Using observation-wise losses with measure's aggregation function
loco_obsloss = LOCO$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = msr("regr.mae"), # uses MAE's aggregation function (mean) internally
  obs_loss = TRUE
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loco_obsloss$compute()
loco_obsloss$obs_losses
#>      row_ids      feature iteration iter_refit    truth response_ref
#>        <int>       <char>     <int>      <int>    <num>        <num>
#>   1:       3   important1         1          1  7.61903     9.616444
#>   2:       3   important2         1          1  7.61903     9.616444
#>   3:       3   important3         1          1  7.61903     9.616444
#>   4:       3   important4         1          1  7.61903     9.616444
#>   5:       3   important5         1          1  7.61903     9.616444
#>  ---                                                                
#> 666:     199 unimportant1         1          1 12.31176    15.091447
#> 667:     199 unimportant2         1          1 12.31176    15.091447
#> 668:     199 unimportant3         1          1 12.31176    15.091447
#> 669:     199 unimportant4         1          1 12.31176    15.091447
#> 670:     199 unimportant5         1          1 12.31176    15.091447
#>      response_feature loss_ref loss_feature   obs_diff
#>                 <num>    <num>        <num>      <num>
#>   1:        12.470067 1.997413    4.8510362  2.8536231
#>   2:        10.619345 1.997413    3.0003148  1.0029017
#>   3:         8.536649 1.997413    0.9176182 -1.0797949
#>   4:        10.492322 1.997413    2.8732914  0.8758782
#>   5:         8.535012 1.997413    0.9159814 -1.0814317
#>  ---                                                  
#> 666:        14.253773 2.779689    1.9420140 -0.8376747
#> 667:        13.614412 2.779689    1.3026532 -1.4770355
#> 668:        13.659737 2.779689    1.3479780 -1.4317108
#> 669:        14.403667 2.779689    2.0919080 -0.6877807
#> 670:        13.311589 2.779689    0.9998304 -1.7798583

# Original LOCO formulation using median aggregation
mae_median = msr("regr.mae")
mae_median$aggregator = median
loco_original = LOCO$new(
  task = task,
  learner = lrn("regr.ranger", num.trees = 50),
  measure = mae_median,
  obs_loss = TRUE
)
#>  No <Resampling> provided
#> Using `resampling = rsmp("holdout")` with default `ratio = 0.67`.
loco_original$compute()