Comparison with fippy (Python Implementation)

library(data.table)
library(ggplot2)
library(jsonlite)
library(knitr)
library(glue)
library(here)

# Set seed for reproducibility
set.seed(123)

Overview

This article compares xplainfi’s feature importance implementations with those from fippy, a Python package implementing similar methods. This comparison serves as a regression test to ensure methodological consistency across language implementations.

The comparison includes:

PFI (Permutation Feature Importance)
CFI (Conditional Feature Importance)
RFI (Relative Feature Importance)
SAGE (Shapley Additive Global Importance) - both marginal and conditional variants

Methodology

Both implementations use:

Dataset: Ewald et al. simulation with 5000 observations (sim_dgp_ewald())
Evaluation: Same train/test data (70% train)
Metrics: Mean Squared Error for importance calculations

The Ewald simulation provides an interpretable test case where we can better understand expected feature importance patterns, particularly for conditional methods that account for feature dependencies.

Due to the difference in underlying conditional samplers (ARF for xplainfi vs Gaussian samplers in fippy) we expect conditional methods to show more variation than marginal ones.

Setup and Execution

The comparison uses separate calculation scripts:

# 1. Calculate xplainfi results
cd vignettes/articles/fippy-comparison
Rscript calculate_xplainfi.R

# 2. Calculate fippy results (portable with uv - automatically installs dependencies)
./calculate_fippy.py

Both scripts generate JSON files with results that are loaded below for comparison.

Expected Feature Importance Patterns

The Ewald simulation (sim_dgp_ewald) generates a regression task with 5 features (x1-x5) where:

All features contribute to the target variable, but with different weights
Some features are correlated, making conditional methods particularly interesting
The known data generating process allows us to validate whether the methods identify sensible patterns

See the article for more details on the DGP

Load Results

# Check that both result files exist
# Look in the fippy-comparison subdirectory
base_dir <- here::here("vignettes", "articles", "fippy-comparison")
xplainfi_results_path <- file.path(base_dir, "xplainfi_results.json")
fippy_results_path <- file.path(base_dir, "fippy_results.json")

if (!file.exists(xplainfi_results_path)) {
  stop("xplainfi_results.json not found. Please run calculate_xplainfi.R first.")
}

if (!file.exists(fippy_results_path)) {
  stop("fippy_results.json not found. Please run calculate_fippy.py first.")
}

# Load results from both implementations
xplainfi_results <- fromJSON(xplainfi_results_path)
fippy_results <- fromJSON(fippy_results_path)

Model Performance Comparison

performance_comparison <- data.table(
  Implementation = c("xplainfi (R)", "fippy (Python)"),
  R_squared = c(
    round(xplainfi_results$model_performance$r_squared, 4),
    round(fippy_results$model_performance$r_squared, 4)
  )
)

kable(performance_comparison, caption = "Model Performance Comparison")

Model Performance Comparison
Implementation	R_squared
xplainfi (R)	0.8416
fippy (Python)	0.8376

Method Comparisons

compare_method <- function(method_name, xplainfi_result, fippy_result) {
  # Both implementations available
  method_dt <- data.table(
    feature = xplainfi_result$feature,
    xplainfi = xplainfi_result$importance,
    fippy = fippy_result$importance
  )
  
  # Return table and correlation for display
  correlation <- cor(method_dt$xplainfi, method_dt$fippy)
  correlation_spearman = cor(method_dt$xplainfi, method_dt$fippy, method = "spearman")
  
  list(
    method = method_name, 
    table = kable(method_dt[order(-xplainfi)], 
                  caption = glue("{method_name} Results Comparison"), 
                  digits = 4),
    correlation = correlation,
    correlation_spearman = correlation_spearman
  )
}

PFI (Permutation Feature Importance)

pfi_result <- compare_method("PFI", xplainfi_results$PFI, fippy_results$PFI)
pfi_result$table

PFI Results Comparison
feature	xplainfi	fippy
x4	0.6926	0.7892
x5	0.3699	0.4809
x3	0.0136	0.0030
x2	0.0000	0.0020
x1	-0.0003	0.0006

CFI (Conditional Feature Importance)

cfi_result <- compare_method("CFI", xplainfi_results$CFI, fippy_results$CFI)
cfi_result$table

CFI Results Comparison
feature	xplainfi	fippy
x4	0.4394	0.4706
x5	0.3612	0.4491
x3	0.0016	0.0013
x2	-0.0002	-0.0007
x1	-0.0004	0.0011

RFI (Relative Feature Importance)

rfi_result <- compare_method("RFI", xplainfi_results$RFI, fippy_results$RFI)
rfi_result$table

RFI Results Comparison
feature	xplainfi	fippy
x4	0.6841	0.6920
x5	0.3532	0.2546
x3	0.0118	0.0018
x1	0.0000	0.0000
x2	0.0000	0.0002

glue("xplainfi conditioning set: {paste(xplainfi_results$RFI$conditioning_set, collapse = ', ')}")

## xplainfi conditioning set: x1, x2

glue("fippy conditioning set: {paste(fippy_results$RFI$conditioning_set, collapse = ', ')}")

## fippy conditioning set: x1, x2

SAGE Methods

Marginal SAGE

sage_marginal_result <- compare_method("Marginal SAGE", xplainfi_results$SAGE_Marginal, fippy_results$SAGE_Marginal)
sage_marginal_result$table

Marginal SAGE Results Comparison
feature	xplainfi	fippy
x4	0.3817	0.4695
x5	0.1706	0.3759
x3	0.0182	0.0147
x1	-0.0002	0.0021
x2	-0.0003	-0.0131

Conditional SAGE

sage_conditional_result <- compare_method("Conditional SAGE", xplainfi_results$SAGE_Conditional, fippy_results$SAGE_Conditional)
sage_conditional_result$table

Conditional SAGE Results Comparison
feature	xplainfi	fippy
x4	0.4976	0.3661
x5	0.1949	0.3567
x3	0.0631	0.0907
x1	-0.0835	0.0002
x2	-0.0895	-0.0273

Correlation Summary

correlations <- rbindlist(list(
  pfi_result[c("method", "correlation", "correlation_spearman")],
  cfi_result[c("method", "correlation", "correlation_spearman")],
  rfi_result[c("method", "correlation", "correlation_spearman")],
  sage_marginal_result[c("method", "correlation", "correlation_spearman")],
  sage_conditional_result[c("method", "correlation", "correlation_spearman")]
))

kable(
  correlations, 
  digits = 4,
  caption = "Pearson and Spearman Correlations between xplainfi and fippy", 
  col.names = c("Method", "Pearson Corr.", "Spearman Corr.")
)

Pearson and Spearman Correlations between xplainfi and fippy
Method	Pearson Corr.	Spearman Corr.
PFI	0.9973	1.0000
CFI	0.9955	0.9000
RFI	0.9894	0.9747
Marginal SAGE	0.9475	1.0000
Conditional SAGE	0.9032	1.0000

melt(correlations, id.vars = "method") |>
  ggplot(aes(x = reorder(method, value), y = value)) +
    facet_wrap(vars(variable), ncol = 1, labeller = as_labeller(c(correlation = "Pearson's", correlation_spearman = "Spearman's"))) +
    geom_col(fill = "steelblue", alpha = 0.7) +
    geom_hline(yintercept = c(0.5, 1), linetype = "dashed", color = "red", alpha = 0.7) +
    coord_flip() +
    labs(
      title = "Implementation Correlations",
      subtitle = "xplainfi (R) vs fippy (Python)",
      x = "Method",
      y = "Correlation"
    ) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title.position = "plot"
    )

Interpretation in Context of Ewald Simulation

The Ewald simulation provides interpretable feature importance patterns that help validate both implementations:

Method-Specific Insights

Marginal vs Conditional Methods:

PFI and Marginal SAGE ignore feature correlations
CFI, RFI, and Conditional SAGE account for feature dependencies
Differences between marginal and conditional variants reveal the impact of feature correlations

RFI Conditioning: Both implementations use {x3} as the conditioning set.

Cross-Implementation Consistency: High correlations indicate that both xplainfi and fippy identify similar underlying feature importance patterns despite using different:

Programming languages (R vs Python)
Conditional sampling approaches (ARF vs Gaussian)
Implementation details

Cross-language validation of feature importance methods