In this article we present basic examples for the estimation of PAMs using pammtools and compare the results to estimates obtained using the coxph function from the survival package.
In the following two sections, we first describe the classical piece-wise exponential model (PEM) and after that the extension to piece-wise exponential additive models (PAM).
The strength of the PEM is that analysis of time-to-event data can be performed using algorithms designed to fit Generalized Linear Models. This approach yields coefficient estimates that are equivalent to the estimates obtained from Cox Proportional Hazards models if
there are no ties (i.e., simultaneous events) in the data and
all unique event (and censoring) times are used as interval cut points in order to transform the data into the format suitable for PEMs (see ?as_ped and the vignette on data transformation).
Estimates can differ between the two approaches due to a more crude handling of ties in the PEM approach (Whitehead 1980). In practice these differences are usually negligible (see examples below).
We first demonstrate the equivalence using a subset of the Veterans’ data (Kalbfleisch and Prentice 1980) provided in the survival package:
library(survival)
library(mgcv)
library(pammtools)
library(dplyr)
data("veteran", package = "survival") # load veteran data## Warning in data("veteran", package = "survival"): data set 'veteran' not found
# remove ties to illustrate equivalence with Cox approach
vetu <- filter(veteran, !duplicated(time))
ped_vetu <- vetu %>%
as_ped(Surv(time, status)~., cut = unique(vetu$time), id = "id")
pem_age <- glm(ped_status ~ interval - 1 + age, data = ped_vetu,
family = poisson(), offset = offset)
## cox model for comparison
cph_age <- coxph(Surv(time, status) ~ age, data = vetu)
## compare coefficients
cbind(
pem = coef(pem_age)["age"],
cox = coef(cph_age))## pem cox
## age 0.006413903 0.006413903
In this case both models yield equivalent estimates.
## Using the full data set (with ties) yields slightly different results
# when comparing PEM to Cox-PH
ped_vet <- veteran %>%
as_ped(Surv(time, status)~., cut = unique(veteran$time), id = "id")
pem2_age <- glm(ped_status ~ interval - 1 + age, data = ped_vet,
family = poisson(), offset = offset)
cph2_age <- coxph(Surv(time, status) ~ age, data = veteran)
## compare coefficient estimate to Cox-PH estimate
cbind(
pem = coef(pem2_age)["age"],
cox = coef(cph2_age))## pem cox
## age 0.007470507 0.007499866
PAMs have two main advantages over PEMs (Bender, Groll, and Scheipl 2018):
PAM estimation scales better when the number of intervals becomes large: PEMs need to estimate one parameter per interval for the baseline hazard, while the number of parameters in PAMs only depends on the number of basis functions used for the spline estimate of the baseline hazard.
In PEMs, baseline hazard estimates for each interval can vary a lot between neighboring intervals, especially when intervals only contain few events. For PAMs on the other hand estimates of the baseline hazard in neighboring intervals are similar due to penalization unless the data provides very strong evidence for large changes between neighboring intervals.
Note that coefficient estimates in PAMs are no longer equivalent to those from Cox PH models, since the estimation of the baseline hazard is performed semi-parametrically. In our experience the differences are negligible.
Using the example above (data without ties) we get:
## compare to PAM
pam_age <- gam(ped_status ~ s(tend) + age, data = ped_vetu,
family = "poisson", offset = offset)
cbind(
pam = coef(pam_age)["age"],
pem = coef(pem_age)["age"],
cox = coef(cph_age)["age"])## pam pem cox
## age 0.007543308 0.006413903 0.006413903
Bender, Andreas, Andreas Groll, and Fabian Scheipl. 2018. “A Generalized Additive Model Approach to Time-to-Event Analysis.” Statistical Modelling 18 (3-4): 299–321. https://doi.org/10.1177/1471082X17748083.
Kalbfleisch, J., and R. Prentice. 1980. The Statistical Analysis of Failure Time Data. New York: Wiley.
Whitehead, John. 1980. “Fitting Cox’s Regression Model to Survival Data Using GLIM.” Applied Statistics 29 (3): 268–75.