This functions provides a flexible interface to create a data set that
can be plugged in as newdata argument to a suitable predict
function (or similar).
The function is particularly useful in combination with one of the
add_* functions, e.g., add_term,
add_hazard, etc.
make_newdata(x, ...) # S3 method for default make_newdata(x, ...) # S3 method for ped make_newdata(x, ...) # S3 method for fped make_newdata(x, ...)
| x | A data frame (or object that inherits from |
|---|---|
| ... | Covariate specifications (expressions) that will be evaluated
by looking for variables in |
Depending on the type of variables in x, mean or modus values
will be used for variables not specified in ellipsis
(see also sample_info). If x is an object
that inherits from class ped, useful data set completion will be
attempted depending on variables specified in ellipsis. This is especially
useful, when creating a data set with different time points, e.g. to
calculate survival probabilities over time (add_surv_prob)
or to calculate a time-varying covariate effects (add_term).
To do so, the time variable has to be specified in ..., e.g.,
tend = seq_range(tend, 20). The problem with this specification is that
not all values produced by seq_range(tend, 20) will be actual values
of tend used at the stage of estimation (and in general, it will
often be tedious to specify exact tend values). make_newdata
therefore finds the correct interval and sets tend to the respective
interval endpoint. For example, if the intervals of the PED object are
\((0,1], (1,2]\) then tend = 1.5 will be set to 2 and the
remaining time-varying information (e.g. offset) completed accordingly.
See examples below.
# General functionality tumor %>% make_newdata()#> # A tibble: 1 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1017. 0.483 2.78 62.0 male no no yes #> # … with 1 more variable: resection <fct>#> # A tibble: 1 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1017. 0.483 2.78 50 male no no yes #> # … with 1 more variable: resection <fct>#> # A tibble: 6 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1 0.483 2.78 50 male no no yes #> 2 1904. 0.483 2.78 50 male no no yes #> 3 3806 0.483 2.78 50 male no no yes #> 4 1 0.483 2.78 55 male no no yes #> 5 1904. 0.483 2.78 55 male no no yes #> 6 3806 0.483 2.78 55 male no no yes #> # … with 1 more variable: resection <fct>#> # A tibble: 12 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <int> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1 0 2.78 50 male no no yes #> 2 1904. 0 2.78 50 male no no yes #> 3 3806 0 2.78 50 male no no yes #> 4 1 1 2.78 50 male no no yes #> 5 1904. 1 2.78 50 male no no yes #> 6 3806 1 2.78 50 male no no yes #> 7 1 0 2.78 55 male no no yes #> 8 1904. 0 2.78 55 male no no yes #> 9 3806 0 2.78 55 male no no yes #> 10 1 1 2.78 55 male no no yes #> 11 1904. 1 2.78 55 male no no yes #> 12 3806 1 2.78 55 male no no yes #> # … with 1 more variable: resection <fct># mean/modus values of unspecified variables are calculated over whole data tumor %>% make_newdata(sex=unique(sex))#> # A tibble: 2 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1017. 0.483 2.78 62.0 female no no yes #> 2 1017. 0.483 2.78 62.0 male no no yes #> # … with 1 more variable: resection <fct>#> # A tibble: 2 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct> #> 1 1060. 0.483 2.96 63.3 male no no yes #> 2 954. 0.484 2.52 60.1 female no no yes #> # … with 1 more variable: resection <fct># You can also pass a part of the data sets as data frame to make_newdata purrr::cross_df(list(days = c(0, 500, 1000), sex = c("male", "female"))) %>% make_newdata(x=tumor)#> # A tibble: 6 x 9 #> days status charlson_score age sex transfusion complications metastases #> <dbl> <dbl> <dbl> <dbl> <chr> <fct> <fct> <fct> #> 1 0 0.483 2.78 62.0 male no no yes #> 2 500 0.483 2.78 62.0 male no no yes #> 3 1000 0.483 2.78 62.0 male no no yes #> 4 0 0.483 2.78 62.0 female no no yes #> 5 500 0.483 2.78 62.0 female no no yes #> 6 1000 0.483 2.78 62.0 female no no yes #> # … with 1 more variable: resection <fct># Examples for PED data ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000)) ped %>% make_newdata(age=c(50, 55))#> tstart tend intlen interval id offset ped_status charlson_score age sex #> 1 0 500 500 (0,500] 1.8 6.214608 0 2 50 female #> 2 0 500 500 (0,500] 1.8 6.214608 0 2 55 female #> transfusion complications metastases resection #> 1 yes no yes no #> 2 yes no yes no# if time information is specified, other time variables will be specified # accordingly and offset calculated correctly ped %>% make_newdata(tend = c(1000), age = c(50, 55))#> tstart tend intlen interval id offset ped_status charlson_score age #> 1 500 1000 500 (500,1000] 1.8 6.214608 0 2 50 #> 2 500 1000 500 (500,1000] 1.8 6.214608 0 2 55 #> sex transfusion complications metastases resection #> 1 female yes no yes no #> 2 female yes no yes no#> tstart tend intlen interval id offset ped_status charlson_score age #> 1 0 500 500 (0,500] 1.8 6.214608 0 2 58.8 #> 2 500 1000 500 (500,1000] 1.8 6.214608 0 2 58.8 #> sex transfusion complications metastases resection #> 1 female yes no yes no #> 2 female yes no yes no#> # A tibble: 4 x 14 #> tstart tend intlen interval id offset ped_status charlson_score age #> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0 500 500 (0,500] 2 6.21 0 2 52 #> 2 0 500 500 (0,500] 1.67 6.21 0 2 63.3 #> 3 500 1000 500 (500,1000] 2 6.21 0 2 52 #> 4 500 1000 500 (500,1000] 1.67 6.21 0 2 63.3 #> # … with 5 more variables: sex <fct>, transfusion <fct>, complications <fct>, #> # metastases <fct>, resection <fct># tend is set to the end point of respective interval: ped <- tumor %>% as_ped(Surv(days, status)~.) seq_range(ped$tend, 3)#> [1] 1.0 1517.5 3034.0#>#> tstart tend intlen interval id offset ped_status charlson_score #> 1 0 1 1 (0,1] 392.6801 0.000000 0 2.72929 #> 2 1502 1538 36 (1502,1538] 392.6801 3.583519 0 2.72929 #> 3 2808 3034 226 (2808,3034] 392.6801 5.420535 0 2.72929 #> age sex transfusion complications metastases resection #> 1 61.31348 male no no yes no #> 2 61.31348 male no no yes no #> 3 61.31348 male no no yes no