7.1 Survival Analysis

Survival analysis examines data on whether a specific event of interest takes place and how long it takes till this event occurs. One cannot use ordinary regression analysis when dealing with survival analysis data sets. Firstly, survival data contains solely positive values and therefore needs to be transformed to avoid biases. Secondly, ordinary regression analysis cannot deal with censored observations accordingly. Censored observations are observations in which the event of interest has not occurred, yet. Survival analysis allows the user to handle censored data with limited time frames that sometimes do not entail the event of interest. Note that survival analysis accounts for both censored and uncensored observations while adjusting respective model parameters.

The package mlr3proba extends mlr3 with the following objects for survival analysis:

TaskSurv to define (right-censored) survival tasks
LearnerSurv as base class for survival learners
PredictionSurv as specialized class for Prediction objects
MeasureSurv as specialized class for performance measures

In this example we demonstrate the basic functionality of the package on the rats data from the survival package. This task ships as pre-defined TaskSurv with mlr3proba.

library("mlr3proba")
task = tsk("rats")
print(task)

## <TaskSurv:rats> (300 x 5)
## * Target: time, status
## * Properties: -
## * Features (3):
##   - int (2): litter, rx
##   - fct (1): sex

# the target column is a survival object:
head(task$truth())

## [1] 101+  49  104+  91+ 104+ 102+

# kaplan-meier plot
library("mlr3viz")
autoplot(task)

Now, we conduct a small benchmark study on the rats task using some of the integrated survival learners:

# some integrated learners
learners = lapply(c("surv.coxph", "surv.kaplan", "surv.ranger"), lrn)
print(learners)

## [[1]]
## <LearnerSurvCoxPH:surv.coxph>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, factor
## * Properties: weights
## 
## [[2]]
## <LearnerSurvKaplan:surv.kaplan>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: crank
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: missings
## 
## [[3]]
## <LearnerSurvRanger:surv.ranger>
## * Model: -
## * Parameters: list()
## * Packages: ranger, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: importance, oob_error, weights

# Uno's C-Index for survival
measure = msr("surv.unoC")
print(measure)

## <MeasureSurvUnoC:surv.unoC>
## * Packages: survAUC
## * Range: [0, 1]
## * Minimize: FALSE
## * Properties: na_score, requires_task, requires_train_set
## * Predict type: crank

set.seed(1)
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 3)))
bmr$aggregate(measure)

## # A tibble: 3 x 7
##      nr resample_result task_id learner_id  resampling_id iters surv.unoC
##   <int> <list>          <chr>   <chr>       <chr>         <int>     <dbl>
## 1     1 <RsmplRsl>      rats    surv.coxph  cv                3     0.904
## 2     2 <RsmplRsl>      rats    surv.kaplan cv                3     0    
## 3     3 <RsmplRsl>      rats    surv.ranger cv                3     0.864

autoplot(bmr, measure = measure)