7.1 Survival Analysis
Survival analysis examines data on whether a specific event of interest takes place and how long it takes till this event occurs. One cannot use ordinary regression analysis when dealing with survival analysis data sets. Firstly, survival data contains solely positive values and therefore needs to be transformed to avoid biases. Secondly, ordinary regression analysis cannot deal with censored observations accordingly. Censored observations are observations in which the event of interest has not occurred, yet. Survival analysis allows the user to handle censored data with limited time frames that sometimes do not entail the event of interest. Note that survival analysis accounts for both censored and uncensored observations while adjusting respective model parameters.
The package mlr3proba extends mlr3 with the following objects for survival analysis:
TaskSurvto define (right-censored) survival tasksLearnerSurvas base class for survival learnersPredictionSurvas specialized class forPredictionobjectsMeasureSurvas specialized class for performance measures
In this example we demonstrate the basic functionality of the package on the rats data from the survival package.
This task ships as pre-defined TaskSurv with mlr3proba.
## <TaskSurv:rats> (300 x 5)
## * Target: time, status
## * Properties: -
## * Features (3):
## - int (2): litter, rx
## - fct (1): sex
## [1] 101+ 49 104+ 91+ 104+ 102+
Now, we conduct a small benchmark study on the rats task using some of the integrated survival learners:
# some integrated learners
learners = lapply(c("surv.coxph", "surv.kaplan", "surv.ranger"), lrn)
print(learners)## [[1]]
## <LearnerSurvCoxPH:surv.coxph>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, factor
## * Properties: weights
##
## [[2]]
## <LearnerSurvKaplan:surv.kaplan>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: crank
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: missings
##
## [[3]]
## <LearnerSurvRanger:surv.ranger>
## * Model: -
## * Parameters: list()
## * Packages: ranger, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: importance, oob_error, weights
## <MeasureSurvUnoC:surv.unoC>
## * Packages: survAUC
## * Range: [0, 1]
## * Minimize: FALSE
## * Properties: na_score, requires_task, requires_train_set
## * Predict type: crank
set.seed(1)
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 3)))
bmr$aggregate(measure)## # A tibble: 3 x 7
## nr resample_result task_id learner_id resampling_id iters surv.unoC
## <int> <list> <chr> <chr> <chr> <int> <dbl>
## 1 1 <RsmplRsl> rats surv.coxph cv 3 0.904
## 2 2 <RsmplRsl> rats surv.kaplan cv 3 0
## 3 3 <RsmplRsl> rats surv.ranger cv 3 0.864