4.4 Modeling

The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner. In the following we chain two preprocessing tasks:

mutate (creation of a new feature)
filter (filtering the dataset)

and then chain a PO learner to train and predict on the modified dataset.

graph = mutate %>>%
  filter %>>%
  mlr_pipeops$get("learner",
    learner = mlr_learners$get("classif.rpart"))

Until here we defined the main pipeline stored in Graph. Now we can train and predict the pipeline:

task = mlr_tasks$get("iris")
graph$train(task)

## $classif.rpart.output
## NULL

graph$predict(task)

## $classif.rpart.output
## <PredictionClassif> for 150 observations:
##     row_id     truth  response
##          1    setosa    setosa
##          2    setosa    setosa
##          3    setosa    setosa
## ---                           
##        148 virginica virginica
##        149 virginica virginica
##        150 virginica virginica

Rather than calling $train() and $predict() manually, we can put the pipeline Graph into a GraphLearner object. A GraphLearner encapsulates the whole pipeline (including the preprocessing steps) and can be put into resample() or benchmark() . If you are familiar with the old mlr package, this is the equivalent of all the make*Wrapper() functions. The pipeline being encapsulated (here Graph ) must always produce a Prediction with its $predict() call, so it will probably contain at least one PipeOpLearner .

glrn = GraphLearner$new(graph)

This learner can be used for model fitting, resampling, benchmarking, and tuning:

cv3 = rsmp("cv", folds = 3)
resample(task, glrn, cv3)

## <ResampleResult> of 3 iterations
## * Task: iris
## * Learner: mutate.variance.classif.rpart
## * Warnings: 0 in 0 iterations
## * Errors: 0 in 0 iterations

4.4.1 Setting Hyperparameters

Individual POs offer hyperparameters because they contain $param_set slots that can be read and written from $param_set$values (via the paradox package). The parameters get passed down to the Graph, and finally to the GraphLearner . This makes it not only possible to easily change the behavior of a Graph / GraphLearner and try different settings manually, but also to perform tuning using the mlr3tuning package.

glrn$param_set$values$variance.filter.frac = 0.25
cv3 = rsmp("cv", folds = 3)
resample(task, glrn, cv3)

## <ResampleResult> of 3 iterations
## * Task: iris
## * Learner: mutate.variance.classif.rpart
## * Warnings: 0 in 0 iterations
## * Errors: 0 in 0 iterations

4.4.2 Tuning

If you are unfamiliar with tuning in mlr3, we recommend to take a look at the section about tuning first. Here we define a ParamSet for the “rpart” learner and the “variance” filter which should be optimized during tuning.

library("paradox")
ps = ParamSet$new(list(
  ParamDbl$new("classif.rpart.cp", lower = 0, upper = 0.05),
  ParamDbl$new("variance.filter.frac", lower = 0.25, upper = 1)
))

After having defined the PerformanceEvaluator, a random search with 10 iterations is created. For the inner resampling, we are simply doing holdout (single split into train/test) to keep the runtimes reasonable.

library("mlr3tuning")
instance = TuningInstance$new(
  task = task,
  learner = glrn,
  resampling = rsmp("holdout"),
  measures = msr("classif.ce"),
  param_set = ps,
  terminator = term("evals", n_evals = 20)
)

tuner = TunerRandomSearch$new()
tuner$tune(instance)

The tuning result can be found in the result slot.

instance$result