4.4 Modeling
The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner.
In the following we chain two preprocessing tasks:
- mutate (creation of a new feature)
- filter (filtering the dataset)
and then chain a PO learner to train and predict on the modified dataset.
graph = mutate %>>%
filter %>>%
mlr_pipeops$get("learner",
learner = mlr_learners$get("classif.rpart"))Until here we defined the main pipeline stored in Graph.
Now we can train and predict the pipeline:
## $classif.rpart.output
## NULL
## $classif.rpart.output
## <PredictionClassif> for 150 observations:
## row_id truth response
## 1 setosa setosa
## 2 setosa setosa
## 3 setosa setosa
## ---
## 148 virginica virginica
## 149 virginica virginica
## 150 virginica virginica
Rather than calling $train() and $predict() manually, we can put the pipeline Graph into a GraphLearner object.
A GraphLearner encapsulates the whole pipeline (including the preprocessing steps) and can be put into resample() or benchmark() .
If you are familiar with the old mlr package, this is the equivalent of all the make*Wrapper() functions.
The pipeline being encapsulated (here Graph ) must always produce a Prediction with its $predict() call, so it will probably contain at least one PipeOpLearner .
This learner can be used for model fitting, resampling, benchmarking, and tuning:
## <ResampleResult> of 3 iterations
## * Task: iris
## * Learner: mutate.variance.classif.rpart
## * Warnings: 0 in 0 iterations
## * Errors: 0 in 0 iterations
4.4.1 Setting Hyperparameters
Individual POs offer hyperparameters because they contain $param_set slots that can be read and written from $param_set$values (via the paradox package).
The parameters get passed down to the Graph, and finally to the GraphLearner .
This makes it not only possible to easily change the behavior of a Graph / GraphLearner and try different settings manually, but also to perform tuning using the mlr3tuning package.
glrn$param_set$values$variance.filter.frac = 0.25
cv3 = rsmp("cv", folds = 3)
resample(task, glrn, cv3)## <ResampleResult> of 3 iterations
## * Task: iris
## * Learner: mutate.variance.classif.rpart
## * Warnings: 0 in 0 iterations
## * Errors: 0 in 0 iterations
4.4.2 Tuning
If you are unfamiliar with tuning in mlr3, we recommend to take a look at the section about tuning first.
Here we define a ParamSet for the “rpart” learner and the “variance” filter which should be optimized during tuning.
library("paradox")
ps = ParamSet$new(list(
ParamDbl$new("classif.rpart.cp", lower = 0, upper = 0.05),
ParamDbl$new("variance.filter.frac", lower = 0.25, upper = 1)
))After having defined the PerformanceEvaluator, a random search with 10 iterations is created.
For the inner resampling, we are simply doing holdout (single split into train/test) to keep the runtimes reasonable.
library("mlr3tuning")
instance = TuningInstance$new(
task = task,
learner = glrn,
resampling = rsmp("holdout"),
measures = msr("classif.ce"),
param_set = ps,
terminator = term("evals", n_evals = 20)
)The tuning result can be found in the result slot.