Last updated: 2019-05-28

Checks: 6 0

Knit directory: 2018-model-comparison/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20190523)

The command set.seed(20190523) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Repository version: 82bf724

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rproj.user/
    Ignored:    .drake/
    Ignored:    code/07-paper/._files
    Ignored:    data/
    Ignored:    log/
    Ignored:    packrat/lib-R/
    Ignored:    packrat/lib-ext/
    Ignored:    packrat/lib/
    Ignored:    rosm.cache/
    Ignored:    tests/testthat/

Untracked files:
    Untracked:  packrat/src/drake/40af919c816b5ddd61b5280dda72d35dfa54cf73.tar.gz
    Untracked:  packrat/src/drake/drake_7.3.0.tar.gz

Unstaged changes:
    Modified:   .Rhistory
    Modified:   _drake.R
    Modified:   analysis/_site.yml
    Modified:   code/01-data/task.R
    Modified:   code/04-prediction/prediction.R
    Modified:   code/06-reports.R
    Modified:   code/07-paper/submission/3/latex-source-files/cv_boxplots_final_brier-1.pdf
    Modified:   packrat/packrat.lock

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
html	2c3f42c	pat-s	2019-05-28	Build site.
html	188497c	pat-s	2019-05-25	Build site.
html	1e2f112	pat-s	2019-05-23	Build site.
Rmd	8394815	pat-s	2019-05-23	wflow_publish(c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/license.Rmd”))
Rmd	eb20984	pat-s	2019-05-23	add workflowr structure
Rmd	769718b	pat-s	2019-05-23	Start workflowr project.

Hyperparameter tuning and performance assessment of statistical and machine-learning models using spatial data.

Authors

Patrick Schratz (patrick.schratz@gmail.com)
Jannes Muenchow
Eugenia Iturritxa
Jakob Richter
Alexander Brenning

This repository contains the research compendium of the above mentioned paper.

In addition, it contains code and results for the LIFE Healthy Forest project. The following reports are available:

How to use

Read the code, access the data

See the code directory on GitHub for the source code that generated the figures and statistical results contained in the manuscript. The raw data is stored on Zenodo and will be downloaded when starting the analysis.

Install the R package

This repository is organized as an R package, providing functions and raw data to reproduce and extend the analysis reported in the publication. Note that this package has been written explicitly for this project and may not be suitable for general use.

This project is setup with a drake workflow, ensuring reproducibility. Intermediate targets/objects will be stored in a hidden .drake directory.

The R library of this project is managed by packrat. This makes sure that the exact same package versions are used when recreating the project. When calling packrat::restore(), all required packages will be installed with their specific version.

Please note that this project was built with R version 3.5.1 on a CentOS 7.5 operating system. The packrat packages from this project are not compatible with R versions prior version 3.5.0. (In general, it should be possible to reproduce the analysis on any other operating system.)

To clone the project, a working installation of git is required. Open a terminal in the directory of your choice and execute:

git clone git@github.com:pat-s/pathogen-modeling.git

Then start R in this directory and run

packrat::restore() # restores all R packages with their specific version
r_make() # recreates the analysis

Structure of the analysis

In the drake philosophy, every R object is a “target” with dependencies. This repository contains more targets than actually needed to replicate the associated publication.

If you want to replicate the publication, you need to build the following targets:

benchmark_diplodia (Benchmark report for pathogen Diplodia sapinea)
vis_opt_paths(Optimization path figure)
vis_tuning_effects (Tuning effects figure)
vis_partitions (Partition figure)

Please note that the attached runtime of these targets are derived from a parallel execution on a HPC with 16 cores (AMD Threadripper 2950X) and 126 GB RAM (DDR4) per node. Reproducing these targets sequentially on a local machine might take weeks.

Other practical notes

All “diplodia” targets (bm_sp_sp_diplodia, bm_sp_nsp_diplodia and bm_nsp_nsp_diplodia) are built with mlr::benchmark(keep.extract = TRUE) in benchmark_custom. This slot is needed to perform analysis on the tuning results. All other pathogens are built with mlr::benchmark(keep.extract = FALSE) to save disk space of the resulting R objects. BMR + tuning results = ~ 3 GB, BMR - tuning results = xx MB.

Notes and resources

The issues tracker is the place to report problems or ask questions
See the repository history for a fine-grained view of progress and changes.
The structure of this compendium is based on the work of Carl Boettiger, Ben Marwick and the workflowr package.

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/spack/opt/spack/linux-centos7-x86_64/gcc-7.3.0/r-3.5.1-b4xhm3pook4yl4olk6ttnovnyttdpkhe/rlib/R/lib/libRblas.so
LAPACK: /opt/spack/opt/spack/linux-centos7-x86_64/gcc-7.3.0/r-3.5.1-b4xhm3pook4yl4olk6ttnovnyttdpkhe/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.3.0 Rcpp_1.0.0      digest_0.6.15   rprojroot_1.3-2
 [5] backports_1.1.2 git2r_0.23.0    magrittr_1.5    evaluate_0.13  
 [9] stringi_1.2.4   fs_1.2.6        whisker_0.3-2   rmarkdown_1.12 
[13] tools_3.5.1     stringr_1.3.1   glue_1.3.0      xfun_0.7       
[17] yaml_2.2.0      compiler_3.5.1  htmltools_0.3.6 knitr_1.23

About