3 Parallel Processing

The furrr package provides a parallel version of the map functions for iteration. The developers provide useful documentation and deep dives that are worth reading when you start using future_map() and its variants.

foreach provides an alternative for loop that can run sequentially or in parallel as requested. foreach is used under the hood to do resampling by fit_resamples() and tune_grid() in tidymodels

Michael Hallquist has provided a useful and detailed overview of parallel processing. It is a good first read to orient to terms and concepts. However, it does not describe either the future package or the furrr package. It does provide brief introduction to foreach

Info on future ecosystem and more

Parallel processing and other optimizations in tidymodels

Here are the number of physical (not logical) cores on this machine. You may have more or less

parallel::detectCores(logical = FALSE)

[1] 40

Lets use them by setting up a parallel backend.

This works on Windows, Mac OSs, and Linux.
Options that use forking rather than sockets may be faster for Linux and Mac OSs but we prefer sockets because it works on all three major OSs and we use all three in our lab.
We prefer to use namespace when calling these functions rather than loading full packages

cl <- parallel::makePSOCKcluster(parallel::detectCores(logical = FALSE))
doParallel::registerDoParallel(cl)

3.0.1 future_map()

Here is the use of map (that uses sequential processing)

tic()
x <- map(c(2, 2, 2), \(time) Sys.sleep(time))
toc()

6.015 sec elapsed

Using future_map() without a plan (Don’t do this!)

tic()
x <- future_map(c(2, 2, 2), \(time) Sys.sleep(time))
toc()

6.082 sec elapsed

Using future_map() with a plan (Do this!)

plan(multisession, workers = parallel::detectCores(logical = FALSE))

tic()
x <- future_map(c(2, 2, 2), \(time) Sys.sleep(time))
toc()

3.977 sec elapsed

plan(sequential)

3.0.2 foreach()

foreach() in sequential mode using %do%

tic()
x <- foreach(time = c(2, 2, 2), .combine = "c") %do% {
  Sys.sleep(time)
  time
}
toc()

6.019 sec elapsed

foreach() in parallel mode using %dopar% but without a plan. in contrast to future_map(), no plan is needed for foreach(). You should use it without a plan!

tic()
x <- foreach(time = c(2, 2, 2), .combine = "c") %dopar% {
  Sys.sleep(time)
  time
}
toc()

2.078 sec elapsed

But a plan doesn’t break anything either Still, don’t use it because not needed.

plan(multisession, workers = parallel::detectCores(logical = FALSE))

tic()
x <- foreach(time = c(2, 2, 2), .combine = "c") %dopar% {
  Sys.sleep(time)
  time
}
toc()

2.062 sec elapsed

plan(sequential)

Need a demo for how to handle random numbers. No error or warning here but %dorng% is recommended I think?

tic()
x <- foreach(time = c(2, 2, 2), .combine = "c") %dopar% {
  Sys.sleep(time)
  rnorm(1)
}
toc()

2.08 sec elapsed

3.1 tune_grid() in tidymodels

Set up data, resamples, recipe, tuning grid. Will do 3x 10-fold CV to tune an elasticnet glm with a sample size of 1000 and 30 features

# set up data
n_obs <- 1000
n_x <- 30
irr_err <- 5
d <- MASS::mvrnorm(n = n_obs, mu = rep(0,n_x), Sigma = diag(n_x)) %>% 
    magrittr::set_colnames(str_c("x", 1:n_x)) %>% 
    as_tibble() %>% 
    mutate(error = rnorm(n_obs, 0, irr_err),
           y = rowSums(across(everything()))) %>% 
    select(-error)

# recipe
rec <- recipe(y ~ ., data = d)

# 10-fold CV
set.seed(19690127)
splits <- d %>% 
  vfold_cv(v = 10, strata = "y")

# tuning grid
tune_grid <- expand_grid(penalty = exp(seq(0, 6, length.out = 200)),
                           mixture = seq(0, 1, length.out = 11))

First, let’s benchmark without parallel processing. tune_grid() (and fit_resamples()) default is to allow parallel processing so have to explicitly turn it off using control_grid(). You will NOT do this. It is only to show the benefits of parallel processing.

tic()
linear_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet") %>% 
  tune_grid(preprocessor = rec, 
            resamples = splits, grid = tune_grid, 
            metrics = metric_set(rmse),
            control = control_grid(allow_par = FALSE)) # turn off pp

# Tuning results
# 10-fold cross-validation using stratification 
# A tibble: 10 × 4
   splits            id     .metrics             .notes          
   <list>            <chr>  <list>               <list>          
 1 <split [900/100]> Fold01 <tibble [2,200 × 6]> <tibble [0 × 3]>
 2 <split [900/100]> Fold02 <tibble [2,200 × 6]> <tibble [0 × 3]>
 3 <split [900/100]> Fold03 <tibble [2,200 × 6]> <tibble [0 × 3]>
 4 <split [900/100]> Fold04 <tibble [2,200 × 6]> <tibble [0 × 3]>
 5 <split [900/100]> Fold05 <tibble [2,200 × 6]> <tibble [0 × 3]>
 6 <split [900/100]> Fold06 <tibble [2,200 × 6]> <tibble [0 × 3]>
 7 <split [900/100]> Fold07 <tibble [2,200 × 6]> <tibble [0 × 3]>
 8 <split [900/100]> Fold08 <tibble [2,200 × 6]> <tibble [0 × 3]>
 9 <split [900/100]> Fold09 <tibble [2,200 × 6]> <tibble [0 × 3]>
10 <split [900/100]> Fold10 <tibble [2,200 × 6]> <tibble [0 × 3]>

toc()

12.621 sec elapsed

Now allow use of parallel processing (the default). No plan is needed here (consistent with findings for foreach()). Yay!

tic()
linear_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet") %>% 
  tune_grid(preprocessor = rec, 
            resamples = splits, grid = tune_grid, 
            metrics = metric_set(rmse))

# Tuning results
# 10-fold cross-validation using stratification 
# A tibble: 10 × 4
   splits            id     .metrics             .notes          
   <list>            <chr>  <list>               <list>          
 1 <split [900/100]> Fold01 <tibble [2,200 × 6]> <tibble [0 × 3]>
 2 <split [900/100]> Fold02 <tibble [2,200 × 6]> <tibble [0 × 3]>
 3 <split [900/100]> Fold03 <tibble [2,200 × 6]> <tibble [0 × 3]>
 4 <split [900/100]> Fold04 <tibble [2,200 × 6]> <tibble [0 × 3]>
 5 <split [900/100]> Fold05 <tibble [2,200 × 6]> <tibble [0 × 3]>
 6 <split [900/100]> Fold06 <tibble [2,200 × 6]> <tibble [0 × 3]>
 7 <split [900/100]> Fold07 <tibble [2,200 × 6]> <tibble [0 × 3]>
 8 <split [900/100]> Fold08 <tibble [2,200 × 6]> <tibble [0 × 3]>
 9 <split [900/100]> Fold09 <tibble [2,200 × 6]> <tibble [0 × 3]>
10 <split [900/100]> Fold10 <tibble [2,200 × 6]> <tibble [0 × 3]>

toc()

8.458 sec elapsed

3.2 Final notes

The following is often found as an alternative setup for a back-end for parallel processing. It works for future_map() (when combined with plan) and for foreach() but not in the tidymodels implementations of resampling. Not clear why since those use foreach() but this should not be used if you plan to use tidymodels resampling.

library(doFuture)
registerDoFuture()

I tried this both directly and with various options of plan()

plan(multisession, workers = parallel::detectCores(logical = FALSE))

and with

cl <- makeCluster(parallel::detectCores(logical = FALSE))
plan(cluster, workers = cl)

3.3 Conclusions

For future_map(), foreach(), and tidymodels functions in parallel, set up the parallel backend with this code chunk.

cl <- parallel::makePSOCKcluster(parallel::detectCores(logical = FALSE))
doParallel::registerDoParallel(cl)

Nothing further is needed to use foreach() or tidymodels functions.

For future_map(), you need to set up a multisession plan with this code chunk

plan(multisession, workers = parallel::detectCores(logical = FALSE))