8  Iteration

8.1 Introduction and curated resources

See this article for details on the use of dplyr for row-wise operations

See this article for details on the use of dplyr for column-wise operations

See this article for the uses of across() in summarize() and mutate()

use list_rbind() with map() to bind together lists of dataframes, which is common when using map()

In the sections that follow, we provide common examples of approaches to iteration using the iris dataset.

iris |> glimpse()
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
iris |> head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

8.2 select() subset of columns

There is a good tutorial providing more detail on selecting subsets of columns. Here we illustrate a few common applications in our lab.

It is easy to select a subset of columns based on their class. Common functions for selecting on column class are * is.numeric * is.factor * is.ordered * is.character.

These class functions are used inside of the where() function. For example, here we select all numeric columns. Notice that the parentheses are left off of the is.numeric function when using where()

iris |> select(where(is.numeric)) |> 
  glimpse()
Rows: 150
Columns: 4
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…

…and now all factor columns

iris |> select(where(is.factor)) |> 
  glimpse()
Rows: 150
Columns: 1
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa…

You can also select columns based on column name. Common helper functions for this include

  • starts_with(): Starts with a prefix.
  • ends_with(): Ends with a suffix.
  • contains(): Contains a literal string.
  • matches(): Matches a regular expression.
  • num_range(): Matches a numerical range like x01, x02, x03.

For example…

iris |> select(contains("Width")) |> 
  glimpse()
Rows: 150
Columns: 2
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2…

8.3 Get summary statistics for multiple columns

We use across() combined with summarize() to get summary statistics across sets of columns. This can also be combined with group_by() to do this within subsets/groups for rows Mean for all numeric columns

iris |> 
  summarize(across(where(is.numeric), mean))
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

Mean for width columns grouped by Species

iris |> 
  group_by(Species) |> 
  summarize(across(contains("Width"), mean))
# A tibble: 3 × 3
  Species    Sepal.Width Petal.Width
  <fct>            <dbl>       <dbl>
1 setosa            3.43       0.246
2 versicolor        2.77       1.33 
3 virginica         2.97       2.03 

df %>% summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))

8.4 Apply mutate() to multiple columns

We use across() combined with mutate() to apply the same transformation or other function to multiple columns.

multiply values in col1 and col2 by 2 df %>% mutate(across(c(col1, col2), function(x) x*2))

iris %>% mutate(across(c(Sepal.Length, Sepal.Width), round))

8.5 map() and future_map()

can generally return lists using map() and then combine into a df afterwards using list_rbind()

Can use pluck() in a second map if a first map returned a list with multiple elements

8.6 for loops

Does someone want o do this?

8.7 foreach loops

8.8 Nesting

Some useful tutorials

  • https://r4ds.had.co.nz/many-models.html
  • https://bookdown.org/Maxine/r4ds/nesting.html
  • https://tidyr.tidyverse.org/reference/nest.html