Introduction and curated resources
See this article for details on the use of dplyr for row-wise operations
See this article for details on the use of dplyr for column-wise operations
See this article for the uses of across()
in summarize()
and mutate()
use list_rbind()
with map()
to bind together lists of dataframes, which is common when using map()
In the sections that follow, we provide common examples of approaches to iteration using the iris
dataset.
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
select() subset of columns
There is a good tutorial providing more detail on selecting subsets of columns. Here we illustrate a few common applications in our lab.
It is easy to select a subset of columns based on their class. Common functions for selecting on column class are * is.numeric
* is.factor
* is.ordered
* is.character
.
These class functions are used inside of the where()
function. For example, here we select all numeric columns. Notice that the parentheses are left off of the is.numeric
function when using where()
iris |> select (where (is.numeric)) |>
glimpse ()
Rows: 150
Columns: 4
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
…and now all factor columns
iris |> select (where (is.factor)) |>
glimpse ()
Rows: 150
Columns: 1
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa…
You can also select columns based on column name. Common helper functions for this include
starts_with()
: Starts with a prefix.
ends_with()
: Ends with a suffix.
contains()
: Contains a literal string.
matches()
: Matches a regular expression.
num_range()
: Matches a numerical range like x01, x02, x03.
For example…
iris |> select (contains ("Width" )) |>
glimpse ()
Rows: 150
Columns: 2
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2…
Get summary statistics for multiple columns
We use across()
combined with summarize()
to get summary statistics across sets of columns. This can also be combined with group_by()
to do this within subsets/groups for rows Mean for all numeric columns
iris |>
summarize (across (where (is.numeric), mean))
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.843333 3.057333 3.758 1.199333
Mean for width columns grouped by Species
iris |>
group_by (Species) |>
summarize (across (contains ("Width" ), mean))
# A tibble: 3 × 3
Species Sepal.Width Petal.Width
<fct> <dbl> <dbl>
1 setosa 3.43 0.246
2 versicolor 2.77 1.33
3 virginica 2.97 2.03
df %>% summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))
Apply mutate() to multiple columns
We use across()
combined with mutate()
to apply the same transformation or other function to multiple columns.
multiply values in col1 and col2 by 2 df %>% mutate(across(c(col1, col2), function(x) x*2))
iris %>% mutate(across(c(Sepal.Length, Sepal.Width), round))
map()
and future_map()
can generally return lists using map()
and then combine into a df afterwards using list_rbind()
Can use pluck()
in a second map if a first map returned a list with multiple elements
for
loops
Does someone want o do this?
Nesting
Some useful tutorials
https://r4ds.had.co.nz/many-models.html
https://bookdown.org/Maxine/r4ds/nesting.html
https://tidyr.tidyverse.org/reference/nest.html