Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Lesser known purrr tricks

R

purrr is a package that extends R’s functional programming capabilities. It brings a lot of new stuff to the table and in this post I show you some of the most useful (at least to me) functions included in purrr.

Getting rid of loops with map()

library(purrr)

numbers <- list(11, 12, 13, 14)

map_dbl(numbers, sqrt)
## [1] 3.316625 3.464102 3.605551 3.741657

You might wonder why this might be preferred to a for loop? It’s a lot less verbose, and you do not need to initialise any kind of structure to hold the result. If you google “create empty list in R” you will see that this is very common. However, with the map() family of functions, there is no need for an initial structure. map_dbl() returns an atomic list of real numbers, but if you use map() you will get a list back. Try them all out!

Map conditionally

map_if()

# Create a helper function that returns TRUE if a number is even
is_even <- function(x){
  !as.logical(x %% 2)
}

map_if(numbers, is_even, sqrt)
## [[1]]
## [1] 11
## 
## [[2]]
## [1] 3.464102
## 
## [[3]]
## [1] 13
## 
## [[4]]
## [1] 3.741657

map_at()

map_at(numbers, c(1,3), sqrt)
## [[1]]
## [1] 3.316625
## 
## [[2]]
## [1] 12
## 
## [[3]]
## [1] 3.605551
## 
## [[4]]
## [1] 14

map_if() and map_at() have a further argument than map(); in the case of map_if(), a predicate function ( a function that returns TRUE or FALSE) and a vector of positions for map_at(). This allows you to map your function only when certain conditions are met, which is also something that a lot of people google for.

Map a function with multiple arguments

numbers2 <- list(1, 2, 3, 4)

map2(numbers, numbers2, `+`)
## [[1]]
## [1] 12
## 
## [[2]]
## [1] 14
## 
## [[3]]
## [1] 16
## 
## [[4]]
## [1] 18

You can map two lists to a function which takes two arguments using map_2(). You can even map an arbitrary number of lists to any function using pmap().

By the way, try this in: `+`(1,3) and see what happens.

Don’t stop execution of your function if something goes wrong

possible_sqrt <- possibly(sqrt, otherwise = NA_real_)

numbers_with_error <- list(1, 2, 3, "spam", 4)

map(numbers_with_error, possible_sqrt)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1.414214
## 
## [[3]]
## [1] 1.732051
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] 2

Another very common issue is to keep running your loop even when something goes wrong. In most cases the loop simply stops at the error, but you would like it to continue and see where it failed. Try to google “skip error in a loop” or some variation of it and you’ll see that a lot of people really just want that. This is possible by combining map() and possibly(). Most solutions involve the use of tryCatch() which I personally do not find very easy to use.

Don’t stop execution of your function if something goes wrong and capture the error

safe_sqrt <- safely(sqrt, otherwise = NA_real_)

map(numbers_with_error, safe_sqrt)
## [[1]]
## [[1]]$result
## [1] 1
## 
## [[1]]$error
## NULL
## 
## 
## [[2]]
## [[2]]$result
## [1] 1.414214
## 
## [[2]]$error
## NULL
## 
## 
## [[3]]
## [[3]]$result
## [1] 1.732051
## 
## [[3]]$error
## NULL
## 
## 
## [[4]]
## [[4]]$result
## [1] NA
## 
## [[4]]$error
## <simpleError in sqrt(x = x): non-numeric argument to mathematical function>
## 
## 
## [[5]]
## [[5]]$result
## [1] 2
## 
## [[5]]$error
## NULL

safely() is very similar to possibly() but it returns a list of lists. An element is thus a list of the result and the accompagnying error message. If there is no error, the error component is NULL if there is an error, it returns the error message.

Transpose a list

safe_result_list <- map(numbers_with_error, safe_sqrt)

transpose(safe_result_list)
## $result
## $result[[1]]
## [1] 1
## 
## $result[[2]]
## [1] 1.414214
## 
## $result[[3]]
## [1] 1.732051
## 
## $result[[4]]
## [1] NA
## 
## $result[[5]]
## [1] 2
## 
## 
## $error
## $error[[1]]
## NULL
## 
## $error[[2]]
## NULL
## 
## $error[[3]]
## NULL
## 
## $error[[4]]
## <simpleError in sqrt(x = x): non-numeric argument to mathematical function>
## 
## $error[[5]]
## NULL

Here we transposed the above list. This means that we still have a list of lists, but where the first list holds all the results (which you can then access with safe_result_list$result) and the second list holds all the errors (which you can access with safe_result_list$error). This can be quite useful!

Apply a function to a lower depth of a list

transposed_list <- transpose(safe_result_list)

transposed_list %>%
    at_depth(2, is_null)
## Warning: at_depth() is deprecated, please use `modify_depth()` instead
## $result
## $result[[1]]
## [1] FALSE
## 
## $result[[2]]
## [1] FALSE
## 
## $result[[3]]
## [1] FALSE
## 
## $result[[4]]
## [1] FALSE
## 
## $result[[5]]
## [1] FALSE
## 
## 
## $error
## $error[[1]]
## [1] TRUE
## 
## $error[[2]]
## [1] TRUE
## 
## $error[[3]]
## [1] TRUE
## 
## $error[[4]]
## [1] FALSE
## 
## $error[[5]]
## [1] TRUE

Sometimes working with lists of lists can be tricky, especially when we want to apply a function to the sub-lists. This is easily done with at_depth()!

Set names of list elements

name_element <- c("sqrt()", "ok?")

set_names(transposed_list, name_element)
## $`sqrt()`
## $`sqrt()`[[1]]
## [1] 1
## 
## $`sqrt()`[[2]]
## [1] 1.414214
## 
## $`sqrt()`[[3]]
## [1] 1.732051
## 
## $`sqrt()`[[4]]
## [1] NA
## 
## $`sqrt()`[[5]]
## [1] 2
## 
## 
## $`ok?`
## $`ok?`[[1]]
## NULL
## 
## $`ok?`[[2]]
## NULL
## 
## $`ok?`[[3]]
## NULL
## 
## $`ok?`[[4]]
## <simpleError in sqrt(x = x): non-numeric argument to mathematical function>
## 
## $`ok?`[[5]]
## NULL

Reduce a list to a single value

reduce(numbers, `*`)
## [1] 24024

reduce() applies the function * iteratively to the list of numbers. There’s also accumulate():

accumulate(numbers, `*`)
## [1]    11   132  1716 24024

which keeps the intermediary results.

This function is very general, and you can reduce anything:

Matrices:

mat1 <- matrix(rnorm(10), nrow = 2)
mat2 <- matrix(rnorm(10), nrow = 2)
mat3 <- matrix(rnorm(10), nrow = 2)
list_mat <- list(mat1, mat2, mat3)

reduce(list_mat, `+`)
##             [,1]       [,2]       [,3]     [,4]      [,5]
## [1,] -2.48530177  1.0110049  0.4450388 1.280802 1.3413979
## [2,]  0.07596679 -0.6872268 -0.6579242 1.615237 0.8231933

even data frames:

df1 <- as.data.frame(mat1)
df2 <- as.data.frame(mat2)
df3 <- as.data.frame(mat3)

list_df <- list(df1, df2, df3)

reduce(list_df, dplyr::full_join)
## Joining, by = c("V1", "V2", "V3", "V4", "V5")
## Joining, by = c("V1", "V2", "V3", "V4", "V5")
##           V1         V2          V3          V4         V5
## 1 -0.6264538 -0.8356286  0.32950777  0.48742905  0.5757814
## 2  0.1836433  1.5952808 -0.82046838  0.73832471 -0.3053884
## 3 -0.8969145  1.5878453 -0.08025176  0.70795473  1.9844739
## 4  0.1848492 -1.1303757  0.13242028 -0.23969802 -0.1387870
## 5 -0.9619334  0.2587882  0.19578283  0.08541773 -1.2188574
## 6 -0.2925257 -1.1521319  0.03012394  1.11661021  1.2673687

Hope you enjoyed this list of useful functions! If you enjoy the content of my blog, you can follow me on twitter.