A tutorial on tidy cross-validation with R
Analyzing NetHack data, part 1: What kills the players
Analyzing NetHack data, part 2: What players kill the most
Building a shiny app to explore historical newspapers: a step-by-step guide
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
Curly-Curly, the successor of Bang-Bang
Dealing with heteroskedasticity; regression with robust standard errors using R
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport
Exporting editable plots from R to Powerpoint: making ggplot2 purrr with officer
Fast food, causality and R packages, part 1
Fast food, causality and R packages, part 2
For posterity: install {xml2} on GNU/Linux distros
Forecasting my weight with R
From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack
Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}
Getting data from pdfs using the pdftools package
Getting the data from the Luxembourguish elections out of Excel
Going from a human readable Excel file to a machine-readable csv with {tidyxl}
Historical newspaper scraping with {tesseract} and R
How Luxembourguish residents spend their time: a small {flexdashboard} demo using the Time use survey data
Imputing missing values in parallel using {furrr}
Intermittent demand, Croston and Die Hard
Looking into 19th century ads from a Luxembourguish newspaper with R
Making sense of the METS and ALTO XML standards
Manipulate dates easily with {lubridate}
Manipulating strings with the {stringr} package
Maps with pie charts on top of each administrative division: an example with Luxembourg's elections data
Missing data imputation and instrumental variables regression: the tidy approach
Modern R with the tidyverse is available on Leanpub
Objects types and some useful R functions for beginners
Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`
R or Python? Why not both? Using Anaconda Python within R with {reticulate}
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach
Some fun with {gganimate}
Split-apply-combine for Maximum Likelihood Estimation of a linear model
Statistical matching, or when one single data source is not enough
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization
The never-ending editor war (?)
The year of the GNU+Linux desktop is upon us: using user ratings of Steam Play compatibility to play around with regex and the tidyverse
Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model
Using cosine similarity to find matching documents: a tutorial using Seneca's letters to his friend Lucilius
Using linear models with binary dependent variables, a simulation study
Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods
What hyper-parameters are, and what to do with them; an illustration with ridge regression
{disk.frame} is epic
{pmice}, an experimental package for missing data imputation in parallel using {mice} and {furrr}
Building formulae
Functional peace of mind
Get basic summary statistics for all the variables in a data frame
Getting {sparklyr}, {h2o}, {rsparkling} to work together and some fun with bash
Importing 30GB of data into R with sparklyr
Introducing brotools
It's lists all the way down
It's lists all the way down, part 2: We need to go deeper
Keep trying that api call with purrr::possibly()
Lesser known dplyr 0.7* tricks
Lesser known dplyr tricks
Lesser known purrr tricks
Make ggplot2 purrr
Mapping a list of functions to a list of datasets with a list of columns as arguments
Predicting job search by training a random forest on an unbalanced dataset
Teaching the tidyverse to beginners
Why I find tidyeval useful
tidyr::spread() and dplyr::rename_at() in action
Easy peasy STATA-like marginal effects with R
Functional programming and unit testing for data munging with R available on Leanpub
How to use jailbreakr
My free book has a cover!
Work on lists of datasets instead of individual datasets by using functional programming
Method of Simulated Moments with R
New website!
Nonlinear Gmm with R - Example with a logistic regression
Simulated Maximum Likelihood with R
Bootstrapping standard errors for difference-in-differences estimation with R
Careful with tryCatch
Data frame columns as arguments to dplyr functions
Export R output to a file
I've started writing a 'book': Functional programming and unit testing for data munging with R
Introduction to programming econometrics with R
Merge a list of datasets together
Object Oriented Programming with R: An example with a Cournot duopoly
R, R with Atlas, R with OpenBLAS and Revolution R Open: which is fastest?
Read a lot of datasets at once with R
Unit testing with R
Update to Introduction to programming econometrics with R
Using R as a Computer Algebra System with Ryacas

This week I had the opportunity to teach R at my workplace, again. This course was the “advanced
R” course, and unlike the one I taught at the end of last year, I had one more day (so 3 days in total)
where I could show my colleagues the joys of the `tidyverse`

and R.

To finish the section on programming with R, which was the very last section of the whole 3 day course
I wanted to blow their minds; I had already shown them packages from the `tidyverse`

in the previous
days, such as `dplyr`

, `purrr`

and `stringr`

, among others. I taught them how to use `ggplot2`

, `broom`

and `modelr`

. They also liked `janitor`

and `rio`

very much. I noticed that it took them a bit more
time and effort for them to digest `purrr::map()`

and `purrr::reduce()`

, but they all seemed to see
how powerful these functions were. To finish on a very high note, I showed them the ultimate
`purrr::map()`

use case.

Consider the following; imagine you have a situation where you are working on a list of datasets.
These datasets might be the same, but for different years, or for different countries, or they might
be completely different datasets entirely. If you used `rio::import_list()`

to read them into R,
you will have them in a nice list. Let’s consider the following list as an example:

`library(tidyverse)`

```
data(mtcars)
data(iris)
data_list = list(mtcars, iris)
```

I made the choice to have completely different datasets. Now, I would like to map some functions
to the columns of these datasets. If I only worked on one, for example on `mtcars`

, I would do
something like:

```
my_summarise_f = function(dataset, cols, funcs){
dataset %>%
summarise_at(vars(!!!cols), funs(!!!funcs))
}
```

And then I would use my function like so:

```
mtcars %>%
my_summarise_f(quos(mpg, drat, hp), quos(mean, sd, max))
```

```
## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max
## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93
## hp_max
## 1 335
```

`my_summarise_f()`

takes a dataset, a list of columns and a list of functions as arguments and uses
tidy evaluation to apply `mean()`

, `sd()`

, and `max()`

to the columns `mpg`

, `drat`

and `hp`

of `mtcars`

. That’s pretty useful, but not useful enough! Now I want to apply this to the list of
datasets I defined above. For this, let’s define the list of columns I want to work on:

```
cols_mtcars = quos(mpg, drat, hp)
cols_iris = quos(Sepal.Length, Sepal.Width)
cols_list = list(cols_mtcars, cols_iris)
```

Now, let’s use some `purrr`

magic to apply the functions I want to the columns I have defined in
`list_cols`

:

```
map2(data_list,
cols_list,
my_summarise_f, funcs = quos(mean, sd, max))
```

```
## [[1]]
## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max
## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93
## hp_max
## 1 335
##
## [[2]]
## Sepal.Length_mean Sepal.Width_mean Sepal.Length_sd Sepal.Width_sd
## 1 5.843333 3.057333 0.8280661 0.4358663
## Sepal.Length_max Sepal.Width_max
## 1 7.9 4.4
```

That’s pretty useful, but not useful enough! I want to also use different functions to different datasets!

Well, let’s define a list of functions then:

```
funcs_mtcars = quos(mean, sd, max)
funcs_iris = quos(median, min)
funcs_list = list(funcs_mtcars, funcs_iris)
```

Because there is no `map3()`

, we need to use `pmap()`

:

```
pmap(
list(
dataset = data_list,
cols = cols_list,
funcs = funcs_list
),
my_summarise_f)
```

```
## [[1]]
## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max
## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93
## hp_max
## 1 335
##
## [[2]]
## Sepal.Length_median Sepal.Width_median Sepal.Length_min Sepal.Width_min
## 1 5.8 3 4.3 2
```

Now I’m satisfied! Let me tell you, this blew their minds 😄!

To be able to use things like that, I told them to always solve a problem for a single example, and
from there, try to generalize their solution using functional programming tools found in `purrr`

.

If you found this blog post useful, you might want to follow me on twitter for blog post updates.