A tutorial on tidy cross-validation with R
Analyzing NetHack data, part 1: What kills the players
Analyzing NetHack data, part 2: What players kill the most
Building a shiny app to explore historical newspapers: a step-by-step guide
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
Curly-Curly, the successor of Bang-Bang
Dealing with heteroskedasticity; regression with robust standard errors using R
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport
Exporting editable plots from R to Powerpoint: making ggplot2 purrr with officer
Fast food, causality and R packages, part 1
Fast food, causality and R packages, part 2
For posterity: install {xml2} on GNU/Linux distros
Forecasting my weight with R
From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack
Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}
Getting data from pdfs using the pdftools package
Getting the data from the Luxembourguish elections out of Excel
Going from a human readable Excel file to a machine-readable csv with {tidyxl}
Historical newspaper scraping with {tesseract} and R
How Luxembourguish residents spend their time: a small {flexdashboard} demo using the Time use survey data
Imputing missing values in parallel using {furrr}
Intermittent demand, Croston and Die Hard
Looking into 19th century ads from a Luxembourguish newspaper with R
Making sense of the METS and ALTO XML standards
Manipulate dates easily with {lubridate}
Manipulating strings with the {stringr} package
Maps with pie charts on top of each administrative division: an example with Luxembourg's elections data
Missing data imputation and instrumental variables regression: the tidy approach
Modern R with the tidyverse is available on Leanpub
Objects types and some useful R functions for beginners
Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`
R or Python? Why not both? Using Anaconda Python within R with {reticulate}
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach
Some fun with {gganimate}
Split-apply-combine for Maximum Likelihood Estimation of a linear model
Statistical matching, or when one single data source is not enough
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization
The never-ending editor war (?)
The year of the GNU+Linux desktop is upon us: using user ratings of Steam Play compatibility to play around with regex and the tidyverse
Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model
Using cosine similarity to find matching documents: a tutorial using Seneca's letters to his friend Lucilius
Using linear models with binary dependent variables, a simulation study
Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods
What hyper-parameters are, and what to do with them; an illustration with ridge regression
{disk.frame} is epic
{pmice}, an experimental package for missing data imputation in parallel using {mice} and {furrr}
Building formulae
Functional peace of mind
Get basic summary statistics for all the variables in a data frame
Getting {sparklyr}, {h2o}, {rsparkling} to work together and some fun with bash
Importing 30GB of data into R with sparklyr
Introducing brotools
It's lists all the way down
It's lists all the way down, part 2: We need to go deeper
Keep trying that api call with purrr::possibly()
Lesser known dplyr 0.7* tricks
Lesser known dplyr tricks
Lesser known purrr tricks
Make ggplot2 purrr
Mapping a list of functions to a list of datasets with a list of columns as arguments
Predicting job search by training a random forest on an unbalanced dataset
Teaching the tidyverse to beginners
Why I find tidyeval useful
tidyr::spread() and dplyr::rename_at() in action
Easy peasy STATA-like marginal effects with R
Functional programming and unit testing for data munging with R available on Leanpub
How to use jailbreakr
My free book has a cover!
Work on lists of datasets instead of individual datasets by using functional programming
Method of Simulated Moments with R
New website!
Nonlinear Gmm with R - Example with a logistic regression
Simulated Maximum Likelihood with R
Bootstrapping standard errors for difference-in-differences estimation with R
Careful with tryCatch
Data frame columns as arguments to dplyr functions
Export R output to a file
I've started writing a 'book': Functional programming and unit testing for data munging with R
Introduction to programming econometrics with R
Merge a list of datasets together
Object Oriented Programming with R: An example with a Cournot duopoly
R, R with Atlas, R with OpenBLAS and Revolution R Open: which is fastest?
Read a lot of datasets at once with R
Unit testing with R
Update to Introduction to programming econometrics with R
Using R as a Computer Algebra System with Ryacas

`tryCatch`

is one of the functions that allows the users to handle errors in a simple way. With it, you can do things like: `if(error), then(do this)`

.

Take the following example:

```
sqrt("a")
Error in sqrt("a") : non-numeric argument to mathematical function
```

Now maybe you’d want something to happen when such an error happens. You can achieve that with `tryCatch`

:

`tryCatch(sqrt("a"), error=function(e) print("You can't take the square root of a character, silly!"))`

`## [1] "You can't take the square root of a character, silly!"`

Why am I interested in `tryCatch`

?

I am currently working with dates, specifically birthdays of people in my data sets. For a given mother, the birthday of her child is given in three distinct columns: a column for the child’s birth year, birth month and birth day respectively. I’ve wanted to put everything in a single column and convert the birthday to unix time (I have a very good reason to do that, but I won’t bore you with the details).

Let’s create some data:

`mother <- as.data.frame(list(month=12, day=1, year=1988))`

In my data, there’s a lot more columns of course, such as the mother’s wage, education level, etc, but for illustration purposes, this is all that’s needed.

Now, to create this birthday column:

```
mother$birth1 <- as.POSIXct(paste0(as.character(mother$year),
"-", as.character(mother$month),
"-", as.character(mother$day)),
origin="1970-01-01")
```

and to convert it to unix time:

```
mother$birth1 <- as.numeric(as.POSIXct(paste0(as.character(mother$year),
"-", as.character(mother$month),
"-", as.character(mother$day)),
origin="1970-01-01"))
print(mother)
```

```
## month day year birth1
## 1 12 1 1988 596934000
```

Now let’s see what happens in this other example here:

```
mother2 <- as.data.frame(list(month=2, day=30, year=1988))
mother2$birth1 <- as.POSIXct(paste0(as.character(mother2$year),
"-", as.character(mother2$month),
"-", as.character(mother2$day)),
origin="1970-01-01")
```

This is what happens:

```
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
```

This error is to be expected; there is no 30th of February! It turns out that in some rare cases, weird dates like this exist in my data. Probably some encoding errors. Not a problem I thought, I could use `tryCatch`

and return `NA`

in the case of an error.

```
mother2 <- as.data.frame(list(month=2, day=30, year=1988))
mother2$birth1 <- tryCatch(as.POSIXct(paste0(as.character(mother2$year),
"-", as.character(mother2$month),
"-", as.character(mother2$day)),
origin="1970-01-01"), error=function(e) NA)
print(mother2)
```

```
## month day year birth1
## 1 2 30 1988 NA
```

Pretty great, right? Well, no. Take a look at what happens in this case:

```
mother <- as.data.frame(list(month=c(12, 2), day=c(1, 30), year=c(1988, 1987)))
print(mother)
```

```
## month day year
## 1 12 1 1988
## 2 2 30 1987
```

We’d expect to have a correct date for the first mother and an `NA`

for the second. However, this is what happens

```
mother$birth1 <- tryCatch(as.POSIXct(paste0(as.character(mother$year),
"-", as.character(mother$month),
"-", as.character(mother$day)),
origin="1970-01-01"), error=function(e) NA)
print(mother)
```

```
## month day year birth1
## 1 12 1 1988 NA
## 2 2 30 1987 NA
```

As you can see, we now have an `NA`

for both mothers! That’s actually to be expected. Indeed, this little example illustrates it well:

`sqrt(c(4, 9, "haha"))`

```
Error in sqrt(c(4, 9, "haha")) :
non-numeric argument to mathematical function
```

But you’d like to have this:

`[1] 2 3 NA`

So you could make the same mistake as myself and use tryCatch:

`tryCatch(sqrt(c(4, 9, "haha")), error=function(e) NA)`

`## [1] NA`

But you only get `NA`

in return. That’s actually completely normal, but it took me off-guard and I spent quite some time to figure out what was happening. Especially because I had written unit tests to test my function `create_birthdays()`

that was doing the above computations and all tests were passing! The problem was that in my tests, I only had a single individual, so for a wrong date, having `NA`

for this individual was expected behaviour. But in a panel, only some individuals have a weird date like the 30th of February, but because of those, the whole column was filled with `NA`

’s! What I’m doing now is trying to either remove these weird birthdays (there are mothers whose children were born on the 99-99-9999. Documentation is lacking, but this probably means `missing value`

), or tyring to figure out how to only get `NA`

’s for the “weird” dates. I guess that the answer lies with `dplyr`

’s `group_by()`

and `mutate()`

to compute this birthdays for each individual separately.