Econometrics and Free Software by Bruno Rodrigues.

Follow me on twitter.

Watch my youtube channel.

R makes it too easy to write papers

R

I’m currently working on a preprint on the spread of COVID-19 in Luxembourg. My hypothesis is that landlocked countries, especially ones like Luxembourg that have very close ties to their neighbours have a very hard time controlling the pandemic, unlike island countries which can completely close off their borders, impose very drastic quarantine measure to anyone who would still have to come in and successfully wipe out the disease by imposing strict lockdowns and contract tracing measures.

In actuality, this started more as a project in which I simply wanted to look at COVID-19 cases for Luxembourg and its neighbouring regions. As I started digging and writing code, this evolved into this package which makes it easy to download open data on the daily COVID-19 cases from Luxembourg and its neighbours. I also blogged about it here. Creating and animating the map that you see in that blog post, I thought about this hypothesis I wanted to test. Maybe it won’t work (preliminary results are encouraging however), but I also took this opportunity to write a preprint using only R, Rmarkdown and packages that make writing something like that easy. This blog post is a shallow review of these tools.

By the way, you can take a look at the repo with the preprint here, and I’ll be writing about it soon as well.

Packages as by-products of papers

The first thing I did was download data from the various open data portals, make sense of it and then plot it. At first, I did so in a very big a messy script file. As time went on, I felt more and more disgusted with this script and wanted to make something cleaner out of it. This is how the package I already mentioned above came to be. It took some time to prepare, but now it simplifies the process of updating my plots and machine learning models much faster. It also makes the paper more “interesting”; not everyone is interesting in the paper itself, but might be interested in the data, or in the process of making the package itself. I think that there are many examples of such packages as by-products of papers, especially papers that present and discuss new methods are very often accompanied by a package to make it easy for readers of the paper to use this new method. Package development is made easy with {usethis}.

Starting a draft with {rticles}

The second thing I did was start a draft with {rticles}. This package allows users to start a Rmarkdown draft with a single command. Users can choose among many different drafts for many different journals; I choose the arXiv draft, as I might publish the preprint there. To do so, I used the following command:

rmarkdown::draft("paper.Rmd", template = "arxiv", package = "rticles")

I can now edit this Rmd file and compile it to a nice looking pdf very easily. But I don’t do so in the “traditional” way of knitting the Rmd file from Rstudio (or rather, from Spacemacs, my editor of choice). No, no, for this I use the magnificent {targets} package.

Setting up a clean, automated and reproducible workflow with {targets}

{targets} is the latest package by William Landau, who is also the author of {drake}. I was very impressed by {drake} and even made a video about it but now {targets} will replace {drake} as THE build automation tool for the R programming language. I started using it for this project, and just like {drake} it’s really an amazing package. It allows you to declare your project as a series of steps, each one of them being a call to a function. It’s very neat, and clean. The dependencies between each of the steps and objects that are created at each step are tracked by {targets} and should one of them get updated (for instance, because you changed the code of the underlying function), every object that depends on it will also get updated once you run the pipeline again.

This can get complex very quickly, and here is the network of objects, functions and their dependencies for the preprint I’m writing:

Imagine keeping track of all this in your head. Now I won’t go much into how to use {targets}, because the user manual is very detailed. Also, you can inspect the repository of my preprint I linked above to figure out the basics of {targets}. What’s really neat though, is that the Rmd file of your paper is also a target that gets built automatically. If you check out my repository, you will see that it’s the last target that is built. And if you check the Rmd file itself, you will see the only R code I use is:

tar_load(something)

tar_load() is a {targets} function that loads an object, in the example above this object is called something and puts it in the paper. For instance, if something is a ggplot object, then this plot will appear on that spot in the paper. It’s really great, because the paper itself gets compiled very quickly once all the targets are built.

Machine learning, and everything else

Last year I wrote a blog post about {tidymodels}, which you can find here. Since then, the package evolved, and it’s in my opinion definitely one of the best machine learning packages out there. Just like the other tools I discussed in this blog post, it abstracts away many unimportant idiosyncrasies of many other packages and ways of doing things, and let’s you focus on what matters; getting results and presenting them neatly.

I think that this is what I really like about the R programming language, and the ecosystem of packages built on top of it. Combining functional programming, build automation tools, markdown, and all the helper packages like {usethis} make it really easy to go from idea, to paper, or interactive app using {shiny} very quickly.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso