Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebook to learn some R, Modern R with the tidyverse,
and if you're interested in setting up reproducible analytical pipelines,
read my other ebook.
You can also watch my youtube channel.
Buy me a coffee, my kids don't let me sleep.

R will always be arcane to those who do not make a serious effort to learn it...

R

R will always be arcane to those who do not make a serious effort to learn it. It is not meant to be intuitive and easy for casual users to just plunge into. It is far too complex and powerful for that. But the rewards are great for serious data analysts who put in the effort.

— Berton Gunter R-help August 2007

I’ve posted this quote on twitter the other day and it sparked some discussion. Personally I agree with this quote, and I’ll explain why.

Just like any tool aimed at professionals, R requires people to spend time to actually master it. There is no ifs or buts. Just like I don’t want a casual carpenter doing my carpentry, or a casual electrician doing the wiring in my house, I don’t think anyone should want to be a casual R user. Now of course, depending on your needs, you might not need to learn everything the language has to offer. I certainly don’t know everything R has to offer, far from it. But whatever task you need to fulfill, take the time to learn the required syntax and packages. As Berton Gunter said in 2007, the rewards are great if you put in the effort. You need to create top notch plots? Master {ggplot2}. Need to create top notch web apps? {shiny}, and so on and so forth… you get the idea. But as a shiny expert, you might not need to know, nor care, about R’s object oriented capabilities for example.

That’s fine.

Evelyn Hall: I would like to know how (if) I can extract some of the information from the summary of my nlme.

Simon Blomberg: This is R. There is no if. Only how.

— Evely Hall and Simon ’Yoda’ Blomberg, R-help April 2005

I remember being extremely frustrated when I started to learn R, not because the language was overly complex, (even if that was the case in the beginning, but honestly, that’s true for any language, even for supposedly piss-easy languages like Python) but because my professors kept saying “no need to learn the language in great detail, we’re economists after all, not programmers”. That didn’t seem right, and now that I’ve been working with R for years (and with economists for some time as well), it certainly is important, even for economists, to be quite fluent in at least one programming language like R. How fluent should you be? Well, enough that you can test new ideas, or explore new data without much googling nor friction. Your creativity and curiosity cannot be limited by your lack of knowledge of the tools you need to use.

Some people posit that the {tidyverse} (and Rstudio, the GUI interface) made R more accessible. I’d say yes and no. On one hand, the tidyverse has following nice things going for it:

  • Consistent api across packages. That definitely makes R easier to learn!
  • Made the %>% operator famous, which improves readability.
  • Top notch documentation, and also many packages come with books that you can read online for free! That certainly makes R easier to learn.

(and Rstudio was the first, really good, GUI for R).

But while this is all true, on the other hand, the {tidyverse} also makes it possible to write code like this (I’ll be using the package::function() to make the origin of the functions clear):

library(dplyr)
library(purrr)
library(ggfortify) # Not part of the tidyverse, but needed to make ggplot2::autoplot work on lm
library(ggplot2)
library(broom) # Not part of the tidyverse, but adheres to the *tidy* principles

result <- mtcars %>%
  dplyr::group_nest(am) %>%
  dplyr::mutate(models = purrr::map(data, ~lm(hp ~ mpg + cyl, data = .))) %>%
  dplyr::mutate(diag_plots = purrr::map(models, ggplot2::autoplot)) %>%
  dplyr::mutate(model_summary = purrr::map(models, broom::tidy))

result is now a data frame with several columns:

result
## # A tibble: 2 × 5
##      am                data models diag_plots model_summary   
##   <dbl> <list<tibble[,10]>> <list> <list>     <list>          
## 1     0           [19 × 10] <lm>   <ggmltplt> <tibble [3 × 5]>
## 2     1           [13 × 10] <lm>   <ggmltplt> <tibble [3 × 5]>

am defines the groups, and then data, models and model_summary are list-columns containing complex objects (data frames, models, and plots, respectively). And don’t get me wrong here, this is not code that I made look complicated on purpose. This type of workflow is canon in the tidyverse lore. This is how you can avoid for loops and keep every result together neatly in a single object.

Let’s look at another esoteric example: imagine I want to publish a paper and am only interested in the coefficients of the model where the p-value is less than .05 (lol):

mtcars %>%
  dplyr::group_nest(am) %>%
  dplyr::mutate(models = purrr::map(data, ~lm(hp ~ mpg + cyl, data = .))) %>%
  dplyr::mutate(model_summary = purrr::map(models, broom::tidy)) %>%
  dplyr::mutate(model_summary = purrr::map(model_summary, \(x)(filter(x, p.value < .05))))
## # A tibble: 2 × 4
##      am                data models model_summary   
##   <dbl> <list<tibble[,10]>> <list> <list>          
## 1     0           [19 × 10] <lm>   <tibble [2 × 5]>
## 2     1           [13 × 10] <lm>   <tibble [1 × 5]>

I’ve mapped an anomymous function to the model summary, to filter out p-values greater than .05. Do you think this looks comprehensible to the beginner? I don’t think so. But I also don’t think that the beginners must stay beginners, and this is what matters.

Actually, I see it as part of my job to inflict R on people who are perfectly happy to have never heard of it. Happiness doesn’t equal proficient and efficient. In some cases the proficiency of a person serves a greater good than their momentary happiness.

— Patrick Burns, R-help April 2005

I’d argue that R, as arcane as it is (or not), is very likely one of the easiest languages to learn, and this is because there are a lot, and I mean a lot, of resources online:

  • Free books (just take a look at the big book of R to find everything you need)
  • Youtube channels dedicated to R (I’m shamelessly plugging mine)
  • Packages with great documentation (take a look at the easystats suite for an example, or modelsummary and marginaleffects, both by Vincent Arel Bundock, and I’m not citing many, many others here)
  • Slack channels where you can get help
  • The community of R users on twitter (check out the #RStats hashtag)
  • The RStudio Community forums
  • And of course, the good old R-help mailing list

And that’s only the free stuff. If you can afford it, there’s plenty of courses available as well. But no amount of free or paid content will be enough if you don’t invest enough time to learn the language, and this is true of anything. There are no secret recipes.

P.S.: I got all these quotes from the {fortunes} package.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso