Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Some learnings from functional programming you can use to write safer programs

R

Learning number 1: make functions fail early

When writing your own functions, avoid conversion of types without warning. For example, this function only works on characters:

my_nchar <- function(x, result = 0){

  if(x == ""){
    result
  } else {
    result <- result + 1
    split_x <- strsplit(x, split = "")[[1]]
    my_nchar(paste0(split_x[-1],
                    collapse = ""), result)
  }

}
my_nchar("100000000")
## [1] 9
my_nchar(100000000)
Error in strsplit(x, split = "") : non-character argument

It may tempting to write functions that accept a lot of different types of inputs, because it seems convenient and you’re a lazy ding-dong:

my_nchar2 <- function(x, result = 0){

  # What could go wrong?
  x <- as.character(x)

  if(x == ""){
    result
  } else {
    result <- result + 1
    split_x <- strsplit(x, split = "")[[1]]
    my_nchar2(paste0(split_x[-1],
                    collapse = ""), result)
  }

}

You should avoid doing this, because this can have unforseen consequences:

my_nchar2(10000000)
## [1] 5

If you think that this example is far-fetched, you’d be surprised to learn that this is exactly what nchar(), the built-in function to count characters, does:

nchar("10000000")
## [1] 8

to this:

nchar(10000000)
## [1] 5

(thanks to @cararthompson for pointing this out on twitter)

You can also add guards to be extra safe:

my_nchar2 <- function(x, result = 0){

  if(!isTRUE(is.character(x))){
    stop(paste0("x should be of type 'character', but is of type '",
                typeof(x), "' instead."))
  } else if(x == ""){
    result
  } else {
    result <- result + 1
    split_x <- strsplit(x, split = "")[[1]]
    my_nchar2(paste0(split_x[-1],
                     collapse = ""), result)
  }
}
my_nchar2("10000000")
## [1] 8

compare to this:

my_nchar2(10000000)
Error in my_nchar2(1000):
x should be of type 'character', but is of type 'double' instead.

Now this doesn’t really help here, because our function is already safe (it only handles characters, since strsplit() only handles characters), but in other situations this could be helpful (and at least we customized the error message). Since it can be quite tedious to write all these if...else... statements, you might want to take a look at purrr::safely() (and purrr::possibly()), the {maybe} package, or the {typed} package, or even my package for that matter.

Learning number 2: Make your functions referentially transparent (and as pure as possible)

Any variable used by a function should be one of its parameters. Don’t do this:

f <- function(x){
  x + y
}

This function has only one parameter, x, and so depends on y outside of this scope. This function is unpredictable, because the result it provides depends on the value of y.

See what happens:

f(10)
## [1] 20
f(10)
## [1] 10

I called f twice with 10 and got two results (because I changed the value of y without showing you). In very long scripts, having functions like this depending on values in the global environment is a recipe for disaster. It’s better to make this function referentially transparent; some very complicated words to describe a very simple concept:

f <- function(x, y){
  x + y
}

Just give f a second parameter, and you’re good to go.

Something else your functions shouldn’t do is changing stuff outside of its scope:

f <- function(x, y){
  result <<- x + y
}

Let’s take a look at variables in global environment before calling f:

ls()
## [1] "f"         "my_nchar"  "my_nchar2" "view"      "view_xl"   "y"

Now let’s call it:

f(1, 2)

And let’s have a good look at the global environment again:

ls()
## [1] "f"         "my_nchar"  "my_nchar2" "result"    "view"      "view_xl"  
## [7] "y"

We now see that result has been defined in the global environment:

result
## [1] 3

Just like before, if your functions change stuff outside their scope, this is a recipe for disaster. You have to be very careful and know exactly what you’re doing if you want to use <<-.

So it’s better to write your function like this, and call it like this:

f <- function(x, y){
  x + y
}

result <- f(1, 2)

Learning number 3: make your functions do one thing

Try to write small functions that do just one thing. This make them easier to document, test and simply wrap your head around. You can then pipe your function one after the other to get stuff done:

a |>
  f() |>
  g() |>
  h()

You have of course to make sure that the output of f() is of the correct type, so that g() then knows how to handle it. In some cases, you really need a function to do several things to get the output you want. In that case, still write small functions to handle every aspect of the whole algorithm, and then write a function that calls each function. And if needed, you can even provide functions as arguments to other functions:

h <- function(x, y, f, g){
  f(x) + g(y)
}

This makes h() a higher-order function.

Learning number 4: use higher-order functions to abstract loops away

Loops are hard to write. Higher order function are really cool though:

Reduce(`+`, seq(1:100))
## [1] 5050

Reduce() is a higher-order function that takes a function (here +) and a list of inputs compatible with the function. So Reduce() performs this operation:

Reduce(`+`, seq(1:100))

100 + Reduce(`+`, seq(2:100))
100 + 99 + Reduce(`+`, seq(3:100))
100 + 99 + 98 + Reduce(`+`, seq(4:100))

This avoids having to write a loop, which can go wrong for many reasons (typos, checking input types, depending on variables outside the global environment… basically anything I mentioned already).

There’s also purrr::reduce() if you prefer the tidyverse ecosystem. Higher-order functions are super flexible; all that matters is that the function you give to reduce() knows what the do with the elements in the list.

Another higher-order function you should know about is purrr::map() (or lapply() if your prefer base functions):

purrr::map(list(mtcars, iris), nrow)
## [[1]]
## [1] 32
## 
## [[2]]
## [1] 150

This loops a function (here nrow()) over a list of whatevers (here data frames). Super flexible once again.

(Optional) Learning number 5: use recursion to avoid loops further

The following function calls itself and reverses a string:

rev_char <- function(x){

  try({
    if(x == ""){
      ""
    } else {
      split_x <- strsplit(x, split = "")[[1]]

      len_x <- length(split_x)

      paste0(split_x[len_x],
             rev_char(paste0(split_x[1:len_x-1],
                             collapse = "")))
    }
  }, stop(paste0("x should be of type 'character', but is of type '",
                 typeof(x), "' instead.")))

}

rev_char("abc")
## [1] "cba"

I say that this is optional, because while it might sometimes be easier to use recursion to define a functions, this is not always the case, and (in the case of R) runs slower than using a loop. If you’re interested in learning more about map() and reduce(), I wrote several blog posts on it here, here and here and some youtube videos as well:

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso