Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Add logging to your functions using my newest package `{loud}`

R

UPDATE: {loud} has been superseded by {chronicle}, read about it here

This is a short blog post to announce the early alpha, hyper unstable, use at your own peril, package I’ve been working on for the past 6 hours or so (actually longer if I add all the research/study time). This package provides the function loudly() which allows you to do cool stuff like:

# First two lines install the package
# install.packages("devtools")
# devtools::install_github("b-rodrigues/loud")
library(loud)
## Loading required package: rlang
loud_sqrt <- loudly(sqrt)

loud_sqrt(1:10)
## $result
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
##  [9] 3.000000 3.162278
## 
## $log
## [1] "Log start..."                                                                
## [2] "✔ sqrt(1:10) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"

As you can see, I start by applying loudly() to a function, and then I can use this function as usual. Not only do I get the result, but also a logging message telling me which function and which arguments got used, and when the computation started and ended.

It is also possible to chain operations:

loud_sqrt <- loudly(sqrt)
loud_exp <- loudly(exp)
loud_mean <- loudly(mean)

1:10 |>
  loud_sqrt() |>
  bind_loudly(loud_exp) |>
  bind_loudly(loud_mean)
## $result
## [1] 11.55345
## 
## $log
## [1] "Log start..."                                                                     
## [2] "✔ sqrt(1:10) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"     
## [3] "✔ exp(.l$result) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00" 
## [4] "✔ mean(.l$result) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"

You’ll notice that here I have to use another function called bind_loudly(). The reason is because loud functions return a list. The first element of that list is the result of the function applied to the inputs, and the second element is the log message. So bind_loudly() passes the first element of the output of loud_sqrt() to the actual function exp() and also passes the second element, this time the log message, to the part of the function that concatenates the log messages.

This works with any function:

library(dplyr)
loud_group_by <- loudly(group_by)
loud_select <- loudly(select)
loud_summarise <- loudly(summarise)
loud_filter <- loudly(filter)

starwars %>%
  loud_select(height, mass, species, sex) %>%
  bind_loudly(loud_group_by, species, sex) %>%
  bind_loudly(loud_filter, sex != "male") %>%
  bind_loudly(loud_summarise,
              mass = mean(mass, na.rm = TRUE)
              )
## $result
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55  
## 
## $log
## [1] "Log start..."                                                                                                   
## [2] "✔ select(.,height,mass,species,sex) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"            
## [3] "✔ group_by(.l$result,species,sex) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"              
## [4] "✔ filter(.l$result,sex != \"male\") started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"            
## [5] "✔ summarise(.l$result,mean(mass, na.rm = TRUE)) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"

This is not perfect however. You’ll notice that the last log message states:

summarise(.l$result,mean(mass, na.rm = TRUE)) ....

ideally I would like for it to say:

summarise(.l$result,mass = mean(mass, na.rm = TRUE)) ....

Also, I’ve added a pipe operator so you don’t need to use bind_loudly() if you don’t want to:

1:10 |>
  loud_sqrt() %>=%
  loud_exp() %>=%
  loud_mean()
## $result
## [1] 11.55345
## 
## $log
## [1] "Log start..."                                                                     
## [2] "✔ sqrt(1:10) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"     
## [3] "✔ exp(.l$result) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00" 
## [4] "✔ mean(.l$result) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"

However, this operator does not work well with {dplyr} functions. See here:

starwars %>%
  loud_select(height, mass, species, sex) %>=%
  loud_group_by(species, sex) %>=%
  loud_filter(sex != "male") %>=%
  loud_summarise(mass = mean(mass, na.rm = TRUE))
## $result
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55  
## 
## $log
## [1] "Log start..."                                                                                                   
## [2] "✔ select(.,height,mass,species,sex) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"            
## [3] "✔ group_by(.l$result,species,sex) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"              
## [4] "✔ filter(.l$result,sex != \"male\") started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"            
## [5] "✔ summarise(.l$result,mean(mass, na.rm = TRUE)) started at 2022-04-01 21:20:00 and ended at 2022-04-01 21:20:00"

If you look at the result, you’ll see that it is not equal to the obtained with bind_loudly(), and if you look at the last logging message you’ll see why. Instead of

summarise(.l$result,mean(mass, na.rm = TRUE)) ....

the message states:

summarise(.l$result,mass,TRUE) started at

I know where the problem is (it’s due to some regex fuckery) so I think that I should be able to correct this in the coming days. Ideally, in the future, I would also like for the users to provide their own log messages.

The package has a website with a vignette that shows another interesting example here. Source code can be found here.

It is almost certain that function names will change, maybe even the package name itself. Contributions, bug reports, suggestions, etc, welcome of course.

A final word: this is the result of me exploring more advanced functional programming concepts and discussing with really nice people like Andrew R Mcneil, Laszlo Kupcsik. Andrew wrote a cool package called maybe and Laszlo a super cool blog post explaining what monads are here.

I’ll be writing a blog post on monads, in particular the maybe monad soonish.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso