Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

The {chronicler} package, an implementation of the logger monad in R

R

Back in February I discussed a package I was working on which allowed users to add logging to function calls. I named the package {loudly} but decided to rename it to {chronicler}.

I have been working on it for the past few weeks, and I think that a CRAN release could happen soon.

Introduction

So what does {chronicler} do? {chronicler} allows you do decorate functions, so that they provide enhanced output:

library(chronicler)
## Loading required package: rlang
r_sqrt <- record(sqrt)

a <- r_sqrt(1:5)

Object a is now an object of class chronicle. Let’s print a to the terminal:

a
## ✔ Value computed successfully:
## ---------------
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
## 
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().

as the output says, a is an object of type chronicle. Let’s use read_log() as suggested:

read_log(a)
## [1] "Complete log:"                                      
## [2] "✔ sqrt(1:5) ran successfully at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000240325927734375 secs"

The log tells us how a was built, and it’s especially useful for objects that are the result of many function calls:

r_sqrt <- record(sqrt)
r_exp <- record(exp)
r_mean <- record(mean)

b <- 1:10 |>
  r_sqrt() |>
  bind_record(r_exp) |>
  bind_record(r_mean)

The log gives all the details:

read_log(b)
## [1] "Complete log:"                                           
## [2] "✔ sqrt(1:10) ran successfully at 2022-04-01 21:14:28"    
## [3] "✔ exp(.c$value) ran successfully at 2022-04-01 21:14:28" 
## [4] "✔ mean(.c$value) ran successfully at 2022-04-01 21:14:28"
## [5] "Total running time: 0.00820255279541016 secs"

The end result, or what is called value can be obtained using pick() (you could also use a$value):

pick(a, "value")
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
pick(b, "value")
## [1] 11.55345

Composing decorated functions

bind_record() is used to pass the output from one decorated function to the next:

suppressPackageStartupMessages(
  library(dplyr)
)

r_group_by <- record(group_by)
r_select <- record(select)
r_summarise <- record(summarise)
r_filter <- record(filter)

output <- starwars %>%
  r_select(height, mass, species, sex) %>%
  bind_record(r_group_by, species, sex) %>%
  bind_record(r_filter, sex != "male") %>%
  bind_record(r_summarise,
              mass = mean(mass, na.rm = TRUE)
              )
read_log(output)
## [1] "Complete log:"                                                                         
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"           
## [3] "✔ group_by(.c$value,species,sex) ran successfully at 2022-04-01 21:14:28"              
## [4] "✔ filter(.c$value,sex != \"male\") ran successfully at 2022-04-01 21:14:28"            
## [5] "✔ summarise(.c$value,mean(mass, na.rm = TRUE)) ran successfully at 2022-04-01 21:14:28"
## [6] "Total running time: 0.11384654045105 secs"

The value can then be saved in a new variable:

(my_df <- pick(output, "value"))
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55

You can save the output object with saveRDS() and share it; your colleague can then read the log to learn how the object was created.

This package also ships with a dedicated pipe, %>=% which you can use instead of bind_record():

output_pipe <- starwars %>%
  r_select(height, mass, species, sex) %>=%
  r_group_by(species, sex) %>=%
  r_filter(sex != "male") %>=%
  r_summarise(mass = mean(mass, na.rm = TRUE))
pick(output_pipe, "value")
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55

Condition handling

By default, errors and warnings get caught and composed in the log:

errord_output <- starwars %>%
  r_select(height, mass, species, sex) %>=%
  r_group_by(species, sx) %>=% # typo, "sx" instead of "sex"
  r_filter(sex != "male") %>=%
  r_summarise(mass = mean(mass, na.rm = TRUE))
errord_output
## ✖ Value computed unsuccessfully:
## ---------------
## [1] NA
## 
## ---------------
## This is an object of type `chronicle`.
## Retrieve the value of this object with pick(.c, "value").
## To read the log of this object, call read_log().

Reading the log tells you which function failed, and with which error message:

read_log(errord_output)
## [1] "Complete log:"                                                                                                                                                                                    
## [2] "✔ select(.,height,mass,species,sex) ran successfully at 2022-04-01 21:14:28"                                                                                                                      
## [3] "✖ group_by(.c$value,species,sx) ran unsuccessfully with following exception: Must group by variables found in `.data`.\n✖ Column `sx` is not found. at 2022-04-01 21:14:28"                       
## [4] "✖ filter(.c$value,sex != \"male\") ran unsuccessfully with following exception: no applicable method for 'filter' applied to an object of class \"logical\" at 2022-04-01 21:14:28"               
## [5] "✖ summarise(.c$value,mean(mass, na.rm = TRUE)) ran unsuccessfully with following exception: no applicable method for 'summarise' applied to an object of class \"logical\" at 2022-04-01 21:14:28"
## [6] "Total running time: 0.163575887680054 secs"

It is also possible to only capture errors, or catpure errors, warnings and messages using the strict parameter of record()

# Only errors:

r_sqrt <- record(sqrt, strict = 1)

# Nothing will be captured here, since sqrt(-10) returns a NA and a warning
r_sqrt(-10) |>
  read_log()
## Warning in .f(...): NaNs produced
## [1] "Complete log:"                                                                     
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NA at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000255584716796875 secs"
# Errors and warnings:

r_sqrt <- record(sqrt, strict = 2)

# The warning gets captured
r_sqrt(-10) |>
  read_log()
## [1] "Complete log:"                                                                                
## [2] "✖ sqrt(-10) ran unsuccessfully with following exception: NaNs produced at 2022-04-01 21:14:28"
## [3] "Total running time: 0.00019383430480957 secs"
# Errors, warnings and messages

my_f <- function(x){
  message("this is a message")
  10
}

record(my_f, strict = 3)(10) |>
                         read_log()
## [1] "Complete log:"                                                                                     
## [2] "✖ my_f(10) ran unsuccessfully with following exception: this is a message\n at 2022-04-01 21:14:28"
## [3] "Total running time: 0.000336408615112305 secs"

Advanced logging

You can provide a function to record(), which will be evaluated on the output. This makes it possible to, for example, monitor the size of a data frame throughout the pipeline. In the example below I provide dim(), which will return the dimensions of the data frame, as an argument to record():

r_group_by <- record(group_by)
r_select <- record(select, .g = dim)
r_summarise <- record(summarise, .g = dim)
r_filter <- record(filter, .g = dim)

output_pipe <- starwars %>%
  r_select(height, mass, species, sex) %>=%
  r_group_by(species, sex) %>=%
  r_filter(sex != "male") %>=%
  r_summarise(mass = mean(mass, na.rm = TRUE))

The $log_df element of a chronicle object contains detailed information. In most cases you don’t need to worry about it:

pick(output_pipe, "log_df")
## # A tibble: 4 × 8
##   outcome   `function` arguments message start_time          end_time           
##   <chr>     <chr>      <chr>     <chr>   <dttm>              <dttm>             
## 1 ✔ Success select     ".,heigh… NA      2022-04-01 21:14:28 2022-04-01 21:14:28
## 2 ✔ Success group_by   ".c$valu… NA      2022-04-01 21:14:28 2022-04-01 21:14:28
## 3 ✔ Success filter     ".c$valu… NA      2022-04-01 21:14:28 2022-04-01 21:14:29
## 4 ✔ Success summarise  ".c$valu… NA      2022-04-01 21:14:28 2022-04-01 21:14:29
## # … with 2 more variables: run_time <drtn>, g <list>

but if you want to look at the output of .g, then you have to grab it and see:

# I coerce it to a data.frame just for the output here on my blog, to make the column `g` readable
as.data.frame(output_pipe$log_df[, c("function", "g")])
##    function     g
## 1    select 87, 4
## 2  group_by    NA
## 3    filter 23, 4
## 4 summarise  9, 3

We can see that the dimension of the dataframe was (87, 4) after the call to select(), (23, 4) after the call to filter() and finally (9, 3) after the call to summarise().

Monads

This package implements a logger monad. I might talk about monads in the future, but probably in a video; if you don’t know what monads are, don’t worry, no one really knows. Legend has it that to truly understand what monads are you have to have a PhD in computer science and have been born in the former Soviet Union. But to make things simple, you can think of a monad as a way to:

  • embelish functions to provide additional output without having to touch the function’s core behaviour
  • a way to compose these functions and work with the embelished outputs (also called monadic values)
  • monadic values are basically containers that contain the actual value of the function evaluated on its inputs and something else (here, a log)

Monads are quite useful in some programming languanges, like Haskell. Not so much in R, but I think that the logger monad I propose here can be quite useful. So let me know if you find it useful or if you have suggestions!

You can install {chronicler} from its github repo.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso