Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Get basic summary statistics for all the variables in a data frame

R

I have added a new function to my {brotools} package, called describe(), which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir} package but also by assist::describe() (click on the packages to be redirected to the respective Github repos) but I wanted to write my own for two reasons: first, as an exercice, and second I really only needed the function skim_to_wide() from {skimr}. So instead of installing a whole package for a single function, I decided to write my own (since I use {brotools} daily).

Below you can see it in action:

library(dplyr)
data(starwars)
brotools::describe(starwars)
## # A tibble: 10 x 13
##    variable  type   nobs  mean    sd mode     min   max   q25 median   q75
##    <chr>     <chr> <int> <dbl> <dbl> <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 birth_ye… Nume…    87  87.6 155.  19         8   896  35       52  72  
##  2 height    Nume…    87 174.   34.8 172       66   264 167      180 191  
##  3 mass      Nume…    87  97.3 169.  77        15  1358  55.6     79  84.5
##  4 eye_color Char…    87  NA    NA   blue      NA    NA  NA       NA  NA  
##  5 gender    Char…    87  NA    NA   male      NA    NA  NA       NA  NA  
##  6 hair_col… Char…    87  NA    NA   blond     NA    NA  NA       NA  NA  
##  7 homeworld Char…    87  NA    NA   Tatoo…    NA    NA  NA       NA  NA  
##  8 name      Char…    87  NA    NA   Luke …    NA    NA  NA       NA  NA  
##  9 skin_col… Char…    87  NA    NA   fair      NA    NA  NA       NA  NA  
## 10 species   Char…    87  NA    NA   Human     NA    NA  NA       NA  NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

As you can see, the object that is returned by describe() is a tibble.

For now, this function does not handle dates, but it’s in the pipeline.

You can also only describe certain columns:

brotools::describe(starwars, height, mass, name)
## # A tibble: 3 x 13
##   variable type    nobs  mean    sd mode      min   max   q25 median   q75
##   <chr>    <chr>  <int> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 height   Numer…    87 174.   34.8 172        66   264 167      180 191  
## 2 mass     Numer…    87  97.3 169.  77         15  1358  55.6     79  84.5
## 3 name     Chara…    87  NA    NA   Luke S…    NA    NA  NA       NA  NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

If you want to try it out, you can install {brotools} from Github:

devtools::install_github("b-rodrigues/brotools")

If you found this blog post useful, you might want to follow me on twitter for blog post updates.