Get basic summary statistics for all the variables in a data frame

2018/04/10 R

I have added a new function to my {brotools} package, called describe(), which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir} package but also by assist::describe() (click on the packages to be redirected to the respective Github repos) but I wanted to write my own for two reasons: first, as an exercice, and second I really only needed the function skim_to_wide() from {skimr}. So instead of installing a whole package for a single function, I decided to write my own (since I use {brotools} daily).

Below you can see it in action:

library(dplyr)
data(starwars)

brotools::describe(starwars)

## # A tibble: 10 x 13
##    variable  type   nobs  mean    sd mode     min   max   q25 median   q75
##    <chr>     <chr> <int> <dbl> <dbl> <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 birth_ye… Nume…    87  87.6 155.  19         8   896  35       52  72  
##  2 height    Nume…    87 174.   34.8 172       66   264 167      180 191  
##  3 mass      Nume…    87  97.3 169.  77        15  1358  55.6     79  84.5
##  4 eye_color Char…    87  NA    NA   blue      NA    NA  NA       NA  NA  
##  5 gender    Char…    87  NA    NA   male      NA    NA  NA       NA  NA  
##  6 hair_col… Char…    87  NA    NA   blond     NA    NA  NA       NA  NA  
##  7 homeworld Char…    87  NA    NA   Tatoo…    NA    NA  NA       NA  NA  
##  8 name      Char…    87  NA    NA   Luke …    NA    NA  NA       NA  NA  
##  9 skin_col… Char…    87  NA    NA   fair      NA    NA  NA       NA  NA  
## 10 species   Char…    87  NA    NA   Human     NA    NA  NA       NA  NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

As you can see, the object that is returned by describe() is a tibble.

For now, this function does not handle dates, but it’s in the pipeline.

You can also only describe certain columns:

brotools::describe(starwars, height, mass, name)

## # A tibble: 3 x 13
##   variable type    nobs  mean    sd mode      min   max   q25 median   q75
##   <chr>    <chr>  <int> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 height   Numer…    87 174.   34.8 172        66   264 167      180 191  
## 2 mass     Numer…    87  97.3 169.  77         15  1358  55.6     79  84.5
## 3 name     Chara…    87  NA    NA   Luke S…    NA    NA  NA       NA  NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

If you want to try it out, you can install {brotools} from Github:

devtools::install_github("b-rodrigues/brotools")

If you found this blog post useful, you might want to follow me on twitter for blog post updates.