A tutorial on tidy cross-validation with R
Analyzing NetHack data, part 1: What kills the players
Analyzing NetHack data, part 2: What players kill the most
Building a shiny app to explore historical newspapers: a step-by-step guide
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 1
Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit, part 2
Curly-Curly, the successor of Bang-Bang
Dealing with heteroskedasticity; regression with robust standard errors using R
Easy time-series prediction with R: a tutorial with air traffic data from Lux Airport
Exporting editable plots from R to Powerpoint: making ggplot2 purrr with officer
Fast food, causality and R packages, part 1
Fast food, causality and R packages, part 2
For posterity: install {xml2} on GNU/Linux distros
Forecasting my weight with R
From webscraping data to releasing it as an R package to share with the world: a full tutorial with data from NetHack
Get text from pdfs or images using OCR: a tutorial with {tesseract} and {magick}
Getting data from pdfs using the pdftools package
Getting the data from the Luxembourguish elections out of Excel
Going from a human readable Excel file to a machine-readable csv with {tidyxl}
Historical newspaper scraping with {tesseract} and R
How Luxembourguish residents spend their time: a small {flexdashboard} demo using the Time use survey data
Imputing missing values in parallel using {furrr}
Intermittent demand, Croston and Die Hard
Looking into 19th century ads from a Luxembourguish newspaper with R
Making sense of the METS and ALTO XML standards
Manipulate dates easily with {lubridate}
Manipulating strings with the {stringr} package
Maps with pie charts on top of each administrative division: an example with Luxembourg's elections data
Missing data imputation and instrumental variables regression: the tidy approach
Modern R with the tidyverse is available on Leanpub
Objects types and some useful R functions for beginners
Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`
R or Python? Why not both? Using Anaconda Python within R with {reticulate}
Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach
Some fun with {gganimate}
Split-apply-combine for Maximum Likelihood Estimation of a linear model
Statistical matching, or when one single data source is not enough
The best way to visit Luxembourguish castles is doing data science + combinatorial optimization
The never-ending editor war (?)
The year of the GNU+Linux desktop is upon us: using user ratings of Steam Play compatibility to play around with regex and the tidyverse
Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century
Using a genetic algorithm for the hyperparameter optimization of a SARIMA model
Using cosine similarity to find matching documents: a tutorial using Seneca's letters to his friend Lucilius
Using linear models with binary dependent variables, a simulation study
Using the tidyverse for more than data manipulation: estimating pi with Monte Carlo methods
What hyper-parameters are, and what to do with them; an illustration with ridge regression
{disk.frame} is epic
{pmice}, an experimental package for missing data imputation in parallel using {mice} and {furrr}
Building formulae
Functional peace of mind
Get basic summary statistics for all the variables in a data frame
Getting {sparklyr}, {h2o}, {rsparkling} to work together and some fun with bash
Importing 30GB of data into R with sparklyr
Introducing brotools
It's lists all the way down
It's lists all the way down, part 2: We need to go deeper
Keep trying that api call with purrr::possibly()
Lesser known dplyr 0.7* tricks
Lesser known dplyr tricks
Lesser known purrr tricks
Make ggplot2 purrr
Mapping a list of functions to a list of datasets with a list of columns as arguments
Predicting job search by training a random forest on an unbalanced dataset
Teaching the tidyverse to beginners
Why I find tidyeval useful
tidyr::spread() and dplyr::rename_at() in action
Easy peasy STATA-like marginal effects with R
Functional programming and unit testing for data munging with R available on Leanpub
How to use jailbreakr
My free book has a cover!
Work on lists of datasets instead of individual datasets by using functional programming
Method of Simulated Moments with R
New website!
Nonlinear Gmm with R - Example with a logistic regression
Simulated Maximum Likelihood with R
Bootstrapping standard errors for difference-in-differences estimation with R
Careful with tryCatch
Data frame columns as arguments to dplyr functions
Export R output to a file
I've started writing a 'book': Functional programming and unit testing for data munging with R
Introduction to programming econometrics with R
Merge a list of datasets together
Object Oriented Programming with R: An example with a Cournot duopoly
R, R with Atlas, R with OpenBLAS and Revolution R Open: which is fastest?
Read a lot of datasets at once with R
Unit testing with R
Update to Introduction to programming econometrics with R
Using R as a Computer Algebra System with Ryacas

I have added a new function to my `{brotools}`

package, called `describe()`

,
which takes a data frame as an argument, and returns another data frame with descriptive
statistics. It is very much inspired by the `{skmir}`

package but also by
`assist::describe()`

(click
on the packages to be redirected to the respective Github repos)
but I wanted to write my own for two reasons: first, as an exercice, and second
I really only needed the function `skim_to_wide()`

from `{skimr}`

. So instead of installing a
whole package for a single function, I decided to write my own (since I use `{brotools}`

daily).

Below you can see it in action:

```
library(dplyr)
data(starwars)
```

`brotools::describe(starwars)`

```
## # A tibble: 10 x 13
## variable type nobs mean sd mode min max q25 median q75
## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 birth_ye… Nume… 87 87.6 155. 19 8 896 35 52 72
## 2 height Nume… 87 174. 34.8 172 66 264 167 180 191
## 3 mass Nume… 87 97.3 169. 77 15 1358 55.6 79 84.5
## 4 eye_color Char… 87 NA NA blue NA NA NA NA NA
## 5 gender Char… 87 NA NA male NA NA NA NA NA
## 6 hair_col… Char… 87 NA NA blond NA NA NA NA NA
## 7 homeworld Char… 87 NA NA Tatoo… NA NA NA NA NA
## 8 name Char… 87 NA NA Luke … NA NA NA NA NA
## 9 skin_col… Char… 87 NA NA fair NA NA NA NA NA
## 10 species Char… 87 NA NA Human NA NA NA NA NA
## # ... with 2 more variables: n_missing <int>, n_unique <int>
```

As you can see, the object that is returned by `describe()`

is a `tibble`

.

For now, this function does not handle dates, but it’s in the pipeline.

You can also only describe certain columns:

`brotools::describe(starwars, height, mass, name)`

```
## # A tibble: 3 x 13
## variable type nobs mean sd mode min max q25 median q75
## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height Numer… 87 174. 34.8 172 66 264 167 180 191
## 2 mass Numer… 87 97.3 169. 77 15 1358 55.6 79 84.5
## 3 name Chara… 87 NA NA Luke S… NA NA NA NA NA
## # ... with 2 more variables: n_missing <int>, n_unique <int>
```

If you want to try it out,
you can install `{brotools}`

from Github:

`devtools::install_github("b-rodrigues/brotools")`

If you found this blog post useful, you might want to follow me on twitter for blog post updates.