Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Read a lot of datasets at once with R

R

I often have to read a lot of datasets at once using R. So I’ve wrote the following function to solve this issue:

read_list <- function(list_of_datasets, read_func){

        read_and_assign <- function(dataset, read_func){
                dataset_name <- as.name(dataset)
                dataset_name <- read_func(dataset)
        }

        # invisible is used to suppress the unneeded output
        output <- invisible(
                sapply(list_of_datasets,
                           read_and_assign, read_func = read_func, simplify = FALSE, USE.NAMES = TRUE))

        # Remove the extension at the end of the data set names
        names_of_datasets <- c(unlist(strsplit(list_of_datasets, "[.]"))[c(T, F)])
        names(output) <- names_of_datasets
        return(output)
}

You need to supply a list of datasets as well as the function to read the datasets to read_list. So for example to read in .csv files, you could use read.csv() (or read_csv() from the readr package, which I prefer to use), or read_dta() from the package haven for STATA files, and so on.

Now imagine you have some data in your working directory. First start by saving the name of the datasets in a variable:

data_files <- list.files(pattern = ".csv")

print(data_files)
## [1] "data_1.csv" "data_2.csv" "data_3.csv"

Now you can read all the data sets and save them in a list with read_list():

library("readr")
library("tibble")

list_of_data_sets <- read_list(data_files, read_csv)


glimpse(list_of_data_sets)
## List of 3
##  $ data_1:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,018930679" "0,8748013128" "0,1025635934" "0,6246140983" ...
##   ..$ col2: chr [1:19] "0,0377725807" "0,5959457638" "0,4429121533" "0,558387159" ...
##   ..$ col3: chr [1:19] "0,6241767189" "0,031324594" "0,2238059868" "0,2773350732" ...
##  $ data_2:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,9098418493" "0,1127788509" "0,5818891392" "0,1011773532" ...
##   ..$ col2: chr [1:19] "0,7455905887" "0,4015039612" "0,6625796605" "0,029955339" ...
##   ..$ col3: chr [1:19] "0,327232932" "0,2784035673" "0,8092386735" "0,1216045306" ...
##  $ data_3:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..$ col1: chr [1:19] "0,9236124896" "0,6303271761" "0,6413583054" "0,5573887416" ...
##   ..$ col2: chr [1:19] "0,2114708388" "0,6984538266" "0,0469865249" "0,9271510226" ...
##   ..$ col3: chr [1:19] "0,4941919971" "0,7391538511" "0,3876723797" "0,2815014394" ...

If you prefer not to have the datasets in a list, but rather import them into the global environment, you can change the above function like so:

read_list <- function(list_of_datasets, read_func){

        read_and_assign <- function(dataset, read_func){
                assign(dataset, read_func(dataset), envir = .GlobalEnv)
        }

        # invisible is used to suppress the unneeded output
        output <- invisible(
                sapply(list_of_datasets,
                           read_and_assign, read_func = read_func, simplify = FALSE, USE.NAMES = TRUE))

}

But I personnally don’t like this second option, but I put it here for completeness.