# Merge a list of datasets together

R

Last week I showed how to read a lot of datasets at once with R, and this week I’ll continue from there and show a very simple function that uses this list of read datasets and merges them all together.

First we’ll use `read_list()` to read all the datasets at once (for more details read last week’s post):

``````library("readr")
library("tibble")

data_files <- list.files(pattern = ".csv")

print(data_files)``````
``##  "data_1.csv" "data_2.csv" "data_3.csv"``
``````list_of_data_sets <- read_list(data_files, read_csv)

glimpse(list_of_data_sets)``````
``````## List of 3
##  \$ data_1:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..\$ col1: chr [1:19] "0,018930679" "0,8748013128" "0,1025635934" "0,6246140983" ...
##   ..\$ col2: chr [1:19] "0,0377725807" "0,5959457638" "0,4429121533" "0,558387159" ...
##   ..\$ col3: chr [1:19] "0,6241767189" "0,031324594" "0,2238059868" "0,2773350732" ...
##  \$ data_2:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..\$ col1: chr [1:19] "0,9098418493" "0,1127788509" "0,5818891392" "0,1011773532" ...
##   ..\$ col2: chr [1:19] "0,7455905887" "0,4015039612" "0,6625796605" "0,029955339" ...
##   ..\$ col3: chr [1:19] "0,327232932" "0,2784035673" "0,8092386735" "0,1216045306" ...
##  \$ data_3:Classes 'tbl_df', 'tbl' and 'data.frame':  19 obs. of  3 variables:
##   ..\$ col1: chr [1:19] "0,9236124896" "0,6303271761" "0,6413583054" "0,5573887416" ...
##   ..\$ col2: chr [1:19] "0,2114708388" "0,6984538266" "0,0469865249" "0,9271510226" ...
##   ..\$ col3: chr [1:19] "0,4941919971" "0,7391538511" "0,3876723797" "0,2815014394" ...``````

You see that all these datasets have the same column names. We can now merge them using this simple function:

``````multi_join <- function(list_of_loaded_data, join_func, ...){

require("dplyr")

output <- Reduce(function(x, y) {join_func(x, y, ...)}, list_of_loaded_data)

return(output)
}``````

This function uses `Reduce()`. `Reduce()` is a very important function that can be found in all functional programming languages. What does `Reduce()` do? Let’s take a look at the following example:

``Reduce(`+`, c(1, 2, 3, 4, 5))``
``##  15``

`Reduce()` has several arguments, but you need to specify at least two: a function, here `+` and a list, here `c(1, 2, 3, 4, 5)`. The next code block shows what `Reduce()` basically does:

``````0 + c(1, 2, 3, 4, 5)
0 + 1 + c(2, 3, 4, 5)
0 + 1 + 2 + c(3, 4, 5)
0 + 1 + 2 + 3 + c(4, 5)
0 + 1 + 2 + 3 + 4 + c(5)
0 + 1 + 2 + 3 + 4 + 5``````

`0` had to be added as in “init”. You can also specify this “init” to `Reduce()`:

``Reduce(`+`, c(1, 2, 3, 4, 5), init = 20)``
``##  35``

So what `multi_join()` does, is the same operation as in the example above, but where the function is a user supplied join or merge function, and the list of datasets is the one read with `read_list()`.

Let’s see what happens when we use `multi_join()` on our list:

``merged_data <- multi_join(list_of_data_sets, full_join)``
``class(merged_data)``
``##  "tbl_df"     "tbl"        "data.frame"``
``glimpse(merged_data)``
``````## Observations: 57
## Variables: 3
## \$ col1 <chr> "0,018930679", "0,8748013128", "0,1025635934", "0,6246140...
## \$ col2 <chr> "0,0377725807", "0,5959457638", "0,4429121533", "0,558387...
## \$ col3 <chr> "0,6241767189", "0,031324594", "0,2238059868", "0,2773350...``````

You should make sure that all the data frames have the same column names but you can also join data frames with different column names if you give the argument `by` to the join function. This is possible thanks to `...` that allows you to pass further argument to `join_func()`.

This function was inspired by the one found on the blog Coffee and Econometrics in the Morning.