Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Statistical matching, or when one single data source is not enough


I was recently asked how to go about matching several datasets where different samples of individuals were interviewed. This sounds like a big problem; say that you have dataset A and B, and that A contain one sample of individuals, and B another sample of individuals, then how could you possibly match the datasets? Matching datasets requires a common identifier, for instance, suppose that A contains socio-demographic information on a sample of individuals I, while B, contains information on wages and hours worked on the same sample of individuals I, then yes, it will be possible to match/merge/join both datasets.

But that was not what I was asked about; I was asked about a situation where the same population gets sampled twice, and each sample answers to a different survey. For example the first survey is about labour market information and survey B is about family structure. Would it be possible to combine the information from both datasets?

To me, this sounded a bit like missing data imputation problem, but where all the information about the variables of interest was missing! I started digging a bit, and found that not only there was already quite some literature on it, there is even a package for this, called {StatMatch} with a very detailed vignette. The vignette is so detailed, that I will not write any code, I just wanted to share this package!

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or

Buy me an EspressoBuy me an Espresso