Econometrics and Free Software by Bruno Rodrigues.
RSS feed for blog post updates.
Follow me on Mastodon, twitter, or check out my Github.
Check out my package that adds logging to R functions, {chronicler}.
Or read my free ebooks, to learn some R and build reproducible analytical pipelines..
You can also watch my youtube channel or find the slides to the talks I've given here.
Buy me a coffee, my kids don't let me sleep.

Simulated Maximum Likelihood with R


This document details section 12.4.5. Unobserved Heterogeneity Example from Cameron and Trivedi's book - MICROECONOMETRICS: Methods and Applications. The original source code giving the results from table 12.2 are available from the authors' site here and written for Stata. This is an attempt to translate the code to R. I'd like to thank Reddit user anonemouse2010 for his advice which helped me write the function.

Consult the original source code if you want to read the authors' comments. If you want the R source code without all the commentaries, grab it here. This is not guaranteed to work, nor to be correct. It could set your pet on fire and/or eat your first born. Use at your own risk. I may, or may not, expand this example. Corrections, constructive criticism are welcome.

The model is \( y=\theta+u+\varepsilon \) where \( \theta \) is a scalar parameter equal to 1. \( u \) is extreme value type 1 (Gumbel distribution), \( \varepsilon \leadsto \mathbb{N}(0,1) \). For more details, consult the book.

Import the data

You can consult the original source code to see how the authors simulated the data. To get the same results, and verify that I didn't make mistakes I prefer importing their data directly from their website.

data <- read.table("")
u <- data[, 1]
e <- data[, 2]
y <- data[, 3]
numobs <- length(u)
simreps <- 10000


In the code below, the following likelihood function: $$\log{\hat{L}_N(\theta)} = \dfrac{1}{N} \sum_{i=1}^N\log{\big( \dfrac{1}{S}\sum_{s=1}^S \dfrac{1}{\sqrt{2\pi}} \exp \{ -(-y_i-\theta-u_i^s)^2/2 \}\big)}$$ which can be found on page 397 is programmed using the function sapply.

denssim <- function(theta) {
    loglik <- mean(sapply(y, function(y) log(mean((1/sqrt(2 * pi)) * exp(-(y - theta + log(-log(runif(simreps))))^2/2)))))

This likelihood is then maximized:

system.time(res <- optim(0.1, denssim, method = "BFGS", control = list(maxit = simreps)))
##    user  system elapsed 
##   21.98    0.08   22.09

Convergence is achieved pretty rapidly, to

## [1] 1.101

which is close to the true value of the parameter 1 (which was used to generate the data).

Let's try again with another parameter value, for example \( \theta=2.5 \). We have to generate y again:

y2 <- 2.5 + u + e

and slightly modify the likelihood:

denssim2 <- function(theta) {
    loglik <- mean(sapply(y2, function(y2) log(mean((1/sqrt(2 * pi)) * exp(-(y2 - 
        theta + log(-log(runif(simreps))))^2/2)))))

which can then be maximized:

system.time(res2 <- optim(0.1, denssim2, method = "BFGS", control = list(maxit = simreps)))
##    user  system elapsed 
##   12.56    0.00   12.57

The value that maximizes the likelihood is:

## [1] 2.713

which is close to the true value of the parameter 2.5 (which was used to generate the data).