How does lmer (from the R package lme4) compute log likelihood?

Question

The links in the comments contained the answer. Below I've put what the formulae simplify to in this simple example, since the results are somewhat intuitive.

lmer fits a model of the form $Y_{ij} = \beta + B_i + \epsilon_{ij}$ , where $\epsilon_{ij}$ and $B_i$ are independent normals with variances $\sigma^2$ and $\tau^2$ respectively. The joint probability distribution of $Y_{ij}$ and $B_i$ is therefore

$\left(\prod_{i,j}f_{\sigma^2}(y_{ij}-\beta-b_i)\right)\left(\prod_i f_{\tau^2}(b_i)\right)$

where

$f_{\sigma^2}(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$ .

The likelihood is obtained by integrating this with respect to $b_i$ (which isn't observed) to give

$\left(\prod_{i,j}f_{\sigma^2}(y_{ij}-\bar y_i)\right)\left(\prod_i f_{\sigma^2/n_i+\tau^2}(\bar y_i-\beta)\sqrt{2\pi\sigma^2/n_i}\right)$

where $n_i$ is the number of observations from group $i$ , and $\bar y_i$ is the mean of observations from group $i$ . This is somewhat intuitive since the first term captures spread within each group, which should have variance $\sigma^2$ , and the second captures the spread between groups. Note that $\sigma^2/n_i+\tau^2$ is the variance of $\bar y_i$ .

However, by default (REML=T) lmer maximises not the likelihood but the "REML criterion", obtained by additionally integrating this with respect to $\beta$ to give

$\left(\prod_{i,j}f_{\sigma^2}(y_{ij}-\bar y_i)\right)\left(\prod_i f_{\sigma^2/n_i+\tau^2}(\bar y_i-\hat\beta)\sqrt{2\pi\sigma^2/n_i}\right)\sqrt{\frac{2\pi\sigma^2}{\sum_i\frac{n_i}{1+n_i\theta^2}}}$

where $\hat\beta$ is given below.

Maximising likelihood (REML=F)

If $\theta=\tau/\sigma$ is fixed, we can explicitly find the $\beta$ and $\sigma$ which maximise likelihood. They turn out to be

$\hat\beta=\frac{\sum_{i,j}y_{ij}/(1+n_i\theta^2)}{\sum_i n_i/(1+n_i\theta^2)}$

$\hat\sigma^2=\frac{1}{n}\left(\sum_{i,j}(y_{ij}-\bar y_i)^2+\sum_i\frac{n_i}{1+n_i\theta^2}(\bar y_i-\hat\beta)^2\right)$

Note $\hat\sigma^2$ has two terms for variation within and between groups, and $\hat\beta$ is somewhere between the mean of $y_{ij}$ and the mean of $\bar y_i$ depending on the value of $\theta$ .

Substituting these into likelihood, we can express the log likelihood $l$ in terms of $\theta$ only:

$-2l=\sum_i\log(1+n_i\theta^2)+n(1+\log(2\pi\hat\sigma^2))$

lmer iterates to find the value of $\theta$ which minimises this. In the output, $-2l$ and $l$ are shown in the fields "deviance" and "logLik" (if REML=F) respectively.

Maximising restricted likelihood (REML=T)

Since the REML criterion doesn't depend on $\beta$ , we use the same estimate for $\beta$ as above. We estimate $\sigma$ to maximise the REML criterion:

$\hat\beta=\frac{\sum_{i,j}y_{ij}/(1+n_i\theta^2)}{\sum_i n_i/(1+n_i\theta^2)}$

$\hat\sigma^2=\frac{1}{n-1}\left(\sum_{i,j}(y_{ij}-\bar y_i)^2+\sum_i\frac{n_i}{1+n_i\theta^2}(\bar y_i-\hat\beta)^2\right)$

The restricted log likelihood $l_R$ is given by

$-2l_R=\sum_i\log(1+n_i\theta^2)+(n-1)(1+\log(2\pi\hat\sigma^2))+\log\left(\sum_i\frac{n_i}{1+n_i\theta^2}\right)$

In the output of lmer, $-2l_R$ and $l_R$ are shown in the fields "REMLdev" and "logLik" (if REML=T) respectively.