What is wrong with this marginalize-out trick

Posted by Yuling Yao on Aug 23, 2022. Tag: stan

Consider a normal-normal model with vector data $y$ and scalar parameter $\mu$ and $\sigma$ written in the following stan code¹:

data {
  int<lower=0> N;
  vector[N] y;
  real<lower=0> tau;
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  mu ~ normal(0, tau);
  y ~ normal(mu, sigma);
}

Tau is a fixed hyper-parameter, say 10. We can make inference on $\mu$ and $\tau$. That is easy.

But now I decide that I want to apply marginalization out trick to get rid of $\mu$. That is easy because it is a normal-normal model, such that the marginalized out model is

data {
  int<lower=0> N;
  vector[N] y;
  real<lower=0> tau;
}
parameters {
  real<lower=0> sigma;
}
model {
  y ~ normal(0, hypot(sigma, tau));
}

The problem is that this two models are not the same. Mathematically, the full joint model reads

\[y\vert \mu , \sigma \sim N(\mu, \sigma^2),~~ \mu\sim N(0, \tau^2),\]

It looks so tempting to marginalize out $\mu$ and write

\[y\vert \sigma \sim N(0, \sigma^2+ \tau^2).\]

But they just cannot be the same: The MAP estimate of model 1 is $\tau^2= Var(y)$ and $\tau^2= \sum_{i=1}^n y_i^2 / n - \tau^2$ for model 2.

The problem is y is a vector. $y_i$ are conditionally independent given $\mu$ and $\sigma$, but not so when only conditioning on $\sigma$. It is true that the marginal-marginal of $y_i$ is

\[y_i\sim N(0, \sigma^2+ \tau^2).\]

However, the joint-marginal is no longer factorizable. Indeed, Cov$(y_i, y_j)= \tau^2$. So the correct marginalized-out model $y \vert \sigma$ should be a MVN with mean 0 and a covariance matrix whose diagonals are $\sigma^2+ \tau^2$ and off-diagonals $\tau^2$.

The bottomline: Marginalization is a great trick to boost computing efficienty. But it is your obligation to validate the conditional independence after the marginalization.

Bob Carpenter wrote the code. Bob, Charles and I wasted one hour discussing this toy example. Please do not let our employee knows what we are doing. ↩