What is wrong with this marginalize-out trick
Posted by Yuling Yao on Aug 23, 2022.Consider a normal-normal model with vector data $y$ and scalar parameter $\mu$ and $\sigma$ written in the following stan code1:
data {
int<lower=0> N;
vector[N] y;
real<lower=0> tau;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
mu ~ normal(0, tau);
y ~ normal(mu, sigma);
}
Tau is a fixed hyper-parameter, say 10. We can make inference on $\mu$ and $\tau$. That is easy.
But now I decide that I want to apply marginalization out trick to get rid of $\mu$. That is easy because it is a normal-normal model, such that the marginalized out model is
data {
int<lower=0> N;
vector[N] y;
real<lower=0> tau;
}
parameters {
real<lower=0> sigma;
}
model {
y ~ normal(0, hypot(sigma, tau));
}
The problem is that this two models are not the same. Mathematically, the full joint model reads
\[y\vert \mu , \sigma \sim N(\mu, \sigma^2),~~ \mu\sim N(0, \tau^2),\]It looks so tempting to marginalize out $\mu$ and write
\[y\vert \sigma \sim N(0, \sigma^2+ \tau^2).\]But they just cannot be the same: The MAP estimate of model 1 is $\tau^2= Var(y)$ and $\tau^2= \sum_{i=1}^n y_i^2 / n - \tau^2$ for model 2.
The problem is y is a vector. $y_i$ are conditionally independent given $\mu$ and $\sigma$, but not so when only conditioning on $\sigma$. It is true that the marginal-marginal of $y_i$ is
\[y_i\sim N(0, \sigma^2+ \tau^2).\]However, the joint-marginal is no longer factorizable. Indeed, Cov$(y_i, y_j)= \tau^2$. So the correct marginalized-out model $y \vert \sigma$ should be a MVN with mean 0 and a covariance matrix whose diagonals are $\sigma^2+ \tau^2$ and off-diagonals $\tau^2$.
The bottomline: Marginalization is a great trick to boost computing efficienty. But it is your obligation to validate the conditional independence after the marginalization.
-
Bob Carpenter wrote the code. Bob, Charles and I wasted one hour discussing this toy example. Please do not let our employee knows what we are doing. ↩