Alternaitves to two stage modeling
Posted by Yuling Yao on Aug 22, 2022.Sometimes a model can be decomposed into modules and we may run inference separately. This task comes a lot in cutfeedback, SMC, causal inference (two stage regression), multiple imputation, and PKPD modeling.
To have an easiest example, consider a Stan model with data y
and parameter mu
, sigma
y ~ normal (mu, sigma);
For some reason, we have already fitted mu
from a different module or from a different dataset. We have obtained $\mu_1, \dots, \mu_S$. The goal is to make inference on $p(\sigma \vert y, \mu_1, \dots, \mu_S)$.
To be clear, till now we have already lost fullBayeisanity now since we do not fit a joint model. But hey, we are inclusive of nonbayeisan methods.
There are three seemingly reasonable approaches to do for the second stage model:

Multiple imputation. We run the model
y ~ normal (mu[i], sigma);
separately for each $i$ and collect draws $p(\sigma \vert y, \mu_i)$; we then mix these draws altogether. We run this method in MI, Cut. 
Plugin estimate. When we do two stage leastsquare fit, we simply plugin the first stage point estimate, say the posterior mean. This amount to a new model $y \sim normal (\bar {\mu}, \sigma)$, where $\bar {\mu}= 1/S \sum_{i=1}^S \mu_i$.

Mixed log likelihood. At least seemingly doable, we may also mix the log density from these draws, which in Stan reads
for (i in 1:S)
target += 1/S * normal_lpdf (y  mu[i], sigma);
In this model example, the mixedloglikelihoodapproach is identical to the plugin estimate, although generally all these three methods will differ. Using the conditional variance formula, we can see that the multiple imputation delivers that largest estimate of $\sigma$.
OK, I know that in most cases approach 1 is the only acceptable answer. The justification is straight from the Bayes rule:
\[p(\sigma \vert y) = \int p(\sigma \vert y, \mu) p(\mu \vert y ) d\mu.\]My controversial objective is that the Bayes rule is only relevant is we are running a joint model and infer $\mu$ and $\sigma$ together. That is SMC. But in a situation like Cut, we are placing doubt on the model in the first place, and still keep the obsession over this bayes rule seems a little bit stubborn to me.
Approach 1 and approach 3 differ in how they mix the conditional sampling model $p(y \vert \sigma, \mu)$. Approach 1 is using a mixture (coherent with the joint model)
\[p(y \vert \sigma) := \int (p(y \vert \sigma, \mu) p(\mu \vert \sigma) ) d\mu,\]while approach 3 is using loglinearpooling (this line does not correspond to any joint model):
\[\log p(y \vert \sigma) := \int \log p(y \vert \sigma, \mu) p(\mu \vert y) d\mu + Constant.\]I wonder if this approach 3 has any actual application. I do not know.