Alternaitves to two stage modeling

Posted by Yuling Yao on Aug 22, 2022. Tag: modeling

Sometimes a model can be decomposed into modules and we may run inference separately. This task comes a lot in cut-feedback, SMC, causal inference (two stage regression), multiple imputation, and PK-PD modeling.

To have an easiest example, consider a Stan model with data y and parameter mu, sigma

y ~ normal (mu, sigma);

For some reason, we have already fitted mu from a different module or from a different dataset. We have obtained $\mu_1, \dots, \mu_S$. The goal is to make inference on $p(\sigma \vert y, \mu_1, \dots, \mu_S)$.

To be clear, till now we have already lost full-Bayeisanity now since we do not fit a joint model. But hey, we are inclusive of non-bayeisan methods.

There are three seemingly reasonable approaches to do for the second stage model:

Multiple imputation. We run the model y ~ normal (mu[i], sigma); separately for each $i$ and collect draws $p(\sigma \vert y, \mu_i)$; we then mix these draws altogether. We run this method in MI, Cut.
Plugin estimate. When we do two stage least-square fit, we simply plugin the first stage point estimate, say the posterior mean. This amount to a new model $y \sim normal (\bar {\mu}, \sigma)$, where $\bar {\mu}= 1/S \sum_{i=1}^S \mu_i$.
Mixed log likelihood. At least seemingly doable, we may also mix the log density from these draws, which in Stan reads

for (i in 1:S)
 	 target += 1/S * normal_lpdf (y | mu[i], sigma);

In this model example, the mixed-log-likelihood-approach is identical to the plugin estimate, although generally all these three methods will differ. Using the conditional variance formula, we can see that the multiple imputation delivers that largest estimate of $\sigma$.

OK, I know that in most cases approach 1 is the only acceptable answer. The justification is straight from the Bayes rule:

\[p(\sigma \vert y) = \int p(\sigma \vert y, \mu) p(\mu \vert y ) d\mu.\]

My controversial objective is that the Bayes rule is only relevant is we are running a joint model and infer $\mu$ and $\sigma$ together. That is SMC. But in a situation like Cut, we are placing doubt on the model in the first place, and still keep the obsession over this bayes rule seems a little bit stubborn to me.

Approach 1 and approach 3 differ in how they mix the conditional sampling model $p(y \vert \sigma, \mu)$. Approach 1 is using a mixture (coherent with the joint model)

\[p(y \vert \sigma) := \int (p(y \vert \sigma, \mu) p(\mu \vert \sigma) ) d\mu,\]

while approach 3 is using log-linear-pooling (this line does not correspond to any joint model):

\[\log p(y \vert \sigma) := \int \log p(y \vert \sigma, \mu) p(\mu \vert y) d\mu + Constant.\]

I wonder if this approach 3 has any actual application. I do not know.