Does MAP estimate always overfit?
Posted by Yuling Yao on Oct 21, 2020.This is wrong. Indeed it can be opposite.
Well, just to be clear, here I am talking about a specific situation: We have a model and a dataset, such that we can write down the joint posterior density. To estimate parameters, we can use
- posterior simulation draws from the joint posterior densities, for which we run MCMC.
- MAP estimate: the joint mode of the density, for which we run optimization such as sgd.
In any hierarchical model with the form: \(y_{ik}\sim \mathrm{N}(\mu_k, \sigma); ~~ \mu_k \sim \mathrm{N}(\mu_0, \tau); \tau \sim 1.\)
The joint mode is achieved at the complete pooling subspace: $\tau=0, \mu_k = \mu_0$. And, this complete pooling model is always simpler than the hierarchical model in terms of training-testing error gap. If anything, the MAP underfits.
Of course I believe there are other empales in which MAP does ovefit. The relation is just not definite. Is there any general characterizations? I don’t know.