Does MAP estimate always overfit?

Posted by Yuling Yao on Oct 21, 2020.       Tag: modeling  

This is wrong. Indeed it can be opposite.

Well, just to be clear, here I am talking about a specific situation: We have a model and a dataset, such that we can write down the joint posterior density. To estimate parameters, we can use

  1. posterior simulation draws from the joint posterior densities, for which we run MCMC.
  2. MAP estimate: the joint mode of the density, for which we run optimization such as sgd.

In any hierarchical model with the form: \(y_{ik}\sim \mathrm{N}(\mu_k, \sigma); ~~ \mu_k \sim \mathrm{N}(\mu_0, \tau); \tau \sim 1.\)

The joint mode is achieved at the complete pooling subspace: $\tau=0, \mu_k = \mu_0$. And, this complete pooling model is always simpler than the hierarchical model in terms of training-testing error gap. If anything, the MAP underfits.

Of course I believe there are other empales in which MAP does ovefit. The relation is just not definite. Is there any general characterizations? I don’t know.