The second coming of Cauchy

Posted by Yuling Yao on Nov 29, 2019.       Tag: modeling  

A good Cauchy

I am preparing a paper, where I am using an old example (which I thought it it me that first came up with a few years ago in an email communication with Andrew). I have posted it before. It starts with a Cauchy likelihood

\[y\sim Cauchy(\theta,1)\]

and assuming half of the true data is generated from Cauchy(-10,1) and the remaining half from Cauchy(10,1), or in other words the true DG possesses

\[\theta \sim 1/2 (\delta(-10)+ \delta(10)),\]

the Bayesian posterior will be bimodal, while only one of the mode will predominates. But that is fine: If the DG is bimodal, how can the posterior not reflect the bimodality?

And we have some other way to adjust it so as to “recover the true DG from the wrong model and wrong inference”, which however is irrelevant to today’s blog.

A bad Cauchy

What I found today is that an essentially similar toy example was introduced by Persi Diaconis and David Freedman in their famous 1986 paper as a counterexample. It states if we have a location parameter $\theta$ and we observe

\[X= \theta+ \epsilon.\]

With a prior $\theta \sim N(0,1)$ and the error distribution $\epsilon\sim DP(MC)$ where DP is a dirichlet process, M is scaler constant, and C is the base measure: a standard cauchy.

In this example, if the true DG of $\epsilon$ is two points: \(\epsilon = 1/2 (\delta(-a)+ \delta(a)).\)

Then the Bayesian posterior of $\theta$ is asymptotically only supported by $\theta_0\pm \sqrt(a^2-1)$– different from the true value $\theta_0$.

Alternatively, if $\epsilon$ follows a normal likelihood in the base measure in the dirichlet process, $\theta$ does converges to the true value $\theta_0$.

example or counterexample?

The funny thing is that we are using this example in the opposite way: Diaconis and Freedman call the Cauchy one a counterexample for the inference fails to converge to the true value. When I construct this model, the normal behavior is rather a pitfall: effectively we are using a one-component normal to approximate a two-component normal mixture, how can it be anything more wrong than the posterior density concentrated at the middle point?

This is the inevitable pluralist’s dilemma.