Book review "Discrete Distribution"

Posted by Yuling Yao on Nov 21, 2020. Tag: book

Today I was reading the book “Discrete Distribution” by Johnson and Kotz. I did not realize it has a newer version until I started this blog post—-the edition I read was published in 1969 by Willey.

Nevertheless, it is almost exotic by noticing how big shift the study focus has shifted in half a century. As a pre-computer era publication, this book spent a lot of space on various approximation tricks. For example, in a binomial distribution $X\sim Bin( \theta, N)$, a transformation

\[\mathrm{arcsin}(\frac{X+3/8}{N+3/4})\]

leads to an approximation normal(sin$^{-1}(\sqrt \theta, 1/2\sqrt{N})$). But that is not the end of the story, cuz the author spent the next page showing a better transformation

\[\frac{1}{2}(\mathrm{arcsin \sqrt{\frac{X}{N+1}}}+\sqrt{\frac{X+1}{N+1}})\]

is an even better normal approximation with some decimal improvement.

There are many tricks like this in the book, toward which I have a conflicted feeling. On one hand they all look cute and stant for an extensive intellectual effort. On the other hand, they are really useless nowadays as an estimation technique. We no longer need this normal approximation for the ability to easily generate any posterior simulation for free.

Another exotic impression of this book for me is its dedication to a comprehensive review on compound distributions—and again they all have cute names: Polya-Eggenberger (Bin + Beta), Neyman Type-A (poisson + poisson), etc etc. Again, these cute names are not really that useful in modern modeling. For example, when we worry about the heterogeneity of the model, we can still use a restricted likelihood (e.g., $y_i\sim$ poisson$(\theta_i)$), and hierarchically model the subject-varying parameter $\theta_i\sim$ foo() without having to worrying about what the compound likelihood is. Of course in some cases, the compound likelihood has closed form density (e.g., Negative Binomial) and using this closed form density eliminates local parameter $\theta_i$ and hence improves the computation efficiency.