Statistics intuitions for intergals

Posted by Yuling Yao on Mar 29, 2022.       Tag: computing  

I have not done any math for a long while. Today I happen to need to compute an integral

\[S(k, \sigma) =\int_{0}^\infty x\log (x)/\sigma (1+kx/\sigma)^{(-1/k-1)} dx.\]

It is the expectation of $x\log x$ under generalized Pareto distribution. Surely it will be finite as long as $k <1$.

I tried for a while then I was very sure I cannot solve it. So I opened some symbolic integral tool and the result turned out easy

\[S(k, \sigma)= \frac{\sigma \left( 1-\mathrm{HarmonicNumber}[-2+\frac{1}{k}] - \log(\frac{k}{\sigma}) \right) }{1-k}.\]

Except that I do not understand what HarmonicNumber is. I think it is some special function, so I looked it up. Wikipedia told me that

In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers:

\[H_{n}=1+{\frac {1}{2}}+{\frac {1}{3}}+\cdots +{\frac {1}{n}}=\sum _{k=1}^{n}{\frac {1}{k}}.\]

Except it is not helpful to me cuz apparently i have non-integer $n= -2+\frac{1}{k}$ here. I studied complex analysis in college but I have never used it ever since. But that is ok, I trust my symbolic integral tool.

Indeed I only want to evaluate this integral near 1. Because the mean and variance of generalized Pareto distribution is of the order $O((1-k)^{-1})$ and $O((1-k)^{-2}(1-2k))$ respectively, my best conjecture is that this $S$ should be $O((1-k)^{-m}), 1\leq m \leq 2$ as k is close to 1.

So I searched one more minute I found that $H_{x}= \frac{\Gamma^\prime(x+1)}{\Gamma(x+1)}+\gamma$, in which $\gamma$ is the Euler constant and $\Gamma$ is the Gamma function.

The appearance of Euler constant and Gamma function in applied statistics is like a six pleat shirring on a shirt: fancy to the wearer but seldom useful to the audience.

It appeared that I needed the derivative of the Gamma function near 0. But I found that $\frac{\Gamma^\prime(x)}{\Gamma(x)}$ is itself called digamma function $\psi(x)$. OK, I am not proud for being ignorant here, but it is still fun to learn. So I looked up Wikipedia again and I found $\psi(x)\approx \log x - 1/2x$. I plugged this approximation into my expression and is it not the same as my conjecture. Ohh, of course, the $\psi(x)\approx \log x - 1/2x$ approximation is only applicable if x is large. For small $x\approx 0$, I found that $\psi(x) \approx -1/x - \gamma$. I plugged this in and then $\gamma$ cancelled out. So the final answer is that $S(k, \sigma) = \sigma k / (1-k)^2 $ + small order terms as k goes to 1. Done.

But this is not why I wrote this post. The point is that sometimes statistics intuitions can help to do tedious math. To be clear this math problem is only tedious to me cuz I am ignorant on digamma function or gamma function. I am sure the previous problem is trivial to Euler. That said, I have already used stats intuition once that I know the order must be between -1 and -2 because $x\log x$ is bounded between the first and second moments.

Indeed, a more statistically intuitive solution here is that I can simply replace the generalized Pareto distribution by a Pareto distribution. This time I can do it by hand:

\[S(k,1)\approx \int_{1}^{\infty} r \log r \frac{1}{k} r^{-1/k-1} dr= k(1-k)^{-2}.\]

This expression is different but has the same order I obtained using the digamma function when k is close 1, which can be used for many crude approximaitons. Again, it would be nice if i have known more about gamma fucntion, but solving a tedious math by some simple statitics approximation is equally fun.