StanCon 2023

Posted by Yuling Yao on Jun 22, 2023. Tag: conference

Last month I was at WashU campus for the Bayesian-for-nuclear-physics workshop. Today I am in WashU again: This time it is for StanCon 2023. Here is some of the random takeaway I obtain from the conference:

John Kruschke discussed Barg: the Bayesian analysis reporting guidelines. It is a decision-theory-orientated framework of workflow, including guidance for data analysts, reviewers and method developers. For example, instead of looking at the confidence interval only, John proposes to look at the posterior coverage posterior probability of region of practical equivalence, an approach sharing a similar spirit to the traditional power calculation. Personally, I am not the largest fan of placing central importance on nested models in data analysis, but I sympathize with the ultimate fantasy to have a fully automated data analysis process.

Ben Goodrich presents his ongoing work on the numerical approximation of log likelihood. The goal is to evaluate the log density fewer times in sampling. Ben proposed a simple change of variable, instead of sampling theta, we sample x to be the inverse cdf (quantile) of theta in the prior, such that x has uniform [0,1] prior. Now instead of evaluating the log likelihood $p(x)$ directly, Ben would like to approximate the log likelihood function by some numerical approximation. To this end, Ben used Kolmogorov representation theorem. It seems there are still many gaps in practice, such as high dimensional quantile transformation and the smooth implementation of the Kolmogorov representation theorem. I guess an active learning approach that iteratively approximates the log density and sample therein would be interesting. I asked Ben why we need the Kolmogorov representation theorem in light of the seemingly more nueral network and normaling flow, Ben told me that it seems a nicer approach in noise-free functional approximation.

William Gillespie from Metrum presents their R package bbr.bayes. It facilitates Bayesian analysis for pharmacometrics using stan and nonmem, the famous PK analysis software. The overall workflow seems to resemble the usual bayesian workflow, and talk is quite a high level so I am curious what their secrete sauce is in actual PK modeling (for example, they use loo, but I could imagine there are easily non-iid/nested outcomes in PK modeling, which requires non-iid loo).

Edward Roualdes presents BridgeStan, an r and python wrapper that computes log density and its gradient, or the next-generation rstan. I have been following and using some parts of BridgeStan since last summer. In (my shallow) retrospect, the lack of the ability to easily return gradients of an arbitrary log density or even an arbitrary function, and the lack of exposure of stan sampler in high-level functions, had made stan miss the chance to monopolize the field of automatic differentiable programming language, while at the same time paved the way for its counterparts such as jax and tfp. BridgeStan is certainly overdue to the stan community and will be most welcome to methodology developers.

Jeff Soules introduces MCMC-Monitor: browser-based monitoring of stan. The kool part is that it can run a stan model on a server and allow users to monitor the samples on a different machine on the fly.

Arya Pourzanjani presents a clever way to model summary statistics. The observations are some tumor progressive data: the number of patients got 20% tumor size growth from published results. Arya used a clever stan modeling technique to infer the fine-grained tumor growth curve by converting the “20% tumor size growth” into linear-constrained parameter vectors. I asked Arya how is his method in relation to abc, and Arya told us abc would be more general and automated, while such smart hacking requires human engineering but is arguably more efficient. It reminds me of the difficulty of likelihood-free or simulation-based inference: likelihood-free is like the GPL-license, even if the majority part the model is likelihood-exact, as long as there is a tiny part of the observation is likelihood-free, then you need to run likelihood-free inference on all the data and model. How to incorporate likelihood-free and -exact inference jointly? I do not know.

Siddhartha Chib gives a talk on conditional moment models. The talk is based on his 2018 Jasa and 2021 Jrss-b paper. The basic idea is simple: how to do Bayesian linear regression $y=x\beta_1 +z\beta_2+ \epsilon$ in the existence of endogeneity—i.e., $\mathrm{E}(\epsilon x) \neq 0$, in which $z$ is the instrument variables such that $\mathrm{E}(\epsilon z) = 0$. In light the two cultures battle, Sib’s approach is certainly on the reduced-form-for-robustness side. The most reduced form in linear regression is probably to only set the first-moment condition: such that $\mathrm E \epsilon (x, z, 1) = 0$ means linear regression without endogeneity while $\mathrm E \epsilon (x, z, 1) = (b, 0)$ is a relaxed model allowing endogeneity. To make Bayesian inference without distributional assumptions, Sib further uses empirical likelihood to make Bayesian inference $\beta \mid y$—the idea is that given any $\beta$, you obtain its profile liklihood from an optimization procedure: $\max_w \prod w_i, s.t., \sum w_i \epsilon_i (x_i, z_i, 1) = 0$. Multiple this $\max_w \prod w_i \mid \beta$ with the prior of $\beta$, you obtain the posterior desnity evaluted at $\beta$. To test between the reduced model and the encompassing model using marginal liklihood, you obtain a valid endogeneity test. Set aside whether I may practically use it in linear regression, i am impressed by the mathematical beauty of the empirical likelihood method.

Will Landau talks about the file management in statistical modeling. He designed a cool pipeline tool targets in R that range the code and data file into a graph that reflects the dependence of the code. The graph representation enables automatic parallelization, file storage, and version control.

Wade Brorsen discusses optimal experimental design in agriculture. Experimental design is one of my favorite topic. The goal is to determine the optimal nitrogen amount in soil for crop yield. Arya reminds me that the model is similar to how pharmacometricians decides the optimal drug dosage in PKPD modeling.

Cameron Pfiffer introduces the model that describes how likely a reader will subscribed to the news site after paywall. To this end he models the probability that a user reads a news article, and the chance that of a subscription after paywall. It

Collin Cademartori presents “Two Challenges for Bayesian Model Expansion”—i think one of his main claim is that model expansion reduced identifiability, but it appears that his identifiability defincaiton is the mutual information between data and parameter. I do not know. I probably won’t be worried if the the correlation between data and parameter reduces from 0.9 to 0.8, and I probably won’t call that poor identifiability.

Nathaniel Haines discusses the loss rate modeling in insurance industry, which they are using in an investment firm to help investor to trade investment-grade securities. To incorporate various risk models, they have been using hierarchical staking in their pipeline, cool!

Finally, Bob wrapped the conference by pointing out a few ongoing promising directions in stan implementation, including generalized HMC, massive parallelization, and normalization flow VI.