## About

I am a Flatiron Research Fellow, at Flatiron Institute, Center for Computational Mathematics. My general research interest lies in Bayesian computation, Bayesian modeling, machine learning, and causal inference.

Before Flatiron, I earned my Ph.D. in Statistics from Columbia University in 2021 under the supervision of Andrew Gelman. Before that, I obtained my undergraduate education from Tsinghua University in Mathematics and in Economics.

## About my research

• My ultimate goal is to develop a scalable Bayesian workflow for open-ended real data problems. For example, some recent applications included lead fallout in Paris, arsenic diffusion in groundwater, and Covid-19 mortality in Bangladesh.

• But to do better applied statistics needs better methodology development. To that end, I investigate statistical and machine learning methods, with a focus on model evaluation and aggregation, meta-learning and causal inference. Some ongoing progresses are on cross-validation, stacking and hierarchical stacking, and covariate imbalance.

• But to facilitate complex methods further needs scalable and diagnosable computing. Hence, I develop algorithms and theories for fully Bayesian and approximate computations. Recently at Flatiron, I am interested in combining Monte Carlo methods with sophisticated numerical tricks or quadrature, from which the applications include importance sampling, simulated tempering and annealing, dropout, and multimodal MCMC sampling.

yyao@flatironinstitute.org

646-908-6510

Twitter

Blog

Blog 2

Résumé

Google scholar

Github

Flatiron Institute, CCM

162 5th Ave

New York, NY 10010

## Publications

### Bayesian methodology

**Make cross-validation Bayes again**. *NeurIPS workshop*.

__Yuling Yao__, Aki Vehtari. (2021).

[Poster]

**Toward a scalable Bayesian workflow**. *PhD Thesis*.

__Yuling Yao__. (2021).

A scalable Bayesian workflow needs the combination of fast but reliable computing, efficient but targeted model evaluation, and extensive but directed model building and expansion.

**Bayesian hierarchical stacking: Some models are (somewhere) useful**. *Bayesian Analysis*. __Yuling Yao__, Gregor Pirš, Aki Vehtari, Andrew Gelman. (2021).

[Code] [Blog]
[Talk]

With the input-varying yet partially-pooled model weights, hierarchical stacking improves average and conditional predictions. Our Bayesian formulation includes constant-weight (complete-pooling) stacking as a special case.

**Adaptive path sampling in metastable posterior distributions**. *under review*.

__Yuling Yao__, Collin Cademartori, Aki Vehtari, Andrew Gelman. (2020).

[Package] [Blog]

From importance sampling to adaptive importance sampling to path sampling to adaptive path sampling, and from Rao–Blackwell to Wang-Landau to Jarzynski-Crook: all about free energy and simulated tempering

**Stacking for non-mixing Bayesian computations: The curse and blessing of
multimodal posteriors**. *under review*. __Yuling Yao__, Aki Vehtari, Andrew Gelman. (2020).

[Code] [Blog]
[Talk]

The result from multi-chain stacking is not necessarily equivalent, even asymptotically, to fully Bayesian inference, but it serves many of the same goals. Under misspecified models, stacking can give better predictive performance than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse.

**Holes in Bayesian statistics**. *Journal of Physics G.*

Andrew Gelman, __Yuling Yao__. (2020).

This does not mean that we think Bayesian inference is a bad idea, but it does mean that there is a tension between Bayesian logic and Bayesian workflow which we believe can only be resolved by considering Bayesian logic as a tool, a way of revealing inevitable misfits and incoherences in our model assumptions, rather than as an end in itself.

**Bayesian aggregation**. *Wiley StatsRef: Statistics Reference Online. *

__Yuling Yao__. (2020).

**Pareto smoothed importance sampling**. *under review*

Aki Vehtari, Daniel Simpson, Andrew Gelman, __Yuling Yao__, Jonah Gabry. (2019+).

How to run importance sampling with effieiciency and reassurance

**Limitations of "Limitations of Bayesian leave-one-out cross-validation for model selection"**. *Computational Brain & Behavior*.

Aki Vehtari, Daniel Simpson, __Yuling Yao__, Andrew Gelman. (2018).

** Yes, but did it work?: Evaluating variational inference**. *International Conference on Machine Learning*.

__Yuling Yao__, Aki Vehtari, Daniel Simpson, Andrew Gelman. (2018).

[Blog]
[Code]
[Talk]

** Using stacking to average Bayesian predictive distributions (with discussions and rejoinder)**. *Bayesian Analysis.*

__Yuling Yao__, Aki Vehtari, Daniel Simpson, Andrew Gelman. (2018).

[Code] [R package ]

### Applied statistics

**Assessment of excess mortality and household income in rural Bangladesh during the Covid-19 pandemic in 2020**. *JAMA Network Open*.

Prabhat Barnwal*, __Yuling Yao*__ (equal contribution), Yiqian Wang, Nishat Akter Juy, Shabib Raihan, Mohammad Ashraful Haque, Alexander van Geen. (2021).

[Media coverage]

**
Economic growth and happiness in China: A Bayesian multilevel age-period-cohort analysis based on the CGSS data 2005-2015**. *International Review of Economics and Finance*.

Yu-Sung Su, Donald Lien, __Yuling Yao__. (2021).

**Making the most of imprecise measurements: Changing patterns of arsenic
concentrations in shallow wells of Bangladesh from laboratory and field data**. *preprint*.

__Yuling Yao__, Rajib Mozumder, Benjamin Bostick, Brian Mailloux, Charles Harvey, Andrew Gelman, Alexander van Geen. (2021).

Imprecise but widely-accessible field kit tests in companion with flexible statistical modeling that facilitates this open-ended data gathering can provide a balance between total cost and accuracy in many areas of geoscience research and policy.

**Bayesian workflow**. *preprint*.

Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles Margossian, Bob Carpenter, __Yuling Yao__, Paul-Christian Bürkner, Lauren Kennedy, Jonah Gabry, Martin Modrák. (2020).

Theoretical statistics indeed is the theory of applied statistics.

** Fallout of lead over Paris from the 2019 Notre-Dame Cathedral fire**. *Geohealth*.

Alexander van Geen, __Yuling Yao__, Tyler Ellis, Andrew Gelman. (2020).

[Code]
[Media coverage (Le Monde)]
[Media coverage 2]

How much lead was there after the fire?

**Ensemble model patching: A parameter-efficient vriational Bayesian neural network**.

Oscar Chang, __Yuling Yao__, David Williams-King, Hod Lipson. (2019).

running BNN on ImageNet: more expressive than MC-Dropout, more affordable than meanfield VI

**A Bayesian bird's eye view of ‘Replications of important results in social psychology’**. *Royal Society Open Science*.

Maarten Marsman, Felix D Schönbrodt, Richard D Morey, __Yuling Yao__, Andrew Gelman, Eric-Jan Wagenmakers. (2016).

## Software

I am among the developer teams of the following softwares:

### LOO

An R package for efficient approximate leave-one-out cross-validation (LOO) using Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights.

### Stan

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

© Yuling Yao