- This event has passed.

## December 6, 2019, 8:30 am - 5:00 pm EST

**Schedule:**

8:30am – 9am: Welcome and sign-in

9am – 10:30am: **Tamara Broderick (Slides)**

Coffee break

11am – 12:30pm: **Max Raginsky (Slides)**

Lunch break

2:30pm – 4pm: **Dave Blei (Slides)**

Coffee break

4:30pm – 5pm: **Panel discussion: **Tamara Broderick, Dave Blei, Max Raginsky, John Paisley (moderator)

**Speaker: Tamara Broderick (9am – 10:30am) (Slides)**

**Title: Variational Bayes and beyond: Foundations of Scalable Bayesian Inference**

Abstract: Bayesian methods exhibit a number of desirable properties for modern data analysis—including (1) coherent quantification of uncertainty, (2) a modular modeling framework able to capture complex phenomena, (3) the ability to incorporate prior information from an expert source, and (4) interpretability. In practice, though, Bayesian inference necessitates approximation of a high-dimensional integral, and some traditional algorithms for this purpose can be slow—notably at data scales of current interest. The tutorial will cover the foundations of some modern tools for fast, approximate Bayesian inference at scale. One increasingly popular framework is provided by “variational Bayes” (VB), which formulates Bayesian inference as an optimization problem. We will examine key benefits and pitfalls of using VB in practice, with a focus on the widespread “mean-field variational Bayes” (MFVB) subtype. We will highlight properties that anyone working with VB, from the data analyst to the theoretician, should be aware of. And we will discuss a number of open challenges.

**Speaker: Max Raginsky (11am – 12:30pm) (Slides)**

**Title: Stochastic Calculus in Machine Learning: Optimization, Sampling, Simulation**

Abstract: A great deal of recent research activity has focused on using continuous-time processes to analyze discrete-time algorithms and models. In particular, diffusion processes have been examined as a way towards a better understanding of first-order optimization methods, as they afford an analysis of behavior over non-convex landscapes using a rich array of techniques from the statistical physics literature. Gradient flows and diffusions have also found a role in the analysis of deep neural networks, where they are interpreted as describing the limiting case of infinitely many layers, each in effect infinitesimally thin.

In this tutorial, I will give an informal treatment of some of the recent applications of stochastic calculus of K. Ito to some problems at the intersection of optimization and machine learning. Specifically, I will cover the following topics:

I) Optimization — I will discuss non-convex learning using continuous-time Stochastic Gradient Langevin Dynamics (SGLD). I will first show that, under reasonable regularity assumptions on the objective function, SGLD finds an approximate global minimizer of the population risk in finite time (which, generally, be exponential in the problem dimension), and then discuss the metastability phenomenon of the Langevin dynamics at “intermediate” time scales. Here, by metastability I mean that, with high probability, the trajectory of the Langevin diffusion will either spend an arbitrarily long time in a small neighborhood of some local minimum or will quickly escape that neighborhood within a short recurrence time.

II) Sampling and simulation — I will show that diffusion processes with drift given by a sufficiently deep feedforward neural net provide a flexible and expressive class of probabilistic generative models. I will first show that sampling in such generative models can be phrased as a stochastic control problem (revisiting the classic results of Föllmer and Dai Pra) and then build on this formulation to quantify the expressive power of these models. Specifically, I will prove that one can efficiently sample from a wide class of terminal target distributions by choosing the drift of the latent diffusion from the class of multilayer feedforward neural nets, with the accuracy of sampling measured by the Kullback-Leibler divergence to the target distribution.

**Speaker: Dave Blei (2:30pm – 4pm) (Slides)**

**Title: Scaling and Generalizing Approximate Bayesian Inference**

Abstract: A core problem in statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss innovations in variational inference (VI), a method a that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics. It tends to be faster than more traditional methods, such as Markov chain Monte Carlo sampling.

After quickly reviewing the basics, I will discuss our recent research on VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive data sets, and demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a generic algorithm for approximating the posterior. Black box inference easily applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep exponential families—a method for Bayesian deep learning—and describe how it enables powerful tools for probabilistic programming.