- This event has passed.
2 May, 10:00 am - 4:00 pm EDT
The upcoming tutorial on Reinforcement Learning will start with a gentle introduction to the topic, leading up to the state-of-the-art as far as practical considerations and theoretical understanding. The tutorial will be online, is free and open to everyone, but requires a free registration. (Update: Registrations are now closed).
Speakers and Schedule:
- Shipra Agrawal (Columbia University), 10am-12pm
- Title: Thompson Sampling based Methods for Reinforcement Learning
- Slides: ColumbiaTutorialMay2.pdf
- Video link
Thompson Sampling is a surprisingly simple and flexible Bayesian heuristic for handling the exploration-exploitation tradeoff in sequential decision-making problems. While this basic algorithmic technique can be traced back to 1933, the last five years have seen unprecedented growth in the theoretical understanding as well as commercial interest in this method. This tutorial will provide a deep dive into the techniques involved in the design and analysis of Thompson sampling based algorithms. In the first half of the tutorial, I will illustrate the main ideas and proof techniques through the special case of the multi-armed bandit problem. In the second half, these techniques will be extended to prove worst-case regret bound for the general (tabular) reinforcement learning problem.
- Sham Kakade (University of Washington and Microsoft Research NYC), 1:30pm-3:30pm
- Title: The Mathematical Foundations of Policy Gradient Methods
- Slides: pg_tutorial.pdf
- Annotated slides: pg_tutorial_annotated-1.pdf, pg_tutorial_annotated-2.pdf
- Video links: Video 1, Video 2
Reinforcement learning is now the dominant paradigm for how an agent learns to interact with the world in order to achieve some long term objectives. Here, policy gradient methods are among the most effective methods in challenging reinforcement learning problems, due to that they: are applicable to any differentiable policy parameterization; admit easy extensions to function approximation; easily incorporate structured state and action spaces; are easy to implement in a simulation based, model-free manner. This tutorial will cover both the basic algorithms and derivations of the method, along with more advanced analysis related to convergence proofs and their approximation power.
Part I: Basic Concepts and Relations. We will cover the basic derivation of the method, along with: simulation based implementations; the natural gradient method and friends (Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO)); variance reduction with baselines; actor critic methods and compatible function approximation. Familiarity with the basics of Markov decision processes is helpful but not mandatory.
Part II: Masterclass. We will cover the convergence properties of these methods, showing how even though the underlying problem is nonconvex that these methods not only globally converge but do so quickly. We will also cover policy gradient methods approximation power when working with restricted parametric policy classes. Novel relations to transfer learning and distribution shift will be discussed, time permitting. Familiarity with the analysis of the gradient descent for convex functions (and, ideally, mirror descent) along with having seen the performance difference lemma will be helpful.