BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Events on Statistical Machine Learning - ECPv4.9.13//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Events on Statistical Machine Learning
X-ORIGINAL-URL:https://statisticalml.stat.columbia.edu
X-WR-CALDESC:Events for Events on Statistical Machine Learning
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20200308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20201101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20200502T100000
DTEND;TZID=America/New_York:20200502T160000
DTSTAMP:20241102T080252
CREATED:20191126T035140Z
LAST-MODIFIED:20230306T004722Z
UID:175-1588413600-1588435200@statisticalml.stat.columbia.edu
SUMMARY:Tutorials on Reinforcement Learning
DESCRIPTION:The upcoming tutorial on Reinforcement Learning will start with a gentle introduction to the topic\, leading up to the state-of-the-art as far as practical considerations and theoretical understanding. The tutorial will be online\, is free and open to everyone\, but requires a free registration. (Update: Registrations are now closed). \n \n\nSpeakers and Schedule: \n\nShipra Agrawal (Columbia University)\, 10am-12pm\n\nTitle: Thompson Sampling based Methods for Reinforcement Learning\nSlides: ColumbiaTutorialMay2.pdf\nVideo link\nAbstract:Thompson Sampling is a surprisingly simple and flexible Bayesian heuristic for handling the exploration-exploitation tradeoff in sequential decision-making problems. While this basic algorithmic technique can be traced back to 1933\, the last five years have seen unprecedented growth in the theoretical understanding as well as commercial interest in this method. This tutorial will provide a deep dive into the techniques involved in the design and analysis of Thompson sampling based algorithms. In the first half of the tutorial\, I will illustrate the main ideas and proof techniques through the special case of the multi-armed bandit problem. In the second half\, these techniques will be extended to prove worst-case regret bound for the general (tabular) reinforcement learning problem.\n\n\n\nSham Kakade (University of Washington and Microsoft Research NYC)\, 1:30pm-3:30pm\n\nTitle: The Mathematical Foundations of Policy Gradient Methods\nSlides: pg_tutorial.pdf\nAnnotated slides: pg_tutorial_annotated-1.pdf\, pg_tutorial_annotated-2.pdf\nVideo links: Video 1\, Video 2\nAbstract:Reinforcement learning is now the dominant paradigm for how an agent learns to interact with the world in order to achieve some long term objectives. Here\, policy gradient methods are among the most effective methods in challenging reinforcement learning problems\, due to that they: are applicable to any differentiable policy parameterization; admit easy extensions to function approximation; easily incorporate structured state and action spaces; are easy to implement in a simulation based\, model-free manner. This tutorial will cover both the basic algorithms and derivations of the method\, along with more advanced analysis related to convergence proofs and their approximation power.\nPart I: Basic Concepts and Relations. We will cover the basic derivation of the method\, along with: simulation based implementations; the natural gradient method and friends (Trust Region Policy Optimization (TRPO)\, Proximal Policy Optimization (PPO)); variance reduction with baselines; actor critic methods and compatible function approximation. Familiarity with the basics of Markov decision processes is helpful but not mandatory. \nPart II: Masterclass. We will cover the convergence properties of these methods\, showing how even though the underlying problem is nonconvex that these methods not only globally converge but do so quickly. We will also cover policy gradient methods approximation power when working with restricted parametric policy classes. Novel relations to transfer learning and distribution shift will be discussed\, time permitting. Familiarity with the analysis of the gradient descent for convex functions (and\, ideally\, mirror descent) along with having seen the performance difference lemma will be helpful.\n\n\n\n
URL:https://statisticalml.stat.columbia.edu/event/tutorials-on-reinforcement-learning/
LOCATION:Zoom Webinar
END:VEVENT
END:VCALENDAR