- This event has passed.
October 4, 2019, 2:10 pm - 3:00 pm EDT
Title: Falsifiability, calibration and a bit of Reinforcement Learning
Abstract: Multi-armed bandits in the real world likely don’t follow any model theory considers. So, how can we tell if we are actually performing well when watching a system play a bandit problem? I’ll introduce an idea I’ll call “falsifiable bandits”. Such a bandit comes with a certificate of performance: showing this certificate is wrong is good enough to say the system is not solving the bandit problem well. On the other hand, any system which actually plays an optimal strategy should easily construct such a certificate.
To get to this result, we’ll start with traditional definitions of on-line regret. I’ll then talk about about betting as a way of showing a system is behaving poorly. Using a magical calibration variable these two ideas can be forced to agree. This allows a system to use an algorithm that provides some minimal protection from being falsified.
Biography: Dean received 3 of his 4 degrees from the University of Maryland in the 1980s. Up until a few years ago he had been in academia all his life. But he then left the ivory tower to join Amazon in NYC. His current research interests are mostly around machine learning and optimization. In this talk, he’ll touch on some of the work he did on individual sequences (in particular on-line least squares) and calibration.