A reduction of imitation learning and structured prediction to no-regret online learning
MLRG, Nov 2019
Sequential transfer in multi-armed bandit with finite set of models
MLRG, July 2019
Exploiting easy data in online optimization
MLRG, Mar 2019
Regret bound for the stochastic multi-armed bandit problem
MLRG, Nov 2018
Others
A Farewell to Arms: Budgeted bandits with giving up option
13 Feb 2020, ECS 468, UVic
Coordinate descent with bandit sampling
adversarial attack on stochastic bandits
NeurIPS'18 debriefing, Dec 2018
Improving Stack Exchange for Overall Usability
ECS 106, Dec 2019
Recent activities:
A Learning agent with the ability to wait or terminate a process for maximizing reward. - with theoretical regret bound guarantees! - with empirical performance comparison