My research focuses on sequential decision-making in feedback loops (i.e., the multi-armed bandit problem).
I also work on online optimization and machine learning applied to psychology.

E-mail: (kwangsung.jun å† gmail),

Multi-Armed Bandit

Multi-armed bandit is a state-less version of the reinforcement learning (RL).
However, bandits usually enjoy stronger theoretical guarantees and have abundant real-world applications.
Informally speaking, bandits learn to make better decisions over time in a feedback-loop.
The decisions necessarily affect the feedback information, and the feedback data collected so far is no longer i.i.d.; most traditional learning guarantees do not apply.

Bandits are actively being studied in both theory and applications including deployable web service.
Also, the cartoon caption contest of New Yorker is currently using a multi-armed bandit algorithm to efficiently crowdsource caption evaluations (read this article)!