Abstract: A survey of machine learning problems involving exploration-exploitation trade-off is presented. Theoretical and practical properties of existing algorithms for online learning tasks including K-armed bandit problem, apple-tasting and reinforcement learning are discussed. Several open problems in this area are described and their importance is emphasized.
Key words: Machine learning, K-armed bandit problem, reinforcement learning, on-line learning.