Vol. 14, No. 1, April 2001, 67--90

EXPLORATION-EXPLOITATION TRADE-OFF IN MACHINE LEARNING

Dragoljub Pokrajac, Aleksandar Lazarević, and Zoran Obradović

Abstract: A survey of machine learning problems involving exploration-exploitation trade-off is presented. Theoretical and practical properties of existing algorithms for online learning tasks including K-armed bandit problem, apple-tasting and reinforcement learning are discussed. Several open problems in this area are described and their importance is emphasized.

Key words: Machine learning, K-armed bandit problem, reinforcement learning, on-line learning.