Advanced Topics in Machine Learning Part III: Multi–armed Bandit Problems A. LAZARIC (INRIA-Lille) DEI, Politecnico di Milano SequeL–INRIALille April2-15,2012 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-2/104 A Motivating Example Statistical learning (cid:73) Collect training samples (cid:73) (introduce an implicit stochastic assumption on the model generating the data) (cid:73) Solve an optimization problem (e.g., ERM, least–squares, SVM, etc.) (cid:73) Deploy the solution (i.e., classifier, regressor) A.LAZARIC–Multi–armedBanditProblems April2-15,2012-3/104 A Motivating Example (cid:73) GoogleMaps (cid:73) Bing (cid:73) Via Michelin (cid:73) Yahoo! (cid:73) MapQuest A.LAZARIC–Multi–armedBanditProblems April2-15,2012-4/104 A Motivating Example Online learning (cid:73) Define a set of experts (cid:73) Learn from a stream of data (cid:73) Solve an optimization problem (e.g., find the optimal expert) (cid:73) Predict as you learn A.LAZARIC–Multi–armedBanditProblems April2-15,2012-5/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 Problem: each day we obtain a limited feedback: traveling time of the chosen route Results: if we do not repeatedly try different options we cannot learn. Solution: trade off between optimization and learning. A Motivating Example Question: which route should we take? A.LAZARIC–Multi–armedBanditProblems April2-15,2012-7/104 Results: if we do not repeatedly try different options we cannot learn. Solution: trade off between optimization and learning. A Motivating Example Question: which route should we take? Problem: each day we obtain a limited feedback: traveling time of the chosen route A.LAZARIC–Multi–armedBanditProblems April2-15,2012-7/104
Description: