ebook img

Multi–armed Bandit Problems PDF

205 Pages·2012·1.96 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multi–armed Bandit Problems

Advanced Topics in Machine Learning Part III: Multi–armed Bandit Problems A. LAZARIC (INRIA-Lille) DEI, Politecnico di Milano SequeL–INRIALille April2-15,2012 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-2/104 A Motivating Example Statistical learning (cid:73) Collect training samples (cid:73) (introduce an implicit stochastic assumption on the model generating the data) (cid:73) Solve an optimization problem (e.g., ERM, least–squares, SVM, etc.) (cid:73) Deploy the solution (i.e., classifier, regressor) A.LAZARIC–Multi–armedBanditProblems April2-15,2012-3/104 A Motivating Example (cid:73) GoogleMaps (cid:73) Bing (cid:73) Via Michelin (cid:73) Yahoo! (cid:73) MapQuest A.LAZARIC–Multi–armedBanditProblems April2-15,2012-4/104 A Motivating Example Online learning (cid:73) Define a set of experts (cid:73) Learn from a stream of data (cid:73) Solve an optimization problem (e.g., find the optimal expert) (cid:73) Predict as you learn A.LAZARIC–Multi–armedBanditProblems April2-15,2012-5/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 A Motivating Example A.LAZARIC–Multi–armedBanditProblems April2-15,2012-6/104 Problem: each day we obtain a limited feedback: traveling time of the chosen route Results: if we do not repeatedly try different options we cannot learn. Solution: trade off between optimization and learning. A Motivating Example Question: which route should we take? A.LAZARIC–Multi–armedBanditProblems April2-15,2012-7/104 Results: if we do not repeatedly try different options we cannot learn. Solution: trade off between optimization and learning. A Motivating Example Question: which route should we take? Problem: each day we obtain a limited feedback: traveling time of the chosen route A.LAZARIC–Multi–armedBanditProblems April2-15,2012-7/104

Description:
April 2-15, 2012. Advanced Topics in Machine Learning. Part III: Multi–armed Bandit Problems. A. LAZARIC (INRIA-Lille). DEI, Politecnico di Milano.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.