ebook img

One Weird Trick'' for Advertising Outcomes: An Exploration of the Multi-Armed Bandit for ... PDF

108 Pages·2015·0.76 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview One Weird Trick'' for Advertising Outcomes: An Exploration of the Multi-Armed Bandit for ...

“One Weird Trick” for Advertising Outcomes: An Exploration of the Multi-Armed Bandit for Performance-Driven Marketing by Giuseppe Antonio Burtini B.A., University of British Columbia, 2013 a thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the college of graduate studies (Interdisciplinary Studies) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) October 2015 (cid:13)c Giuseppe Antonio Burtini, 2015 Abstract In this work, we explore an online reinforcement learning problem called the multi-armed bandit for application to improving outcomes in a web marketing context. Specifically, we aim to produce tools for the efficient experiment design of variations of a website with the goal of increasing some desired behavior such as purchases. We provide a detailed reference, with a statistical lens, of the existing research in variants and associated policies known for the problem, then produce a set of theoretical and empirical analyses of specific application area questions. Concretely, we provide a number of contributions: First, we present a new standardized simulation platform integrating knowledge and techniques from the existing literature for the evaluation of bandit algorithms in a set of pre-defined worlds. To the best of our knowledge, this is the first comprehensive simulation platform for multi-armed bandits capable of arbitrary arms, parameterization, algorithms and repeatable experimentation. Second, we integrate Thompson sampling into linear model techniques and explore a number of implementation questions, finding both that replication of Thompson sampling and adjusting for estimative uncertainty is a plausible mechanism for improving the results. Third, we explore novel techniques for dealing with certain types of structural non-stationarity such as drift and find that the technique of weighted least squares is a strong tool for handling both known and unknown drift. Empirically, in the unspecified case, an exponential decaying weight provides good performance in a large variety of cases; in the specified case, an experimenter can select a weighting strategy to reflect their known drift achieving state-of-the-art results. Fourth, we present the first known oracle-free measure of regret called statistical regret, which utilizes intuitions from the confidence interval to produce a type of interval metric by replaying late-experiment knowledge over prior actions to determine how performant an experimenter can believe their results to be. Fifth, we present preliminary results on a specification-robust and computationally efficient sampling technique called the Simple Optimistic Sampler which shows promising outcomes via a technique which requires no modelling assumptions to implement. ii Preface This thesis is the original and independent work of the author, Giuseppe A. Burtini. The research was identified, designed, performed and analyzed by the author. Sections 3.2 (Linear Model Thompson Sampling: LinTS) and 3.4 (Nonstationary Time Series Techniques) draw heavily from the published work Burtini et al. [36] (2015a), where Drs. Jason Loeppky and Ramon Lawrence provided an advisory role. A variant of the work which appears in chapter 2, in which Drs. Jason Loeppky and Ramon Lawrence provided an advisory and editorial role, has been submitted to Statistics Surveys Burtini et al. [37] and published on the preprint archive arXiv.org. The work which appears in sections 3.3, 3.5 and 3.6 is intended to be submitted for external publication either in whole or in part at a future date. All other work is unpublished as of this date. Thetitle“OneWeirdTrick”forAdvertisingOutcomesreferstoastyleofadvertisingpopularized in 2013 after the acknowledgment of some influential experimental results in consumer psychology – highlighting just how fundamental, and even formulaic, the scientific approach of advertising has become. The language of “One Weird Trick” itself has become memetic in online advertising and even in a minority of academic work [97]. This work discusses an approach to performance-driven experimentation appropriate for scientific advertising. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Specific Interest Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Adaptive Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Portfolio Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Natural Resource Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Research and Development Investment . . . . . . . . . . . . . . . . . . . . . . 7 Employee Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 General Real World Explore/Exploit Tradeoffs . . . . . . . . . . . . . . . . . 8 1.3 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 A Simulation Platform for Multi-Armed Bandits . . . . . . . . . . . . . . . . 8 1.3.2 LinTS: The Regression Thompson Sampler . . . . . . . . . . . . . . . . . . . 9 1.3.3 Experiments in Thompson Sampling . . . . . . . . . . . . . . . . . . . . . . . 9 iv 1.3.4 Time-series Techniques for Non-Stationary Bandits . . . . . . . . . . . . . . . 10 1.3.5 Statistical Regret for Applied Bandit Models . . . . . . . . . . . . . . . . . . 10 1.3.6 Simple Efficient Sampling for Optimistic Surrogate Models . . . . . . . . . . 11 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 The Stochastic Multi-Armed Bandit Model . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 A Stylized Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Measures of Regret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Variance and Bounds of Regret . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Higher Moments and Risk Measures of Regret . . . . . . . . . . . . . . . . . 21 Feedback Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Problem Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Stationarity of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Change-Point Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Ethical and Practical Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 23 Practical Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.3 Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Studied Problem Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Traditional K-Armed Stochastic Bandit . . . . . . . . . . . . . . . . . . . . . 25 (cid:15)-greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Constant (cid:15) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Adaptive and (cid:15)-Decreasing . . . . . . . . . . . . . . . . . . . . . . . . . 25 (cid:15)-first . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Multiple Epoch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 UCB1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 UCB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 UCB-Tuned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 MOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Bayes-UCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 KL-UCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 POKER and price of knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.2 K-Armed vs. Infinite-Armed Bandits . . . . . . . . . . . . . . . . . . . . . . . 32 Bandit Algorithm for Smooth Trees (BAST) . . . . . . . . . . . . . . . . . . 33 Hierarchical Optimistic Optimization (HOO) . . . . . . . . . . . . . . . . . . 34 2.2.3 Adversarial Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 v Hedge and Exp3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Exp4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Exp4.P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Stochastic and Adversarial Optimal (SAO) . . . . . . . . . . . . . . . . . . . 38 2.2.4 Contextual Bandits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 LinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 CoFineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Banditron and NeuralBandit . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2.5 Non-stationary Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Discounted UCB(-T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Sliding-Window UCB(-T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Adapt-EvE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Kalman Filtered Bandit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.6 Probability Matching and Thompson Sampling . . . . . . . . . . . . . . . . . 47 Optimism in Probability Matching . . . . . . . . . . . . . . . . . . . . . . . . 49 The Bernoulli Approach to Nonparametric Thompson Sampling . . . . . . . 49 Bootstrap Thompson Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Change-Point Thompson Sampling (CTS) . . . . . . . . . . . . . . . . . . . . 51 2.3 Application Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 3: Towards the Use of Multi-Armed Bandits in Advertisement Testing 54 3.1 An Extensible Platform for Simulating Bandit Problems . . . . . . . . . . . . . . . . 54 3.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Arms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Linear Model Thompson Sampling: LinTS . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.1 Optimistic Thompson Sampling in LinTS . . . . . . . . . . . . . . . . . . . . 57 3.3 Experiments in Thompson Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Measure of Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2 Estimative Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.3 Sampling Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4 Non-Stationary Time Series Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 A Short Review of Stochastic Drift . . . . . . . . . . . . . . . . . . . . . . . . 64 Generalized Linear Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.2 Overview of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 vi Autoregression and Detrending . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Penalized Weighted Least Squares . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Statistical Regret for Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5.1 Traditional Parametric Statistical Regret . . . . . . . . . . . . . . . . . . . . 70 3.6 Simple Efficient Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.6.1 Simple Efficient Symmetric Sampling (SESS) . . . . . . . . . . . . . . . . . . 72 3.6.2 Efficient Non-Symmetric Sampling (ENSS) . . . . . . . . . . . . . . . . . . . 72 3.6.3 A Short Background in Nonparametric Sampling . . . . . . . . . . . . . . . . 73 Bootstrap Thompson Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 73 The Eckles and Kaptein (2014) Model . . . . . . . . . . . . . . . . . . . . . . 73 3.6.4 A Simple Efficient Nonparametric Sampler . . . . . . . . . . . . . . . . . . . 73 Simple Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Introducing Optimism to the Simple Sampler . . . . . . . . . . . . . . . . . . 74 Experiments in Replication Strategies with SOS . . . . . . . . . . . . . . . . 77 3.6.5 Using Categorical Contextual Variables in SOS . . . . . . . . . . . . . . . . . 77 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 4: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.1 Theoretical Bounds in Low Sample Size Scenarios . . . . . . . . . . . . . . . 82 4.2.2 Prior Elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 ...from Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 ...from Prior Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.3 Risk-Aware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.4 Feedback Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.5 Contextual Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Costs of Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Clustering and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.6 Speed and Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 84 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 vii List of Tables Table 2.1 A hierarchy of bandit problems, categorized by the adversarial bandits general- ization in Audibert and Bubeck [11]. . . . . . . . . . . . . . . . . . . . . . . . . . 41 Table 3.1 Results from eliminating estimative uncertainty in the unbiased sampling case. . . 61 Table 3.2 Results from eliminating estimative uncertainty in the optimistic sampling case. . 61 Table 3.3 Results of a selection of replication strategies. . . . . . . . . . . . . . . . . . . . . 63 Table 3.4 The robustness and performance of the distribution-free sampler compared to the traditional parametric sampler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Table 3.5 Replication strategy experiments for SOS and Simple Sampler . . . . . . . . . . . 78 viii List of Figures Figure 2.1 An example of playing an expected-suboptimal arm but achieving a high reward due to random variation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Figure 2.2 An example showing how expected-expected regret, expected-payoff regret and suboptimal plays differ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Figure 2.3 Thompson Sampling for Bernoulli Bandits . . . . . . . . . . . . . . . . . . . . . . 50 Figure 3.1 Pseudocode of combined algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 68 Figure 3.2 Adjusted average cumulative regret of selected algorithms over 1,000 replicates of all worlds and true drift forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Figure 3.3 The Simple Nonparametric Sampler . . . . . . . . . . . . . . . . . . . . . . . . . 75 Figure 3.4 The Fully Parameterized Simple Optimistic Sampler (SOS) . . . . . . . . . . . . 76 Figure 3.5 The Categorical-Contextual Simple Optimistic Bootstrap Sampler (SOS) . . . . 79 ix List of Symbols This listing provides a reference for some of the common symbols used within this work. x The payoff received after selecting arm i of a multi-armed bandit process at time t. i,t x The payoff received after selecting arm i of a multi-armed bandit process explicitly i assumed to be stationary in time. E Expectation taken over the distribution of an arm (equivalently, over the prior parameter θ θ for the arm distribution.) E Expectation to be taken over both the random selection of a a priori fixed matrix of rewards and the actions of the player. µ The mean payoff from arm i. An alternative expression of E (x ). i θ i µ∗ The highest mean payoff of an arm. Alternatively, maxE (x ). θ i K The number of arms available in a multi-armed bandit. Usually a constant positive integer. H The horizon or number of time periods to be played in a multi-armed bandit. A positive integer or infinity, often unknown. n The number of time periods consumed thus far when considering measures of regret taken prior to completion of the process. n The number of time periods thus far in which an arm j has been selected. Equivalently, j the number of observations of arm j. S The sequence of arm selections made by the player. t R One of the many measures of regret. See Section 2.1.2. x

Description:
select a weighting strategy to reflect their known drift achieving state-of-the-art results. Fourth, we . 1.3.6 Simple Efficient Sampling for Optimistic Surrogate Models 11 We include a small set of data visualization.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.