Dynamic Task Allocation for Crowdsourcing Settings AngelaZhou [email protected] PrincetonUniversity,Princeton,NJ08544 IrineoCabrerors [email protected] PrincetonUniversity,Princeton,NJ08544 7 KaranSingh [email protected] 1 PrincetonUniversity,Princeton,NJ08544 0 2 b Abstract 1.1.PreviousWork e F DawidandSkenestudiedthisestimationproblemin1979 5 We consider the problem of optimal budget al- in the context of responses to surveys [4]. In the Dawid- 2 location for crowdsourcing problems, allocating Skene stochastic model, each user has an associated reli- users to tasks to maximize our final confidence ability score dictating the probability with which the user ] G inthecrowdsourcedanswers. Suchanoptimized answersabinaryquestioncorrectly.Thejointestimationof L worker assignment method allows us to “boost” user reliabilities and correct answers in this model is very . the efficacy of any popular crowdsourcing esti- well studied in statistics, even for the case of k-ary ques- s mation algorithm. We consider a mutual infor- tions. Popular approaches use Expectation Maximization c [ mationinterpretationofthecrowdsourcingprob- to find the true labels maximizing the likelihood of esti- lem, which leads to a stochastic subset selec- mated answer labels and reliabilities. Recent work uses 2 v tion problem with a submodular objective func- spectral methods (based on eigenvalues of the assignment 5 tion. We present experimental simulation re- graph) to initialize the iterative expectation maximization 9 sultswhichdemonstratetheeffectivenessofour algorithm, proving optimality convergence rates of such a 7 dynamic task allocation method for achieving scheme [7]. Another approach achieving state-of-the-art 8 higheraccuracy,possiblyrequiringfewerlabels, performance finds the labeling via a minimax conditional 0 as well as improving upon a previous method entropyapproach[8]. . 1 which is sensitive to the proportion of users to 0 questions. Most analysis of the estimation algorithms assume ran- 7 dom sampling. However, conceivably, intermediate es- 1 timates of the reliabilities of workers can inform better : v worker-task assignments which lead to higher quality fi- Xi 1.Introduction nal estimates. A 2011 paper by Karger, Oh, and Shah on budget-optimalcrowdsourcingtracksmultipleinstancesof r a Therationaleofcrowdsourcingistoleveragethe“wisdom distinct assignment graphs but assumes a highly specific ofthecrowd”insolicitingsomekindofresponseorsugges- ”spammer-hammer” model where all users either answer tion. Crowdsourcingallowsfortheannotationoflargere- randomly or correctly. Another approach models the as- searchcorpusesfortrainingdata-intensivemodelssuchas signment problem as a Markov Decision Process and de- deepneuralnetworks. PlatformssuchasMechanicalTurk rivestheOptimisticKnowledgeGradient,whichcomputes allowforthesystematicandprogrammaticimplementation the conditional expectation of choosing certain workers, and assignment of crowdsourcing tasks. The crowdsourc- assuming a Beta-Bernoulli prior on the reliability of each ingestimationproblemisespeciallydifficultbecauseboth worker [2]. However the assignment scheme requires ex- the reliabilities of users and the true answers to the ques- tensiverecomputationandupdatingbetweeneachindivid- tionsareunknown. ual question-worker assignment, an unrealistic frequency ofmodelupdating. International Conference on Machine Learning, New York, NY, Classical Dawid-Skene Model In the classical Dawid- USA,2016. AppearingintheworkshopData-EfficientMachine Skenemodel, abbreviated“D-S”, therearenusersandm Learning.Copyright2016bytheauthor(s). DynamicTaskAllocationforCrowdsourcingSettings questions with correct answers {−1,+1}. (Without loss tual information sums over all possible outcomes of the of generality we may map the answers to {0,1}.) Let random variables X, Y, and measures the mutual depen- G ∈ {0,1}n×m denoteabipartitematrixindicatingifthe dence between two variables by quantifying the “amount ith useransweredthejth question. Thisistypicallycalled ofinformation”obtainedaboutonerandomvariablebyob- the assignment matrix. Additionally, let a ∈ {−1,+1}m servinganother(Cover,2006). Avariantofthemutualin- be a vector which captures the correct answer for each of formation between realized outcomes of the random vari- themquestions. Letp ∈ [0,1]n beavectordenotingeach ables X = x,Y = y is the pointwise mutual information user’sreliability,fornusers. LetA ∈ {−1,0,1}n×m bea whichhasfoundapplicationsinstatisticalnaturallanguage stochasticanswermatrixsuchthat processing. We define another variant of mutual informa- tion,partialmutualinformation,betweenarandomvari- 0 ifG =0 ij able on the source channel, X, and observed outcomes of A = a withprobabilityp ,ifG =1 ij j i ij Y = y, where we only integrate over the randomness of −a withprobability1−p ,ifG =1 j i ij the unknown source channel. Partial Mutual Informa- tion, pMI(X; Y), is defined between random variable X 2.MethodforQuasi-OnlineTaskAllocation andoutcomesofrandomvariableY =y. We derive an improved task allocation scheme and model (cid:18) (cid:19) (cid:88) p(x,y) the crowdsourcing problem as an optimization problem pMI(X;Y)= p(x,y)log p(x)p(y) with an information-theoretic objective, since we want to x∈X assignworkerstotaskstogainthemostinformationabout thetruelabelofthequestion. Inournotation: Previous work (unpublished) by the same authors (cid:89) (cid:89) P[X =1,{Y }]= F (1−F ) (Cabreros, 2015) proposes a two-step estimation method m m im jm that uses a budget parameter, s ∈ [0,1], in two stages. i|yi=1 j|yj=0 Running a crowdsourcing estimation algorithm when half ofthebudgethasbeenallocatedyieldsestimatesofthetrue answers, whichmaybeusedwithamixturemodelonthe However, solving the cardinality-constrained integer pro- topicsofquestionstoestimatereliabilitiesofeachuser,F˜. gramtoassignuserstoquestionsismostlikelyNP-hard,as Thensampleusersmorelikelytoprovidecorrectanswers (Krause,2012)reduceasimilarformulationofanentropy for specific questions during the remaining portion of the minimizationproblemtoindependentset.Thebestthingto budgettoarriveatafinalA. ThefinalanswersmatrixAis doina‘one-shot’settingifourremainingbudgetsisless thenusedasinputforthesameblack-boxestimationalgo- thanthenumberofquestionsmistochoosethesbestrela- rithmtoarriveatthefinalestimates, asdepictedinFigure tiveimprovementsinthepartialmutualinformation,which 1. we denote as ∆pMI (v;X ;Y ) for the improvement rel m m withrespecttoqueryinguserv. EstimatingF hasalready ij computeda“nestedoptimization”andfoundwhichuseris thebesttoassigntoacertainquestion. 3.Results We evaluate the error rates of three methods: (1) running Dawid-Skeneestimationonabudgetsrandomlysampled, and(2)runningDawid-Skeneestimationonabudget s ran- 2 domlysampledandassigningtheremainingbudgetviathe one-shotallocation,and(3)method(2)butassigningbud- getviathedynamictaskallocation. Weevaluatetheerror ratesfrom20randomsamplesofquestionsanduserrelia- bilitiesfor1000usersand100questions,assumingamix- ture of 2 topics, over 10 different budgets from assigning Figure1.Schematic diagram of the dynamic task allocation 0.5%ofuserstoeachquestionto2%coverageinFigure2. scheme. We also consider how the dimensions of the estimation How do we decide which questions require more budget problemimpacttheperformanceofthesesamplingpolicies allocated? Weconsiderinformation-theoreticmetrics: mu- inFigure3: whenusersoutnumberquestions,theone-shot DynamicTaskAllocationforCrowdsourcingSettings Figure3.Comparingsamplingpolicyperformancewhenvarying Figure2.Comparisonoferrorvs. budgetpercentagesforthedy- thenumberofquestions,10sampletrialseach.Errorbarsareone namic allocation method (blue) against fully random sampling standard deviation. Dynamic task allocation (red solid) is com- andDawid-Skene(dashedred)estimator.Notethatforsmallbud- petitivewithfullsampling(bluedashed),whileone-shotlearning gets the dynamic allocation scheme significantly improves esti- (magentadashed)performspoorlyasthenumberofquestionsin- mationerror.Forlargebudgetsbothmethodsarehighlyaccurate. creases Errorbarsare1.96standarderrorsover25trials(confidencelevel of95%) compute the label estimates. We are able to show signifi- cantimprovementinestimationaccuracyforsmallbudgets. Our dynamic task allocation scheme is also robust for es- allocationdoespoorlywhilethenewdynamicuseralloca- timation schemes with higher ratios of questions to users, tionisstillrobust. wheretheone-shotlearningpolicysuffershighererrorthan However, examining the intermediate accuracies yielded randomsampling. Theadvantageofsuchaschemeisthat by the dynamic task allocation method indicate that the the mutual information heuristic we develop may be ex- strengths of this method lie in the ability to achieve bet- tendedtoevaluateearlyterminationoftheestimationpro- ter estimation accuracy performance with fewer samples, cess. and terminate the estimation process early. Can we use the estimates of mutual information in an optimal stop- References pingframeworktodevelopmoredata-efficientcrowdsourc- ing estimation algorithms? “Efficiency” will be measured Cabreros, Irineo; Singh, Karan; Zhou Angela. Mixture withregardstorandomsampling, wherewewanttoshow modelforcrowdsourcing. 2015. our method is (1+(cid:15)) efficient using (1−δ) samples as needed by random sampling, for any chosen δ. We will Chen, Xi; Lin, QIhang; Zhou Dengyong. Optimistic considerthisquestioninthefullpaperandconjecturethat knowledge gradient policy for optimal allocation in one point of connection between the channel capacity is crowdsourcing. JournalofMachineLearningResearch, with the Fisher Information. Analysis would proceed by 2013. consideringtheminimaxconvergenceratesoftheD-Sesti- Cover, Thomas; Thomas, Joy. Elements of Information mator,provedin(Gao,2013),toanalyzethedisadvantage Theory. Wiley,2006. ofusingfewersamples. Dawid,A.P;Skene,A.M. Maximumlikelihoodestimation 4.Conclusions ofobservererror-ratesusingtheemalgorithm. Journal oftheRoyalStatisticalSociety,1979. We model task assignment in the crowdsourcing prob- lem and develop a probabilistic model for a “partial mu- Gao, Chao; Zhou, Dengyong. Minimax optimal conver- tual information” criterion which yields a one-step looka- gence rates for estimating ground truth from crowd- head policy of which questions to ask next. We imple- sourcedlabels. MSRTechnicalReport,2013. ment a batch estimation policy which exhausts the bud- get by querying an additional label for each question, re- Krause, Andreas; Golovin, Daniel. Submodular function estimating F˜ , and using these updated estimates to re- maximization. 2012. est DynamicTaskAllocationforCrowdsourcingSettings Zhang, Yuchen; Chen, Xi; Zhou Dengyong; Jor- dan Michael. Spectral methods meet em: A provably optimalalgorithmforcrowdsourcing. NIPS,2014. Zhou, D; Liu, Q; Platt J.C; Meek C. Aggregating ordi- nallabelsfromcrowdsbyminimaxconditionalentropy. ProceedingsofICML,2014.