Table Of Content

Dynamic Task Allocation for Crowdsourcing Settings AngelaZhou ANGELAZ@PRINCETON.EDU PrincetonUniversity,Princeton,NJ08544 IrineoCabrerors CABREROS@PRINCETON.EDU PrincetonUniversity,Princeton,NJ08544 7 KaranSingh KARANS@PRINCETON.EDU 1 PrincetonUniversity,Princeton,NJ08544 0 2 b Abstract 1.1.PreviousWork e F DawidandSkenestudiedthisestimationproblemin1979 5 We consider the problem of optimal budget al- in the context of responses to surveys [4]. In the Dawid- 2 location for crowdsourcing problems, allocating Skene stochastic model, each user has an associated reli- users to tasks to maximize our final confidence ability score dictating the probability with which the user ] G inthecrowdsourcedanswers. Suchanoptimized answersabinaryquestioncorrectly.Thejointestimationof L worker assignment method allows us to “boost” user reliabilities and correct answers in this model is very . the efficacy of any popular crowdsourcing esti- well studied in statistics, even for the case of k-ary ques- s mation algorithm. We consider a mutual infor- tions. Popular approaches use Expectation Maximization c [ mationinterpretationofthecrowdsourcingprob- to find the true labels maximizing the likelihood of esti- lem, which leads to a stochastic subset selec- mated answer labels and reliabilities. Recent work uses 2 v tion problem with a submodular objective func- spectral methods (based on eigenvalues of the assignment 5 tion. We present experimental simulation re- graph) to initialize the iterative expectation maximization 9 sultswhichdemonstratetheeffectivenessofour algorithm, proving optimality convergence rates of such a 7 dynamic task allocation method for achieving scheme [7]. Another approach achieving state-of-the-art 8 higheraccuracy,possiblyrequiringfewerlabels, performance finds the labeling via a minimax conditional 0 as well as improving upon a previous method entropyapproach[8]. . 1 which is sensitive to the proportion of users to 0 questions. Most analysis of the estimation algorithms assume ran- 7 dom sampling. However, conceivably, intermediate es- 1 timates of the reliabilities of workers can inform better : v worker-task assignments which lead to higher quality fi- Xi 1.Introduction nal estimates. A 2011 paper by Karger, Oh, and Shah on budget-optimalcrowdsourcingtracksmultipleinstancesof r a Therationaleofcrowdsourcingistoleveragethe“wisdom distinct assignment graphs but assumes a highly specific ofthecrowd”insolicitingsomekindofresponseorsugges- ”spammer-hammer” model where all users either answer tion. Crowdsourcingallowsfortheannotationoflargere- randomly or correctly. Another approach models the as- searchcorpusesfortrainingdata-intensivemodelssuchas signment problem as a Markov Decision Process and de- deepneuralnetworks. PlatformssuchasMechanicalTurk rivestheOptimisticKnowledgeGradient,whichcomputes allowforthesystematicandprogrammaticimplementation the conditional expectation of choosing certain workers, and assignment of crowdsourcing tasks. The crowdsourc- assuming a Beta-Bernoulli prior on the reliability of each ingestimationproblemisespeciallydifficultbecauseboth worker [2]. However the assignment scheme requires ex- the reliabilities of users and the true answers to the ques- tensiverecomputationandupdatingbetweeneachindivid- tionsareunknown. ual question-worker assignment, an unrealistic frequency ofmodelupdating. International Conference on Machine Learning, New York, NY, Classical Dawid-Skene Model In the classical Dawid- USA,2016. AppearingintheworkshopData-EfficientMachine Skenemodel, abbreviated“D-S”, therearenusersandm Learning.Copyright2016bytheauthor(s). DynamicTaskAllocationforCrowdsourcingSettings questions with correct answers {−1,+1}. (Without loss tual information sums over all possible outcomes of the of generality we may map the answers to {0,1}.) Let random variables X, Y, and measures the mutual depen- G ∈ {0,1}n×m denoteabipartitematrixindicatingifthe dence between two variables by quantifying the “amount ith useransweredthejth question. Thisistypicallycalled ofinformation”obtainedaboutonerandomvariablebyob- the assignment matrix. Additionally, let a ∈ {−1,+1}m servinganother(Cover,2006). Avariantofthemutualin- be a vector which captures the correct answer for each of formation between realized outcomes of the random vari- themquestions. Letp ∈ [0,1]n beavectordenotingeach ables X = x,Y = y is the pointwise mutual information user’sreliability,fornusers. LetA ∈ {−1,0,1}n×m bea whichhasfoundapplicationsinstatisticalnaturallanguage stochasticanswermatrixsuchthat processing. We define another variant of mutual informa-  tion,partialmutualinformation,betweenarandomvari- 0 ifG =0  ij able on the source channel, X, and observed outcomes of A = a withprobabilityp ,ifG =1 ij j i ij Y = y, where we only integrate over the randomness of  −a withprobability1−p ,ifG =1 j i ij the unknown source channel. Partial Mutual Informa- tion, pMI(X; Y), is defined between random variable X 2.MethodforQuasi-OnlineTaskAllocation andoutcomesofrandomvariableY =y. We derive an improved task allocation scheme and model (cid:18) (cid:19) (cid:88) p(x,y) the crowdsourcing problem as an optimization problem pMI(X;Y)= p(x,y)log p(x)p(y) with an information-theoretic objective, since we want to x∈X assignworkerstotaskstogainthemostinformationabout thetruelabelofthequestion. Inournotation: Previous work (unpublished) by the same authors (cid:89) (cid:89) P[X =1,{Y }]= F (1−F ) (Cabreros, 2015) proposes a two-step estimation method m m im jm that uses a budget parameter, s ∈ [0,1], in two stages. i|yi=1 j|yj=0 Running a crowdsourcing estimation algorithm when half ofthebudgethasbeenallocatedyieldsestimatesofthetrue answers, whichmaybeusedwithamixturemodelonthe However, solving the cardinality-constrained integer pro- topicsofquestionstoestimatereliabilitiesofeachuser,F˜. gramtoassignuserstoquestionsismostlikelyNP-hard,as Thensampleusersmorelikelytoprovidecorrectanswers (Krause,2012)reduceasimilarformulationofanentropy for specific questions during the remaining portion of the minimizationproblemtoindependentset.Thebestthingto budgettoarriveatafinalA. ThefinalanswersmatrixAis doina‘one-shot’settingifourremainingbudgetsisless thenusedasinputforthesameblack-boxestimationalgo- thanthenumberofquestionsmistochoosethesbestrela- rithmtoarriveatthefinalestimates, asdepictedinFigure tiveimprovementsinthepartialmutualinformation,which 1. we denote as ∆pMI (v;X ;Y ) for the improvement rel m m withrespecttoqueryinguserv. EstimatingF hasalready ij computeda“nestedoptimization”andfoundwhichuseris thebesttoassigntoacertainquestion. 3.Results We evaluate the error rates of three methods: (1) running Dawid-Skeneestimationonabudgetsrandomlysampled, and(2)runningDawid-Skeneestimationonabudget s ran- 2 domlysampledandassigningtheremainingbudgetviathe one-shotallocation,and(3)method(2)butassigningbud- getviathedynamictaskallocation. Weevaluatetheerror ratesfrom20randomsamplesofquestionsanduserrelia- bilitiesfor1000usersand100questions,assumingamix- ture of 2 topics, over 10 different budgets from assigning Figure1.Schematic diagram of the dynamic task allocation 0.5%ofuserstoeachquestionto2%coverageinFigure2. scheme. We also consider how the dimensions of the estimation How do we decide which questions require more budget problemimpacttheperformanceofthesesamplingpolicies allocated? Weconsiderinformation-theoreticmetrics: mu- inFigure3: whenusersoutnumberquestions,theone-shot DynamicTaskAllocationforCrowdsourcingSettings Figure3.Comparingsamplingpolicyperformancewhenvarying Figure2.Comparisonoferrorvs. budgetpercentagesforthedy- thenumberofquestions,10sampletrialseach.Errorbarsareone namic allocation method (blue) against fully random sampling standard deviation. Dynamic task allocation (red solid) is com- andDawid-Skene(dashedred)estimator.Notethatforsmallbud- petitivewithfullsampling(bluedashed),whileone-shotlearning gets the dynamic allocation scheme significantly improves esti- (magentadashed)performspoorlyasthenumberofquestionsin- mationerror.Forlargebudgetsbothmethodsarehighlyaccurate. creases Errorbarsare1.96standarderrorsover25trials(confidencelevel of95%) compute the label estimates. We are able to show signifi- cantimprovementinestimationaccuracyforsmallbudgets. Our dynamic task allocation scheme is also robust for es- allocationdoespoorlywhilethenewdynamicuseralloca- timation schemes with higher ratios of questions to users, tionisstillrobust. wheretheone-shotlearningpolicysuffershighererrorthan However, examining the intermediate accuracies yielded randomsampling. Theadvantageofsuchaschemeisthat by the dynamic task allocation method indicate that the the mutual information heuristic we develop may be ex- strengths of this method lie in the ability to achieve bet- tendedtoevaluateearlyterminationoftheestimationpro- ter estimation accuracy performance with fewer samples, cess. and terminate the estimation process early. Can we use the estimates of mutual information in an optimal stop- References pingframeworktodevelopmoredata-efficientcrowdsourc- ing estimation algorithms? “Efficiency” will be measured Cabreros, Irineo; Singh, Karan; Zhou Angela. Mixture withregardstorandomsampling, wherewewanttoshow modelforcrowdsourcing. 2015. our method is (1+(cid:15)) efficient using (1−δ) samples as needed by random sampling, for any chosen δ. We will Chen, Xi; Lin, QIhang; Zhou Dengyong. Optimistic considerthisquestioninthefullpaperandconjecturethat knowledge gradient policy for optimal allocation in one point of connection between the channel capacity is crowdsourcing. JournalofMachineLearningResearch, with the Fisher Information. Analysis would proceed by 2013. consideringtheminimaxconvergenceratesoftheD-Sesti- Cover, Thomas; Thomas, Joy. Elements of Information mator,provedin(Gao,2013),toanalyzethedisadvantage Theory. Wiley,2006. ofusingfewersamples. Dawid,A.P;Skene,A.M. Maximumlikelihoodestimation 4.Conclusions ofobservererror-ratesusingtheemalgorithm. Journal oftheRoyalStatisticalSociety,1979. We model task assignment in the crowdsourcing problem and develop a probabilistic model for a “partial mu- Gao, Chao; Zhou, Dengyong. Minimax optimal conver- tual information” criterion which yields a one-step looka- gence rates for estimating ground truth from crowd- head policy of which questions to ask next. We imple- sourcedlabels. MSRTechnicalReport,2013. ment a batch estimation policy which exhausts the budget by querying an additional label for each question, re- Krause, Andreas; Golovin, Daniel. Submodular function estimating F˜ , and using these updated estimates to re- maximization. 2012. est DynamicTaskAllocationforCrowdsourcingSettings Zhang, Yuchen; Chen, Xi; Zhou Dengyong; Jor- dan Michael. Spectral methods meet em: A provably optimalalgorithmforcrowdsourcing. NIPS,2014. Zhou, D; Liu, Q; Platt J.C; Meek C. Aggregating ordi- nallabelsfromcrowdsbyminimaxconditionalentropy. ProceedingsofICML,2014.

Dynamic Task Allocation for Crowdsourcing Settings PDF

0.23 MB·

by Angela Zhou

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Dynamic Task Allocation for Crowdsourcing Settings PDF Free - Full Version

by Angela Zhou| 0.23

Download Dynamic Task Allocation for Crowdsourcing Settings by Angela Zhou in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Dynamic Task Allocation for Crowdsourcing Settings

No description available for this book.

Detailed Information

Author:	Angela Zhou
File Size:	0.23
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Dynamic Task Allocation for Crowdsourcing Settings Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Dynamic Task Allocation for Crowdsourcing Settings PDF?

Yes, on https://PDFdrive.to you can download Dynamic Task Allocation for Crowdsourcing Settings by Angela Zhou completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Dynamic Task Allocation for Crowdsourcing Settings on my mobile device?

After downloading Dynamic Task Allocation for Crowdsourcing Settings PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Dynamic Task Allocation for Crowdsourcing Settings?

Yes, this is the complete PDF version of Dynamic Task Allocation for Crowdsourcing Settings by Angela Zhou. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Dynamic Task Allocation for Crowdsourcing Settings PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.