BBrriigghhaamm YYoouunngg UUnniivveerrssiittyy BBYYUU SScchhoollaarrssAArrcchhiivvee Theses and Dissertations 2013-08-12 PPrraaccttiiccaall CCoosstt--CCoonnsscciioouuss AAccttiivvee LLeeaarrnniinngg ffoorr DDaattaa AAnnnnoottaattiioonn iinn AAnnnnoottaattoorr--IInniittiiaatteedd EEnnvviirroonnmmeennttss Robbie A. Haertel Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Computer Sciences Commons BBYYUU SScchhoollaarrssAArrcchhiivvee CCiittaattiioonn Haertel, Robbie A., "Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments" (2013). Theses and Dissertations. 4242. https://scholarsarchive.byu.edu/etd/4242 This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. PracticalCost-ConsciousActiveLearningforDataAnnotationin Annotator-InitiatedEnvironments RobbieA.Haertel Adissertationsubmittedtothefacultyof BrighamYoungUniversity inpartialfulfillmentoftherequirementsforthedegreeof DoctorofPhilosophy EricKarlRingger,Chair KevinDarrellSeppi ChristopheGerardGiraud-Carrier MichaelD.Jones KentEldonSeamons DepartmentofComputerScience BrighamYoungUniversity August2013 Copyright c 2013RobbieA.Haertel � AllRightsReserved ABSTRACT PracticalCost-ConsciousActiveLearningforDataAnnotationin Annotator-InitiatedEnvironments RobbieA.Haertel DepartmentofComputerScience,BYU DoctorofPhilosophy Manyprojectsexistwhosepurposeistoaugmentrawdatawithannotationsthatincreasethe usefulnessofthedata. Thenumber oftheseprojectsisrapidlygrowingandin theageof“bigdata” the amount of data to be annotated is likewise growing within each project. One common use of suchdataisinsupervisedmachinelearning,whichrequireslabeleddatatotrainapredictivemodel. Annotationisoftenaveryexpensiveproposition,particularlyforstructureddata. Thepurposeof this dissertation is to explore methods of reducing the cost of creating such data sets, including annotatedtextcorpora. We focus on active learning to address the annotation problem. Active learning employs modelstrainedusingmachinelearningtoidentifyinstancesinthedatathataremostinformative and least costly. We introduce novel techniques for adapting vanilla active learning to situations whereindata instancesareof varyingbenefitand cost,annotators requestwork“on-demand,” and therearemultiple,fallibleannotatorsofdifferinglevelsofaccuracyandcost. Inordertoaccount for data instances of varying cost, we build a model of cost from real annotation data based on a user study. We also introduce a novel cost-conscious active learning algorithm which we call return-on-investment, that selects instances for annotation that contain the most benefit per unit cost. To address theissue of annotators thatrequest instances “on-demand,” we develop aparallel, “no-wait” framework that performs computation while the annotator is annotating. As a result, annotatorsneednotwaitforthecomputertodeterminethebestinstanceforthemtoannotate—a common problem with existing approaches. Finally, we introduce a Bayesian model designed to simultaneously infer ground truth annotations from noisy annotations, infer each individual annotators accuracy,and predict itsown accuracy on unseendata, without theuse ofa held-out set. WeextendROI-based active learningandourannotation framework tohandlemultipleannotators using this model. As a whole, our work shows that the techniques introduced in this dissertation reducethecostofannotationinscenariosthataremoretrue-to-lifethanpreviousresearch. Keywords: activelearning,cost-sensitivelearning,machinelearning,return-on-investment,Bayesian models,parallelactivelearning,naturallanguageprocessing,part-of-speechtagging ACKNOWLEDGMENTS Nanosgigantumhumerisinsidentes,“Dwarvesstandingontheshouldersofgiants,”1 isan oldmetaphortypicallyused torefertothefactthatnewresearchisalwaysbuiltuponamuchlarger bodyofexistingresearch. Whilethisiscertainlythecase,anotherinterpretationexists. Namely, a scientist is unable to perform his research without the enormous assistance, aid, and support of many others. The following are a sampling of some of the giants that carried me while I was workingtowards mydegree; myapologiesin advancefor anyofthose thatIhavenotmentioned by name—knowthatIamappreciativeofallthosewhohaveassistedinanyway. Firstandforemost,IwouldliketothankDr. EricRingger,myadvisor. Iamevergrateful that, in his first year at Brigham Young University, he took a chance on a student of Linguistics interestedinnaturallanguageprocessingbyinvitingmetodoadoctorateunderhistutelage. Hehas procuredfundingformeandhastaughtmemorethanIcouldhaveimaginedthroughhisclasses, ourresearch,andotherinteractions. Moreimportantly,hehastaughtmehowtoperformresearch so that I may continue to learn and discover new things. He has as always been very supportive, inspiring, positive, uplifting, and, most of all, patient, even when I am sure I did not make these thingseasy. Hehasalsodeftlyhandledtheadministrativeaspectsofmydegree. I am also very grateful for Dr. Kevin Seppi, my second committee member. He has gone wellbeyondthecallofdutyofasecondcommitteememberandinmymindheisreallyaco-advisor. I have learned volumes from his classes and our interactions. Unlike the stereotypical professor, Dr. Seppiwasalwaysinthetrenches: likeDr.Ringger,hewasalwayseitherpresentphysicallyor availablebyphoneandemaillateintotheeveningsofpaperdeadlines. Iamthankfulforhissupport andencouragement. Likewise, my full Ph.D. committee has been very supportive. They supported me in my decisions to take time off for internships and also helped ensure I finished my dissertation after leavingmystudiestoworkfulltime. 1TranslationtakenfromWikipedia[97],whichcontainsaninterestingdiscussionaboutthehistoryanduseofthe phrase. Myotherco-authorshavebeenincrediblyhelpfulandinsightful,including,butnotlimited to: Dr. JamesCarroll’s,PaulFelt,GeorgeBusby,PeterMcClanahan,MarcCarmenandDr. Deryle Lonsdale. Althoughwehavenot(yet)beenco-authors,Iamalsoverygratefulformeaningfuland stimulatingdiscussionswithDr. DanielWalkerIV. IverymuchenjoyeddiscussionsIhadwithmycolleaguesatconferencesandworkshopsthat undoubtedlyshapedmyviewsofactivelearning. Inparticular,IwouldliketothankDr.BurrSettles forhiskindfeedbackofadraftofChapter10andforsharinghisthoughtsandideasaboutallthings activelearning. Ialsohad several fruitfulconversations with Dr. KatrinTomanekon thesubjectof cost-consciousactivelearning. Inaddition,Iamthankfulforherassistanceasco-organizerofthe ActiveLearningWorkshopforNLP,2010. Otherswhoinfluencedmethroughourconversations andinteractionsincludeDr.MichaelBloodgoodandDr.KevinSmall. Completionofmydegreewouldnothavebeenpossiblewithouttheaidoffinancialsupport and other resources. The Computer Science department has been very generous in this regard, providing funding for my research for all but one year. For that year, I am grateful to Microsoft whokindlyprovidedmewithaMentorGrant. BrighamYoungUniversityalsoprovidescomputing resourcesfreeofchargeviatheFultonSupercomputingLab,withoutwhich,noneofthisresearch wouldhavebeencompleted. Ofcourse,IamalsoextremelygratefultoGooglefortheexperience gainedonbothofmypaidinternships;RobertGardner,MaxLin,andGideonMannwerefabulous hosts who helped me reach my potential and complete successful internships. During this last year while working on my dissertation while a full-time employee of Google, management has been veryaccommodatinginallowingmetofinish;ittrulywasapriorityforthoseIworkwithandthose above me to finish. While here at Google, I have completed some parts of the dissertation using companyprovidedequipment. Last, but certainly not least, I would like to thank my family. I am most grateful for the support and sacrifice proffered me by the love of my life, my beautiful and dear wife of nearly elevenyears,Meri. Shehasmadeincrediblesacrificesandhaswillinglytakenuponherselfextra burdensathometoallowmetimetofinishmydissertation. Thisdissertationsimplywouldnothave beenpossiblewithouther,forwhichIwillbeeternallygrateful. Thisdegreeisasmuchhersasitis mineandIamsoblessedtobemarriedtoherandtowalkthejourneyoflifebyhersideandwith herhelp. Whileagraduatestudent,allfourofmychildren,Jared,Alex,Nathan,andmylittleprincess Caroline, have beenborn. They, too, have been verypatient withme throughthis process. Finally, I wouldliketothankmyparents. Theyhaveprovidedcontinuousloveandsupportfortheir,“eternal student.” TableofContents 1 Introduction 1 2 ASurveyofPractical,Cost-ConsciousActiveLearning 4 2.1 SupervisedMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 FormalDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 SupervisedMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 ActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Transductivevs.InductiveActiveLearning . . . . . . . . . . . . . . . . . . . . . 11 2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Benefit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.2 Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.3 CostinSimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 ScoringFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 EVSIScoringFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.2 Return-on-InvestmentScoringFunction . . . . . . . . . . . . . . . . . . . 17 2.5.3 OtherCost-SensitiveScoringFunctions . . . . . . . . . . . . . . . . . . . 19 2.6 CostandBenefitFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.6.1 DecisionTheoryBenefitFunctions . . . . . . . . . . . . . . . . . . . . . 19 2.6.2 HeuristicBenefitFunctions . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.3 Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.7 CharacteristicsofRealAnnotators . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.8 Learner-Initiatedvs.Annotator-InitiatedActiveLearning . . . . . . . . . . . . . . 24 vi 3 Roadmap 26 4 ActiveLearningforPart-of-SpeechTagging: AcceleratingCorpusAnnotation 30 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 PartofSpeechTagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 ActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 ActiveLearningintheLanguageContext . . . . . . . . . . . . . . . . . . 33 4.3.2 QuerybyCommittee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.3 QuerybyUncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.4 AdaptationsofQBU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.2 DataSets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.3 GeneralResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4.4 QBCResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4.5 QBUResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4.6 ResultsontheBNC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4.7 AnotherPerspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.6 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 AssessingtheCostsofMachine-AssistedCorpusAnnotationThroughaUserStudy 48 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 ExperimentalDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.1 Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.2 ControlVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2.3 SessionSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.4 DataSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 vii 5.2.5 UserInterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2.6 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3 DescriptiveStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 HourlyCostModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.6 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6 AssessingtheCostsofSamplingMethodsinActiveLearningforAnnotation 62 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2 BenefitandCostinActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.3 EvaluationMethodologyandResults . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.4 NormalizedMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7 ReturnonInvestmentforActiveLearning 69 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.2 TheRoleofCostandBenefit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.3 BackgroundandDecisionTheoreticFrameworkforActiveLearning . . . . . . . . 71 7.4 ReturnonInvestment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5.1 UtilityEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5.2 CostEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.5.3 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.6.1 CostEstimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.6.2 UtilityEstimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.7 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.8 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 viii 8 AnAnalyticandEmpiricalEvaluationofReturn-on-Investment-BasedActiveLearn- ing 84 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.3 TheoreticalAnalysisofROI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.4.1 CostSimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4.2 CostEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4.3 BenefitEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.4.4 PracticalActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5 FromTheorytoPractice: ToWhatDegreeAretheConditionsMet? . . . . . . . . 96 8.6 ActiveLearningResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . 100 8.7 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9 ParallelActiveLearning: EliminatingWaitTimewithMinimalStaleness 104 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 9.2 FromZeroStalenesstoZeroWait . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9.2.1 ZeroStaleness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.2.2 TraditionalBatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.2.3 AllowingOldScores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 9.2.4 EliminatingWaitTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 9.3 ExperimentalDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.5 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.6 Addendum: StrengtheningtheCasefortheParallelFramework . . . . . . . . . . . 120 9.6.1 AnalysisofEffectsofRelativeTimeSpentAnnotatingonPerformance . . 120 9.6.2 EmpiricalComparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 9.6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 ix
Description: