UUnniivveerrssiittyy ooff PPeennnnssyyllvvaanniiaa SScchhoollaarrllyyCCoommmmoonnss Publicly Accessible Penn Dissertations 2013 EEnnaabblliinngg MMoorree AAccccuurraattee aanndd EEffifficciieenntt SSttrruuccttuurreedd PPrreeddiiccttiioonn David Joseph Weiss University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Computer Sciences Commons RReeccoommmmeennddeedd CCiittaattiioonn Weiss, David Joseph, "Enabling More Accurate and Efficient Structured Prediction" (2013). Publicly Accessible Penn Dissertations. 942. https://repository.upenn.edu/edissertations/942 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/942 For more information, please contact [email protected]. EEnnaabblliinngg MMoorree AAccccuurraattee aanndd EEffifficciieenntt SSttrruuccttuurreedd PPrreeddiiccttiioonn AAbbssttrraacctt Machine learning practitioners often face a fundamental trade-off between expressiveness and computation time: on average, more accurate, expressive models tend to be more computationally intensive both at training and test time. While this trade-off is always applicable, it is acutely present in the setting of structured prediction, where the joint prediction of multiple output variables often creates two primary, inter-related bottlenecks: inference and feature computation time. In this thesis, we address this trade-off at test-time by presenting frameworks that enable more accurate and efficient structured prediction by addressing each of the bottlenecks specifically. First, we develop a framework based on a cascade of models, where the goal is to control test-time complexity even as features are added that increase inference time (even exponentially). We call this framework Structured Prediction Cascades (SPC); we develop SPC in the context of exact inference and then extend the framework to handle the approximate case. Next, we develop a framework for the setting where the feature computation is explicitly the bottleneck, in which we learn to selectively evaluate features within an instance of the mode. This second framework is referred to as Dynamic Structured Model Selection (DMS), and is once again developed for a simpler, restricted model before being extended to handle a much more complex setting. For both cases, we evaluate our methods on several benchmark datasets, and we find that it is possible to dramatically improve the efficiency and accuracy of structured prediction. DDeeggrreeee TTyyppee Dissertation DDeeggrreeee NNaammee Doctor of Philosophy (PhD) GGrraadduuaattee GGrroouupp Computer and Information Science FFiirrsstt AAddvviissoorr Ben Taskar KKeeyywwoorrddss computer vision, machine learning, natural language processing, structured prediction SSuubbjjeecctt CCaatteeggoorriieess Computer Sciences This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/942 ENABLINGMOREACCURATEANDEFFICIENTSTRUCTUREDPREDICTION DavidWeiss ADISSERTATION in ComputerandInformationScience PresentedtotheFacultiesoftheUniversityofPennsylvania in PartialFulfillmentoftheRequirementsforthe DegreeofDoctorofPhilosophy 2013 SupervisorofDissertation BenTaskar AssociateProfessor,ComputerandInformationScience GraduateGroupChairperson ValTannen,Professor,ComputerandInformationScience DissertationCommittee: KostasDaniilidis,Professor,ComputerandInformationScience DanielLee,Professor,ComputerandInformationScience CamilloJ.Taylor,AssociateProfessor,ComputerandInformationScience PedroFelzenszwalb,AssociateProfessor,ComputerScience,BrownUniversity ENABLINGMOREACCURATEANDEFFICIENTSTRUCTUREDPREDICTION COPYRIGHT 2013 DavidWeiss ForBen: Myadvisor,mentor,andfriend;Imissyou. iii Acknowledgements I would like to first thank and acknowledge my advisor, Ben Taskar. This thesis would simplynotexistwithouthisunwaveringexpertguidance,encouragement,andsupport. His boundless energy and enthusiasm for our work pushed me to achieve what I thought was impossible. Icouldnothaveaskedforabetteradvisor. I am grateful to Kostas Daniliidis, the chair of my committee, for his consistent advice and confidence in me. I would also like to thank the rest of my committee–Dan Lee, CJ Taylor, and Pedro Felzenszwalb–for helping me forge my years of effort into a single coherent document. I am also indebted to my original co-advisor Michael Kearns, who shapedthestartofmytimeingraduateschool. DuringmytimeatPenn,Iwasfortunatetobesurroundedbyanamazingcommunityof colleaguesandfriends. Icountmyselfluckytohavehadtheopportunitytoworksoclosely withBenjaminSapp. WithoutAndrewKing,Iwouldhavemissedoutonanamazingpaper title and acronym. Many thanks to Alex, Alex, Cody, Chris, Christine, Emily, Joao, Jenny, Katie, Katerina, Kuzman, Matthieu, Ryan, Steve, and Umar. I would not have made it throughthisprocesswithoutyou. Finally, I am only the person I am today because of my family: my parents, Robert and Margaret, and my brothers, Michael and Jonny. Their love kept me going through the difficulttimeswhereIneededitmost. iv ABSTRACT ENABLINGMOREACCURATEANDEFFICIENTSTRUCTUREDPREDICTION DavidWeiss BenTaskar Machine learning practitioners often face a fundamental trade-off between expressiveness andcomputationtime: onaverage,moreaccurate,expressivemodelstendtobemorecom- putationallyintensivebothattrainingandtesttime. Whilethistrade-offisalwaysapplica- ble, it is acutely present in the setting of structured prediction, where the joint prediction of multiple output variables often creates two primary, inter-related bottlenecks: inference andfeaturecomputationtime. In this thesis, we address this trade-off at test-time by presenting frameworks that en- ablemoreaccurateandefficientstructuredpredictionbyaddressingeachofthebottlenecks specifically. First, we develop a framework based on a cascade of models, where the goal is to control test-time complexity even as features are added that increase inference time (even exponentially). We call this framework Structured Prediction Cascades (SPC); we develop SPC in the context of exact inference and then extend the framework to handle the approximate case. Next, we develop a framework for the setting where the feature computation is explicitly the bottleneck, in which we learn to selectively evaluate features within an instance of the mode. This second framework is referred to as Dynamic Struc- tured Model Selection (DMS), and is once again developed for a simpler, restricted model beforebeingextendedtohandleamuchmorecomplexsetting. Forbothcases,weevaluate our methods on several benchmark datasets, and we find that it is possible to dramatically improvetheefficiencyandaccuracyofstructuredprediction. v Contents Acknowledgements iv 1 Introduction 1 1.1 ThesisOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 RelatedWork 6 2.1 Controllingcomputationofmulti-classclassification . . . . . . . . . . . . 8 2.1.1 Computationaltrade-offsduringmodelselection . . . . . . . . . . 8 2.1.2 Predictingwithfixedtest-timebudgets . . . . . . . . . . . . . . . . 9 2.1.3 Resourceallocationforbatchtesting . . . . . . . . . . . . . . . . . 10 2.1.4 Classifiercascades . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Moregeneralmulti-stagedecisionsystems . . . . . . . . . . . . . 11 2.1.6 Featureextractionpolicies . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Relatedapproachesforstructuredprediction . . . . . . . . . . . . . . . . . 14 2.2.1 Coarse-to-finereasoning . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Earlystoppingcascade . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Prioritizedinference . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.4 Meta-learners/predictingmodelaccuracy . . . . . . . . . . . . . . 16 2.2.5 Applicationsofpreliminaryworks . . . . . . . . . . . . . . . . . . 16 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 StructuredPrediction: AnOverview 18 vi 3.1 Supervisedlearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Structuredprediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Representingstructurewithfactorgraphs . . . . . . . . . . . . . . 21 3.2.2 Complexityofinference . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3 Max-marginparameterlearning . . . . . . . . . . . . . . . . . . . 24 3.3 Tradingoffcomputationandexpressiveness . . . . . . . . . . . . . . . . . 25 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 StructuredPredictionCascades(SPC) 28 4.1 Enablingcomplexityviafiltering . . . . . . . . . . . . . . . . . . . . . 29 Y 4.2 Cascadedinferencewithmax-marginals . . . . . . . . . . . . . . . . . . . 30 4.3 Learningstructuredpredictioncascades . . . . . . . . . . . . . . . . . . . 36 4.4 Generalizationanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5.1 Speed: Part-of-speech(POS)tagging . . . . . . . . . . . . . . . . 42 4.5.2 Accuracy: Handwritingrecognition . . . . . . . . . . . . . . . . . 45 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Ensemble-SPC: SPCforLoopyGraphs 48 5.1 Decompositionwithoutagreementconstraints . . . . . . . . . . . . . . . . 49 5.2 Safefiltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Learningwithensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4 Generalizationanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.1 SyntheticloopygraphswithEnsemble-SPC . . . . . . . . . . . . . 54 5.5.2 Articulatedposetrackingcascade . . . . . . . . . . . . . . . . . . 57 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6 DynamicStructuredModelSelection(DMS) 61 6.1 Meta-learningwithavalue-basedselector . . . . . . . . . . . . . . . . . . 63 vii 6.2 Learningthemodelsandselector . . . . . . . . . . . . . . . . . . . . . . . 64 6.3 Applicationtosequentialprediction . . . . . . . . . . . . . . . . . . . . . 67 6.3.1 Handwritingrecognition . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.2 Humanposeestimationinvideo . . . . . . . . . . . . . . . . . . . 72 6.3.3 EvaluationofMODEC+S . . . . . . . . . . . . . . . . . . . . . . 75 6.3.4 EvaluationofDMSforMODEC+S . . . . . . . . . . . . . . . . . 76 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7 DMS-π: Policy-basedModelSelection 80 7.1 Q-Learningafeatureextractionpolicy . . . . . . . . . . . . . . . . . . . . 82 7.2 Designoftheinformation-adaptivepredictorh . . . . . . . . . . . . . . . 88 7.3 Batchmodeinference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.4.1 Trackingofhumanposeinvideo . . . . . . . . . . . . . . . . . . . 95 7.4.2 Handwritingrecognition . . . . . . . . . . . . . . . . . . . . . . . 96 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 8 FutureWork 99 8.1 CombiningSPCandDMS-π . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.1.1 SPC-π onLinear-chains . . . . . . . . . . . . . . . . . . . . . . . 102 8.1.2 Extendingtoloopygraphs . . . . . . . . . . . . . . . . . . . . . . 102 8.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9 Conclusion 104 A TheoremProofs 105 A.1 ProofsofTheorems1and2 . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.1.1 ProofofTheorem1 . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A.1.2 ProofofTheorem2 . . . . . . . . . . . . . . . . . . . . . . . . . . 111 viii
Description: