Mathematical Theories of Interaction with Oracles Liu Yang October 2013 CMU-ML-13-111 Mathematical Theories of Interaction with Oracles Liu Yang October2013 CMU-ML-13-111 SchoolofComputer Science MachineLearningDepartment CarnegieMellonUniversity Pittsburgh, PA ThesisCommittee: AvrimBlum,Chair JaimeCarbonell,Chair ManuelBlum SanjoyDasgupta YishayMansour JoelSpencer Submittedinpartialfulfillment oftherequirements forthedegreeofDoctorofPhilosophy. Copyright c 2013LiuYang (cid:13) ThisresearchwassponsoredbytheNationalScienceFoundationundergrantnumbersDBI0640543, IIS0713379, IIS1065251;theDefenseIntelligenceAgencyundergrantnumberFA872105C0003;andagrantfromGoogleInc. The views and conclusions contained in this document are those of the author and should not be interpreted as representingtheofficialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S.governmentor anyotherentity. Keywords: Property Testing, Active Learning, Computational Learning Theory, Learning DNF,StatisticalLearningTheory,TransferLearning,Prior Estimation, BayesianTheory,Surro- gate Losses, Preference Elicitation,Concept Drift, Algorithmic Mechanism Design, Economies ofScale ThisthesisisdedicatedtoallMathematicians. Acknowledgments I would like to thank my advisor Avrim Blum for so many stimulating discussions (research problemsandotherfunmathproblems),fortheinspirationIexperiencedduringourdiscussions, for his amazingly accurate-with-high-probability sense of the directions that are worth trying, and for the many valuable bits of feedback and advice he has provided me. I also thank my other advisor Jaime Carbonell for always being supportive and encouraging me to push on with one problem after another. I am grateful to Manuel Blum for so many ingenious discussions all through these years when I am at CMU, which have broadened my mind, and given me a great taste of research problems and a faith in the ability of Mathematics to uncover interesting and mysterioustruths,suchasthenatureofconsciousness. Iappreciatetheexhilaratingexperienceof workingwithYishayMansouronanalgorithmiceconomicsproblem;throughtheseinteractions, Ihavelearnedmanyinsightsaboutaxiomaticapproachestoalgorithmic economics. One of my great experiences has been interacting with many wonderful mathematicians. I thankRyanO’DonnellforinputonmyresearchonlearningDNF,andinsightsontheanalysisof booleanfunctions. IappreciatediscussionswithStevenRudichoninteractiveproofsystems,and for his counseling on Fourier techniques; he has also helped sharpen my skills of giving good talksandlectures. IthankVenkatesanGuruswamifordiscussionsoninformationtheoryandcod- ing theory related to my work in Bayesian active learning; I also highly enjoyed his complexity theory class. I want to thank Tuomas Sandholm for sharing his knowledge of Bayesian auction design. I thank Anupam Gupta for discussions on approximation algorithms. I would also like to thank all the other faculty that I’ve interacted with in my time at CMU. Thanks especially to my co-author Silvio Micali for extending my philosophical and implementational insights on auctiondesign. IthankShafiGoldwasserforencouragementonmyworkinpropertytestingand computational learning theory. I thank Leslie Valiant for input on my project on learning DNF withrepresentation-specificqueries. There are also several mathematicians who, though our interactions have been only brief, havemadealastingimpactonmymathematicalperspective. Iamgratefulforthewonderfuland stimulating discussion I had with Alan Frieze on combinatorics. I appreciate the one sentence of advice from John Nash when I happened to be at Princeton for a summer workshop. I am gratefultoScottAaronsonandAviWigdersonforafewemailconversationsoninteractiveproof systems with restricted provers, which is a project I am actively pursuing. I also thank all the theorists I met in conferences, and the many friends and peers that made my time as a graduate student quite enjoyable, including Eric Blais and Paul Valiant. Finally, I want to cite Fan Chung Graham’s advice for grad students “Coauthorship is a closer relationship than friendship.” Yes, indeed,theco-authorshipwithallmycollaborators istobecherishedyearafteryear. iv Contents 1 Summary 1 1.1 BayesianActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Arbitrary Binary-ValuedQueries . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Self-Verifying ActiveLearning . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 ActiveTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 TheoryofTransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 ActiveLearningwithDrifting Distributions andTargets . . . . . . . . . . . . . . 6 1.5 EfficientlyLearningDNFwithRepresentation-SpecificQueries . . . . . . . . . 8 1.6 OnlineAllocationwithEconomiesofScale . . . . . . . . . . . . . . . . . . . . 9 2 ActiveTesting 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 TheActivePropertyTestingModel . . . . . . . . . . . . . . . . . . . . 14 2.1.2 OurResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 TestingUnionsofIntervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 TestingLinearThresholdFunctions . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 TestingDisjointUnionsofTestableProperties . . . . . . . . . . . . . . . . . . . 25 2.5 GeneralTestingDimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.1 Application: Dictatorfunctions . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 Application: LTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 ProofofaPropertyTestingLemma . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.7 ProofsforTestingUnionsofIntervals . . . . . . . . . . . . . . . . . . . . . . . 32 2.8 ProofsforTestingLTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.9 ProofsforTestingDisjointUnions . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.10 ProofsforTestingDimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10.1 PassiveTestingDimension(proofofTheorem2.15) . . . . . . . . . . . 39 2.10.2 CoarseActiveTestingDimension(proof ofTheorem2.17) . . . . . . . . 41 2.10.3 ActiveTestingDimension(proof ofTheorem2.19) . . . . . . . . . . . . 42 2.10.4 LowerBoundsforTestingLTFs(proof ofTheorem2.20) . . . . . . . . . 42 2.11 TestingSemi-SupervisedLearningAssumptions . . . . . . . . . . . . . . . . . . 49 3 TestingPiecewiseReal-ValuedFunctions 54 3.1 PiecewiseConstant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 v 4 LearnabilityofDNFwith Representation-SpecificQueries 58 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.1 OurResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 LearningDNFwithGeneralQueries: HardnessResults . . . . . . . . . . . . . . 60 4.3 LearningDNFwithGeneralQueries: Positive . . . . . . . . . . . . . . . . . . . 63 4.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 PositiveResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 LearningDNFundertheUniform Distribution . . . . . . . . . . . . . . . . . . . 68 4.5 MorePowerfulQueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 LearningDNFwithGeneralQueries: OpenQuestions . . . . . . . . . . . . . . . 75 4.7 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.7.1 LearningUnionsofHalfspaces . . . . . . . . . . . . . . . . . . . . . . . 76 4.7.2 LearningVoronoiwithGeneralQueries . . . . . . . . . . . . . . . . . . 76 5 BayesianActiveLearningwithArbitraryBinaryValuedQueries 78 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 DefinitionofPackingEntropy . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 MainResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4 ProofofTheorem5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5 ApplicationtoBayesianActiveLearning . . . . . . . . . . . . . . . . . . . . . . 88 5.6 OpenProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6 TheSampleComplexityofSelf-VerifyingBayesianActiveLearning 91 6.1 Introduction andBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 DefinitionsandPreliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3 Prior-Independent LearningAlgorithms . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Prior-Dependent Learning: AnExample . . . . . . . . . . . . . . . . . . . . . . 99 6.5 AGeneralResultforSelf-Verifying BayesianActiveLearning . . . . . . . . . . 101 6.6 Dependenceon intheLearningAlgorithm . . . . . . . . . . . . . . . . . . . 105 D 6.7 InherentDependenceonπ intheSampleComplexity . . . . . . . . . . . . . . . 106 7 PriorEstimation forTransferLearning 108 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.1.1 Outline ofthepaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.2 DefinitionsandRelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2.1 RelationtoExistingTheoreticalWorkonTransferLearning . . . . . . . 113 7.3 EstimatingthePrior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.3.1 Identifiability from dPoints . . . . . . . . . . . . . . . . . . . . . . . . 127 7.4 TransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4.1 ProofofTheorem7.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 vi 8 PriorEstimation 135 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.2 TheSetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3 AnUpperBound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.4 AMinimaxLowerBound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.5 FutureDirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 9 EstimationofPriorswithApplicationstoPreferenceElicitation 149 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3 MaximizingCustomerSatisfactioninCombinatorial Auctions . . . . . . . . . . 161 10 ActiveLearningwithaDriftingDistribution 166 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 10.2 DefinitionandNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10.4 ActiveLearningintheRealizableCase . . . . . . . . . . . . . . . . . . . . . . . 171 10.4.1 LearningwithaFixedDistribution . . . . . . . . . . . . . . . . . . . . . 173 10.4.2 LearningwithaDrifting Distribution . . . . . . . . . . . . . . . . . . . 173 10.5 LearningwithNoise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 10.5.1 NoiseConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.5.2 AgnosticCAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.5.3 LearningwithaFixedDistribution . . . . . . . . . . . . . . . . . . . . . 179 10.5.4 LearningwithaDrifting Distribution . . . . . . . . . . . . . . . . . . . 179 10.6 QueryingbeforePredicting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 10.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10.8 ProofofTheorem10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10.9 ProofofTheorem10.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 10.10ProofofTheorem10.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 11 ActiveLearningwithaDriftingTargetConcept 189 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 11.2 DefinitionsandNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 11.3 GeneralAnalysisunderConstantDriftRate: InefficientPassiveLearning . . . . 191 11.4 GeneralAnalysisunderConstantDriftRate: Sometimes-EfficientPassiveLearning193 11.4.1 LowerBounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 11.4.2 RandomDrifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.5 LinearSeparatorsundertheUniform Distribution . . . . . . . . . . . . . . . . . 200 11.6 GeneralAnalysisofSublinearMistakeBounds: PassiveLearning . . . . . . . . 211 11.7 GeneralAnalysisunderVaryingDriftRate: InefficientPassiveLearning . . . . . 214 vii 12 SurrogateLossesinPassiveandActiveLearning 218 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 12.1.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 12.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 12.2.1 Surrogate LossFunctionsforClassification . . . . . . . . . . . . . . . . 224 12.2.2 AFewExamplesofLossFunctions . . . . . . . . . . . . . . . . . . . . 228 12.2.3 Empiricalℓ-RiskMinimization . . . . . . . . . . . . . . . . . . . . . . . 229 12.2.4 LocalizedSampleComplexities . . . . . . . . . . . . . . . . . . . . . . 230 12.3 MethodsBasedonOptimizing theSurrogateRisk . . . . . . . . . . . . . . . . . 235 12.3.1 PassiveLearning: EmpiricalRiskMinimization . . . . . . . . . . . . . . 235 12.3.2 NegativeResultsforActiveLearning . . . . . . . . . . . . . . . . . . . 235 12.4 AlternativeUseoftheSurrogateLoss . . . . . . . . . . . . . . . . . . . . . . . 237 12.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 12.5.1 DiameterConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 12.5.2 TheDisagreementCoefficient . . . . . . . . . . . . . . . . . . . . . . . 245 12.5.3 Specificationofφ˚ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 ℓ 12.5.4 VCSubgraphClasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 12.5.5 EntropyConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 12.5.6 RemarksonVCMajorandVCHullClasses. . . . . . . . . . . . . . . . 261 12.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 12.7 ResultsforEfficientlyComputableUpdates . . . . . . . . . . . . . . . . . . . . 273 12.7.1 ProofofTheorem12.16under(12.34) . . . . . . . . . . . . . . . . . . . 274 13 OnlineAllocation andPricingwithEconomiesofScale 280 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 13.1.1 OurResultsandTechniques . . . . . . . . . . . . . . . . . . . . . . . . 283 13.1.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 13.2 Model,Definitions,andNotation . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.1 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.2 Production cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.3 Allocation problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 13.3 StructuralResultsandAllocationPolicies . . . . . . . . . . . . . . . . . . . . . 287 13.3.1 Permutation andpricingpolicies . . . . . . . . . . . . . . . . . . . . . . 288 13.3.2 Structuralresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 13.4 Uniform UnitDemandandtheAllocate-Allproblem . . . . . . . . . . . . . . . 291 13.4.1 GeneralizationResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 13.4.2 GeneralizedPerformanceGuarantees . . . . . . . . . . . . . . . . . . . 297 13.4.3 Generalizationforβ-nicecosts . . . . . . . . . . . . . . . . . . . . . . . 298 13.5 GeneralUnitDemandUtilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 13.5.1 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 13.6 Propertiesofβ-nicecost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Bibliography 310 viii
Description: