ebook img

Mathematical Theories of Interaction with Oracles PDF

335 Pages·2013·2.16 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mathematical Theories of Interaction with Oracles

Mathematical Theories of Interaction with Oracles Liu Yang October 2013 CMU-ML-13-111 Mathematical Theories of Interaction with Oracles Liu Yang October2013 CMU-ML-13-111 SchoolofComputer Science MachineLearningDepartment CarnegieMellonUniversity Pittsburgh, PA ThesisCommittee: AvrimBlum,Chair JaimeCarbonell,Chair ManuelBlum SanjoyDasgupta YishayMansour JoelSpencer Submittedinpartialfulfillment oftherequirements forthedegreeofDoctorofPhilosophy. Copyright c 2013LiuYang (cid:13) ThisresearchwassponsoredbytheNationalScienceFoundationundergrantnumbersDBI0640543, IIS0713379, IIS1065251;theDefenseIntelligenceAgencyundergrantnumberFA872105C0003;andagrantfromGoogleInc. The views and conclusions contained in this document are those of the author and should not be interpreted as representingtheofficialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S.governmentor anyotherentity. Keywords: Property Testing, Active Learning, Computational Learning Theory, Learning DNF,StatisticalLearningTheory,TransferLearning,Prior Estimation, BayesianTheory,Surro- gate Losses, Preference Elicitation,Concept Drift, Algorithmic Mechanism Design, Economies ofScale ThisthesisisdedicatedtoallMathematicians. Acknowledgments I would like to thank my advisor Avrim Blum for so many stimulating discussions (research problemsandotherfunmathproblems),fortheinspirationIexperiencedduringourdiscussions, for his amazingly accurate-with-high-probability sense of the directions that are worth trying, and for the many valuable bits of feedback and advice he has provided me. I also thank my other advisor Jaime Carbonell for always being supportive and encouraging me to push on with one problem after another. I am grateful to Manuel Blum for so many ingenious discussions all through these years when I am at CMU, which have broadened my mind, and given me a great taste of research problems and a faith in the ability of Mathematics to uncover interesting and mysterioustruths,suchasthenatureofconsciousness. Iappreciatetheexhilaratingexperienceof workingwithYishayMansouronanalgorithmiceconomicsproblem;throughtheseinteractions, Ihavelearnedmanyinsightsaboutaxiomaticapproachestoalgorithmic economics. One of my great experiences has been interacting with many wonderful mathematicians. I thankRyanO’DonnellforinputonmyresearchonlearningDNF,andinsightsontheanalysisof booleanfunctions. IappreciatediscussionswithStevenRudichoninteractiveproofsystems,and for his counseling on Fourier techniques; he has also helped sharpen my skills of giving good talksandlectures. IthankVenkatesanGuruswamifordiscussionsoninformationtheoryandcod- ing theory related to my work in Bayesian active learning; I also highly enjoyed his complexity theory class. I want to thank Tuomas Sandholm for sharing his knowledge of Bayesian auction design. I thank Anupam Gupta for discussions on approximation algorithms. I would also like to thank all the other faculty that I’ve interacted with in my time at CMU. Thanks especially to my co-author Silvio Micali for extending my philosophical and implementational insights on auctiondesign. IthankShafiGoldwasserforencouragementonmyworkinpropertytestingand computational learning theory. I thank Leslie Valiant for input on my project on learning DNF withrepresentation-specificqueries. There are also several mathematicians who, though our interactions have been only brief, havemadealastingimpactonmymathematicalperspective. Iamgratefulforthewonderfuland stimulating discussion I had with Alan Frieze on combinatorics. I appreciate the one sentence of advice from John Nash when I happened to be at Princeton for a summer workshop. I am gratefultoScottAaronsonandAviWigdersonforafewemailconversationsoninteractiveproof systems with restricted provers, which is a project I am actively pursuing. I also thank all the theorists I met in conferences, and the many friends and peers that made my time as a graduate student quite enjoyable, including Eric Blais and Paul Valiant. Finally, I want to cite Fan Chung Graham’s advice for grad students “Coauthorship is a closer relationship than friendship.” Yes, indeed,theco-authorshipwithallmycollaborators istobecherishedyearafteryear. iv Contents 1 Summary 1 1.1 BayesianActiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Arbitrary Binary-ValuedQueries . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Self-Verifying ActiveLearning . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 ActiveTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 TheoryofTransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 ActiveLearningwithDrifting Distributions andTargets . . . . . . . . . . . . . . 6 1.5 EfficientlyLearningDNFwithRepresentation-SpecificQueries . . . . . . . . . 8 1.6 OnlineAllocationwithEconomiesofScale . . . . . . . . . . . . . . . . . . . . 9 2 ActiveTesting 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 TheActivePropertyTestingModel . . . . . . . . . . . . . . . . . . . . 14 2.1.2 OurResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 TestingUnionsofIntervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 TestingLinearThresholdFunctions . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 TestingDisjointUnionsofTestableProperties . . . . . . . . . . . . . . . . . . . 25 2.5 GeneralTestingDimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.1 Application: Dictatorfunctions . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 Application: LTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 ProofofaPropertyTestingLemma . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.7 ProofsforTestingUnionsofIntervals . . . . . . . . . . . . . . . . . . . . . . . 32 2.8 ProofsforTestingLTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.9 ProofsforTestingDisjointUnions . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.10 ProofsforTestingDimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10.1 PassiveTestingDimension(proofofTheorem2.15) . . . . . . . . . . . 39 2.10.2 CoarseActiveTestingDimension(proof ofTheorem2.17) . . . . . . . . 41 2.10.3 ActiveTestingDimension(proof ofTheorem2.19) . . . . . . . . . . . . 42 2.10.4 LowerBoundsforTestingLTFs(proof ofTheorem2.20) . . . . . . . . . 42 2.11 TestingSemi-SupervisedLearningAssumptions . . . . . . . . . . . . . . . . . . 49 3 TestingPiecewiseReal-ValuedFunctions 54 3.1 PiecewiseConstant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 v 4 LearnabilityofDNFwith Representation-SpecificQueries 58 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.1 OurResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 LearningDNFwithGeneralQueries: HardnessResults . . . . . . . . . . . . . . 60 4.3 LearningDNFwithGeneralQueries: Positive . . . . . . . . . . . . . . . . . . . 63 4.3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 PositiveResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 LearningDNFundertheUniform Distribution . . . . . . . . . . . . . . . . . . . 68 4.5 MorePowerfulQueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 LearningDNFwithGeneralQueries: OpenQuestions . . . . . . . . . . . . . . . 75 4.7 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.7.1 LearningUnionsofHalfspaces . . . . . . . . . . . . . . . . . . . . . . . 76 4.7.2 LearningVoronoiwithGeneralQueries . . . . . . . . . . . . . . . . . . 76 5 BayesianActiveLearningwithArbitraryBinaryValuedQueries 78 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 DefinitionofPackingEntropy . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 MainResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4 ProofofTheorem5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5 ApplicationtoBayesianActiveLearning . . . . . . . . . . . . . . . . . . . . . . 88 5.6 OpenProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6 TheSampleComplexityofSelf-VerifyingBayesianActiveLearning 91 6.1 Introduction andBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 DefinitionsandPreliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3 Prior-Independent LearningAlgorithms . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Prior-Dependent Learning: AnExample . . . . . . . . . . . . . . . . . . . . . . 99 6.5 AGeneralResultforSelf-Verifying BayesianActiveLearning . . . . . . . . . . 101 6.6 Dependenceon intheLearningAlgorithm . . . . . . . . . . . . . . . . . . . 105 D 6.7 InherentDependenceonπ intheSampleComplexity . . . . . . . . . . . . . . . 106 7 PriorEstimation forTransferLearning 108 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.1.1 Outline ofthepaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.2 DefinitionsandRelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2.1 RelationtoExistingTheoreticalWorkonTransferLearning . . . . . . . 113 7.3 EstimatingthePrior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.3.1 Identifiability from dPoints . . . . . . . . . . . . . . . . . . . . . . . . 127 7.4 TransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4.1 ProofofTheorem7.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 vi 8 PriorEstimation 135 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.2 TheSetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3 AnUpperBound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.4 AMinimaxLowerBound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.5 FutureDirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 9 EstimationofPriorswithApplicationstoPreferenceElicitation 149 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.3 MaximizingCustomerSatisfactioninCombinatorial Auctions . . . . . . . . . . 161 10 ActiveLearningwithaDriftingDistribution 166 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 10.2 DefinitionandNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10.4 ActiveLearningintheRealizableCase . . . . . . . . . . . . . . . . . . . . . . . 171 10.4.1 LearningwithaFixedDistribution . . . . . . . . . . . . . . . . . . . . . 173 10.4.2 LearningwithaDrifting Distribution . . . . . . . . . . . . . . . . . . . 173 10.5 LearningwithNoise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 10.5.1 NoiseConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.5.2 AgnosticCAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.5.3 LearningwithaFixedDistribution . . . . . . . . . . . . . . . . . . . . . 179 10.5.4 LearningwithaDrifting Distribution . . . . . . . . . . . . . . . . . . . 179 10.6 QueryingbeforePredicting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 10.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10.8 ProofofTheorem10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10.9 ProofofTheorem10.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 10.10ProofofTheorem10.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 11 ActiveLearningwithaDriftingTargetConcept 189 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 11.2 DefinitionsandNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 11.3 GeneralAnalysisunderConstantDriftRate: InefficientPassiveLearning . . . . 191 11.4 GeneralAnalysisunderConstantDriftRate: Sometimes-EfficientPassiveLearning193 11.4.1 LowerBounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 11.4.2 RandomDrifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.5 LinearSeparatorsundertheUniform Distribution . . . . . . . . . . . . . . . . . 200 11.6 GeneralAnalysisofSublinearMistakeBounds: PassiveLearning . . . . . . . . 211 11.7 GeneralAnalysisunderVaryingDriftRate: InefficientPassiveLearning . . . . . 214 vii 12 SurrogateLossesinPassiveandActiveLearning 218 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 12.1.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 12.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 12.2.1 Surrogate LossFunctionsforClassification . . . . . . . . . . . . . . . . 224 12.2.2 AFewExamplesofLossFunctions . . . . . . . . . . . . . . . . . . . . 228 12.2.3 Empiricalℓ-RiskMinimization . . . . . . . . . . . . . . . . . . . . . . . 229 12.2.4 LocalizedSampleComplexities . . . . . . . . . . . . . . . . . . . . . . 230 12.3 MethodsBasedonOptimizing theSurrogateRisk . . . . . . . . . . . . . . . . . 235 12.3.1 PassiveLearning: EmpiricalRiskMinimization . . . . . . . . . . . . . . 235 12.3.2 NegativeResultsforActiveLearning . . . . . . . . . . . . . . . . . . . 235 12.4 AlternativeUseoftheSurrogateLoss . . . . . . . . . . . . . . . . . . . . . . . 237 12.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 12.5.1 DiameterConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 12.5.2 TheDisagreementCoefficient . . . . . . . . . . . . . . . . . . . . . . . 245 12.5.3 Specificationofφ˚ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 ℓ 12.5.4 VCSubgraphClasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 12.5.5 EntropyConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 12.5.6 RemarksonVCMajorandVCHullClasses. . . . . . . . . . . . . . . . 261 12.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 12.7 ResultsforEfficientlyComputableUpdates . . . . . . . . . . . . . . . . . . . . 273 12.7.1 ProofofTheorem12.16under(12.34) . . . . . . . . . . . . . . . . . . . 274 13 OnlineAllocation andPricingwithEconomiesofScale 280 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 13.1.1 OurResultsandTechniques . . . . . . . . . . . . . . . . . . . . . . . . 283 13.1.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 13.2 Model,Definitions,andNotation . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.1 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.2 Production cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13.2.3 Allocation problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 13.3 StructuralResultsandAllocationPolicies . . . . . . . . . . . . . . . . . . . . . 287 13.3.1 Permutation andpricingpolicies . . . . . . . . . . . . . . . . . . . . . . 288 13.3.2 Structuralresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 13.4 Uniform UnitDemandandtheAllocate-Allproblem . . . . . . . . . . . . . . . 291 13.4.1 GeneralizationResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 13.4.2 GeneralizedPerformanceGuarantees . . . . . . . . . . . . . . . . . . . 297 13.4.3 Generalizationforβ-nicecosts . . . . . . . . . . . . . . . . . . . . . . . 298 13.5 GeneralUnitDemandUtilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 13.5.1 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 13.6 Propertiesofβ-nicecost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Bibliography 310 viii

Description:
for his amazingly accurate-with-high-probability sense of the directions that ing theory related to my work in Bayesian active learning; I also highly
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.