ebook img

Algorithmic Learning in a Random World PDF

490 Pages·2022·12.254 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Algorithmic Learning in a Random World

Vladimir Vovk Alexander Gammerman Glenn Shafer Algorithmic Learning in a Random World Second Edition Algorithmic Learning in a Random World Vladimir Vovk • Alexander Gammerman (cid:129) Glenn Shafer Algorithmic Learning in a Random World Second Edition VladimirVovk AlexanderGammerman RoyalHolloway RoyalHolloway UniversityofLondon UniversityofLondon London,UK London,UK GlennShafer RutgersUniversity Newark,NJ,USA ISBN978-3-031-06648-1 ISBN978-3-031-06649-8 (eBook) https://doi.org/10.1007/978-3-031-06649-8 1stedition:©SpringerVerlagNewYork,Inc.2005 2ndedition:©SpringerNatureSwitzerlandAG2022 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Contents 1 Introduction................................................................. 1 1.1 MachineLearning.................................................... 1 1.1.1 LearningUnderRandomness .............................. 2 1.1.2 LearningUnderUnconstrainedRandomness ............. 3 1.2 AShortcomingofStatisticalLearningTheory ..................... 3 1.2.1 TheHold-OutEstimateofConfidence .................... 4 1.2.2 TheContributionofThisBook ............................ 4 1.3 TheOnlineFramework .............................................. 5 1.3.1 OnlineLearning ............................................ 5 1.3.2 Online/OfflineCompromises .............................. 6 1.3.3 One-OffandOfflineLearning ............................. 6 1.3.4 Induction,Transduction,andtheOnlineFramework..... 7 1.4 ConformalPrediction ................................................ 9 1.4.1 NestedPredictionSets...................................... 9 1.4.2 Validity...................................................... 10 1.4.3 Efficiency ................................................... 11 1.4.4 Conditionality............................................... 12 1.4.5 FlexibilityofConformalPredictors........................ 12 1.5 ProbabilisticPredictionUnderUnconstrainedRandomness....... 13 1.5.1 UniversallyConsistentProbabilisticPredictor............ 13 1.5.2 ProbabilisticPredictionUsingaFiniteDataset ........... 14 1.5.3 VennPrediction............................................. 14 1.5.4 ConformalPredictiveDistributions........................ 14 1.6 BeyondRandomness................................................. 15 1.6.1 TestingRandomness........................................ 15 1.6.2 OnlineCompressionModels............................... 15 1.7 Context................................................................ 16 v vi Contents PartI SetPrediction 2 ConformalPrediction:GeneralCaseandRegression.................. 19 2.1 ConfidencePredictors................................................ 19 2.1.1 Assumptions ................................................ 19 2.1.2 SimplePredictorsandConfidencePredictors............. 20 2.1.3 Validity...................................................... 22 2.1.4 RandomizedConfidencePredictors ....................... 24 2.1.5 ConfidencePredictorsOveraFiniteHorizon............. 25 2.1.6 One-OffandOfflineConfidencePredictors............... 26 2.2 ConformalPredictors ................................................ 27 2.2.1 Bags......................................................... 27 2.2.2 NonconformityandConformity ........................... 28 2.2.3 p-Values..................................................... 29 2.2.4 DefinitionofConformalPredictors........................ 30 2.2.5 Validity...................................................... 31 2.2.6 SmoothedConformalPredictors........................... 31 2.2.7 Finite-HorizonConformalPrediction ..................... 32 2.2.8 One-OffandOfflineConformalPredictors................ 33 2.2.9 GeneralSchemesforDefiningNonconformity ........... 34 2.2.10 DeletedConformityMeasures............................. 36 2.3 ConformalizedRidgeRegression ................................... 37 2.3.1 LeastSquaresandRidgeRegression ...................... 37 2.3.2 BasicCRR .................................................. 38 2.3.3 TwoModifications.......................................... 42 2.3.4 DualFormRidgeRegression .............................. 44 2.4 ConformalizedNearestNeighboursRegression.................... 46 2.5 EfficiencyofConformalizedRidgeRegression..................... 48 2.5.1 HardandSoftModels...................................... 49 2.5.2 BayesianRidgeRegression ................................ 49 2.5.3 EfficiencyofCRR.......................................... 50 2.6 AreThereOtherWaystoAchieveValidity?........................ 53 2.7 ConformalTransducers .............................................. 54 2.7.1 DefinitionsandPropertiesofValidity ..................... 54 2.7.2 Normalized Confidence Predictors and ConfidenceTransducers.................................... 56 2.8 Proofs................................................................. 58 2.8.1 ProofofTheorem2.2....................................... 58 2.8.2 ProofofTheorem2.7....................................... 58 2.8.3 ProofofTheorem2.10 ..................................... 63 2.9 Context................................................................ 63 2.9.1 ExchangeabilityvsRandomness........................... 63 2.9.2 ConformalPrediction....................................... 64 2.9.3 TwoEquivalentDefinitionsofNonconformityMeasures 66 Contents vii 2.9.4 TheTwoMeaningsofConformityinConformal Prediction ................................................... 66 2.9.5 ExamplesofNonconformityMeasures.................... 67 2.9.6 KernelMethods............................................. 67 2.9.7 Burnaev–WassermanProgramme.......................... 68 2.9.8 CompletenessResults ...................................... 68 3 ConformalPrediction:ClassificationandGeneralCase............... 71 3.1 CriteriaofEfficiencyforConformalPrediction .................... 71 3.1.1 BasicCriteria ............................................... 72 3.1.2 OtherPriorCriteria......................................... 72 3.1.3 ObservedCriteria........................................... 74 3.1.4 IdealisedSetting............................................ 75 3.1.5 ConditionallyProperCriteriaofEfficiency ............... 77 3.1.6 CriteriaofEfficiencythatArenotConditionallyProper . 78 3.1.7 Discussion .................................................. 82 3.2 MoreWaysofComputingNonconformityScores.................. 84 3.2.1 NonconformityScoresfromNearestNeighbours......... 84 3.2.2 NonconformityScoresfromSupportVectorMachines... 86 3.2.3 ReducingClassificationProblemstotheBinaryCase.... 86 3.3 WeakTeachers........................................................ 87 3.3.1 ImperfectlyTaughtPredictors ............................. 88 3.3.2 WeakValidity............................................... 89 3.3.3 StrongValidity.............................................. 90 3.3.4 IteratedLogarithmValidity ................................ 91 3.3.5 Efficiency ................................................... 91 3.4 Proofs................................................................. 92 3.4.1 ProofsforSect.3.1 ......................................... 92 3.4.2 ProofsforSect.3.3 ......................................... 99 3.5 Context................................................................ 105 3.5.1 CriteriaofEfficiency ....................................... 105 3.5.2 ExamplesofNonconformityMeasures.................... 105 3.5.3 UniversalPredictors........................................ 106 3.5.4 WeakTeachers.............................................. 106 4 ModificationsofConformalPredictors .................................. 107 4.1 TheTopicsofThisChapter.......................................... 107 4.2 InductiveConformalPredictors ..................................... 108 4.2.1 InductiveConformalPredictorsintheOnlineMode ..... 108 4.2.2 InductiveConformalPredictorsintheOffline andSemi-OnlineModes.................................... 109 4.2.3 TheGeneralSchemeforDefiningNonconformity ....... 112 4.2.4 NormalizationandHyper-ParameterSelection ........... 113 4.3 FurtherWaysofComputingNonconformityScores ............... 114 4.3.1 NonconformityMeasuresConsideredEarlier............. 115 4.3.2 De-Bayesing ................................................ 116 viii Contents 4.3.3 NeuralNetworksandOtherMulticlassScoring Classifiers................................................... 117 4.3.4 DecisionTreesandRandomForests....................... 118 4.3.5 BinaryScoringClassifiers.................................. 118 4.3.6 LogisticRegression......................................... 119 4.3.7 RegressionandBootstrap .................................. 120 4.3.8 TrainingInductiveConformalPredictors.................. 120 4.4 Cross-ConformalPrediction......................................... 122 4.4.1 DefinitionofCross-ConformalPredictors................. 122 4.4.2 ComputationalEfficiency .................................. 123 4.4.3 ValidityandLackThereofforCross-Conformal Predictors ................................................... 123 4.5 TransductiveConformalPredictors ................................. 125 4.5.1 Definition ................................................... 125 4.5.2 Validity...................................................... 126 4.6 ConditionalConformalPredictors................................... 126 4.6.1 One-OffConditionalConformalPredictors............... 127 4.6.2 MondrianConformalPredictorsandTransducers ........ 128 4.6.3 UsingMondrianConformalTransducersforPrediction.. 130 4.6.4 GeneralityofMondrianTaxonomies ...................... 131 4.6.5 ConformalPrediction....................................... 132 4.6.6 InductiveConformalPrediction............................ 133 4.6.7 Label-ConditionalConformalPrediction.................. 134 4.6.8 Object-ConditionalConformalPrediction................. 134 4.7 Training-ConditionalValidity ....................................... 135 4.7.1 ConditionalValidity........................................ 135 4.7.2 Training-Conditional Validity of Inductive ConformalPredictors....................................... 137 4.8 Context................................................................ 140 4.8.1 ComputationallyEfficientHedgedPrediction ............ 140 4.8.2 Specific Learning Algorithms and NonconformityMeasures .................................. 140 4.8.3 TrainingConformalPredictors............................. 140 4.8.4 Cross-ConformalPredictorsandAlternativeApproaches 140 4.8.5 TransductiveConformalPredictors........................ 141 4.8.6 ConditionalConformalPredictors......................... 141 PartII ProbabilisticPrediction 5 ImpossibilityResults ....................................................... 145 5.1 Introduction........................................................... 145 5.2 DiverseDatasets...................................................... 146 5.3 ImpossibilityofEstimationofProbabilities ........................ 147 5.3.1 BinaryCase................................................. 147 5.3.2 MulticlassCase............................................. 148 Contents ix 5.4 ProofofTheorem5.2 ................................................ 149 5.4.1 ProbabilityEstimatorsandStatisticalTests............... 149 5.4.2 CompleteStatisticalTests.................................. 150 5.4.3 Restatement of the Theorem in Terms of StatisticalTests ............................................. 150 5.4.4 TheProofoftheTheorem.................................. 152 5.5 Context................................................................ 152 5.5.1 MoreAdvancedResults.................................... 152 5.5.2 DensityEstimation,RegressionEstimation,and RegressionwithDeterministicObjects.................... 153 5.5.3 UniversalProbabilisticPredictors ......................... 154 5.5.4 AlgorithmicRandomnessPerspective..................... 154 6 ProbabilisticClassification:VennPredictors............................ 157 6.1 Introduction........................................................... 157 6.2 VennPredictors....................................................... 158 6.2.1 ValidityofOne-OffVennPredictors....................... 159 6.2.2 AreThereOtherWaystoAchievePerfectCalibration?.. 160 6.2.3 VennPredictionwithBinaryLabelsandNoObjects..... 162 6.3 AUniversalVennPredictor.......................................... 163 6.4 Venn–AbersPredictors............................................... 163 6.4.1 FullVenn–AbersPredictors................................ 163 6.4.2 InductiveVenn–AbersPredictors.......................... 165 6.4.3 ProbabilisticPredictorsDerivedfromVennPredictors... 166 6.4.4 CrossVenn–AbersPredictors.............................. 168 6.4.5 MergingMultiprobabilityPredictionsintoa ProbabilisticPrediction..................................... 169 6.5 Proofs................................................................. 170 6.5.1 ProofofTheorem6.4....................................... 170 6.5.2 PAVAandtheProofofLemma6.6 ........................ 172 6.5.3 ProofofProposition6.7.................................... 173 6.6 Context................................................................ 178 6.6.1 RiskandUncertainty....................................... 178 6.6.2 John Venn, Frequentist Probability, and the ProblemoftheReferenceClass............................ 178 6.6.3 OnlineVennPredictorsAreCalibrated.................... 179 6.6.4 IsotonicRegression......................................... 179 7 ProbabilisticRegression:ConformalPredictiveSystems.............. 181 7.1 Introduction........................................................... 181 7.2 ConformalPredictiveSystems....................................... 182 7.2.1 BasicDefinitions ........................................... 182 7.2.2 PropertiesofValidity....................................... 185 7.2.3 SimplestExample:MonotonicConformityMeasures.... 186 7.2.4 CriterionofBeingaCPS................................... 187 7.3 LeastSquaresPredictionMachine .................................. 188 x Contents 7.3.1 ThreeKindsofLSPM...................................... 188 7.3.2 TheStudentizedLSPMinanExplicitForm .............. 190 7.3.3 TheOfflineVersionoftheStudentizedLSPM............ 191 7.3.4 TheOrdinaryLSPM........................................ 192 7.3.5 AsymptoticEfficiencyoftheLSPM....................... 193 7.3.6 Illustrations ................................................. 196 7.4 KernelRidgeRegressionPredictionMachine...................... 198 7.4.1 ExplicitFormsoftheKRRPM............................. 198 7.4.2 LimitationoftheKRRPM ................................. 201 7.5 NearestNeighboursPredictionMachine............................ 202 7.6 UniversalConformalPredictiveSystems ........................... 204 7.6.1 Definitions .................................................. 205 7.6.2 UniversalConformalPredictiveSystems.................. 205 7.6.3 UniversalDeterministicPredictiveSystems .............. 206 7.7 ApplicationstoDecisionMaking.................................... 206 7.7.1 AStandardProblemofDecisionMaking ................. 207 7.7.2 Examples.................................................... 208 7.7.3 AsymptoticallyEfficientDecisionMaking................ 209 7.7.4 DangersofOverfitting...................................... 209 7.8 ComputationallyEfficientVersions ................................. 210 7.8.1 InductiveConformalPredictiveSystems.................. 210 7.8.2 Cross-ConformalPredictiveDistributions................. 213 7.8.3 PracticalAspects............................................ 214 7.8.4 BeyondRandomness ....................................... 215 7.9 ProofsandCalculations.............................................. 216 7.9.1 ProofsforSect.7.2 ......................................... 216 7.9.2 ProofsforSect.7.3 ......................................... 217 7.9.3 ProofofTheorem7.16 ..................................... 220 7.9.4 ProofsforSect.7.8 ......................................... 221 7.10 Context................................................................ 222 7.10.1 ConformalPredictiveDistributions........................ 222 7.10.2 ConformalPredictiveDistributionswithKernels......... 222 7.10.3 VennPredictionforProbabilisticRegression ............. 223 7.10.4 UniversalConsistencyandUniversality................... 223 7.10.5 VariousNotionsofConvergenceinLaw .................. 223 7.10.6 DecisionTheory............................................ 223 7.10.7 PostprocessingofPredictiveDistributions................ 224 PartIII TestingRandomness 8 TestingExchangeability.................................................... 227 8.1 TestingExchangeability.............................................. 227 8.1.1 ExchangeabilitySupermartingales......................... 228 8.1.2 ConformalTestMartingales ............................... 229 8.2 TestingforConceptandLabelShiftinAnticausalClassification . 233

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.