ebook img

Machine Learning: A Bayesian and Optimization Perspective PDF

1072 Pages·2015·33.65 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning: A Bayesian and Optimization Perspective

Machine Learning A Bayesian and Optimization Perspective This page intentionally left blank Machine Learning A Bayesian and Optimization Perspective Sergios Theodoridis AMSTERDAM (cid:129) BOSTON (cid:129) HEIDELBERG (cid:129) LONDON NEW YORK (cid:129) OXFORD (cid:129) PARIS (cid:129) SAN DIEGO SAN FRANCISCO (cid:129) SINGAPORE (cid:129) SYDNEY (cid:129) TOKYO Academic Press is an imprint of Elsevier AcademicPressisanimprintofElsevier 125LondonWall,London,EC2Y5AS,UK 525BStreet,Suite1800,SanDiego,CA92101-4495,USA 225WymanStreet,Waltham,MA02451,USA TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UK Copyright©2015ElsevierLtd.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearanceCenter andtheCopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher (otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusing anyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethods theyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhavea professionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise, orfromanyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein. BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress ISBN:978-0-12-801522-3 ForinformationonallAcademicPresspublications visitourwebsiteathttp://store.elsevier.com/ Publisher:JonathanSimpson AcquisitionEditor:TimPitts EditorialProjectManager:CharlieKent ProductionProjectManager:SusanLi Designer:GregHarris TypesetbySPiGlobal,India PrintedandboundinTheUnitedStates 15 16 17 18 19 10 9 8 7 6 5 4 3 2 1 Contents Preface ...................................................................................................xvii Acknowledgments....................................................................................... xix Notation.................................................................................................. xxi CHAPTER 1 Introduction ...................................................................... 1 1.1 WhatMachineLearningisAbout.................................................. 1 1.1.1 Classification............................................................... 2 1.1.2 Regression.................................................................. 3 1.2 StructureandaRoadMapoftheBook............................................ 5 References................................................................................ 8 CHAPTER 2 Probability and Stochastic Processes.................................... 9 2.1 Introduction......................................................................... 10 2.2 ProbabilityandRandomVariables................................................. 10 2.2.1 Probability.................................................................. 11 2.2.2 DiscreteRandomVariables................................................ 12 2.2.3 ContinuousRandomVariables............................................ 14 2.2.4 MeanandVariance ........................................................ 15 2.2.5 TransformationofRandomVariables..................................... 17 2.3 ExamplesofDistributions.......................................................... 18 2.3.1 DiscreteVariables.......................................................... 18 2.3.2 ContinuousVariables...................................................... 20 2.4 StochasticProcesses................................................................ 29 2.4.1 FirstandSecondOrderStatistics ......................................... 30 2.4.2 StationarityandErgodicity................................................ 30 2.4.3 PowerSpectralDensity.................................................... 33 2.4.4 AutoregressiveModels .................................................... 38 2.5 InformationTheory................................................................. 41 2.5.1 DiscreteRandomVariables................................................ 42 2.5.2 ContinuousRandomVariables............................................ 45 2.6 StochasticConvergence ............................................................ 48 Problems.................................................................................. 49 References................................................................................ 51 CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions 53 3.1 Introduction......................................................................... 53 3.2 ParameterEstimation:TheDeterministicPointofView ......................... 54 v vi Contents 3.3 LinearRegression................................................................... 57 3.4 Classification........................................................................ 60 3.5 BiasedVersusUnbiasedEstimation ............................................... 64 3.5.1 BiasedorUnbiasedEstimation? .......................................... 65 3.6 TheCramér-RaoLowerBound .................................................... 67 3.7 SufficientStatistic................................................................... 70 3.8 Regularization....................................................................... 72 3.9 TheBias-VarianceDilemma ....................................................... 77 3.9.1 Mean-SquareErrorEstimation............................................ 77 3.9.2 Bias-VarianceTradeoff.................................................... 78 3.10 MaximumLikelihoodMethod..................................................... 82 3.10.1 LinearRegression:TheNonwhiteGaussianNoiseCase ................ 84 3.11 BayesianInference.................................................................. 84 3.11.1 TheMaximumaPosterioriProbabilityEstimationMethod............. 88 3.12 CurseofDimensionality............................................................ 89 3.13 Validation ........................................................................... 91 3.14 ExpectedandEmpiricalLossFunctions........................................... 93 3.15 NonparametricModelingandEstimation ......................................... 95 Problems ................................................................................... 97 References.................................................................................. 102 CHAPTER 4 Mean-Square Error Linear Estimation .................................... 105 4.1 Introduction......................................................................... 105 4.2 Mean-SquareErrorLinearEstimation:TheNormalEquations.................. 106 4.2.1 TheCostFunctionSurface................................................ 107 4.3 AGeometricViewpoint:OrthogonalityCondition................................ 109 4.4 ExtensiontoComplex-ValuedVariables .......................................... 111 4.4.1 WidelyLinearComplex-ValuedEstimation.............................. 113 4.4.2 OptimizingwithRespecttoComplex-ValuedVariables: WirtingerCalculus......................................................... 116 4.5 LinearFiltering ..................................................................... 118 4.6 MSELinearFiltering:AFrequencyDomainPointofView...................... 120 4.7 SomeTypicalApplications......................................................... 124 4.7.1 InterferenceCancellation.................................................. 124 4.7.2 SystemIdentification ...................................................... 125 4.7.3 Deconvolution:ChannelEqualization.................................... 126 4.8 AlgorithmicAspects:TheLevinsonandtheLattice-LadderAlgorithms........ 132 4.8.1 TheLattice-LadderScheme............................................... 137 4.9 Mean-SquareErrorEstimationofLinearModels................................. 140 4.9.1 TheGauss-MarkovTheorem.............................................. 143 4.9.2 ConstrainedLinearEstimation:TheBeamformingCase................ 145 Contents vii 4.10 Time-VaryingStatistics:KalmanFiltering ........................................ 148 Problems ................................................................................... 154 References.................................................................................. 158 CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family.................................................................... 161 5.1 Introduction......................................................................... 162 5.2 TheSteepestDescentMethod...................................................... 163 5.3 ApplicationtotheMean-SquareErrorCostFunction ............................ 167 5.3.1 TheComplex-ValuedCase................................................ 175 5.4 StochasticApproximation.......................................................... 177 5.5 TheLeast-Mean-SquaresAdaptiveAlgorithm.................................... 179 5.5.1 ConvergenceandSteady-StatePerformanceoftheLMS inStationaryEnvironments ............................................... 181 5.5.2 CumulativeLossBounds.................................................. 186 5.6 TheAffineProjectionAlgorithm................................................... 188 5.6.1 TheNormalizedLMS ..................................................... 193 5.7 TheComplex-ValuedCase......................................................... 194 5.8 RelativesoftheLMS............................................................... 196 5.9 SimulationExamples............................................................... 199 5.10 AdaptiveDecisionFeedbackEqualization ........................................ 202 5.11 TheLinearlyConstrainedLMS.................................................... 204 5.12 TrackingPerformanceoftheLMSinNonstationary Environments ....................................................................... 206 5.13 DistributedLearning:TheDistributedLMS ...................................... 208 5.13.1 CooperationStrategies..................................................... 209 5.13.2 TheDiffusionLMS........................................................ 211 5.13.3 ConvergenceandSteady-StatePerformance: SomeHighlights........................................................... 218 5.13.4 Consensus-BasedDistributedSchemes................................... 220 5.14 ACaseStudy:TargetLocalization................................................ 222 5.15 SomeConcludingRemarks:ConsensusMatrix................................... 223 Problems ................................................................................... 224 References.................................................................................. 227 CHAPTER 6 The Least-Squares Family.................................................... 233 6.1 Introduction......................................................................... 234 6.2 Least-SquaresLinearRegression:AGeometricPerspective..................... 234 6.3 StatisticalPropertiesoftheLSEstimator.......................................... 236 6.4 OrthogonalizingtheColumnSpaceofX:TheSVDMethod..................... 239 6.5 RidgeRegression ................................................................... 243 6.6 TheRecursiveLeast-SquaresAlgorithm .......................................... 245 viii Contents 6.7 Newton’sIterativeMinimizationMethod ......................................... 248 6.7.1 RLSandNewton’sMethod................................................ 251 6.8 Steady-StatePerformanceoftheRLS............................................. 252 6.9 Complex-ValuedData:TheWidelyLinearRLS.................................. 254 6.10 ComputationalAspectsoftheLSSolution........................................ 255 6.11 TheCoordinateandCyclicCoordinateDescentMethods........................ 258 6.12 SimulationExamples............................................................... 259 6.13 Total-Least-Squares................................................................. 261 Problems ................................................................................... 268 References.................................................................................. 272 CHAPTER 7 Classification: A Tour of the Classics..................................... 275 7.1 Introduction......................................................................... 275 7.2 BayesianClassification............................................................. 276 7.2.1 AverageRisk............................................................... 278 7.3 Decision(Hyper)Surfaces.......................................................... 280 7.3.1 TheGaussianDistributionCase........................................... 282 7.4 TheNaiveBayesClassifier......................................................... 287 7.5 TheNearestNeighborRule ........................................................ 288 7.6 LogisticRegression................................................................. 290 7.7 Fisher’sLinearDiscriminant....................................................... 294 7.8 ClassificationTrees................................................................. 300 7.9 CombiningClassifiers.............................................................. 304 7.10 TheBoostingApproach ............................................................ 307 7.11 BoostingTrees...................................................................... 313 7.12 ACaseStudy:ProteinFoldingPrediction......................................... 314 Problems ................................................................................... 318 References.................................................................................. 323 CHAPTER 8 Parameter Learning: A Convex Analytic Path........................... 327 8.1 Introduction......................................................................... 328 8.2 ConvexSetsandFunctions......................................................... 329 8.2.1 ConvexSets................................................................ 329 8.2.2 ConvexFunctions.......................................................... 330 8.3 ProjectionsontoConvexSets ...................................................... 333 8.3.1 PropertiesofProjections .................................................. 337 8.4 FundamentalTheoremofProjectionsontoConvexSets ......................... 341 8.5 AParallelVersionofPOCS........................................................ 344 8.6 FromConvexSetstoParameterEstimationandMachineLearning ............. 345 8.6.1 Regression.................................................................. 345 8.6.2 Classification............................................................... 347 Contents ix 8.7 InfiniteManyClosedConvexSets:TheOnlineLearningCase.................. 349 8.7.1 ConvergenceofAPSM .................................................... 351 8.8 ConstrainedLearning............................................................... 356 8.9 TheDistributedAPSM ............................................................. 357 8.10 OptimizingNonsmoothConvexCostFunctions.................................. 358 8.10.1 SubgradientsandSubdifferentials ........................................ 359 8.10.2 MinimizingNonsmoothContinuousConvexLossFunctions: TheBatchLearningCase.................................................. 362 8.10.3 OnlineLearningforConvexOptimization ............................... 367 8.11 RegretAnalysis..................................................................... 370 8.12 OnlineLearningandBigDataApplications:ADiscussion...................... 374 8.13 ProximalOperators................................................................. 379 8.13.1 PropertiesoftheProximalOperator...................................... 382 8.13.2 ProximalMinimization.................................................... 383 8.14 ProximalSplittingMethodsforOptimization..................................... 385 Problems ................................................................................... 389 8.15 AppendixtoChapter8.............................................................. 393 References.................................................................................. 398 CHAPTER 9 Sparsity-Aware Learning: Concepts and Theoretical Foundations................................................ 403 9.1 Introduction......................................................................... 403 9.2 SearchingforaNorm............................................................... 404 9.3 TheLeastAbsoluteShrinkageandSelectionOperator(LASSO)................ 407 9.4 SparseSignalRepresentation ...................................................... 411 9.5 InSearchoftheSparsestSolution ................................................. 415 9.6 Uniquenessofthe(cid:2) Minimizer ................................................... 422 0 9.6.1 MutualCoherence ......................................................... 424 9.7 Equivalenceof(cid:2) and(cid:2) Minimizers:SufficiencyConditions................... 426 0 1 9.7.1 ConditionImpliedbytheMutualCoherenceNumber................... 426 9.7.2 TheRestrictedIsometryProperty(RIP).................................. 427 9.8 RobustSparseSignalRecoveryfromNoisyMeasurements...................... 429 9.9 CompressedSensing:TheGloryofRandomness................................. 430 9.9.1 DimensionalityReductionandStableEmbeddings...................... 433 9.9.2 Sub-NyquistSampling:Analog-to-InformationConversion ............ 434 9.10 ACaseStudy:ImageDe-Noising.................................................. 438 Problems ................................................................................... 440 References.................................................................................. 444 CHAPTER 10 Sparsity-Aware Learning: Algorithms and Applications ............. 449 10.1 Introduction......................................................................... 450 10.2 Sparsity-PromotingAlgorithms.................................................... 450

Description:
This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques – together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The b
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.