PracticalSmoothing ThisisapracticalguidetoP-splines,asimple,flexible,andpowerfultoolfor smoothing.P-splinescombineregressiononB-splineswithsimple,discrete,roughness penalties.Theywereintroducedbytheauthorsin1996andhavebeenusedinmany diverseapplications.Theregressionbasismakesitstraightforwardtohandle non-normaldata,likeingeneralizedlinearmodels.Theauthorsdemonstrateoptimal smoothing,usingmixedmodeltechnologyandBayesianestimation,inadditionto classicaltoolslikecross-validationandAIC,coveringtheoryandapplicationswith codeinR.Goingfarbeyondsimplesmoothing,theyalsoshowhowtouseP-splines forregressiononsignals,varying-coefficientmodels,quantileandexpectile smoothing,andcompositelinksforgroupeddata.Penaltiesarethecrucialelementsof P-splines;withpropermodificationstheycanhandleperiodicandcirculardataaswell asshapeconstraints.CombiningpenaltieswithtensorproductsofB-splinesextends theseattractivepropertiestomultipledimensions.Theappendicesofferasystematic comparisontoothersmoothers. paul h. c. eilers isProfessorEmeritusofGeneticalStatisticsattheErasmus UniversityMedicalCenter,Rotterdam,TheNetherlands.HereceivedhisPhDin biostatistics.Hisresearchinterestsincludehigh-throughputgenomicdataanalysis, chemometrics,smoothing,longitudinaldataanalysis,survivalanalysis,andstatistical computing.Hehaspublishedextensivelyonthesesubjects. brian d. marx isProfessorintheDepartmentofExperimentalStatisticsat LouisianaStateUniversity.HereceivedhisPhDinstatistics.Hismainresearch interestsincludesmoothing,ill-conditionedregressionproblems,andhigh-dimensional chemometricapplications,andhehasnumerouspublicationsonthesetopics.Heis currentlyservingascoordinatingeditorforthejournalStatisticalModelling.Heis coauthoroftwobooksandisaFellowoftheAmericanStatisticalAssociation. Practical Smoothing The Joys of P-splines Paul H. C. Eilers ErasmusUniversityMedicalCenter Brian D. Marx LouisianaStateUniversity UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom OneLibertyPlaza,20thFloor,NewYork,NY10006,USA 477WilliamstownRoad,PortMelbourne,VIC3207,Australia 314–321,3rdFloor,Plot3,SplendorForum,JasolaDistrictCentre, NewDelhi–110025,India 79AnsonRoad,#06–04/06,Singapore079906 CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learning,andresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781108482950 DOI:10.1017/9781108610247 ©PaulH.C.EilersandBrianD.Marx2021 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2021 PrintedintheUnitedKingdombyTJBooksLtd.,PadstowCornwall AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Eilers,PaulH.C.,1948–author.|Marx,BrianD.,1960–author. Title:Practicalsmoothing:thejoysofP-splines/PaulH.C.Eilers, ErasmusUniversityMedicalCenter,BrianD.Marx,LouisianaStateUniversity. Description:Cambridge,UK;NewYork,NY:CambridgeUniversityPress, 2021.|Includesbibliographicalreferencesandindex. Identifiers:LCCN2020016638(print)|LCCN2020016639(ebook)| ISBN9781108482950(hardback)|ISBN9781108610247(epub) Subjects:LCSH:Smoothing(Statistics)|Splinetheory. Classification:LCCQA278.E3972021(print)|LCCQA278(ebook)| DDC511/.4223–dc23 LCrecordavailableathttps://lccn.loc.gov/2020016638 LCebookrecordavailableathttps://lccn.loc.gov/2020016639 ISBN978-1-108-48295-0Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracy ofURLsforexternalorthird-partyinternetwebsitesreferredtointhispublication anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. Tomyfamily(PE) ForArnold;toLeopold(BDM) Contents Preface pagexi 1 Introduction 1 2 Bases,Penalties,andLikelihoods 6 2.1 LinearandPolynomialRegression 6 2.2 B-splines 9 2.3 PenalizedLeastSquares 14 2.4 InterpolationandExtrapolation 18 2.5 Derivatives 20 2.6 TheEffectiveDimension 21 2.7 StandardErrors 23 2.8 HeavySmoothingandPolynomialLimits 24 2.9 P-splinesasaParametricModel 24 2.10 Whittaker:P-splineswithoutB-splines 26 2.11 EquivalentKernels 26 2.12 SmoothingofaNon-normalResponse 28 2.12.1 PoissonSmoothing 28 2.12.2 BinomialSmoothing 31 2.12.3 GLMEffectiveDimensionandStandardErrors 32 2.13 NotesandDetails 34 3 OptimalSmoothinginAction 36 3.1 Cross-Validation 37 3.2 Akaike’sInformationCriterion 38 3.3 DensityEstimation 40 3.4 MixedModels 41 3.5 BayesianP-splines 45 3.6 DangersofAutomaticSmoothing 51 vii viii Contents 3.7 L-andV-curves 54 3.8 TransformationoftheIndependentVariable 56 3.9 NotesandDetails 58 4 MultidimensionalSmoothing 59 4.1 GeneralizedAdditiveModels 60 4.2 VaryingCoefficientModels 63 4.3 TensorProductModels 67 4.4 TensorProductBases 69 4.5 Two-DimensionalPenalties 71 4.6 InterpolationandExtrapolation 73 4.7 SmoothingonLargeGrids 74 4.8 GeneralizedTwo-DimensionalSmoothing 75 4.9 OptimalTwo-DimensionalSmoothing 77 4.10 IssueswithIsotropicSmoothing 79 4.11 HigherDimensions 79 4.12 NestedBasesandPS-ANOVA 79 4.13 NotesandDetails 83 5 SmoothingofScaleandShape 84 5.1 QuantileSmoothing 85 5.2 ExpectileSmoothing 91 5.3 ModelsforShapeandScaleParameters 97 5.4 BaselineEstimation 101 5.5 NotesandDetails 102 6 ComplexCountsandCompositeLinks 103 6.1 HistogramswithWideBins 104 6.2 HistogramsandScaleTransformation 107 6.3 IndividualCensoring 109 6.4 LatentMixtures 110 6.5 NotesandDetails 112 7 SignalRegression 114 7.1 AChemicalCalibrationProblem 115 7.2 ExtensionstotheGeneralizedLinearModel 120 7.3 MultidimensionalSignalRegression 122 7.4 FurtherExtensions 126 7.5 NotesandDetails 129 8 SpecialSubjects 131 8.1 TheProperB-splineBasis 132 8.2 HarmonicSmoothing 132 Contents ix 8.3 CircularSmoothing 135 8.4 SignalSeparationwithPenalties 138 8.5 DoublePenalties 141 8.6 PiecewiseConstantSmoothing 143 8.7 ShapeConstraints 146 8.8 VariableandAdaptivePenalties 152 8.9 SurvivalAnalysisandMortalityModeling 155 8.10 NotesandDetails 158 AppendixA P-splinesfortheImpatient 159 AppendixB P-splinesandCompetitors 161 AppendixC ComputationalDetails 168 AppendixD ArrayAlgorithms 174 AppendixE MixedModelEquations 176 AppendixF StandardErrorsinDetail 182 AppendixG TheWebsite 184 References 188 Index 196