Table Of ContentENVIRONMENTAL
DATA ANALYSIS WITH
®
MATLAB OR PYTHON
This page intentionally left blank
ENVIRONMENTAL
DATA ANALYSIS WITH
®
MATLAB OR PYTHON
Principles, Applications, and
Prospects
Third Edition
WILLIAM MENKE
Professor of Earth and Environmental Sciences,Lamont-Doherty Earth
Observatory of ColumbiaUniversity, Palisades, NY, USA
AcademicPressisanimprintofElsevier
125LondonWall,LondonEC2Y5AS,UnitedKingdom
525BStreet,Suite1650,SanDiego,CA92101,UnitedStates
50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates
TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom
Copyright©2022ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical,
includingphotocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwriting
fromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’spermissionspolicies
andourarrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,
canbefoundatourwebsite:www.elsevier.com/permissions.
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanas
maybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour
understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany
information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodsthey
shouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessional
responsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliability
foranyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfrom
anyuseoroperationofanymethods,products,instructions,orideascontainedinthematerialherein.
MATLAB®isatrademarkofTheMathWorksInc.andisusedwithpermission.TheMathWorksdoesnotwarrant
theaccuracyofthetextorexercisesinthisbook.Thisbook’suseordiscussionofMATLAB®softwareorrelated
productsdoesnotconstituteendorsementorsponsorshipbyTheMathworksofaparticularpedagogicalapproach
orparticularuseoftheMATLAB®software.
ISBN:978-0-323-95576-8
ForinformationonallAcademicPresspublications
visitourwebsiteathttps://www.elsevier.com/books-and-journals
Publisher:CandiceJanco
AcquisitionsEditor:JessicaMack
EditorialProjectManager:RupinderHeron
ProductionProjectManager:KumarAnbazhagan
CoverDesigner:MarkRogers
TypesetbySTRAIVE,India
Contents
Preface xi
Acknowledgment xvii
Adviceonscriptingforbeginners xix
1. Data analysis with MATLAB® or Python 1
PartA:MATLAB® 2
1A.1 Gettingstarted 2
1A.2 Navigatingfolders 4
1A.3 Simplearithmeticandalgebra 5
1A.4 Vectorsandmatrices 6
1A.5 Addition,subtractionandmultiplicationofcolumn-vectorsandmatrices 7
1A.6 Elementaccess 9
1A.7 Representingfunctions 10
1A.8 Toloopornottoloop 12
1A.9 Thematrixinverse 14
1A.10 Loadingdatafromafile 15
1A.11 Plottingdata 16
1A.12 Savingdatatoafile 19
1A.13 Someadviceonwritingscripts 20
PartB:Python 22
1B.1 Installation 22
1B.2 Thefirstcellintheedapy_01JupyterNotebook 22
1B.3 AverysimplePythonscript 25
1B.4 Navigatingfolders 25
1B.5 Simplearithmeticandalgebra 26
1B.6 List-likedata 27
1B.7 Creatingcolumn-vectorsandmatrices 28
1B.8 Addition,subtractionandmultiplicationofcolumn-vectorsandmatrices 30
1B.9 Elementaccess 32
1B.10 Representingfunctions 33
1B.11 Toloopornottoloop 36
1B.12 Thematrixinverse 38
1B.13 Loadingdatafromafile 38
1B.14 Plottingdata 39
1B.15 Savingdatatoafile 42
1B.16 Someadviceonwritingscripts 42
Problems 45
v
vi Contents
2. Systematic explorations ofa new dataset 47
2.1 Casestudy:Blackrockforesttemperaturedata 47
2.2 Moreongraphics 59
2.3 Casestudy:NeuseRiverhydrographusedtoexploreratedata 66
2.4 Casestudy,AtlanticRockdatasetusedtoexplorescatterplots 69
2.5 Moreoncharacterstrings 73
Problems 76
3. Modeling observational noise with random variables 77
3.1 Randomvariables 77
3.2 Mean,median,andmode 79
3.3 Variance 85
3.4 Twoimportantprobabilitydensityfunctions 87
3.5 Functionsofarandomvariable 89
3.6 Jointprobabilities 90
3.7 Bayesianinference 93
3.8 Jointprobabilitydensityfunctions 94
3.9 Covariance 100
3.10 Themultivariatenormalp.d.f. 102
3.11 Linearfunctionsofmultivariatedata 105
Problems 109
Reference 110
4. Linear models as the foundation ofdata analysis 111
4.1 Quantitativemodels,data,andmodelparameters 111
4.2 Thesimplestofquantitativemodels 113
4.3 Curvefitting 114
4.4 Mixtures 119
4.5 Weightedaverages 121
4.6 Examiningerror 124
4.7 Leastsquares 129
4.8 Examples 131
4.9 Covarianceandthebehavioroferror 136
Problems 140
5. Leastsquares with prior information 141
5.1 Whenleastsquarefails 141
5.2 Priorinformation 143
5.3 Bayesianinference 144
5.4 TheproductofNormalprobabilitydensityfunctions 146
Contents vii
5.5 Generalizedleastsquares 148
5.6 Theroleofthecovarianceofthedata 150
5.7 Smoothnessaspriorinformation 152
5.8 Sparsematrices 156
5.9 Reorganizinggridsofmodelparameters 160
Problems 169
References 170
6. Detecting periodicities with Fourier analysis 171
6.1 Describingsinusoidaloscillations 171
6.2 Modelscomposedonlyofsinusoidalfunctions 173
6.3 Goingcomplex 182
6.4 Lessonslearnedfromtheintegraltransform 187
6.5 Normalcurve 188
6.6 Spikes 188
6.7 Areaunderafunction 190
6.8 Time-delayedfunction 192
6.9 Derivativeofafunction 193
6.10 Integralofafunction 195
6.11 Convolution 196
6.12 Nontransientsignals 197
Problems 203
Furtherreading 203
7. Modeling time-dependentbehavior with filters 205
7.1 Behaviorsensitivetopastconditions 205
7.2 Filteringasconvolution 209
7.3 Solvingproblemswithfilters 211
7.4 Casestudy:Empirically-derivedfilterforHudsonRiverdischarge 221
7.5 Predictingthefuture 223
7.6 Aparallelbetweenfiltersandpolynomials 225
7.7 Filtercascadesandinversefilters 227
7.8 Makinguseofwhatyouknow 231
Problems 237
Reference 237
8. Undirecteddataanalysis using factors,empirical orthogonal functions,
andclusters 239
8.1 Samplesasmixtures 239
8.2 Determiningtheminimumnumberoffactors 242
viii Contents
8.3 ApplicationtotheAtlanticRocksdataset 246
8.4 Spikyfactors 249
8.5 Weightingofelements 254
8.6 Q-modefactoranalysis 255
8.7 Factorsandfactorloadingswithnaturalorderings 256
8.8 Casestudy:EOF’softheequatorialPacificOceanSeasurfacetemperaturedataset 258
8.9 Clustersofdata 262
8.10 K-meanclusteranalysis 267
8.11 ClusteringtheAtlanticRocksdataset 270
Problems 274
References 275
9. Detecting and understanding correlations among data 277
9.1 Correlationiscovariance 277
9.2 Computingautocorrelationbyhand 283
9.3 Relationshiptoconvolutionandpowerspectraldensity 284
9.4 Cross-correlation 285
9.5 Usingthecross-correlationtoaligntimeseries 287
9.6 Leastsquaresestimationoffilters 291
9.7 Theeffectofsmoothingontimeseries 296
9.8 Band-passfilters 300
9.9 Casestudy:CoherenceoftheReynoldsChannelwaterqualitydata 305
9.10 WindowingbeforecomputingFouriertransforms 310
9.11 Optimalwindowfunctions 313
Problems 317
References 317
10. Interpolation, Gaussian process regression, andkriging 319
10.1 Interpolationrequirespriorinformation 319
10.2 Linearinterpolation 320
10.3 Cubicinterpolation 323
10.4 Gaussianprocessregressionandkriging 325
10.5 ExampleofGaussianprocessregression 329
10.6 TuningofparametersinGaussianprocessregression 333
10.7 Exampleoftuning 335
10.8 Interpolationintwo-dimensions 335
10.9 Fouriertransformsintwodimensions 340
10.10 UsingFouriertransformstofillinmissingdata 342
Problems 347
References 348
Contents ix
11. Approximate methods,including linearization and artificial
neural networks 349
11.1 Thevalueofsimplicity 349
11.2 PolynomialapproximationsandTaylorseries 350
11.3 Smallnumberapproximations 351
11.4 Smallnumberapproximationappliedtodistanceonasphere 352
11.5 Smallnumberapproximationappliedtovariance 353
11.6 Taylorseriesinmultipledimensions 354
11.7 Smallnumberapproximationappliedtocovariance 355
11.8 Solvingnonlinearproblemswithiterativeleastsquares 356
11.9 Fittingasinusoidofunknownfrequency 357
11.10 Thegradientdescentmethod 359
11.11 Precomputationofafunctionandtablelookups 361
11.12 Artificialneuralnetworks 363
11.13 Informationflowinaneuralnet 365
11.14 Traininganeuralnet 372
11.15 Neuralnetforanonlinearfilter 375
Problems 382
References 383
12. Assessing the significance of results 385
12.1 RejectingtheNullHypothesis 385
12.2 Thedistributionofthetotalerror 386
12.3 Fourimportantprobabilitydensityfunctions 389
12.4 Casestudywithcommonhypothesistestingscenarios 391
12.5 Chi-squaredtestforgeneralizedleastsquares 398
12.6 Testingimprovementinfit 400
12.7 Testingthesignificanceofaspectralpeak 402
12.8 Significanceofaspectralpeakforacorrelatedtimeseries 407
12.9 Bootstrapconfidenceintervals 410
Problems 418
13. Notes 419
13.1 Note1.1Onthepersistenceofvariables 419
13.2 Note2.1Ontime 420
13.3 Note2.2Onreadingcomplicatedtextfiles 421
13.4 Note3.1Ontheruleforerrorpropagation 423
13.5 Note3.2Ontheeda_draw()function 424
13.6 Note4.1Oncomplexleastsquares 424
13.7 Note5.1Onthederivationofgeneralizedleastsquares 427