Strata Jumpstart Sep 19, 2011, NY Strata Summit Sep 20-21, 2011, NY Strata Conference Sep 22-23, 2011, NY Use your data – or lose Register Now Save 20% with code EBOOK Data Analysis with Open Source Tools Data Analysis with Open Source Tools Philipp K. Janert Beijing (cid:129) Cambridge (cid:129) Farnham (cid:129) Köln (cid:129) Sebastopol (cid:129) Tokyo DataAnalysiswithOpenSourceTools byPhilippK.Janert Copyright(cid:2)c 2011PhilippK.Janert.Allrightsreserved.PrintedintheUnitedStatesofAmerica. PublishedbyO’ReillyMedia,Inc.1005GravensteinHighwayNorth,Sebastopol,CA95472. O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Online editionsarealsoavailableformosttitles(http://my.safaribooksonline.com).Formoreinformation, contactourcorporate/institutionalsalesdepartment:(800)[email protected]. Editor: MikeLoukides Indexer: FredBrown ProductionEditor: SumitaMukherji CoverDesigner: KarenMontgomery Copyeditor: MattDarnell InteriorDesigner: EdieFreedman andRonBilodeau ProductionServices: MPSLimited,aMacmillan Company,andNewgenNorthAmerica,Inc. Illustrator: PhilippK.Janert PrintingHistory: November2010:FirstEdition. TheO’ReillylogoisaregisteredtrademarkofO’ReillyMedia,Inc.DataAnalysiswithOpenSource Tools,theimageofacommonkite,andrelatedtradedressaretrademarksofO’ReillyMedia,Inc. Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproductsare claimedastrademarks.Wherethosedesignationsappearinthisbook,andO’ReillyMedia,Inc. wasawareofatrademarkclaim,thedesignationshavebeenprintedincapsorinitialcaps. Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthor assumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseofthe informationcontainedherein. ISBN:978-0-596-80235-6 [M] [2011-05-27] Furiousactivityisnosubstituteforunderstanding. —H.H.Williams CONTENTS PREFACE xiii 1 INTRODUCTION 1 DataAnalysis 1 What’sinThisBook 2 What’swiththeWorkshops? 3 What’swiththeMath? 4 WhatYou’llNeed 5 What’sMissing 6 PARTI Graphics:LookingatData 2 ASINGLEVARIABLE:SHAPEANDDISTRIBUTION 11 DotandJitterPlots 12 HistogramsandKernelDensityEstimates 14 TheCumulativeDistributionFunction 23 Rank-OrderPlotsandLiftCharts 30 OnlyWhenAppropriate:SummaryStatisticsandBoxPlots 33 Workshop:NumPy 38 FurtherReading 45 3 TWOVARIABLES:ESTABLISHINGRELATIONSHIPS 47 ScatterPlots 47 ConqueringNoise:Smoothing 48 LogarithmicPlots 57 Banking 61 LinearRegressionandAllThat 62 ShowingWhat’sImportant 66 GraphicalAnalysisandPresentationGraphics 68 Workshop:matplotlib 69 FurtherReading 78 4 TIMEASAVARIABLE:TIME-SERIESANALYSIS 79 Examples 79 TheTask 83 Smoothing 84 Don’tOverlooktheObvious! 90 TheCorrelationFunction 91 vii
Description: