ebook img

Financial Analytics with R. Building a Laptop Laboratory for Data Science PDF

386 Pages·2016·15.011 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Financial Analytics with R. Building a Laptop Laboratory for Data Science

Financial Analytics with R Building a Laptop Laboratory for Data Science MARK J. BENNETT UniversityofChicago DIRK L. HUGEN UniversityofIowa UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom CambridgeUniversityPressispartoftheUniversityofCambridge. www.cambridge.org Informationonthistitle:www.cambridge.org/9781107150751 ©MarkJ.BennettandDirkL.Hugen2016 Firstpublished2016 PrintedintheUnitedKingdombyClays,StIvesplc AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Bennett,MarkJ.(MarkJoseph),1959–author.|Hugen,DirkL.,author. Title:FinancialanalyticswithR:buildingalaptoplaboratoryfordata science/MarkJ.Bennett,UniversityofChicago,DirkL.Hugen, UniversityofIowa. Description:Cambridge,UK:CambridgeUniversityPress,2016. Identifiers:LCCN2016026635|ISBN9781107150751 Subjects:LCSH:Finance–Mathematicalmodels–Dataprocessing.| Finance–Databases.|R(Computerprogramlanguage) Classification:LCCHG104.B462016|DDC332.0285/513--dc23 LCrecordavailableathttps://lccn.loc.gov/2016026635 ISBN978-1-107-15075-1Hardback Contents Preface pagexiii Acknowledgments xvii 1 AnalyticalThinking 1 1.1 WhatIsFinancialAnalytics? 2 1.2 WhatIstheLaptopLaboratoryforDataScience? 3 1.3 WhatIsRandHowCanItBeUsedintheProfessionalAnalyticsWorld? 5 1.4 Exercises 6 2 TheRLanguageforStatisticalComputing 7 2.1 GettingStartedwithR 7 2.2 LanguageFeatures:Functions,Assignment,Arguments,andTypes 10 2.3 LanguageFeatures:BindingandArrays 13 2.4 ErrorHandling 17 2.5 Numeric,Statistical,andCharacterFunctions 18 2.6 DataFramesandInput–Output 19 2.7 Lists 20 2.8 Exercises 22 3 FinancialStatistics 23 3.1 Probability 23 3.2 Combinatorics 24 3.3 MathematicalExpectation 31 3.4 SampleMean,StandardDeviation,andVariance 35 3.5 SampleSkewnessandKurtosis 36 3.6 SampleCovarianceandCorrelation 36 3.7 FinancialReturns 39 3.8 CapitalAssetPricingModel 40 3.9 Exercises 42 4 FinancialSecurities 44 4.1 BondInvestments 45 4.2 StockInvestments 48 4.3 TheHousingCrisis 49 4.4 TheEuroCrisis 50 4.5 SecuritiesDatasetsandVisualization 52 4.6 AdjustingforStockSplits 55 4.7 AdjustingforMergers 61 4.8 PlottingMultipleSeries 62 4.9 SecuritiesDataImporting 64 4.10 SecuritiesDataCleansing 71 4.11 SecuritiesQuoting 74 4.12 Exercises 75 5 DatasetAnalyticsandRiskMeasurement 77 5.1 GeneratingPricesfromLogReturns 77 5.2 NormalMixtureModelsofPriceMovements 80 5.3 SuddenCurrencyPriceMovementin2015 86 5.4 Exercises 90 6 TimeSeriesAnalysis 92 6.1 ExaminingTimeSeries 92 6.2 StationaryTimeSeries 97 6.3 Auto-RegressiveMovingAverageProcesses 98 6.4 PowerTransformations 98 6.5 TheTSAPackage 99 6.6 Auto-RegressiveIntegratedMovingAverageProcesses 109 6.7 CaseStudy:EarningsofJohnson&Johnson 110 6.8 CaseStudy:MonthlyAirlinePassengers 114 6.9 CaseStudy:ElectricityProduction 117 6.10 GeneralizedAuto-RegressiveConditionalHeteroskedasticity 120 6.11 CaseStudy:VolatilityofGoogleStockReturns 121 6.12 Exercises 128 7 TheSharpeRatio 130 7.1 SharpeRatioFormula 131 7.2 TimePeriodsandAnnualizing 131 7.3 RankingInvestmentCandidates 132 7.4 TheQuantmodPackage 136 7.5 MeasuringIncomeStatementGrowth 141 7.6 SharpeRatiosforIncomeStatementGrowth 144 7.7 Exercises 155 8 MarkowitzMean-VarianceOptimization 157 8.1 OptimalPortfolioofTwoRiskyAssets 157 8.2 QuadraticProgramming 160 8.3 DataMiningwithPortfolioOptimization 162 8.4 Constraints,Penalization,andtheLasso 165 8.5 ExtendingtoHighDimensions 171 8.6 CaseStudy:SurvivingStocksoftheS&P500Indexfrom2003to2008 179 8.7 CaseStudy:ThousandsofCandidateStocksfrom2008to2014 182 8.8 CaseStudy:Exchange-TradedFunds 186 8.9 Exercises 195 9 ClusterAnalysis 197 9.1 K-MeansClustering 197 9.2 DissectingtheK-MeansAlgorithm 204 9.3 SparsityandConnectednessofUndirectedGraphs 208 9.4 CovarianceandPrecisionMatrices 211 9.5 VisualizingCovariance 215 9.6 TheWishartDistribution 221 9.7 Glasso:PenalizationforUndirectedGraphs 225 9.8 RunningtheGlassoAlgorithm 225 9.9 TrackingaValueStockthroughtheYears 226 9.10 RegressiononYearlySparsity 231 9.11 RegressiononQuarterlySparsity 235 9.12 RegressiononMonthlySparsity 236 9.13 ArchitectureandExtension 238 9.14 Exercises 239 10 GaugingtheMarketSentiment 240 10.1 MarkovRegimeSwitchingModel 241 10.2 ReadingtheMarketData 244 10.3 BayesianReasoning 247 10.4 TheBetaDistribution 250 10.5 PriorandPosteriorDistributions 250 10.6 ExaminingLogReturnsforCorrelation 253 10.7 MomentumGraphs 255 10.8 Exercises 259 11 SimulatingTradingStrategies 261 11.1 ForeignExchangeMarkets 261 11.2 ChartAnalytics 263 11.3 InitializationandFinalization 264 11.4 MomentumIndicators 265 11.5 BayesianReasoningwithinPositions 266 11.6 Entries 268 11.7 Exits 269 11.8 Profitability 270 11.9 Short-TermVolatility 270 11.10 TheStateMachine 271 11.11 SimulationSummary 278 11.12 Exercises 280 12 DataExplorationUsingFundamentals 281 12.1 TheRSQLitePackage 281 12.2 FindingMarket-to-BookRatios 283 12.3 TheReshape2Package 285 12.4 CaseStudy:Google 288 12.5 CaseStudy:Walmart 289 12.6 ValueInvesting 290 12.7 Lab:TryingtoBeattheMarket 294 12.8 Lab:FinancialStrength 295 12.9 Exercises 296 13 PredictionUsingFundamentals 297 13.1 BestIncomeStatementPortfolio 298 13.2 ReformattingIncomeStatementGrowthFigures 298 13.3 ObtainingPriceStatistics 300 13.4 CombiningtheIncomeStatementwithPriceStatistics 306 13.5 PredictionUsingClassificationTreesandRecursivePartitioning 308 13.6 ComparingPredictionRatesamongClassifiers 314 13.7 Exercises 316 14 BinomialModelforOptions 318 14.1 ApplyingComputationalFinance 318 14.2 Risk-NeutralPricingandNoArbitrage 322 14.3 HighRisk-FreeRateEnvironment 322 14.4 ConvergenceofBinomialModelforOptionData 324 14.5 Put–CallParity 327 14.6 FromBinomialtoLog-Normal 328 14.7 Exercises 330 15 Black–ScholesModelandOption-ImpliedVolatility 331 15.1 GeometricBrownianMotion 332 15.2 MonteCarloSimulationofGeometricBrownianMotion 333 15.3 Black–ScholesDerivation 335 15.4 AlgorithmforImpliedVolatility 338 15.5 ImplementationofImpliedVolatility 339 15.6 TheRcppPackage 345 15.7 Exercises 348 Appendix ProbabilityDistributionsandStatisticalAnalysis 350 A.1 Distributions 350 A.2 BernoulliDistribution 350 A.3 BinomialDistribution 351 A.4 GeometricDistribution 352 A.5 PoissonDistribution 352 A.6 FunctionsforContinuousDistributions 354 A.7 TheUniformDistribution 356 A.8 ExponentialDistribution 357 A.9 NormalDistribution 359 A.10 Log-NormalDistribution 359 A.11 Thetν Distribution 360 A.12 MultivariateNormalDistribution 361 A.13 GammaDistribution 361 A.14 EstimationviaMaximumLikelihood 362 A.15 CentralLimitTheorem 364 A.16 ConfidenceIntervals 366 A.17 HypothesisTesting 366 A.18 Regression 367 A.19 ModelSelectionCriteria 369 A.20 RequiredPackages 370 References 372 Index 376 Preface In1994theChannelTunnelopenedbetweenEnglandandFrance,allowinghigh-speed EurostartrainstowhiskpassengersfromthecontinenttotheUnitedKingdomandback onagrandscale.Whatanamazing engineering featitwasforthetime(beyond many people’s earlier imaginations), yet we take it for granted today. In 1994, Grumman AerospaceCorporation,thechiefcontractorontheApolloLunarModule,wasacquired byNorthropCorporationtoformthenewaerospacegiant,NorthropGrumman.Itwas theprimecontractor ofthenewlydeployed advanced technology B-2StealthBomber. Onamuchmoremundaneandpersonalscale,alsoin1994,inatownhousejustoutside theCityofChicago,Iwasperformingatediousdailyexercise:lookingupdailyclosing priceseacheveninginastackofInvestor’sBusinessDailynewspapersforthetwostock investmentsthatwereabouttobepurchased.Thiswasnotonlytofindouttheirrunning rateofreturnbutalsotofindouttheirhistoricalvolatilityrelativetootherstocksbefore entering into the positions. Doing this manual calculation was slow and tedious. The WorldWideWebwasintroducedintheformoftheMosaicbrowserthenextyear.Itwas notlongbeforeYahoo!waspostingstockquotesandhistoricalpricecharts,aswellas technicalindicatorsonthecharts,availableondemandforfreeinjustafewsecondsvia thenewwebbrowsers. The advent of spreadsheet softwaretook analysts toanew level of analytical think- ing. No longer were live, human-operated calculations limited to a single dimension. Eachroworcolumncouldpresentatimedimension,aproductioncategory,abusiness scenario.Andtheautomateddependencyfeaturemaderevisionsquiteeasy.Nowspread- sheets can be used for a prototype for a more sophisticated and permanent analytical product:thelarge-scale,analyticalcomputerprogram. (cid:2) WithmodernprogramminglanguageslikeRandPython ,askilledanalystcannow designtheiranalyticlogicwithsignificantlylesseffortthanbefore,usingresourcessuch asYahoo!orotherfreeservicesforhistoricalquotes.IthasbeensaidthatPython’sterse syntax allows for programs with the same functionality as their Java equivalents, yet fourtimessmaller,andwesuspectthatRissimilar.Asmallfinanciallaboratorycanbe builtonalaptopcostinglessthan$200inamatterofweeks,simulatingmultiplemarket variablesasrequired.Or,byobtainingahigher-endlaptopmodelwithmoredrivespace, theentiremarketcanbeloadedwith10to20years’worthofhistoricaldataonascale neverbeforepossible. Once that laboratory is built, one can start to gain insights. Knowledge discovery was once a term for a human process. Now we’re talking about computer automation. Knowledge discovery seems like a bold term, a little too ambitious for anything a computerprogramcouldcreate.Forexample,thecomputerscienceprofessionalsociety, the Association for Computing Machinery (ACM), has a special interest group called Knowledge Discovery and Data Mining (KDD). Hardly anyone would challenge the “data mining” part of that. After all, statisticians and computer scientists having at it with data is what they do. But discovering knowledge with a machine? Really? Auto- matically?Nowthatseemsalittletooexaggeratedtobetrue.Thenagain,experiencing firsthand the algorithms that will be described in this book, we soon realized that the programs, using data science techniques, can not only automate very tedious calcula- tionsbutthenverypositivelyyieldinsightsintothehumanthinkinglevel:insightsthat wouldotherwisenotbefound. Perhapsonecanviewtheexperiencewithasportsanalogy.Inmanysportswehave defensetoprotectourcurrentpositionandpreventtheopponents fromscoringfurther points. Offense is the ability to piece together athletic feats sequentially to put more pointsonthescoreboard.ThedataminingportionofKDDcanbethoughtofasdefense: themoredisciplined,regimentedsideofthesport.Singleachievementscanbeeffective: thrusting up one’s hand to block a pass, throwing a curve ball to prevent a hitter from connecting onthepitch.Ontheotherhand,knowledgediscoveryistheoffensiveskill set,goingbeyondtherequiredandexpecteddataanalysisandintothecreativeside.On offense, an entire series of events needs to be successful to yield progress: a full-field soccerscoringdriveoraseriesofthreesuccessivebaseballbasehitsbeforethreeouts toscorearun.Thelikelihoodofasuccessonoffenseisless. So in the KDD model and sports analogy, data mining is the defense and knowl- edge discovery is the offense. Achieving knowledge discovery is a rare event with amazingimpact.Thediscoverycanbeaspowerfulashuman-madeideas,andcancer- tainlyenhancethem.Forexample,wemaydiscoverthatthereisapubliclytradedstock with uniquely desirable properties. The KDD domain touches the limits of what these machinescandowithalltheadvancementsincomputerscience. In 1968, a Hollywood movie and novel by author Arthur C. Clarke, 2001: A Space Odyssey, predicted automated reasoning, natural language speech recognition, video calls,andfacerecognition.TheHAL9000computercontrolstheflighttoJupiterwhile conversing and playing chess with astronaut Dr. Frank Poole and monitoring life con- ditions of over 300 astronauts. Since then, computer science, specifically simulation, hasgreatlyimpactedtheresearchanddiscoveryprocessinmanyfieldsandeffectively achievedmanyofthesesciencefictiongoalsnow.Amongothers,therearefieldsofcom- putational biology, computational cosmology, and computational linguistics. Images fromthesefieldsareshowninFigure1. Throughout this book we are concerned with computer simulation. Computer sim- ulation has become so successful that it is now widely accepted that, after theory and physicalexperimentation,itisathirdscientificmethod.Asthesubtitlesays,thisbook can be used to build a simulation laboratory for finance. The book was developed as studymaterialforthegraduateFinancialAnalyticscourseintheGrahamSchoolatthe UniversityofChicagoMasterofScienceinAnalyticsprogramandfromtheundergrad- uateInvestmentscourseintheDepartmentofFinanceintheTippieCollegeofBusiness KRAS start (S (NP) (VP (V) (NP))) (S (NP) (VP (V) (NP) (PP (P) (NP)))) (S (NP (PP (P) (NP))) (VP (V) (NP) (PP (P) (NP)))) (VP (V) (NP)) (VP (V) (NP) (PP (P) (NP))) (NP (PP (P) (NP))) (PP (PP (P) (NP)) (PP (P) (NP))) (V) (PP (P) (NP)) (NP) (P) Figure1 Sampleimagesfromcomputationalbiology,computationalcosmology,andcomputational linguistics. at the University of Iowa. It is recommended as a graduate textbook when used at a collegeoruniversity.Withthepropermathematicalandcomputersciencebackground, itcouldalsobeusedattheadvancedundergraduatelevel. It is best to have taken a course in statistical analysis, probability and statistics, or, ideally, mathematical statistics for the material in the book, but much of the required material is introduced within the main text and the Appendix. It is best to have an undergraduate-level background in calculus, linear algebra, and enough computer sci- ence to be familiar with array manipulation with one or more scientifically based (cid:2) programming languages such as C, C++, Java, C#, Python, or Matlab . A finance backgroundisnotnecessary.AnyexperiencewithRis,ofcourse,useful. FinancialcomputersimulationintheRlanguagecanbemoreintricateandchalleng- ing than building a spreadsheet. A quantitative optimizer can be better controlled and tailored when its logic is immediately apparent from the surrounding program code. More computer science knowledge is required by our reader to build more robust and sophisticatedplatforms,andmoregoesintothecompilerandrun-timesystembehindthe scenes.Butasthepiecesarecompleted,thebuilder,oroperator,orstudentoffinancial analyticsbeginstorealizethebenefitsofsimulationperformedinalanguagedesigned

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.