ebook img

Simulating Data with SAS PDF

362 Pages·2013·7.851 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Simulating Data with SAS

The correct bibliographic citation for this manual is as follows: Wicklin, Rick. 2013. Simulating Data with SAS®. Cary, NC: SAS Institute Inc. Simulating Data with SAS® Copyright © 2013, SAS Institute Inc., Cary, NC, USA ISBN 978-1-61290-622-5 (electronic book) ISBN 978-1-61290-332-3 All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414 1st printing, April 2013 2nd printing, June 2013 SAS provides a complete selection of books and electronic products to help customers use SAS® software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit support.sas.com/bookstore or call 1-800-727-3228. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Contents Acknowledgments v I Essentials of Simulating Data 1 Chapter 1. Introduction to Simulation ................................................................................................... 3 Chapter 2. Simulating Data from Common Univariate Distributions ................................................. 11 Chapter 3. Preliminary and Background Information ......................................................................... 29 II Basic Simulation Techniques 49 Chapter 4. Simulating Data to Estimate Sampling Distributions ......................................................... 51 Chapter 5. Using Simulation to Evaluate Statistical Techniques ......................................................... 73 Chapter 6. Strategies for Efficient and Effective Simulation ............................................................... 93 III Advanced Simulation Techniques 107 Chapter 7. Advanced Simulation of Univariate Data ......................................................................... 109 Chapter 8. Simulating Data from Basic Multivariate Distributions ................................................... 129 Chapter 9. Advanced Simulation of Multivariate Data ...................................................................... 153 Chapter 10. Building Correlation and Covariance Matrices ................................................................. 175 IV Applications of Simulation in Statistical Modeling 195 Chapter 11. Simulating Data for Basic Regression Models ................................................................. 197 Chapter 12. Simulating Data for Advanced Regression Models .......................................................... 225 Chapter 13. Simulating Data from Times Series Models ..................................................................... 251 Chapter 14. Simulating Data from Spatial Models ............................................................................... 263 Chapter 15. Resampling and Bootstrap Methods ................................................................................. 281 Chapter 16. Moment Matching and the Moment-Ratio Diagram ......................................................... 297 V Appendix 323 Appendix A. A SAS/IML Primer .......................................................................................................... 325 Index 339 Acknowledgments IwouldliketothankRobertRodriguezandPhilGibbsforpointingouttheneedforabookabout simulatingdatainSAS.“Simulation”isavasttopic,andearlydiscussionswiththemhelpedmeto whittledownthepossibletopics. BobandMauraStokesprovidedmanyopportunitiesformeto developthismaterialbyinvitingmetopresentpapersandworkshopsatconferences. Mysupervisors atSASfullysupportedmeasIpreparedforandparticipatedintheseconferences. IthankthemanySASuserswhoencouragedmetowriteabookthatemphasizesthepracticalside of simulation. Discussions with SAS users helped me to determine what topics are of practical importancetostatisticiansandanalystsinbusinessandindustry. I thank my colleagues at SAS from whom I have learned many statistical and programming techniques. Special thanks to Randy Tobias, who always provides sound advice and statistical wisdomformynaivequestions. ThanksalsotoTimArnoldandWarrenKuhfeldfortheir‘saslatex’ documentation system that automatically produced all tables and graphs in this book from the programsthatappearinthetext. I thank my editor, John West, and the other employees at SAS Press for their work producing andpromotingthebook. Ithanktworeviewers, ClementStoneandBobPearson, whoprovided insightfulcommentsaboutthebook’scontentandorganization. Thankstoseveralcolleaguesandfriendswhoreadandcommentedonearlydraftsofthisbook. This includesthefollowingindividuals: RobAgnelli,JasonBrinkley,TonyaChapman,SteveDenham, BruceElsheimer,BetsyEnstrom,PhilGibbs,EmilyLada,PushpalMukhopadhyay,BillRaynor, RobertRodriguez,JimSeabolt,UdoSglavo,YingSo,JillTao,RandyTobias,IanWakeling,Donna Watts,andMinZhu. Finally,Iwouldliketothankmywife,Nancy,forherconstantsupport,andmyparentsforinstilling inmealoveoflearning. Part I Essentials of Simulating Data Chapter 1 Introduction to Simulation Contents 1.1 OverviewofSimulationofData . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 TheGoalofThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 WhoShouldReadThisBook? . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 TheSAS/IMLLanguage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 ComparingtheDATAStepandSAS/IMLLanguage . . . . . . . . . . . . . 6 1.6 OverviewofThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 ObtainingtheProgramsUsedinThisBook . . . . . . . . . . . . . . . . . . 8 1.8 SpecializedSimulationToolsinSASSoftware . . . . . . . . . . . . . . . . 8 1.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1 Overview of Simulation of Data There are many kinds of simulation. Climate scientists use simulation to model the interactions between the earth’s atmosphere, oceans, and land. Astrophysicists use simulation to model the evolutionofgalaxies. Biologistsusesimulationtomodelthespreadofepidemicsandtheeffectsof vaccinationprograms. Engineersusesimulationtostudythesafetyandfuelefficiencyofautomobile andairplanedesigns. Inthesesimulationsofphysicalsystems,scientistsmodelrealityandusea computertostudythemodelundervariousconditions. Statisticiansalsobuildmodels. Forexample,asimplemodelofhumanheightmightassumethat heightisnormallydistributedinthepopulation. Thisisausefulmodel,butitturnsoutthathuman heightsarenotactuallynormallydistributed(Schilling,Watkins,andWatkins2002). Evenifyou restrict the data to a single gender, there are more very tall and very short people than would be expectedfromanormaldistributionofheights. Ifasetofdataisonlyapproximatelynormal,whatdoesthatmeanforstatisticalteststhatassume normality? Ifyoucomputeat testtocomparethemeansoftwogroups—atestthatassumesthatthe twounderlyingpopulationsarenormallydistributed—howsensitiveisyourconclusiontotheactual populationdistribution? Ifthepopulationsareslightlynonnormal,doesthatinvalidatethet test? Or aretheresultsfairlyrobusttodeviationsfromnormality? Onewaytoanswerthesequestionsistosimulatedatafromnonnormalpopulations. Ifyouconstruct adistributionalmodel,thenyoucangeneraterandomsamplesfromthemodelandexaminehowthe t testperformsonthesimulateddata. Simulationgivesyoucompletecontroloverthecharacteristics ofthepopulationmodelfromwhichthe(simulated)dataaredrawn. Simulatingdataisalsousefulforcomparingtwodifferentstatisticaltechniques. PerhapsTechniqueA performs better on skewed data than Technique B. Perhaps Technique B is more robust to the 4 Chapter1: IntroductiontoSimulation presenceofoutliers. Toapracticingstatistician,thiskindofinformationisquitevaluable. AsGentle (2009,p. xi)says,“Learningtosimulatedatawithgivencharacteristicsmeansthatoneunderstands thosecharacteristics. Applyingstatisticalmethodstosimulateddata...helpsusbettertounderstand thosemethodsandtheprinciplesunderlyingthem.” This book is about simulating data in SAS software. This book demonstrates how to generate observationsfrompopulationsthathavespecifiedstatisticalcharacteristics. Inthisbook,thephrases “simulating data,” “generating a random sample,” and “sampling from a distribution” are used interchangeably. A large portion of this book is about learning how to construct statistical models (distributions) that have certain statistical properties. Skewed distributions, fat-tailed distributions, bimodal distributions—theseareafewexamplesofmodelsthatyoucanconstructbyusingthetechniques inthisbook. Thisbookalsopresentstechniquesforgeneratingdatafromcorrelatedmultivariate distributions. EachtechniqueisaccompaniedbyaSASprogram. Althoughthisbookusesstatistics,itsaudienceisnotlimitedtostatisticians. Itisahow-tobookfor statisticalprogrammerswhouseSASsoftwareandwhowanttosimulatedataefficiently. Inshort,thisbookdescribeshowtowriteSASprogramsthatsimulatedatawithawiderangeof characteristics,anddescribeshowtousethatdatatounderstandtheperformanceandapplicabilityof statisticaltechniques. 1.2 The Goal of This Book Thegoalofthisbookistoprovidetips,techniques,andexamplesforefficientlysimulatingdatain SASsoftware. Datasimulationisafundamentaltechniqueinstatisticalprogrammingandresearch. Toevaluate statisticalmethods,youoftenneedtocreatedatawithknownproperties,bothrandomandnonrandom. Thisbookcontainsmorethanonehundredannotatedprogramsthatsimulatedatawithspecified characteristics. Youcanusesimulateddatatoestimatetheprobabilityofanevent,toestimatethe samplingdistributionofastatistic,toestimatethecoverageprobabilitiesofconfidenceintervals,and toevaluatetherobustnessofastatisticaltest. Someprogramsarepresentedintwoforms,firstbyusingtheDATAstepandthenagainbyusingthe SAS/IMLlanguage. Bypresentingthesamealgorithmintwodifferentways,thenoviceSAS/IML programmercanlearnhowtowritesimulationsintheSAS/IMLlanguage. Laterchaptersthatdiscuss multivariatesimulationusetheSAS/IMLlanguageheavily. Ifyouareseriousaboutsimulation,you shouldinvestthetimetolearnhowtousetheSAS/IMLlanguageefficiently. Although this book covers many standard examples of data distributions, there are many other examples that are not covered. However, many techniques that are described in this book are generallyapplicable. Forexample,Section7.5describeshowtogeneraterandomsamplesfroma mixtureofnormaldistributions. Thesametechniquecanbeusedtosimulatedatafromamixtureof otherdistributions. Thebookalsoincludesmorethan100exercises. Manyexercisesextendtheresultsofasectionto otherdistributionsortorelatedproblems. Theexercisesprovidepracticalprogrammingproblems thatencourageyoutomasterthematerialbeforemovingontothenextsection. Mostexerciseswill

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.