ebook img

Applied Survival Analysis Using R PDF

234 Pages·2016·3.405 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Survival Analysis Using R

UseR! Dirk F. Moore Applied Survival Analysis Using R Use R! SeriesEditors: RobertGentleman KurtHornik GiovanniParmigiani Moreinformationaboutthisseriesathttp://www.springer.com/series/6991 Use R! Wickham:ggplot2(2nded.2016) Luke:AUser’sGuidetoNetworkAnalysisinR Monogan:PoliticalAnalysisUsingR Cano/M.Moguerza/PrietoCorcoba:QualityControlwithR Schwarzer/Carpenter/Rücker:Meta-AnalysiswithR Gondro:PrimertoAnalysisofGenomicDataUsingR Chapman/Feit:RforMarketingResearchandAnalytics Willekens:MultistateAnalysisofLifeHistorieswithR Cortez:ModernOptimizationwithR Kolaczyk/Csàrdi:StatisticalAnalysisofNetworkDatawithR Swenson/Nathan:FunctionalandPhylogeneticEcologyinR Nolan/TempleLang:XMLandWebTechnologiesforDataScienceswithR Nagarajan/Scutari/Lèbre:BayesianNetworksinR vandenBoogaart/Tolosana-Delgado:AnalyzingCompositionalDatawithR Bivand/Pebesma/Gòmez-Rubio:AppliedSpatialData AnalysiswithR(2nded.2013) Eddelbuettel:SeamlessRandC++IntegrationwithRcpp Knoblauch/Maloney:ModelingPsychophysicalDatainR Lin/Shkedy/Yekutieli/Amaratunga/Bijnens:ModelingDose-ResponseMicroarray DatainEarlyDrugDevelopment ExperimentsUsingR Cano/M.Moguerza/Redchuk:SixSigmawithR Soetaert/Cash/Mazzia:SolvingDifferentialEquationsinR Dirk F. Moore Applied Survival Analysis Using R 123 DirkF.Moore DepartmentofBiostatistics RutgersSchoolofPublicHealth Piscataway,NJ,USA ISSN2197-5736 ISSN2197-5744 (electronic) UseR! ISBN978-3-319-31243-9 ISBN978-3-319-31245-3 (eBook) DOI10.1007/978-3-319-31245-3 LibraryofCongressControlNumber:2016940055 ©SpringerInternationalPublishingSwitzerland2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland To Lynne, Molly,andEmily Preface This book serves as an introductory guide for students and analysts who need to work with survival time data. The minimum prerequisites are basic applied courses in linear regression and categoricaldata analysis. Studentswho also have taken a master’s level course in statistical theory will be well prepared to work throughthisbook,sincefrequentreferenceismadetomaximumlikelihoodtheory. Studentslackingthistrainingmaystillbeabletounderstandmostofthematerial, provided they have an understanding of the basic concepts of differential and integralcalculus.Specifically,studentsshouldunderstandtheconceptofthelimit, andtheyshouldknowwhatderivativesandintegralsareandbeabletoevaluatethem insomebasiccases. The material for this book has come from two sources. The first source is an introductory class in survival analysis for graduate students in epidemiology and biostatistics at the Rutgers School of Public Health. Biostatistics students, as one would expect, have a much firmer grasp of more mathematical aspects of statisticsthandoepidemiologystudents.Still,Ihavefoundthatthoseepidemiology students with strong quantitative backgroundshave been able to understandsome mathematicalstatisticalproceduressuchasscoreandlikelihoodratiotests,provided that they are not expected to symbolically differentiate or integrate complex formulas.InthisbookIhave,whenpossible,usedthenumericalcapabilitiesofthe R system to substitute for symbolic manipulation. The second source of material is derivedfrom collaborationswith physiciansand epidemiologistsat the Rutgers Cancer Institute of New Jersey and at the Rutgers Robert Wood Johnson Medical School.Anumberofthedatasetsinthistextarederivedfromthesecollaborations. Also, the experience of training statistical analysts to work on these data sets providedadditionalinspirationforthebook. The first chapter introduces the concepts of survival times and how right censoringoccursanddescribesseveralofthedatasetsthatwillbeusedthroughout thebook.Chapter2presentsfundamentalsofsurvivaltheory.Thisincludeshazard, probability density, survival functions, and how they are related. The hazard functionisillustratedusingbothlifetabledataandusingsomecommonparametric distributions.Thechapterendswithabriefintroductiontopropertiesofmaximum vii viii Preface likelihoodestimatesusingtheexponentialdistributionasanillustration.Chapter3 discusses the Kaplan-Meier estimate of the survival function and several related concepts such as the median survival and its confidence interval. Also discussed inthischapteraresmoothingofthehazardfunctionandhowtoaccommodateleft truncationintotheKaplan-Meierestimate. Chapter 4 discusses the log-rank test for comparing survival distributions and alsosomemodifiedlinearranktests. Stratifiedtests arealsodiscussed,alongwith an example where stratification can reverse the apparent direction of a treatment effect in a survival example of Simpson’s paradox. In Chapter 5, we present the Cox proportional hazards model and partial likelihood function in the context of comparing two groups of survival data. There we illustrate the Wald, score, and likelihoodratiotestsinthisbasiccontext.Left-truncatedsurvivaldataandthepartial likelihoodarealsodiscussed. Chapter 6 presentsmethodsfor modelselection and extendsand illustrates the proportionalhazardsmodelinsituationswheretherearemultiplepossiblepredictor covariates.Chapter7presentsdiagnosticresidualplotsthatareusefulforassessing modelassumptions.Chapter8discusseshowtoadaptthesurvivalmodelsdiscussed earliertoallowfortime-dependentcovariates. The next few chapters discuss some important special situations. Chapter 9 discusses multiple outcomes, which can occur as clustered survival times or in a competing risks framework, where only the first of multiple outcomes can be observed.Chapter10discussesparametricsurvivalmodels,andChapter11covers thecriticallyimportantdesignquestionofhowtodeterminethepowerandsample size ofa proposedstudy thathasa survivaloutcome.Finally,Chapter 12 presents some additional topics, including the piecewise exponential distribution, methods forhandlingintervalcensoring,andthelassomethodforhandlingsurvivaldatawith largenumbersofpredictors.Manyofthedatasetsdiscussedinthetextareavailable intheaccompanyingRpackage“asaur”(for“AppliedSurvivalAnalysisUsingR”), while othersare in otherpackages.All are freelyavailable fordownloadfromthe CentralRArchiveNetworkatcran.r-project.org.TheR-codediscussedinthebook isavailablefordownloadathttp://www.springer.com/us/book/9783319312439 A key feature of this book is the integration of the R statistical system with the survival analysis material. Not only do we show the reader how to use R functionstofitsurvivalmodelsandhowtointerprettheresults, butwealso useR toillustratehowsurvivalquantitiesarecomputed.Typicallyweusesmallexamples to illustrate in detail how one constructs survival tests, partial likelihood models, and diagnostics and then proceed to more complicated examples. Most of the survival functions will require that the “survival” library be attached using the “library(survival)”statement.The“survival”packageisincludedbydefault;other packages referred to in the text must be explicitly downloaded and installed. The appendixincludesbothsomebasicsoftheRlanguageandspecialfeaturesrelevant to thesurvivalcalculationsused elsewhereinthe book.Usersnotalreadyfamiliar withtheRsystemshouldrefertooneofthemanyonlineresourcesformoredetailed information. Preface ix IwouldliketothankRebeccaMossforpermissiontousethe“pancreatic”data andMichaelSteinbergforpermissiontousethe“pharmacoSmoking”data.Bothof these data sets are used repeatedly throughoutthe text. I would also like to thank Grace Lu-Yao, Weichung Joe Shih, and Yong Lin for years-long collaborations onusingtheSEER-Medicaredataforstudyingthesurvivaltrajectoriesofprostate cancer patients. These collaborations led to the development of the “prostateSur- vival” data set discussed in this text in Chapter 9. I thank the Division of Cancer Epidemiologyand Genetics of the US National Cancer Institute for providingthe “asheknazi”data. I also thankWan Yee Lau formakingthe “hepatoCellular”data publically available in the online Dryad data repository and for allowing me to includeitinthe“asaur”Rpackage. Piscataway,NJ,USA DirkF.Moore October2015

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.