ebook img

Advanced Statistical Methods in Data Science PDF

229 Pages·2016·5.184 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Advanced Statistical Methods in Data Science

ICSA Book Series in Statistics Series Editors: Jiahua Chen · Ding-Geng (Din) Chen Ding-Geng (Din) Chen Jiahua Chen Xuewen Lu Grace Y. Yi Hao Yu Editors Advanced Statistical Methods in Data Science ICSA Book Series in Statistics Serieseditors JiahuaChen DepartmentofStatistics UniversityofBritishColumbia Vancouver Canada Ding-Geng(Din)Chen UniversityofNorthCarolina ChapelHill,NC,USA Moreinformationaboutthisseriesathttp://www.springer.com/series/13402 Ding-Geng (Din) Chen • Jiahua Chen (cid:129) Xuewen Lu (cid:129) Grace Y. Yi (cid:129) Hao Yu Editors Advanced Statistical Methods in Data Science 123 Editors Ding-Geng(Din)Chen JiahuaChen SchoolofSocialWork DepartmentofStatistics UniversityofNorthCarolinaatChapelHill UniversityofBritishColumbia ChapelHill,NC,USA Vancouver,BC,Canada DepartmentofBiostatistics GraceY.Yi GillingsSchoolofGlobalPublicHealth DepartmentofStatisticsandActuarial UniversityofNorthCarolinaatChapelHill Science ChapelHill,NC,USA UniversityofWaterloo Waterloo,ON,Canada XuewenLu HaoYu DepartmentofMathematicsandStatistics DepartmentofStatisticsandActuarial UniversityofCalgary Science Calgary,AB,Canada WesternUniversity London,ON,Canada ISSN2199-0980 ISSN2199-0999 (electronic) ICSABookSeriesinStatistics ISBN978-981-10-2593-8 ISBN978-981-10-2594-5 (eBook) DOI10.1007/978-981-10-2594-5 LibraryofCongressControlNumber:2016959593 ©SpringerScience+BusinessMediaSingapore2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. The registered company address is: 152 Beach Road, #22-06/08 Gateway East, Singapore 189721, Singapore To myparentsandparents-in-law,who value highereducationandhardwork; tomywife Ke, forher love,support,andpatience;and tomysonJohn D.Chen andmydaughter JennyK. Chen fortheirloveandsupport. Ding-Geng(Din)Chen, PhD To mywife, mydaughterAmy, andmyson Andy,whoseadmiringconversations transformedintolastingenthusiasmformy researchactivities. JiahuaChen, PhD To mywifeXiaobo,mydaughterSophia,and mysonSamuel,fortheirsupportand understanding. XuewenLu,PhD To myfamily,WenqingHe, MorganHe, and JoyHe, for beingmyinspirationandoffering everlastingsupport. Grace Y. Yi,PhD Preface Thisbookisacompilationofinvitedpresentationsandlecturesthatwerepresented at the Second Symposium of the International Chinese Statistical Association– Canada Chapter (ICSA–CANADA) held at the University of Calgary, Canada, August 4–6, 2015 (http://www.ucalgary.ca/icsa-canadachapter2015).The Sympo- siumwasorganizedaroundthetheme“EmbracingChallengesandOpportunitiesof StatisticsandDataScienceintheModernWorld”withathreefoldgoal:topromote advancedstatistical methodsin big data sciences, to create an opportunityfor the exchangeideasamongresearchersinstatisticsanddatascience,andtoembracethe opportunitiesinherent in the challenges of using statistics and data science in the modernworld. The Symposium encompassed diverse topics in advanced statistical analysis in big data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, and longitudinal and functional data analysis; design and analysis of studies with response-dependentandmultiphasedesigns;time series androbuststatistics; and statistical inference based on likelihood, empirical likelihood, and estimating functions. This book compiles 12 research articles generated from Symposium presentations. Our aim in creating this bookwas to providea venue for timely dissemination of the research presented during the Symposium to promote further research and collaborative work in advanced statistics. In the era of big data, this collection of innovativeresearch not only has high potential to have a substantial impact on thedevelopmentofadvancedstatisticalmodelsacrossawidespectrumofbigdata sciencesbutalso hasgreatpromiseforfosteringmoreresearchandcollaborations addressing the ever-changing challenges and opportunities of statistics and data science.Theauthorshavemadetheirdataandcomputerprogramspubliclyavailable so that readers can replicate the model development and data analysis presented in each chapter, enabling them to readily apply these new methods in their own research. vii viii Preface The 12 chaptersare organizedinto three sections. Part I includesfourchapters that present and discuss data analyses based on latent variable models in data sciences.PartIIcomprisesfourchaptersthatshareacommonfocusonlifetimedata analyses.PartIIIiscomposedoffourchaptersthataddressapplieddataanalysesin bigdatasciences. PartIDataAnalysisBasedonLatentorDependentVariableModels(Chaps.1, 2,3,and4) Chapter 1 presents a weighted multiple testing procedure commonly used and known in clinical trials. Given this wide use, many researchers have proposed methodsformakingmultipletestingadjustmentstocontrolfamily-wiseerrorrates whileaccountingforthelogicalrelationsamongthenullhypotheses.However,most ofthosemethodsnotonlydisregardthecorrelationamongtheendpointswithinthe samefamilybutalsoassumethehypothesesassociatedwitheachfamilyareequally weighted. Authors Enas Ghulam, Kesheng Wang, and Changchun Xie report on their work in which they proposed and tested a gatekeeping procedure based on Xie’sweightedmultipletestingcorrectionforcorrelatedtests.Theproposedmethod isillustratedwithanexampletoclearlydemonstratehowitcanbeusedincomplex clinicaltrials. In Chap.2, Abbas Khalili, Jiahua Chen, and David A. Stephens consider the regime-switching Gaussian autoregressive model as an effective platform for analyzing financial and economic time series. The authors first explain the heterogeneousbehaviorinvolatilityovertimeandmultimodalityoftheconditional or marginaldistributionsand then propose a computationallymore efficient regu- larization method for simultaneousautoregressive-orderand parameter estimation whenthenumberofautoregressiveregimesispredetermined.Theauthorsprovide ahelpfuldemonstrationbyapplyingthismethodtoanalysisofthegrowthoftheUS grossdomesticproductandUSunemploymentratedata. Chapter 3 deals with a practical problem of healthcare use for understanding the risk factors associated with the length of hospital stay. In this chapter, Cindy XinFengandLonghaiLidevelophurdleandzero-inflatedmodelstoaccommodate both the excess zeros and skewness of data with various configurationsof spatial random effects. In addition, these models allow for the analysis of the nonlinear effectofseasonalityandotherfixedeffectcovariates.Thisresearchdrawsattention to considerable drawbacks regarding model misspecifications. The modeling and inference presented by Feng and Li use the fully Bayesian approach via Markov ChainMonteCarlo(MCMC)simulationtechniques. Chapter 4 discusses emerging issues in the era of precision medicine and the developmentofmulti-agentcombinationtherapyorpolytherapy.Priorresearchhas established that, as compared with conventional single-agent therapy (monother- apy),polytherapyoftenleadstoahigh-dimensionaldosesearchingspace,especially when a treatment combines three or more drugs. To overcome the burden of calibration of multiple design parameters, Ruitao Lin and Guosheng Yin propose arobustoptimalinterval(ROI)designtolocatethemaximumtolerateddose(MTD) in Phase I clinical trials. The optimal interval is determined by minimizing the probability of incorrect decisions under the Bayesian paradigm. To tackle high- Preface ix dimensional drug combinations, the authors develop a random-walk ROI design to identify the MTD combination in the multi-agent dose space. The authors of thischapterdesignedextensivesimulationstudiestodemonstratethefinite-sample performanceoftheproposedmethods. PartIILifetimeDataAnalysis(Chaps.5,6,7,and8) In Chap.5, Longlong Huang, Karen Kopciuk, and Xuewen Lu present a new method for group selection in an accelerated failure time (AFT) model with a groupbridgepenalty.Thismethodiscapableofsimultaneouslycarryingoutfeature selection at the group and within-group individual variable levels. The authors conducteda series of simulation studies to demonstratethe capacity of this group bridge approachto identify the correctgroup and correct individualvariable even with high censoring rates. Real data analysis illustrates the application of the proposedmethodtoscientificproblems. Chapter 6 considers issues around Case I interval censored data, also known as current status data, commonly encountered in areas such as demography, economics,epidemiology,andmedicalscience.Inthischapter,PoonehPordeliand XuewenLufirstintroduceapartiallylinearsingle-indexproportionaloddsmodelto analyzethesetypesofdataandthenproposeamethodforsimultaneoussievemax- imumlikelihoodestimation.Theresultantestimatorofregressionparametervector isasymptoticallynormal,and,undersomeregularityconditions,thisestimatorcan achievethesemiparametricinformationbound. Chapter 7 presents a framework for general empirical likelihood inference of Type I censored multiple samples. Authors Song Cai and Jiahua Chen develop an effective empirical likelihood ratio test and efficient methods for distribution functionandquantileestimationforTypeIcensoredsamples.Thisnewlydeveloped approach can achieve high efficiency without requiring risky model assumptions. Themaximumempiricallikelihoodestimatorisasymptoticallynormal.Simulation studiesshowthat,ascomparedto somesemiparametriccompetitors,theproposed empiricallikelihoodratiotesthassuperiorpowerunderawiderangeofpopulation distributionsettings. Chapter 8 provides readers with an overview of recent developments in the joint modeling of longitudinal quality of life (QoL) measurements and survival timeforcancerpatientsthatpromisemoreefficientestimation.AuthorsHuiSong, YingweiPeng,andDongshengTuthenproposesemiparametricestimationmethods to estimate the parameters in these joint models and illustrate the applications of these joint modeling procedures to analyze longitudinal QoL measurements and recurrencetimesusingdatafromaclinicaltrialsampleofwomenwithearlybreast cancer. PartIIIAppliedDataAnalysis(Chaps.9,10,11,and12) Chapter 9 presents an interesting discussion of a confidence weighting model appliedtomultiple-choicetestscommonlyusedinundergraduatemathematicsand statisticscourses.MichaelCaversandJosephLingdiscussanapproachtomultiple- choice testing called the student-weightedmodel and report on findings based on theimplementationofthismethodintwosectionsofafirst-yearcalculuscourseat theUniversityofCalgary(2014and2015). x Preface Chapter 10 discusses parametric imputation in missing data analysis. Author Peisong Han proposes to estimate and subtract the asymptotic bias to obtain consistent estimators. Han demonstrates that the resulting estimator is consistent if any of the missingness mechanismmodelsor the imputationmodelis correctly specified. Chapter 11 considersone of the basic and importantproblemsin statistics: the estimationofthecenterofasymmetricdistribution.Inthischapter,authorsPengfei Li and Zhaoyang Tian propose a new estimator by maximizing the smoothed likelihood.LiandTian’ssimulationstudiesshowthat,ascomparedwiththeexisting methods, their proposed estimator has much smaller mean square errors under uniform distribution, t-distribution with one degree of freedom, and mixtures of normaldistributionsonthemeanparameter.Additionally,theproposedestimatoris comparabletotheexistingmethodsunderothersymmetricdistributions. Chapter12presentstheworkofJingjiaChu,RegKulperger,andHaoYuinwhich theyproposeanewclassofmultivariatetimeseriesmodels.Specifically,theauthors propose a multivariate time series model with an additive GARCH-type structure to capture the common risk among equities. The dynamic conditional covariance betweenseriesisaggregatedbyacommonriskterm,whichiskeytocharacterizing theconditionalcorrelation. As a general note, the references for each chapter are included immediately followingthe chaptertext. We have organizedthe chaptersas self-containedunits soreaderscanmoreeasilyandreadilyrefertothecitedsourcesforeachchapter. The editors are deeply gratefulto many organizationsand individualsfor their supportoftheresearchandeffortsthathavegoneintothecreationofthiscollection of impressive, innovative work. First, we would like to thank the authors of each chapter for the contribution of their knowledge, time, and expertise to this book as well as to the Second Symposium of the ICSA–CANADA. Second, our sinceregratitudegoestothesponsorsoftheSymposiumfortheirfinancialsupport: the Canadian Statistical Sciences Institute (CANSSI), the Pacific Institute for the MathematicalSciences(PIMS),andtheDepartmentofMathematicsandStatistics, University of Calgary; without their support, this book would not have become a reality. We also owe big thanks to the volunteers and the staff of the University of Calgary for their assistance at the Symposium. We express our sincere thanks to the Symposium organizers: Gemai Chen, PhD, University of Calgary; Jiahua Chen, PhD, University of British Columbia; X. Joan Hu, PhD, Simon Fraser University;WendyLou,PhD,UniversityofToronto;XuewenLu,PhD,University of Calgary; Chao Qiu, PhD, University of Calgary; Bingrui (Cindy) Sun, PhD, University of Calgary; Jingjing Wu, PhD, University of Calgary; Grace Y. Yi, PhD,UniversityofWaterloo;andYingZhang,PhD,AcadiaUniversity.Theeditors wishtoacknowledgetheprofessionalsupportofHannahQiu(Springer/ICSABook Seriescoordinator)andWeiZhao(associateeditor)fromSpringerBeijingthatmade publishingthisbookwithSpringerareality.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.