UseR! Frans Willekens Multistate Analysis of Life Histories with R Use R! SeriesEditors RobertGentleman KurtHornik GiovanniParmigiani More information about this series at http://www.springer.com/series/6991 Frans Willekens Multistate Analysis of Life Histories with R ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Supposeyouareaskedtodescribeanindividual.Youprobablylistage,sex,marital status, presence of children and number of children, main occupation, education level,ethnicity,placeofresidence,placeofwork,mainsourceofincome,religious denomination and some lifestyle features. You probably add years of major tran- sitions:whenthepersongraduatedfromschool,gotmarried,enteredthecurrentjob andmovedtothecurrentaddress.Ifthepersonhaschildren,youmayaddthename, ageandsexofeachchild.Whenyouareaskedtodescribeapopulation,youmay mentionsize,agestructure,distributionbylevelofeducation,employmentstatus, marital status and health status. It describes the population at a point in time. If askedtodescribepopulation change,youmaymentionchanges insizeanddistri- bution. Population change is an outcome of changes in people’s lifestyle and life course.Anageingpopulationisaresultofpeoplehavingfewerchildrenandliving longer. A declining married proportion is an outcome of fewer people marrying, postponementofmarriageandmarriagesbeinglessstable.Fewermarriagesmaybe linkedtochangesinthemeaningoftheinstitutionofmarriage.Anincreaseinthe proportionofunemploymentisanoutcomeofmorepeoplelosingtheirjoband/or decreasedlikelihoodoffindingajobwhenunemployed,resultinginlongerunem- ploymentspells.Thedescriptionofpopulationchangeintermsofchanginglivesis referredtoasthebiographicalmethod.Themethodemphasizespersonalattributes, lifeeventsandlifehistories. Anindividualmaybecharacterizedbyasetofattributessuchasmaritalstatus, employmentstatus,healthstatus,placeofresidenceandincomelevel.Ifattributes arerepresentedbydiscretevariableswithfinitenumbersofcategories,acombina- tionofcategoriesdefinesastateofexistenceandanindividualwithgivenvaluesof attributes is said to occupy a state. Individuals with the same values of attributes occupythesamestate.Thestatespaceisthesetofpossiblestates.Inpractice,one or a few attributes are selected to define the state space. Which attributes are selected depends on the research question. Other attributes that are relevant but notofprimaryimportancearetreatedascovariates. Aslifeunfolds,anindividualmovesbetweenstates.Thesequencesofstatesand transitionsbetweenstatesdescribelifehistoriesorcareers.Employmenthistories, v vi Preface maritalhistoriesandresidentialhistoriesareexamplesofcareers.Instudiesoflife histories, two approaches are distinguished (Abbott 2001). The first views a life pathasawholeandtriestofindtypicalpatterns.Theapproachisgenerallyknown assequenceanalysis.Thesecondviewsalifehistoryasarealizationofastochastic process and aims at the description, explanation and prediction of life histories. Probabilitymodelsareusedtorepresentstochasticprocessesandtomodelthelife historiesthattheygenerate.Thisbookisaboutthesecondapproach.Lifehistories are viewed as realizations of continuous-time Markov processes that depend on ratesoftransitionbetweenstates.Theratesareestimatedfromlongitudinaldata. The multistate methods that are presented in this book are included in the software package Biograph, a package in R that implements the biographical method. The packages can be downloaded from the Comprehensive R Archive Network(CRAN)(http://cran.r-project.org/).Biographretrievesusefulinformation fromlifehistorydata.Itestimatestransitionratesandcomputesusefullifehistory indicators. A particularly useful feature of Biograph is the set of utilities that connect the package to R packages for multistate modeling including mstate, msm,mvna,etm,Epi,andthepackageTraMineRforsequenceanalysis.Biograph producesinputdataintherightformatandbasicRobjectsforthepackages. Themotivationtowritethebookwastostimulatetheuseofmultistatemodeling among social science students and researchers with basic knowledge of survival analysis and event history analysis. The methods presented in the book are illus- trated using two data sets. The first is a subsample of the German Life History Survey. Blossfeld and Rohwer (2002) and Blossfeld et al. (2007) used the data to illustratethestatisticalmodelingoftime-to-eventdata.Byusingthesamedataset, the multistate analysis of life histories is presented as a logical extension of the analysis of time-to-event data. At the end of the book, another data set is consid- ered: the Netherlands Fertility and Family Survey of 1998. The data sets are includedintheBiographpackage. Thebookshouldappealtoanyoneinterestedinhowpopulationschangeandhow the change is related to the lifestyle and life course of individuals. The changes include today’s major societal challenges: ageing, population decline, migration and integration, population diversity, population health, labour market dynamics and the role of education and skills in the modern knowledge society. The book should be of particular interest to demographers, epidemiologists and students of population health, sociologists, criminologists, economists and historians. The book is suitable as a textbook for graduate courses on event history analysis. It mayalsobeusedasaself-studybookprovidedthereaderhasabasicknowledgeof survivalanalysisandmultistatemodeling.TheRcodeusedonthebookisavailable online. The preparation of the book has been a long but exciting journey. Most of the work was done while I was with the Netherlands Interdisciplinary Demographic Institute(NIDI)inTheHague.ThebookwascompletedattheMaxPlanckInstitute forDemographicResearchinRostock,Germany.IwouldliketothankHans-Peter BlossfeldforallowingmetousethesubsampleoftheGermanLifeHistorySurvey Preface vii thatheusedinhisbookwithGo¨tzRohwer,TechniquesofEventHistoryModeling (BlossfeldandRohwer2002).JamesRaymer,JuttaGampe,SabineZinnandArthur Allignolprovidedusefulcommentsonthemanuscript.Iamgratefulfortheirhelp. Rostock,Germany FransWillekens May2014 ThiSisaFMBlankPage Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 LifeHistories:RealandSynthetic. . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 TransitionRates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 TransitionProbabilitiesandStateOccupationProbabilities. . . . . . 28 2.4 ExpectedWaitingTimesandStateOccupationTimes. . . . . . . . . 40 2.5 SyntheticLifeHistories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.6 Conclusion. . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . 51 3 TheBiographObject. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 DescriptionofaBiographObject. . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 HowtoCreateaBiographObject?. . . . . . . . . . . . . . . . . . . . . . . 57 3.4 DataRestructuring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 OtherDataFormats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 ANoteonDates. . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. 74 3.7 Conclusion. . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . 78 4 ExploratoryDataAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 TheMultistateSystemandItsMeasurement. . . . . . . . . . . . . . . . 82 4.3 EpisodesandTransitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4 StateandEventSequences:IndividualandAggregate. . . . . . . . . 91 4.5 StateOccupancies,TransitionsandStateOccupationTimes. . . . . 95 4.6 Covariates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 Conclusion. . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . 106 5 VisualisationofLifeHistories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 PointsofDeparture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3 BasicGraphicswithggplot2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 ix

