P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome Latent Variable Models and Factor Analysis P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome WILEYSERIESINPROBABILITYANDSTATISTICS EstablishedbyWALTERA.SHEWHARTandSAMUELS.WILKS Editors DavidJ.Balding,NoelA.C.Cressie,GarrettM.Fitzmaurice,HarveyGoldstein, GeertMolenberghs,DavidW.Scott,AdrianF.M.Smith,RueyS.Tsay, SanfordWeisberg EditorsEmeriti VicBarnett,RalphA.Bradley,J.StuartHunter,J.B.Kadane,DavidG.Kendall, JozefL.Teugels Acompletelistofthetitlesinthisseriescanbefoundon http://www.wiley.com/WileyCDA/Section/id-300611.html. P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome Latent Variable Models and Factor Analysis A Unified Approach 3rd Edition David Bartholomew (cid:2) Martin Knott (cid:2) Irini Moustaki LondonSchoolofEconomicsandPoliticalScience,UK A John Wiley & Sons, Ltd., Publication P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome Thiseditionfirstpublished2011 ©2011JohnWiley&Sons,Ltd Registeredoffice JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,United Kingdom Fordetailsofourglobaleditorialoffices,forcustomerservicesandforinformationabouthowtoapply forpermissiontoreusethecopyrightmaterialinthisbookpleaseseeourwebsiteatwww.wiley.com. Therightoftheauthortobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewiththe Copyright,DesignsandPatentsAct1988. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise, exceptaspermittedbytheUKCopyright,DesignsandPatentsAct1988,withoutthepriorpermissionof thepublisher. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmay notbeavailableinelectronicbooks. Designationsusedbycompaniestodistinguishtheirproductsareoftenclaimedastrademarks.Allbrand namesandproductnamesusedinthisbookaretradenames,servicemarks,trademarksorregistered trademarksoftheirrespectiveowners.Thepublisherisnotassociatedwithanyproductorvendor mentionedinthisbook.Thispublicationisdesignedtoprovideaccurateandauthoritativeinformationin regardtothesubjectmattercovered.Itissoldontheunderstandingthatthepublisherisnotengagedin renderingprofessionalservices.Ifprofessionaladviceorotherexpertassistanceisrequired,theservices ofacompetentprofessionalshouldbesought. LibraryofCongressCataloging-in-PublicationData Bartholomew,DavidJ. Latentvariablemodelsandfactoranalysis:aunifiedapproach.–3rded./DavidBartholomew, MartinKnott,IriniMoustaki. p.cm. Includesbibliographicalreferencesandindex. ISBN978-0-470-97192-5(cloth) 1.Latentvariables. 2.Latentstructureanalysis. 3.Factoranalysis. I.Knott,M.(Martin) II.Moustaki,Irini. III.Title. QA278.6.B372011 519.5(cid:2)35–dc22 2011007711 AcataloguerecordforthisbookisavailablefromtheBritishLibrary. PrintISBN:978-0-470-97192-5 ePDFISBN:978-1-119-97059-0 oBookISBN:978-1-119-97058-3 ePubISBN:978-1-119-97370-6 MobiISBN:978-1-119-97371-3 Setin10/12ptTimesbyAptaraInc.,NewDelhi,India. P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome Contents Preface xi Acknowledgements xv 1 Basicideasandexamples 1 1.1 Thestatisticalproblem 1 1.2 Thebasicidea 3 1.3 Twoexamples 4 1.3.1 Binarymanifestvariablesandasinglebinarylatentvariable 4 1.3.2 Amodelbasedonnormaldistributions 6 1.4 Abroadertheoreticalview 6 1.5 Illustrationofanalternativeapproach 8 1.6 Anoverviewofspecialcases 10 1.7 Principalcomponents 11 1.8 Thehistoricalcontext 12 1.9 Closelyrelatedfieldsinstatistics 17 2 Thegenerallinearlatentvariablemodel 19 2.1 Introduction 19 2.2 Themodel 19 2.3 Somepropertiesofthemodel 20 2.4 Aspecialcase 21 2.5 Thesufficiencyprinciple 22 2.6 Principalspecialcases 24 2.7 Latentvariablemodelswithnon-linearterms 25 2.8 Fittingthemodels 27 2.9 Fittingbymaximumlikelihood 29 2.10 FittingbyBayesianmethods 30 2.11 Rotation 33 2.12 Interpretation 35 2.13 Samplingerrorofparameterestimates 38 2.14 Thepriordistribution 39 2.15 Posterioranalysis 41 2.16 Afurthernoteontheprior 43 2.17 Psychometricinference 44 P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome vi CONTENTS 3 Thenormallinearfactormodel 47 3.1 Themodel 47 3.2 Somedistributionalproperties 48 3.3 Constraintsonthemodel 50 3.4 Maximumlikelihoodestimation 50 3.5 MaximumlikelihoodestimationbytheE-Malgorithm 53 3.6 Samplingvariationofestimators 55 3.7 Goodnessoffitandchoiceofq 58 3.7.1 Modelselectioncriteria 58 3.8 Fittingwithoutnormalityassumptions:leastsquaresmethods 59 3.9 Othermethodsoffitting 61 3.10 Approximatemethodsforestimating(cid:2) 62 3.11 Goodnessoffitandchoiceofqforleastsquaresmethods 63 3.12 Furtherestimationissues 64 3.12.1 Consistency 64 3.12.2 Scale-invariantestimation 65 3.12.3 Heywoodcases 67 3.13 Rotationandrelatedmatters 69 3.13.1 Orthogonalrotation 69 3.13.2 Obliquerotation 70 3.13.3 Relatedmatters 70 3.14 Posterioranalysis:thenormalcase 71 3.15 Posterioranalysis:leastsquares 72 3.16 Posterioranalysis:areliabilityapproach 74 3.17 Examples 74 4 Binarydata:latenttraitmodels 83 4.1 Preliminaries 83 4.2 Thelogit/normalmodel 84 4.3 Theprobit/normalmodel 86 4.4 Theequivalenceoftheresponsefunctionandunderlyingvariable approaches 88 4.5 Fittingthelogit/normalmodel:theE-Malgorithm 90 4.5.1 Fittingtheprobit/normalmodel 93 4.5.2 Othermethodsforapproximatingtheintegral 93 4.6 Samplingpropertiesofthemaximumlikelihoodestimators 94 4.7 Approximatemaximumlikelihoodestimators 95 4.8 Generalisedleastsquaresmethods 96 4.9 Goodnessoffit 97 4.10 Posterioranalysis 100 4.11 Fittingthelogit/normalandprobit/normalmodels:Markovchain MonteCarlo 102 4.11.1 Gibbssampling 102 4.11.2 Metropolis–Hastings 105 P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome CONTENTS vii 4.11.3 Choosingpriordistributions 108 4.11.4 ConvergencediagnosticsinMCMC 108 4.12 Divergenceoftheestimationalgorithm 109 4.13 Examples 109 5 Polytomousdata:latenttraitmodels 119 5.1 Introduction 119 5.2 Aresponsefunctionmodelbasedonthesufficiencyprinciple 120 5.3 Parameterinterpretation 124 5.4 Rotation 124 5.5 Maximumlikelihoodestimationofthepolytomouslogitmodel 125 5.6 Anapproximationtothelikelihood 126 5.6.1 Onefactor 127 5.6.2 Morethanonefactor 130 5.7 Binarydataasaspecialcase 134 5.8 Orderingofcategories 136 5.8.1 Aresponsefunctionmodelforordinalvariables 136 5.8.2 Maximumlikelihoodestimationofthemodelwithordinal variables 138 5.8.3 Thepartialcreditmodel 140 5.8.4 Anunderlyingvariablemodel 140 5.9 Analternativeunderlyingvariablemodel 144 5.10 Posterioranalysis 147 5.11 Furtherobservations 148 5.12 Examplesoftheanalysisofpolytomousdatausingthelogitmodel 149 6 Latentclassmodels 157 6.1 Introduction 157 6.2 Thelatentclassmodelwithbinarymanifestvariables 158 6.3 Thelatentclassmodelforbinarydataasalatenttraitmodel 159 6.4 K latentclasseswithintheGLLVM 161 6.5 Maximumlikelihoodestimation 162 6.6 Standarderrors 164 6.7 Posterioranalysisofthelatentclassmodelwithbinarymanifest variables 166 6.8 Goodnessoffit 167 6.9 Examplesforbinarydata 167 6.10 Latentclassmodelswithunorderedpolytomousmanifestvariables 170 6.11 Latentclassmodelswithorderedpolytomousmanifestvariables 171 6.12 Maximumlikelihoodestimation 172 6.12.1 Allocationofindividualstolatentclasses 174 6.13 Examplesforunorderedpolytomousdata 174 6.14 Identifiability 178 6.15 Startingvalues 180 P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome viii CONTENTS 6.16 Latentclassmodelswithmetricalmanifestvariables 180 6.16.1 Maximumlikelihoodestimation 181 6.16.2 Othermethods 182 6.16.3 Allocationtocategories 185 6.17 Modelswithorderedlatentclasses 185 6.18 Hybridmodels 186 6.18.1 Hybridmodelwithbinarymanifestvariables 186 6.18.2 Maximumlikelihoodestimation 187 7 Modelsandmethodsformanifestvariablesofmixedtype 191 7.1 Introduction 191 7.2 Principalresults 192 7.3 Othermembersoftheexponentialfamily 193 7.3.1 Thebinomialdistribution 193 7.3.2 ThePoissondistribution 194 7.3.3 Thegammadistribution 194 7.4 Maximumlikelihoodestimation 195 7.4.1 Bernoullimanifestvariables 196 7.4.2 Normalmanifestvariables 197 7.4.3 AgeneralE-Mapproachtosolvingthelikelihoodequations 199 7.4.4 Interpretationoflatentvariables 200 7.5 Samplingpropertiesandgoodnessoffit 201 7.6 Mixedlatentclassmodels 202 7.7 Posterioranalysis 203 7.8 Examples 204 7.9 Orderedcategoricalvariablesandothergeneralisations 208 8 Relationshipsbetweenlatentvariables 213 8.1 Scope 213 8.2 Correlatedlatentvariables 213 8.3 Procrustesmethods 215 8.4 Sourcesofpriorknowledge 215 8.5 Linearstructuralrelationsmodels 216 8.6 TheLISRELmodel 218 8.6.1 Thestructuralmodel 218 8.6.2 Themeasurementmodel 219 8.6.3 Themodelasawhole 219 8.7 Adequacyofastructuralequationmodel 221 8.8 Structuralrelationshipsinageneralsetting 222 8.9 GeneralisationsoftheLISRELmodel 223 8.10 Examplesofmodelswhichareindistinguishable 224 8.11 Implicationsforanalysis 227 P1:TIX/XYZ P2:ABC JWST070-FM JWST070-Bartholomew May9,2011 14:30 PrinterName:YettoCome CONTENTS ix 9 Relatedtechniquesforinvestigatingdependency 229 9.1 Introduction 229 9.2 Principalcomponentsanalysis 229 9.2.1 Adistributionaltreatment 229 9.2.2 Asample-basedtreatment 233 9.2.3 Unorderedcategoricaldata 235 9.2.4 Orderedcategoricaldata 236 9.3 Analternativetothenormalfactormodel 236 9.4 Replacinglatentvariablesbylinearfunctionsofthemanifest variables 238 9.5 Estimationofcorrelationsandregressionsbetweenlatentvariables 240 9.6 Q-Methodology 242 9.7 Concludingreflectionsoftheroleoflatentvariablesinstatistical modelling 244 Softwareappendix 247 References 249 Authorindex 265 Subjectindex 271 P1:TIX/XYZ P2:ABC JWST070-Preface JWST070-Bartholomew May9,2011 13:58 PrinterName:YettoCome Preface It is more than 20 years since the first edition of this book appeared in 1987, and its subject, like statistics as a whole, has changed radically in that period. By far the greatest impact has been made by advances in computing. In 1987 adequate implementation of most latent variable methods, even the well-established factor analysis,wasguidedmorebycomputationalfeasibilitythanbytheoreticaloptimality. What was true of factor analysis was even more true of the assortment of other latent variable techniques, which were then seen as unconnected and very specific to different applications. The development of new models was seriously inhibited bytheinsuperablecomputationalproblemswhichtheywouldhaveposed.Thisnew editionaimstotakefullaccountofthesechanges. The Griffin series of monographs, then edited by Alan Stuart, was designed to consolidatetheliteratureofpromisingnewdevelopmentsintoshortbooks.Knowing thatoneofus(DJB)wasattemptingtodevelopandunifylatentvariablemodelling from a statistical point of view, he proposed what appeared in 1987 as Volume 40 in the Griffin series. Ten years later the series had been absorbed into the Kendall Library of Statistics monographs designed to complement the evergreen volumes of Kendall and Stuart’s Advanced Theory of Statistics. Latent Variable Models and FactorAnalysistookitsplaceasVolume7inthatseriesin1999.Thissecondedition wassomewhatdifferentincharacterfromitspredecessor,andasecondauthor(MK) broughthisparticularexpertiseintotheproject.Afterafurtherdecadethatbookwas in urgent need of revision, and this could only be done adequately by recruiting a thirdauthor(IM)whoisactivelyinvolvedatthefrontiersofcontemporaryresearch. Throughoutitslonghistorytheprincipalaimhasremainedunchangedanditisworth quotingatsomelengthfromthePrefaceofthesecondedition: the prime object of the book remains the same – that is, to provide a unifiedandcoherenttreatmentofthefieldfromastatisticalperspective. Thisisachievedbysettingupasufficientlygeneralframeworktoenable ustoderivethecommonlyusedmodels,andmanymoreasspecialcases. The starting point is that all variables, manifest and latent, continuous orcategorical,aretreatedasrandomvariables.Thesubsequentanalysis isthendonewhollywithintherealmoftheprobabilitycalculusandthe theoryofstatisticalinference. Thesubtitle,addedinthisedition,merelyservestoemphasise,ratherthanmodifyits originalpurpose.