ebook img

ERIC ED610012: Robust Bayesian Approaches in Growth Curve Modeling: Using Student's "t" Distributions versus a Semiparametric Method PDF

2020·0.47 MB·English
by  ERIC
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ERIC ED610012: Robust Bayesian Approaches in Growth Curve Modeling: Using Student's "t" Distributions versus a Semiparametric Method

Runninghead: ROBUSTBAYESIANAPPROACHESINGCM 1 RobustBayesianApproachesinGrowthCurveModeling: UsingStudent’st Distributionsversus aSemiparametricMethod XinTong UniversityofVirginia ZhiyongZhang UniversityofNotreDame Tong, X., & Zhang, Z. (2020). Robust Bayesian approaches in growth curve modeling: Using Student's t distributions versus semiparametric methods. Structural Equation Modeling, 27(4), 544-560. https:// doi.org/10.1080/10705511.2019.1683014 The research is supported by through the grant program on Statistical and Research Methodology in Education from the Institute of Education Sciences of U.S. Department of Education (R305D140037). AuthorNote CorrespondenceconcerningthisarticleshouldbeaddressedtoXinTong,Departmentof Psychology,UniversityofVirginia,Charlottesville,VA22903. Email: [email protected]. ROBUSTBAYESIANAPPROACHESINGCM 2 Abstract Despitebroadapplicationsofgrowthcurvemodels,fewstudieshavedealtwithapracticalissue-- nonnormalityofdata. PreviousstudieshaveusedStudent’stdistributionstoremedythe nonnormalproblems. Inthisstudy,robustdistributionalgrowthcurvemodelsareproposedfroma semiparametricBayesianperspective,inwhichintraindividualmeasurementerrorsfollow unknownrandomdistributionswithDirichletprocessmixturepriors. BasedonMonteCarlo simulations,weevaluatetheperformanceoftherobustsemiparametricBayesianmethodand compareittotherobustmethodusingStudent’stdistributionsaswellasthetraditional normal-basedmethod. WeconcludethatthesemiparametricBayesianmethodismorerobust againstnonnormaldata. Anexampleaboutthedevelopmentofmathematicalabilitiesisprovided toillustratetheapplicationofrobustgrowthcurvemodeling,usingschoolchildren’sPeabody IndividualAchievementTestmathematicaltestscoresfromtheNationalLongitudinalSurveyof Youth1997Cohort. Keywords: SemiparametricBayesian,Robustmethod,Dirichletprocessmixture,Growth curvemodeling. ROBUSTBAYESIANAPPROACHESINGCM 3 RobustBayesianApproachesinGrowthCurveModeling: UsingStudent’st Distributionsversus aSemiparametricMethod Growthcurvemodelingisoneofthemostfrequentlyusedanalyticaltechniquesfor longitudinaldataanalysis(e.g.,McArdleandNesselroade,2014;MeredithandTisak,1990). In growthcurvemodeling,repeatedmeasuresofdependentvariablesarerepresentedasafunctionof timeandpossiblecovariates,andthefunctionmeanisthemeangrowth. Individualvariations aroundthemeangrowthcurveareduetorandomeffectsandintraindividualmeasurementerrors. Traditionalgrowthcurveanalysistypicallyassumesthattherandomeffectsandintraindividual measurementerrorsarenormallydistributed. Althoughthenormalityassumptionmakesgrowth curvemodelseasytoestimate,datainsocialandbehavioralsciencesarecommonlycollected usingsurveysorquestionnairesandthusoftenarenonnormal(Cainetal.,2017;Micceri,1989) becauseofnonnormalpopulationdistributionsordatacontamination. Ignoringthenonnormality ofdatamayleadtoinefficientorevenbiasedparameterestimates,andstatisticalinferencesbased oncommonteststatisticsandfitindicescouldbemisleading(e.g.,Maronna,Martin,andYohai, 2006;YuanandBentler,2001). Inthisarticle,weproposeasemiparametricBayesianmethodto handlethenonnormalityissueingrowthcurvemodelingandcomparetheproposedmethodto existingrobustBayesianapproachesusingStudent’st distributions. Researchershavebecomemoreandmorekeenlyawareofthelargeinfluencethat nonnormalityhasuponmodelestimation(e.g.,Hampel,Ronchetti,Rousseeuw,andStahel,1986; Huber,1981;Yuan,Bentler,andChan,2004)andhavedevelopedstrategiesaimingtoprovide reliableparameterestimatesandinferenceswhenthenormalityassumptionisviolated. A straightforwardandfeasiblestrategyistoeithertransformthedatasothattheyareclosetobeing normallydistributed,ordirectlydeletepotentialoutliersbeforedataanalysis. However,data transformationoftenmakestheinterpretationofthemodelestimationresultscomplicated. Simplydeletingoutliersmayreduceefficiencyastheresultinginferencesmayfailtoreflect uncertaintyintheexclusionprocess(e.g.,Lange,Little,andTaylor,1989). Moreover,diagnosing multivariateoutliersisachallengingtask(e.g.,Filzmoser,2005;PeñaandPrieto,2001). Tong ROBUSTBAYESIANAPPROACHESINGCM 4 andZhang(2017)proposedsixmethodstodetectoutlyingobservationsingrowthcurvemodeling andconcludedthatthegreatestchanceofsuccesscomesfromtheuseofmultiplemethods, comparingtheirresultsandmakingadecisionbasedonresearchpurposes. Therefore, alternatively,manyresearchers(e.g.,SavaleiandFalk,2014;YuanandZhang,2012)have recommendedtheapplicationofrobustmethodsandstatisticstoprotectdatafrombeingdistorted bythepresenceofoutliersornonnormality. Thesemethodseitherdownweightthepotential outliersasatransformationtechnique(e.g.,YuanandBentler,1998)orassumethatthedatacome fromcertainnonnormaldistributionssuchasatdistributionoramixtureofnormaldistributions (e.g.,AsparouhovandMuthén,2016;MuthénandShedden,1999;TongandZhang,2012). Recently,robustmethodsfromBayesianperspectiveshavedrawngrowinginterestbecause Bayesianmethodshavemanyadvantages. First,estimatingmodelswithcomplexstructuresoften involveshighdimensionalintegrationandthusiscomputationallyintensive. Samplingmethods suchasMarkovChainMonteCarlo(MCMC)undertheBayesianframeworkcanhandlethis problemrelativelyeasily. Second,Bayesianestimationcanconvenientlyinferparametersthatdo nothavesymmetricdistributions(e.g.,varianceparameters),whereasitisdifficultor computationallyintensivetocapturetheasymmetricnatureforparametersusingfrequentist methods. Third,withBayesianmethods,priorinformationcanbeincorporatedviainformative priorstomakeparameterestimatesmoreefficient. Furthermore,Bayesianmethodsnaturally accommodatemissingdatawithoutrequiringnewtechniquesforinferenceandmissingdatacan betakenintoaccountatthesametimeasparameterestimation. Becauseofthesestrengths,more andmoreBayesianestimationmethodsareemployedinrobustanalysis. FromtheBayesianperspective,oneapproachofrobustmethodstoaccountfor nonnormalityistoreplacenormaldistributionsbyStudent’stdistributionsinthemodelasthe degreesoffreedomoftdistributioncancontroltherobustness(Lange,Little,andTaylor,1989). Forexample,Pinheiro,Liu,andWu(2001)proposedarobustversionofthelinearmixed-effect modeltoremedythedistributionaldeviationfromthenormalityassumption,inwhichnormal distributionsfortherandomeffectsandmeasurementerrorswerebothreplacedbymultivariatet ROBUSTBAYESIANAPPROACHESINGCM 5 distributions. Afewstudieshavedirectlydiscussedthisapproachingrowthcurveanalysis. Tong andZhang(2012)andZhang,Lai,Lu,andTong(2013)suggestedmodelingheavy-taileddataand outliersingrowthcurvemodelingusingtdistributionsandprovidedonlinesoftwaretoconduct therobustanalysis. Thetwoarticlesdemonstratedthattherobustgrowthcurvemodelingbased ontdistributionsiseasytounderstandandimplement,andthuspotentiallywouldgreatly promotetheadoptionofrobustgrowthcurveanalyses. However,althoughtherearemany advantagesinusingtdistributionsforrobustdataanalysis(e.g.,TongandZhang,2012), Student’stdistributionhasaparametricformandthusstillhasarestrictiononthedistributionof data. Forexample,usingtdistributionsmaybesensitivetoskeweddataormixturedata,oreven breakdownundersomecircumstances(e.g.,AzzaliniandGenton,2008;Zhang,2013). Notethat researchershaveproposedrobustmethodsbasedonskew-normaldistributions(e.g.,Asparouhov andMuthén,2016;Zhang,2013)toovercometheproblemofskeweddata. However,again, skew-normaldistributionshaveparametricformsandarelimitedtodescribingcertainshapeof distributions. Growthmixturemodels,firstintroducedbyMuthénandShedden(1999),provide anotherusefulapproachtoreducetheinfluenceofthenonnormalityproblem. Thesemodels assumethatindividualscanbegroupedintoafinitenumberofclasseshavingdistinctgrowth trajectories. Althoughgrowthmixturemodelsareflexible,somedifficultissues,includingchoice ofthenumberoflatentclassesandselectionofgrowthcurvemodelswithineachclass,havetobe tackled. Thetypicalstrategyfixesthenumberoflatentclassesinadvanceatasmallvalue(e.g., 2-4),modelsthegrowthtrajectoriesparametricallywithapolynomialfunction,andassesses modelfitsusingcriteriasuchasAIC,BIC,andlikelihood-basedtests(Nylund,Asparoubov,and Muthén,2007). SemiparametricBayesianmethods,sometimesreferredtoasnonparametric Bayesianmethods(e.g.,GershmanandBlei,2012;MüllerandMitra,2004),provideadifferent approachtothisproblem. Ratherthancomparingmodelsthatvaryincomplexity,semiparametric Bayesianmethodsaretofitasinglemodelthatcanadaptitscomplexitytothedataandallowthe complexitytogrowasmoredataareobserved. Whileparametricmodelscanonlycaptureaboundedamountofinformationfromdata, ROBUSTBAYESIANAPPROACHESINGCM 6 semiparametricBayesianmodelsallowforaricherandlargerclassofmodels. MüllerandMitra (2004)pointedoutthatrestrictiontoaparametricfamilycanmisleadinvestigatorsintoan inappropriateillusionofposteriorcertainty. Onthecontrary,semiparametricBayesianmethods areadaptiveandhaveproventobeavaluabletoolfordiscoveringcomplicatedpatternsindata duetotheirgreatflexibility. TherearetwotypicalbuildingblocksforsemiparametricBayesian models: GaussianprocessandDirichletprocess,whereGaussianprocessisadistributionover functionsthatcanbeusedformodelingfunctionsandclassification,andDirichletprocessisa distributionoverprobabilitymeasuresthatcanbeusedfordensityestimationandclustering. For robustanalysisagainstnonnormality,semiparametricBayesianmethodswithDirichletprocess priorsaredesirable,byviewinglatentvariablesormeasurementerrorsasfromunknownrandom distributions. FueledbytheMCMCideasandthedevelopmentofBayesiansoftware(e.g,Lunn, Jackson,Best,Thomas,andSpiegelhalter,2013),manyresearchershavediscussedtheadvantages andflexibilityofusingthesemiparametricBayesianmethods(e.g.,FahrmeirandRaach,2007; Ghosaletal.,1999;Hjort,2003;Hjortetal.,2010;MüllerandMitra,2004;MacEachern,1999) andhaveappliedthesemethodstomodelswithcomplexstructures. Forexample,Bushand MacEachern(1996),KleinmanandIbrahim(1998),andBrownandIbrahim(2003)usedDirichlet processmixturesforrandomeffectsdistributions. AnsariandIyengar(2006)usedDirichlet componentstodefineasemiparametricdynamicchoicemodel. BurrandDoss(2005)useda conditionalDirichletprocessfortherandomeffectsdistributionwithinameta-analysis application. Dunson(2006)useddynamicmixturesofDirichletprocesstoallowalatentvariable distributiontochangenonparametricallyacrossgroups. Forcategoricaldataanalysis,Dirichlet processmixturesofmultinomialdistributionshavebeenstudiedandappliedtomissingdata throughmultipleimputationtechniquestocapturecomplexdependenciesespeciallyinhigh dimensions(SiandReiter,2013;Sietal.,2015). SemiparametricBayesianapproachhasalso beendevelopedforstructuralequationmodelstorelaxtheassumptionthatthedistributionofthe latentvariablesisnormal(e.g.,Lee,Lu,andSong,2008;YangandDunson,2010). Asfaraswe areawareof,measurementerrorsorresidualsinthesemodelsarestillnormallydistributed ROBUSTBAYESIANAPPROACHESINGCM 7 althoughitispointedoutinMüllerandMitra(2004)thatthenonparametricmodelextensioncan gotowardsthedirectionofnonparametricresidualdistributions. Despitethepopularityofgrowthcurvemodeling,theprevalenceofnonnormaldata,andthe flexibilityofsemiparametricBayesianmethods,nostudyhasbeendirectlyconductedon semiparametricBayesiangrowthcurvemodeling. Therefore,themaincontributionsofthisarticle isto(1)proposeasemiparametricBayesianapproachforgrowthcurvemodelingtorelaxthe normalityassumptionimposedintraditionalnormal-basedanalysis(especiallyonmeasurement errors);and(2)evaluatetheperformanceofthesemiparametricBayesianmethodandcompareit withtherobustmethodusingStudent’stdistributionswhichisaparametricanalysis. Therobust methodusingStudent’stdistributionsisselectedforcomparisonbecauseitisrelativelymore broadlyusedinpracticeasstatisticalsoftwarehasbeendevelopedforittofacilitatethe implementation. Inthenextsection,webrieflyreviewthetraditionalBayesiangrowthcurve modelingandtherobustapproachusingStudent’stdistributions. Afterthat,robust semiparametricBayesiangrowthcurvemodelingisproposed. Inaddition,thecomparison betweensemiparametricBayesianmodelsandfinitegrowthmixturemodelsisdiscussed. Thenin thesubsequentsection,simulationstudiesarecarriedouttoshowtheeffectivenessof semiparametricBayesianmethodsincomparisontothetraditionalgrowthcurvemodelingaswell astherobustgrowthcurvemodelingusingStudent’stdistributions. Finally,weillustratethe applicationofthesemiparametricBayesianmethodsthroughanexamplewiththePeabody IndividualAchievementTestmathdatafromtheNationalLongitudinalSurveyofYouth1997 Cohort(BureauofLaborStatistics,U.S.DepartmentofLabor,2005). Weendthearticleby summarizingourfindingswithrecommendations. RobustBayesianGrowthCurveModeling Bayesiangrowthcurvemodeling: abriefreview Growthcurvemodelsareusedtoanalyzelongitudinaldatainwhichthesamesubjectsare observedrepeatedlyovertimeonthesametests. Lety = (y ,...,y )0 beaT ×1random i i1 iT ROBUSTBAYESIANAPPROACHESINGCM 8 vectorandy beanobservationforindividualiattimej (i = 1,...,N;j = 1,...,T). HereN is ij thesamplesizeandT isthetotalnumberofmeasurementoccasions. Atypicalformof unconditionalgrowthcurvemodelscanbeexpressedas y = Λb +e , i i i b = β +u , i i whereΛisaT ×q factorloadingmatrixdeterminingthegrowthtrajectories,b isaq ×1vector i ofrandomeffects,ande isavectorofintraindividualmeasurementerrors. Thevectorofrandom i effectsb variesforeachindividual,anditsmean,β,representsthefixedeffects. Theresidual i vectoru representstherandomcomponentofb . Becauseβ isconstantacrossindividuals,b i i i andu sharethesametypeofdistributionwithdifferentmeans. i Traditionalgrowthcurvemodelstypicallyassumethatbothe andu followmultivariate i i normaldistributionssuchthate ∼ MN (0,Φ)andu ∼ MN (0,Ψ),whereMN denotesa i T i q multivariatenormaldistributionandthesubscriptdenotesitsdimension. TheT ×T matrixΦand theq ×q matrixΨrepresentthecovariancematricesofe andu ,respectively. The i i intraindividualmeasurementerrorstructureisusuallysimplifiedtoΦ = σ2Iwhereσ2 isascale e e parameter. Bythissimplification,weassumehomogeneityoferrorvarianceacrosstimeand measurementerrorsareuncorrelatedatdifferenttimepoints. Giventhecurrentspecificationof u ,b ∼ MN (β,Ψ). i i q Specialformsofgrowthcurvemodelscanbederivedfromtheprecedingform. For example,if   1 0  1 1  L ! β ! σ2 σ ! Λ =  ... ... ,bi = Sii ,β = βLS ,Ψ = σLLS σLS2S , 1 T −1 themodelrepresentsalineargrowthcurvemodelwithrandomintercept(initiallevel)L and i randomslope(rateofchange)S . Theaverageinterceptandslopeacrossallindividualsareβ i L andβ ,respectively. InΨ,σ2 andσ2 representthevariability(orinterindividualdifferences) S L S ROBUSTBAYESIANAPPROACHESINGCM 9 aroundthemeaninterceptandthemeanslope,respectively,andσ representsthecovariance LS betweenthelatentinterceptandslope. Toestimategrowthcurvemodels,Bayesianmethodologycanbeapplied. Bayesian methodsforcomplexdataanalysishavebeenmadepopularinthepastfewdecadesbecauseofits advantagesasdescribedpreviously(e.g.,LeeandShi,2000;LeeandSong,2004;LeeandXia, 2008;Serangetal.,2015;Zhangetal.,2007b;TongandZhang,2012). Thebasicideaof Bayesianmethodsistoobtaintheposteriordistributionsofmodelparametersbycombiningthe likelihoodfunctionandthepriors. Whenpriorsareuninformativeorweaklyinformative,the likelihooddominatesandthusresultsfromBayesianestimationaresimilartothosefrom maximumlikelihoodestimation. Foratypicalunconditionalgrowthcurvemodel,wedefinethe jointpriordistributionofmodelparametersbyp(β,Φ,Ψ)anddenotethelikelihoodfunctionas L. Thejointposteriordistributionofmodelparametersis Z p(β,Φ,Ψ|y ) ∝ p(β,Φ,Ψ)×Ldb, i whereb = (b0,...,b0 )0. Thisintegralisdifficulttosolveinpractice. Instead,MCMCmethods 1 N (e.g.,Gibbssampling;RobertandCasella,2004)areoftenappliedtoobtainparameterestimates andstatisticalinferences. Wefirstobtainconditionalposteriordistributionsfortheparameters, thenbyiterativelydrawingsamplesfromtheconditionalposteriordistributions,weobtain empiricalmarginaldistributionsofthemodelparametersandmakestatisticalinferencesbasedon theempiricalmarginaldistributions(GemanandGeman,1984). Detailedalgorithmcanbefound inSongandLee(2012),Zhangetal.(2013),etc. RobustBayesiangrowthcurvemodelingusingStudent’stdistributions Thetraditionalgrowthcurveanalysisdiscussedaboveisbaseduponthenormality assumptionofrandomeffectsandintraindividualmeasurementerrors. However,practicaldatain socialandbehavioralsciencesarerarelynormalduetodatacontaminationornonnormal populationdistributions. Withouttakingthenonnormalityproblemintoconsideration,wemay obtaininefficientorevenincorrectparameterestimatesinmodelestimation(e.g.,Yuanand ROBUSTBAYESIANAPPROACHESINGCM 10 Bentler,2001;YuanandZhang,2012). Studiestodealwiththeadverseeffectsofnonnormality onparameterestimates,standarderrors,andteststatisticshavebeencarriedoutingrowthcurve analysis. Inagrowthcurvemodel,thenonnormalitymayoccurintherandomeffects,inthe intraindividualmeasurementerrors,orboth(Pinheiroetal.,2001). Motivatedbyarealdataset fromanorthodonticstudy,Pinheiroetal. proposedarobusthierarchicallinearmixed-effects modelinwhichtherandomeffectsandtheintraindividualerrorsfollowmultivariatet distributions,withknownorunknowndegreesoffreedom. Byusingmultivariatetdistributions,extremevaluesinadatasetcanbedownweighted. Supposek dimensionaldatax followsamultivariatetdistribution,withk ×1locationvectorµ, k ×k shapematrixΣ,anddegreesoffreedomν,denotedbyMT(µ,Σ,ν). Theprobability densityfunctionofx is Γ[(ν +k)/2] (cid:20) 1 (cid:21)−(ν+k)/2 p(x|µ,Σ,ν) = 1+ (x−µ)TΣ−1(x−µ) . Γ(ν/2)νk/2πk/2|Σ|1/2 ν Themaximumlikelihoodestimatesofmodelparametersθˆsatisfy ΣN w A Σ(θˆ)−1(x −µ(θˆ) ) = 0(Langeetal.,1989),whereN isthesamplesize,A isthe i=1 i i i i i i matrixofpartialderivativesofµ(θ) withrespecttoθ,andw = ν+τi istheweightassignedto i i ν+δ2 i casei(τ isthedimensionofθ forcasei,andδ2 isthesquaredMahalanobisdistance i i δ2 = (x −µ )T Σ−1(x −µ )forcasei). Thus,potentialoutlierscanbedownweightedinthe i i i i i i modelestimationprocessbecauselowerweightswillbeassignedtocaseswithlargesquared Mahalanobisdistances,givenfixeddegreesoffreedomν anddimensionsτ . i FromaBayesianperspective,TongandZhang(2012)proposedfourtypesofdistributional growthcurvemodelswheretherandomeffectsu andintraindividualmeasurementerrorse may i i followeithermultivariatenormalortdistributions. Theyconcludedthatfourtypesof distributionalgrowthcurvemodelsimplyverydifferentpatternsingrowthtrajectories,andthus givenanempiricaldataset,itisveryimportanttospecifythecorrecttypeofgrowthcurve models. LuandZhang(2014)expandedthestudytofurtherconductarobustgrowthmixture analysiswithnonnormalmissingdata. Althoughtherearemanyadvantagesinusingtdistributionsforrobustdataanalysis(e.g.,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.