Table Of ContentRunninghead: ROBUSTBAYESIANAPPROACHESINGCM 1
RobustBayesianApproachesinGrowthCurveModeling: UsingStudent’st Distributionsversus
aSemiparametricMethod
XinTong
UniversityofVirginia
ZhiyongZhang
UniversityofNotreDame
Tong, X., & Zhang, Z. (2020). Robust Bayesian approaches in growth curve modeling: Using Student's t
distributions versus semiparametric methods. Structural Equation Modeling, 27(4), 544-560. https://
doi.org/10.1080/10705511.2019.1683014
The research is supported by through the grant program on Statistical and Research Methodology in
Education from the Institute of Education Sciences of U.S. Department of Education (R305D140037).
AuthorNote
CorrespondenceconcerningthisarticleshouldbeaddressedtoXinTong,Departmentof
Psychology,UniversityofVirginia,Charlottesville,VA22903. Email: xt8b@virginia.edu.
ROBUSTBAYESIANAPPROACHESINGCM 2
Abstract
Despitebroadapplicationsofgrowthcurvemodels,fewstudieshavedealtwithapracticalissue--
nonnormalityofdata. PreviousstudieshaveusedStudent’stdistributionstoremedythe
nonnormalproblems. Inthisstudy,robustdistributionalgrowthcurvemodelsareproposedfroma
semiparametricBayesianperspective,inwhichintraindividualmeasurementerrorsfollow
unknownrandomdistributionswithDirichletprocessmixturepriors. BasedonMonteCarlo
simulations,weevaluatetheperformanceoftherobustsemiparametricBayesianmethodand
compareittotherobustmethodusingStudent’stdistributionsaswellasthetraditional
normal-basedmethod. WeconcludethatthesemiparametricBayesianmethodismorerobust
againstnonnormaldata. Anexampleaboutthedevelopmentofmathematicalabilitiesisprovided
toillustratetheapplicationofrobustgrowthcurvemodeling,usingschoolchildren’sPeabody
IndividualAchievementTestmathematicaltestscoresfromtheNationalLongitudinalSurveyof
Youth1997Cohort.
Keywords: SemiparametricBayesian,Robustmethod,Dirichletprocessmixture,Growth
curvemodeling.
ROBUSTBAYESIANAPPROACHESINGCM 3
RobustBayesianApproachesinGrowthCurveModeling: UsingStudent’st Distributionsversus
aSemiparametricMethod
Growthcurvemodelingisoneofthemostfrequentlyusedanalyticaltechniquesfor
longitudinaldataanalysis(e.g.,McArdleandNesselroade,2014;MeredithandTisak,1990). In
growthcurvemodeling,repeatedmeasuresofdependentvariablesarerepresentedasafunctionof
timeandpossiblecovariates,andthefunctionmeanisthemeangrowth. Individualvariations
aroundthemeangrowthcurveareduetorandomeffectsandintraindividualmeasurementerrors.
Traditionalgrowthcurveanalysistypicallyassumesthattherandomeffectsandintraindividual
measurementerrorsarenormallydistributed. Althoughthenormalityassumptionmakesgrowth
curvemodelseasytoestimate,datainsocialandbehavioralsciencesarecommonlycollected
usingsurveysorquestionnairesandthusoftenarenonnormal(Cainetal.,2017;Micceri,1989)
becauseofnonnormalpopulationdistributionsordatacontamination. Ignoringthenonnormality
ofdatamayleadtoinefficientorevenbiasedparameterestimates,andstatisticalinferencesbased
oncommonteststatisticsandfitindicescouldbemisleading(e.g.,Maronna,Martin,andYohai,
2006;YuanandBentler,2001). Inthisarticle,weproposeasemiparametricBayesianmethodto
handlethenonnormalityissueingrowthcurvemodelingandcomparetheproposedmethodto
existingrobustBayesianapproachesusingStudent’st distributions.
Researchershavebecomemoreandmorekeenlyawareofthelargeinfluencethat
nonnormalityhasuponmodelestimation(e.g.,Hampel,Ronchetti,Rousseeuw,andStahel,1986;
Huber,1981;Yuan,Bentler,andChan,2004)andhavedevelopedstrategiesaimingtoprovide
reliableparameterestimatesandinferenceswhenthenormalityassumptionisviolated. A
straightforwardandfeasiblestrategyistoeithertransformthedatasothattheyareclosetobeing
normallydistributed,ordirectlydeletepotentialoutliersbeforedataanalysis. However,data
transformationoftenmakestheinterpretationofthemodelestimationresultscomplicated.
Simplydeletingoutliersmayreduceefficiencyastheresultinginferencesmayfailtoreflect
uncertaintyintheexclusionprocess(e.g.,Lange,Little,andTaylor,1989). Moreover,diagnosing
multivariateoutliersisachallengingtask(e.g.,Filzmoser,2005;PeñaandPrieto,2001). Tong
ROBUSTBAYESIANAPPROACHESINGCM 4
andZhang(2017)proposedsixmethodstodetectoutlyingobservationsingrowthcurvemodeling
andconcludedthatthegreatestchanceofsuccesscomesfromtheuseofmultiplemethods,
comparingtheirresultsandmakingadecisionbasedonresearchpurposes. Therefore,
alternatively,manyresearchers(e.g.,SavaleiandFalk,2014;YuanandZhang,2012)have
recommendedtheapplicationofrobustmethodsandstatisticstoprotectdatafrombeingdistorted
bythepresenceofoutliersornonnormality. Thesemethodseitherdownweightthepotential
outliersasatransformationtechnique(e.g.,YuanandBentler,1998)orassumethatthedatacome
fromcertainnonnormaldistributionssuchasatdistributionoramixtureofnormaldistributions
(e.g.,AsparouhovandMuthén,2016;MuthénandShedden,1999;TongandZhang,2012).
Recently,robustmethodsfromBayesianperspectiveshavedrawngrowinginterestbecause
Bayesianmethodshavemanyadvantages. First,estimatingmodelswithcomplexstructuresoften
involveshighdimensionalintegrationandthusiscomputationallyintensive. Samplingmethods
suchasMarkovChainMonteCarlo(MCMC)undertheBayesianframeworkcanhandlethis
problemrelativelyeasily. Second,Bayesianestimationcanconvenientlyinferparametersthatdo
nothavesymmetricdistributions(e.g.,varianceparameters),whereasitisdifficultor
computationallyintensivetocapturetheasymmetricnatureforparametersusingfrequentist
methods. Third,withBayesianmethods,priorinformationcanbeincorporatedviainformative
priorstomakeparameterestimatesmoreefficient. Furthermore,Bayesianmethodsnaturally
accommodatemissingdatawithoutrequiringnewtechniquesforinferenceandmissingdatacan
betakenintoaccountatthesametimeasparameterestimation. Becauseofthesestrengths,more
andmoreBayesianestimationmethodsareemployedinrobustanalysis.
FromtheBayesianperspective,oneapproachofrobustmethodstoaccountfor
nonnormalityistoreplacenormaldistributionsbyStudent’stdistributionsinthemodelasthe
degreesoffreedomoftdistributioncancontroltherobustness(Lange,Little,andTaylor,1989).
Forexample,Pinheiro,Liu,andWu(2001)proposedarobustversionofthelinearmixed-effect
modeltoremedythedistributionaldeviationfromthenormalityassumption,inwhichnormal
distributionsfortherandomeffectsandmeasurementerrorswerebothreplacedbymultivariatet
ROBUSTBAYESIANAPPROACHESINGCM 5
distributions. Afewstudieshavedirectlydiscussedthisapproachingrowthcurveanalysis. Tong
andZhang(2012)andZhang,Lai,Lu,andTong(2013)suggestedmodelingheavy-taileddataand
outliersingrowthcurvemodelingusingtdistributionsandprovidedonlinesoftwaretoconduct
therobustanalysis. Thetwoarticlesdemonstratedthattherobustgrowthcurvemodelingbased
ontdistributionsiseasytounderstandandimplement,andthuspotentiallywouldgreatly
promotetheadoptionofrobustgrowthcurveanalyses. However,althoughtherearemany
advantagesinusingtdistributionsforrobustdataanalysis(e.g.,TongandZhang,2012),
Student’stdistributionhasaparametricformandthusstillhasarestrictiononthedistributionof
data. Forexample,usingtdistributionsmaybesensitivetoskeweddataormixturedata,oreven
breakdownundersomecircumstances(e.g.,AzzaliniandGenton,2008;Zhang,2013). Notethat
researchershaveproposedrobustmethodsbasedonskew-normaldistributions(e.g.,Asparouhov
andMuthén,2016;Zhang,2013)toovercometheproblemofskeweddata. However,again,
skew-normaldistributionshaveparametricformsandarelimitedtodescribingcertainshapeof
distributions. Growthmixturemodels,firstintroducedbyMuthénandShedden(1999),provide
anotherusefulapproachtoreducetheinfluenceofthenonnormalityproblem. Thesemodels
assumethatindividualscanbegroupedintoafinitenumberofclasseshavingdistinctgrowth
trajectories. Althoughgrowthmixturemodelsareflexible,somedifficultissues,includingchoice
ofthenumberoflatentclassesandselectionofgrowthcurvemodelswithineachclass,havetobe
tackled. Thetypicalstrategyfixesthenumberoflatentclassesinadvanceatasmallvalue(e.g.,
2-4),modelsthegrowthtrajectoriesparametricallywithapolynomialfunction,andassesses
modelfitsusingcriteriasuchasAIC,BIC,andlikelihood-basedtests(Nylund,Asparoubov,and
Muthén,2007). SemiparametricBayesianmethods,sometimesreferredtoasnonparametric
Bayesianmethods(e.g.,GershmanandBlei,2012;MüllerandMitra,2004),provideadifferent
approachtothisproblem. Ratherthancomparingmodelsthatvaryincomplexity,semiparametric
Bayesianmethodsaretofitasinglemodelthatcanadaptitscomplexitytothedataandallowthe
complexitytogrowasmoredataareobserved.
Whileparametricmodelscanonlycaptureaboundedamountofinformationfromdata,
ROBUSTBAYESIANAPPROACHESINGCM 6
semiparametricBayesianmodelsallowforaricherandlargerclassofmodels. MüllerandMitra
(2004)pointedoutthatrestrictiontoaparametricfamilycanmisleadinvestigatorsintoan
inappropriateillusionofposteriorcertainty. Onthecontrary,semiparametricBayesianmethods
areadaptiveandhaveproventobeavaluabletoolfordiscoveringcomplicatedpatternsindata
duetotheirgreatflexibility. TherearetwotypicalbuildingblocksforsemiparametricBayesian
models: GaussianprocessandDirichletprocess,whereGaussianprocessisadistributionover
functionsthatcanbeusedformodelingfunctionsandclassification,andDirichletprocessisa
distributionoverprobabilitymeasuresthatcanbeusedfordensityestimationandclustering. For
robustanalysisagainstnonnormality,semiparametricBayesianmethodswithDirichletprocess
priorsaredesirable,byviewinglatentvariablesormeasurementerrorsasfromunknownrandom
distributions. FueledbytheMCMCideasandthedevelopmentofBayesiansoftware(e.g,Lunn,
Jackson,Best,Thomas,andSpiegelhalter,2013),manyresearchershavediscussedtheadvantages
andflexibilityofusingthesemiparametricBayesianmethods(e.g.,FahrmeirandRaach,2007;
Ghosaletal.,1999;Hjort,2003;Hjortetal.,2010;MüllerandMitra,2004;MacEachern,1999)
andhaveappliedthesemethodstomodelswithcomplexstructures. Forexample,Bushand
MacEachern(1996),KleinmanandIbrahim(1998),andBrownandIbrahim(2003)usedDirichlet
processmixturesforrandomeffectsdistributions. AnsariandIyengar(2006)usedDirichlet
componentstodefineasemiparametricdynamicchoicemodel. BurrandDoss(2005)useda
conditionalDirichletprocessfortherandomeffectsdistributionwithinameta-analysis
application. Dunson(2006)useddynamicmixturesofDirichletprocesstoallowalatentvariable
distributiontochangenonparametricallyacrossgroups. Forcategoricaldataanalysis,Dirichlet
processmixturesofmultinomialdistributionshavebeenstudiedandappliedtomissingdata
throughmultipleimputationtechniquestocapturecomplexdependenciesespeciallyinhigh
dimensions(SiandReiter,2013;Sietal.,2015). SemiparametricBayesianapproachhasalso
beendevelopedforstructuralequationmodelstorelaxtheassumptionthatthedistributionofthe
latentvariablesisnormal(e.g.,Lee,Lu,andSong,2008;YangandDunson,2010). Asfaraswe
areawareof,measurementerrorsorresidualsinthesemodelsarestillnormallydistributed
ROBUSTBAYESIANAPPROACHESINGCM 7
althoughitispointedoutinMüllerandMitra(2004)thatthenonparametricmodelextensioncan
gotowardsthedirectionofnonparametricresidualdistributions.
Despitethepopularityofgrowthcurvemodeling,theprevalenceofnonnormaldata,andthe
flexibilityofsemiparametricBayesianmethods,nostudyhasbeendirectlyconductedon
semiparametricBayesiangrowthcurvemodeling. Therefore,themaincontributionsofthisarticle
isto(1)proposeasemiparametricBayesianapproachforgrowthcurvemodelingtorelaxthe
normalityassumptionimposedintraditionalnormal-basedanalysis(especiallyonmeasurement
errors);and(2)evaluatetheperformanceofthesemiparametricBayesianmethodandcompareit
withtherobustmethodusingStudent’stdistributionswhichisaparametricanalysis. Therobust
methodusingStudent’stdistributionsisselectedforcomparisonbecauseitisrelativelymore
broadlyusedinpracticeasstatisticalsoftwarehasbeendevelopedforittofacilitatethe
implementation. Inthenextsection,webrieflyreviewthetraditionalBayesiangrowthcurve
modelingandtherobustapproachusingStudent’stdistributions. Afterthat,robust
semiparametricBayesiangrowthcurvemodelingisproposed. Inaddition,thecomparison
betweensemiparametricBayesianmodelsandfinitegrowthmixturemodelsisdiscussed. Thenin
thesubsequentsection,simulationstudiesarecarriedouttoshowtheeffectivenessof
semiparametricBayesianmethodsincomparisontothetraditionalgrowthcurvemodelingaswell
astherobustgrowthcurvemodelingusingStudent’stdistributions. Finally,weillustratethe
applicationofthesemiparametricBayesianmethodsthroughanexamplewiththePeabody
IndividualAchievementTestmathdatafromtheNationalLongitudinalSurveyofYouth1997
Cohort(BureauofLaborStatistics,U.S.DepartmentofLabor,2005). Weendthearticleby
summarizingourfindingswithrecommendations.
RobustBayesianGrowthCurveModeling
Bayesiangrowthcurvemodeling: abriefreview
Growthcurvemodelsareusedtoanalyzelongitudinaldatainwhichthesamesubjectsare
observedrepeatedlyovertimeonthesametests. Lety = (y ,...,y )0 beaT ×1random
i i1 iT
ROBUSTBAYESIANAPPROACHESINGCM 8
vectorandy beanobservationforindividualiattimej (i = 1,...,N;j = 1,...,T). HereN is
ij
thesamplesizeandT isthetotalnumberofmeasurementoccasions. Atypicalformof
unconditionalgrowthcurvemodelscanbeexpressedas
y = Λb +e ,
i i i
b = β +u ,
i i
whereΛisaT ×q factorloadingmatrixdeterminingthegrowthtrajectories,b isaq ×1vector
i
ofrandomeffects,ande isavectorofintraindividualmeasurementerrors. Thevectorofrandom
i
effectsb variesforeachindividual,anditsmean,β,representsthefixedeffects. Theresidual
i
vectoru representstherandomcomponentofb . Becauseβ isconstantacrossindividuals,b
i i i
andu sharethesametypeofdistributionwithdifferentmeans.
i
Traditionalgrowthcurvemodelstypicallyassumethatbothe andu followmultivariate
i i
normaldistributionssuchthate ∼ MN (0,Φ)andu ∼ MN (0,Ψ),whereMN denotesa
i T i q
multivariatenormaldistributionandthesubscriptdenotesitsdimension. TheT ×T matrixΦand
theq ×q matrixΨrepresentthecovariancematricesofe andu ,respectively. The
i i
intraindividualmeasurementerrorstructureisusuallysimplifiedtoΦ = σ2Iwhereσ2 isascale
e e
parameter. Bythissimplification,weassumehomogeneityoferrorvarianceacrosstimeand
measurementerrorsareuncorrelatedatdifferenttimepoints. Giventhecurrentspecificationof
u ,b ∼ MN (β,Ψ).
i i q
Specialformsofgrowthcurvemodelscanbederivedfromtheprecedingform. For
example,if
1 0
1 1 L ! β ! σ2 σ !
Λ = ... ... ,bi = Sii ,β = βLS ,Ψ = σLLS σLS2S ,
1 T −1
themodelrepresentsalineargrowthcurvemodelwithrandomintercept(initiallevel)L and
i
randomslope(rateofchange)S . Theaverageinterceptandslopeacrossallindividualsareβ
i L
andβ ,respectively. InΨ,σ2 andσ2 representthevariability(orinterindividualdifferences)
S L S
ROBUSTBAYESIANAPPROACHESINGCM 9
aroundthemeaninterceptandthemeanslope,respectively,andσ representsthecovariance
LS
betweenthelatentinterceptandslope.
Toestimategrowthcurvemodels,Bayesianmethodologycanbeapplied. Bayesian
methodsforcomplexdataanalysishavebeenmadepopularinthepastfewdecadesbecauseofits
advantagesasdescribedpreviously(e.g.,LeeandShi,2000;LeeandSong,2004;LeeandXia,
2008;Serangetal.,2015;Zhangetal.,2007b;TongandZhang,2012). Thebasicideaof
Bayesianmethodsistoobtaintheposteriordistributionsofmodelparametersbycombiningthe
likelihoodfunctionandthepriors. Whenpriorsareuninformativeorweaklyinformative,the
likelihooddominatesandthusresultsfromBayesianestimationaresimilartothosefrom
maximumlikelihoodestimation. Foratypicalunconditionalgrowthcurvemodel,wedefinethe
jointpriordistributionofmodelparametersbyp(β,Φ,Ψ)anddenotethelikelihoodfunctionas
L. Thejointposteriordistributionofmodelparametersis
Z
p(β,Φ,Ψ|y ) ∝ p(β,Φ,Ψ)×Ldb,
i
whereb = (b0,...,b0 )0. Thisintegralisdifficulttosolveinpractice. Instead,MCMCmethods
1 N
(e.g.,Gibbssampling;RobertandCasella,2004)areoftenappliedtoobtainparameterestimates
andstatisticalinferences. Wefirstobtainconditionalposteriordistributionsfortheparameters,
thenbyiterativelydrawingsamplesfromtheconditionalposteriordistributions,weobtain
empiricalmarginaldistributionsofthemodelparametersandmakestatisticalinferencesbasedon
theempiricalmarginaldistributions(GemanandGeman,1984). Detailedalgorithmcanbefound
inSongandLee(2012),Zhangetal.(2013),etc.
RobustBayesiangrowthcurvemodelingusingStudent’stdistributions
Thetraditionalgrowthcurveanalysisdiscussedaboveisbaseduponthenormality
assumptionofrandomeffectsandintraindividualmeasurementerrors. However,practicaldatain
socialandbehavioralsciencesarerarelynormalduetodatacontaminationornonnormal
populationdistributions. Withouttakingthenonnormalityproblemintoconsideration,wemay
obtaininefficientorevenincorrectparameterestimatesinmodelestimation(e.g.,Yuanand
ROBUSTBAYESIANAPPROACHESINGCM 10
Bentler,2001;YuanandZhang,2012). Studiestodealwiththeadverseeffectsofnonnormality
onparameterestimates,standarderrors,andteststatisticshavebeencarriedoutingrowthcurve
analysis. Inagrowthcurvemodel,thenonnormalitymayoccurintherandomeffects,inthe
intraindividualmeasurementerrors,orboth(Pinheiroetal.,2001). Motivatedbyarealdataset
fromanorthodonticstudy,Pinheiroetal. proposedarobusthierarchicallinearmixed-effects
modelinwhichtherandomeffectsandtheintraindividualerrorsfollowmultivariatet
distributions,withknownorunknowndegreesoffreedom.
Byusingmultivariatetdistributions,extremevaluesinadatasetcanbedownweighted.
Supposek dimensionaldatax followsamultivariatetdistribution,withk ×1locationvectorµ,
k ×k shapematrixΣ,anddegreesoffreedomν,denotedbyMT(µ,Σ,ν). Theprobability
densityfunctionofx is
Γ[(ν +k)/2] (cid:20) 1 (cid:21)−(ν+k)/2
p(x|µ,Σ,ν) = 1+ (x−µ)TΣ−1(x−µ) .
Γ(ν/2)νk/2πk/2|Σ|1/2 ν
Themaximumlikelihoodestimatesofmodelparametersθˆsatisfy
ΣN w A Σ(θˆ)−1(x −µ(θˆ) ) = 0(Langeetal.,1989),whereN isthesamplesize,A isthe
i=1 i i i i i i
matrixofpartialderivativesofµ(θ) withrespecttoθ,andw = ν+τi istheweightassignedto
i i ν+δ2
i
casei(τ isthedimensionofθ forcasei,andδ2 isthesquaredMahalanobisdistance
i i
δ2 = (x −µ )T Σ−1(x −µ )forcasei). Thus,potentialoutlierscanbedownweightedinthe
i i i i i i
modelestimationprocessbecauselowerweightswillbeassignedtocaseswithlargesquared
Mahalanobisdistances,givenfixeddegreesoffreedomν anddimensionsτ .
i
FromaBayesianperspective,TongandZhang(2012)proposedfourtypesofdistributional
growthcurvemodelswheretherandomeffectsu andintraindividualmeasurementerrorse may
i i
followeithermultivariatenormalortdistributions. Theyconcludedthatfourtypesof
distributionalgrowthcurvemodelsimplyverydifferentpatternsingrowthtrajectories,andthus
givenanempiricaldataset,itisveryimportanttospecifythecorrecttypeofgrowthcurve
models. LuandZhang(2014)expandedthestudytofurtherconductarobustgrowthmixture
analysiswithnonnormalmissingdata.
Althoughtherearemanyadvantagesinusingtdistributionsforrobustdataanalysis(e.g.,