PsychologicalAssessment ©2015AmericanPsychologicalAssociation 2016,Vol.28,No.3,319–330 1040-3590/16/$12.00 http://dx.doi.org/10.1037/pas0000152 Measuring Executive Function in Early Childhood: A Case for Formative Measurement Michael T. Willoughby Clancy B. Blair UniversityofNorthCarolinaatChapelHill NewYorkUniversity The Family Life Project Investigators UniversityofNorthCarolinaatChapelHill,ThePennsylvaniaStateUniversity,DukeUniversity, andTheArizonaStateUniversity shers.broadly. ToPrrhorisjeefscltetuc(dtFiyvLetPei)sn,tedadicpwartohoserpstehoceftritvihneedlilovantiedgnuittaucldoeinnxsaetlcrusuttcuitvdoeyffouEfnFfc.atEimoFnildi(eaEstFaw)thhtaoastkwwseewrreeerrceeocblrlueeticttteeerddcaahtsatrhpaeacrtbteiorritfzhtehodefaFasanfmoerwimlycaLhtiiivfldee dpubliminated (fNit!of1m,2o9d2e)l,swihnenwchhicilhdrEenFwtaesrkes3,w4e,raendus5edyeaasrseoiltdh.eVrafonrismhaintigvetetorradreteflsetsctwiveereinudsiecdattoortsesotfthteherellaattievnet salliedisse csyomnsptrtuocmtsof(aEtFAingetshe3ptroed5ictyioeanros)f,inantedlleacctaudaelmabiciliatych(iaetvAemgeen3t),(aattteknitniodner-gdaerftiecnit).hyRpeesrualcttsivciotynsdiisstoerndtleyr ofitobe iinnddiicvaidteudalthtaastkEsFwtaesrkescwomerbeibneetdtetrorefporremseannteodvaesrafollrmmaetaisvueriendoifcEatForasboiflitthyeinlatwenatycsognesntreurcatlloyfcEoFn.sNisteexntt, et onot withformative(i.e.,creatingacompositemeanscore)andreflective(i.e.,creatinganEFfactorscore) n ors measurement. The test–retest reliability and developmental trajectories of EF differed substantially, i ciationerand dweapsemndairnkgedolnywhihgihcherowvehreanllrmeperaessuerneteodfaEsFaafbaiclittoyrwscaosreusveedr.suInsgcoenmepraols,ittehescaocrreo.ssR-etismuletsstaarbeilditiyscoufssEeFd sous withrespecttothewaysinwhichthestatisticalrepresentationofEFtaskscanexertalargeimpacton s Aal inferencesregardingthedevelopmentalcauses,course,andconsequencesofEF. aldu ologicindivi Keywords:executivefunction,earlychildhood,formativemeasurement he ch yt Psof ne caus Executivefunctions(EFs)refertoasetofcognitiveabilities from 1985 to 1990 that used “executive function” in the title or erial thatareimportantfororganizinginformation,forplanningand keywords, compared with 7,445 studies that did so from 2006 to mn Aso problem solving, and for orchestrating thought and action in 2010. eer hp support of goal-directed behavior (Blair & Ursache, 2011). pyrightedbytsolelyforthe Hagthbreeoinlwticetinerem,setxhtehpexaoetgncesuenentiretvvireaaellfliuyrnnetofecevgtireroearnnttithivnEeetFlhfauersnetWcf2eti5erosbnytoseo.faSraSscc.wiiFeeindontercifeericxa(wnaimgnhetipecrloheef,satiacncitsneeesraErsreeFclsahhttheaodesf ouCDsueqrsupreiestnteitothnCesosanubcrogeuepttohufoawmliuztloatitdbioiesnsctsipmoliefnaatshruyereiCntthoeernecsstotrniunstcrEtuFco,tfnrEeummFaseirn- coed ScienceCitationIndexExpanded,SocialSciencesCitationIndex,and unanswered. For example, despite the potential ease of use, sd entinten theArts&HumanitiesCitationIndexdatabases)identified18studies parent-ratings of children’s EF behaviors correlate very poorly mi docucleis Thissarti ThisarticlewaspublishedOnlineFirstJune29,2015. Key Investigators include Lynne Vernon-Feagans, University of North Thi MichaelT.Willoughby,FPGChildDevelopmentInstitute,Universityof Carolina at Chapel Hill; Martha Cox, University of North Carolina at North Carolina at Chapel Hill; Clancy B. Blair, Department of Applied Chapel Hill; Clancy B. Blair, The Pennsylvania State University; Peg Psychology,NewYorkUniversity;andTheFamilyLifeProjectInvesti- Burchinal,UniversityofNorthCarolina;LindaBurton,DukeUniversity; gators,UniversityofNorthCarolinaatChapelHill,ThePennsylvaniaState KeithCrnic,TheArizonaStateUniversity;AnnCrouter,ThePennsylvania University,DukeUniversity,andTheArizonaStateUniversity. State University; Patricia Garrett-Peters, University of North Carolina at MichaelT.WilloughbyisnowatRTIInternational,ResearchTriangle ChapelHill;MarkGreenberg,ThePennsylvaniaStateUniversity;Steph- Park,NorthCarolina. anieLanza,ThePennsylvaniaStateUniversity;RogerMills-Koonce,Uni- TheNationalInstituteofChildHealthandHumanDevelopmentGrants versity of North Carolina at Greensboro; Debra Skinner, University of R01 HD51502 and P01 HD39667, with cofunding from the National North Carolina at Chapel Hill; Emily Werner, The Pennsylvania State InstituteonDrugAbuse,supporteddatacollection.TheInstituteofEdu- University;andMichaelT.Willoughby,RTIInternational,ResearchTri- cationalSciencesGrantR324A120033supporteddataanalysisandwriting. anglePark,NorthCarolina. Theviewsexpressedinthisarticlearethoseoftheauthorsandtheydonot Correspondence concerning this article should be addressed to Mi- necessarilyrepresenttheopinionsandpositionsoftheInstituteofEduca- chaelT.Willoughby,RTIInternational,Hobbs#349,3040Cornwallis tionalSciences,theDepartmentofEducation,ortheNationalInstituteof Road, Research Triangle Park, NC 27709. E-mail: mwilloughby@ ChildHealthandHumanDevelopment.TheFamilyLifeProjectPhaseI rti.org 319 320 WILLOUGHBYETAL. withchildren’sperformanceonEFassessments(mediancorre- The overarching objective of this study is to explicate these lation of r ! .19 across 20 studies; see Toplak, West, & contrasting perspectives on the way in which EF is conceptu- Stanovich,2013).Moretroublingisevidencethatperformance- alized specifically as it informs the statistical modeling of the based indicators of EF are typically poorly to modestly corre- latent construct of EF. To date, virtually all studies have im- lated, despite being administered at the same time, using the plicitlytreatedchildren’sperformanceonindividualEFtasksas same method, in the same setting, by the same person.1 As we reflectiveindicatorsoftheconstructofEFthroughtheiruseof recently reported, the weak to modest correlations among exploratory and confirmatory factor analysis. Here, we intro- performance-based indicators of EF (mean r ! .30 for associ- duce an alternative conceptualization of the latent construct of ations between tasks intended to measure EF or one of its EF, which characterizes individual EF tasks as formative (not subdomains—e.g., inhibitory control) were evident in studies reflective) indicators of the latent construct of EF. We use a that varied substantially with respect to participant age (3 to combination of statistical and pragmatic evidence in order to 70" years of age) and the specific tasks used (Willoughby, demonstratethepotentialutilityofconceptualizingEFtasksas Holochwost, Blanton, & Blair, 2014). These results suggested formative indicators of the latent construct of EF. y. that weak to modest correlations among performance-based shers.broadl iwnedriecantootrsinmdiacyatibveeaofcmhaeraascutreermisteinctodfeftihceiecnocniesstrfuocrtaopfaErtFicualnadr Reflective VeLrasutesnFtoVramraiatibvleesIndicators of publinated setoftasksorforaparticularagegroup(e.g.,youngchildren). Latent variables that are exclusively defined by reflective dmi Hence, disagreements between rated and performance-based salliedisse ipnedrficoarmtoarsncneo-tbwaistehdstainnddiicnagt,oersveonftEhFeaisgrtereomubelnetsoammoe.ngmultiple ilnatdeinctatcoornsstarruectchinatroacmtearnizifeedstbiyndpicaathtosrsth(asteeemthaentaotpe pfraonmelsthoef ite Figures 1, 2, and 3). In contrast, latent variables that are oroneofisnottob EexFaIsnmhpthaleve,eaEbbFseseenwnceedreeosrfcercaiebnneatdlryruolswikinleygnedadeftvionaetrhideetcyaoirnopsfoernmtsteurtsaafpdfhiecofircnso.intitForoonrl, eblaxytcelnputastcihvosenlstyhtraudtcetfeim(nseeadenatbtheyeffbrooortmmtoamttihvepeamninealdnsiicofaeftsoFtrisginuadrrieecsac1tohtrahsrraoicnuttegorhizt3he)de. ationand s2y0s1t1e)ma(nCdenatserthoentchoenDduecvteolropoifnganChoirlcdhaetstHraarv(EarsdpyUneitvaelr.s,itiyn, Althoughthedistinctionbetweenreflectiveandformativemea- cier surementisnotnew(Blalock,1974;Fornell&Bookstein,1982; ssous press).Althoughheuristicallyuseful,thesemetaphorsriskper- Heise, 1972), the merits and pitfalls of these contrasting per- Aal petuatingtheideathatthebrainhasadedicatedsystem(e.g.,an ologicalindividu EThFismcoodnucleep)tutahlaftraismrineggiiosncaollnysibstoeunntdwittohtthheecphraerfaroctnetrailzactioornteoxf. scRpioaetnchst,iv(e2Bs0o0lcl8oe;nntiE&nduweBataroudlsbd,ery2a,c0t21i0v11e;l1yH; doDewibaeamltle,adnBtaormpeooivunilgko,sp,s&yRciheWoflmeilrce,otr&xi-, he EF as a latent variable that “gives rise to” (accounts for) the ycth 2007b). Psof covariationofindividualperformanceacrossasetofperformance- Three linked sets of ideas help to provide an intuitive under- ne basedEFtasks.Moreover,thisperspectivecloselyconformstothe caus standing of the differences between latent constructs that are eAmeriersonal atposesrrufeomprrmpetsaieonnnctseionafdcfirvaocisdtsouraalabndaaitfltfyeetrriycenotecfcethsansinikqsEu.eFs,ownhthicehbaarseisroouftiinndeliyviudsueadl caobmlespotsheadt aorferereflpercetsievneteodr fuosrimngateivxecliunsdivicealytorrse.flFeicrtsitv,elaitnednitcavtaorris- byththep An alternative characterization of EF is that it represents a ainrediccahtaorrasc.tIenrizceodntrbayst,thlaattenvtarviaatriioanblethsatthaist asrhearreepdreasmenotnegd uthsoinsge pyrightedsolelyfor rdeatinsatgrlei.,bu2ot0fe1ds3pn;eePctiweftioecrrskcesongann&idtiPbvoreasnianeb-riw,lii2tdi0ee1s2cot)h.naFnterocdmteipveitthnyids“hpouenrbssmp”eu(clCttiiopvlleee, evstxarcruilcauttssiiovaneslsayucmrfooesrsmththaattoisvienediinicndaditciocaratsotorasrrs.eSaperocesointcidhv,aerlwyachcteeorrerirazeseldarteebfdlye(cattihnvedeptcoroetnaf--l coed the prefrontal cortex is important because of the dense inter- erablyofmoderatetolargemagnitude),formativeconstructsmake isnd connectionsitshareswithotherpartsofthebrain.Forexample, entnte in the case of inhibitory control, Munakata et al. (2011) em- no assumptions about either the direction or magnitude of corre- mi lationsbetweenindicators.Byextension,whereastraditionalindi- docucleis pfohrasdiizsetidnctthatytpdeisffoefreinnthipbrietfioronnotanltrheegiboanssispolafytehdeirundiiqffueerernotlieasl cesofthereliabilityarerelevantforreflectiveconstructs,theyare hisarti patterns of connectivity with other regions of the brain. Simi- irrelevant for formative constructs (Bollen, 1984; Bollen & Len- Ts nox, 1991). Third, reflective indicators of a latent construct are hi larly, Chrysikou, Weber, and Thompson-Schill (2014) empha- T consideredinterchangeable;hence,theadditionorremovalofany sizedthattheprefrontalcortexexertedtop-downinfluenceson indicatordoesnotchangethesubstantivemeaningoftheconstruct. other aspects of cognition and served as a filtering mechanism Incontrast,formativeindicatorsareintendedtorepresentmultiple to bias bottom-up sensory information in ways that facilitate facets of the construct; hence, the addition or removal of any optimalbehavioralresponsesthatweresensitivetocontext.The indicatorhasthepotentialtochangethesubstantivemeaningofthe importantpointisthatthereisnoEFsystemormodule.Rather, construct. EF may be better characterized as an emergent property of individuals. This conceptual framing is consistent with the characterization of EF as a latent variable that is defined by 1Givenourfocusontheearlychildhoodperiod,inwhichtheprepon- (rather than giving rise to) individual performance across a set derance of the current evidence indicates that EF is an undifferentiated of performance-based tasks. This perspective does not corre- (unidimensional) construct, we use the generic referent EF throughout. However,allofourargumentsequallyapplytothestudyofmorenarrowly spond well with the use of factor analytic techniques as a definedsubdimensionsofEF—includinginhibitorycontrol(IC),working statistical approach for representing individual differences memory(WM),orattentionshifting(AS)—thataremoretypicallystudied across a set of performance-based EF tasks. inolderchildrenandadults. FORMATIVEMEASUREMENTOFEF 321 y. s.adl herbro s blied punat dmi salliedisse ite ofob et onot n ors i ationand Ffuingcutrieon1.preRdiecftliencgtivWee(ctohpsl)earnPdrefsocrhmoaotlivaend(bPorttiommar)yinSdciaclaetsorosfoIfnteexlelicguetnivcee ociser subtests. su Asal Figure3. Reflective(top)andformative(bottom)indicatorsofexecutive aldu functionpredictingacademicachievement. ologicindivi forDmiaffteivreenocresrebfeletwcteivene liantdeinctatcoornssctraunctaslsthoabtecodnissicsetrnoefd(etnhtrioreulgyh) he ycth their equations. Following the notation of Bollen and Bauldry Psof (2011), the equations for a latent construct with three reflective y1i!"1 # $11%1i # ε1i (1) ne ericaalus (i.e.,“effect”)indicatorsare y2i!"2 # $21%1i # ε2i (2) mn eAerso y3i!"3 # $31%1i # ε3i (3) hp ythe whereypiisthepthindicatorthatdependsonthelatentconstruct, yrightedbolelyfort #dcoe1sni.csrTtirbhueecttfhaanectdomirtassglinoniadtduicidnaegtosor,fs$.thpT,ehraeesprsreoescsidieanuttaiolsntvraubrceitatuwnrceaeelsnc,oeεeapfic,fhircetifhelneectlstattthehnaattt copeds part of the manifest indicator y that is not accounted for by the sd latent construct. Latent variables that are composed entirely of in entnte reflectiveindicatorshaveasmanyequationsasindicators.More- mi docucleis odvefeirn,itrieofnlecotfivteheinldatiecnattocrsonasrterucchtoosfeninttoerreesptr(eis.ee.n,tththeeythhaevoerectiocna-l hisarti ceptualunity;seeBollen&Bauldry,2011).Forcomparisonpur- This poses,theequationforalatentconstructwiththreeformative(i.e., T “causal”)indicatorsis % !" # & x # & x # & x # ’ (4) 1i % 11 1i 21 2i 31 3i 1i wherex isthepthindicatorofthelatentconstruct# Thesingle pi 1i. residual variance, %, represents all of the influences of the latent construct, # , that are not captured by the formative indicators. 1i Latent variables that are composed of entirely formative indica- tors have a single equation with as many predictors as indica- tors. Like reflective indicators, formative indicators are ex- pected to have conceptual unity. Bollen and Bauldry (2011) drew a further distinction between formative (causal) and so- Figure2. Reflective(top)andformative(bottom)indicatorsofexecutive called “composite” indicators. The equation for a three indica- functionpredictingattention-deficithyperactivitydisorderbehaviors. tor composite construct is 322 WILLOUGHBYETAL. C !w #w x #w x #w x (5) narrative approach does not facilitate unambiguous inferences 1i 10 11 1i 12 2i 13 3i regarding whether a set of performance-based tasks are better where x is the pth indicator of the composite construct C The pi 1i. characterized as formative or reflective indicators of the latent primarydifferencebetweencompositevariables(Equation5)and constructofEF. latent variables that are defined entirely by formative indicators Fortunately,thereexistsastatisticalapproachthatcanbeused (Equation4)isthatcompositesdonotincludeadisturbanceterm. toformallytestwhetheralatentconstructisbestcharacterizedas Thatis,compositesareexactlinearcombinationsoftheirindica- exclusivelyformative,exclusivelyreflective,orsomecombination tors. Moreover, there is no assumption that composite indicators of indicators. The so-called vanishing tetrad test (VTT) has been necessarilyhaveconceptualunity. developedbyBollenandcolleagues(Bollen&Ting,1993,1998, A third way to understand the differences between latent vari- 2000;Hipp,Bauer,&Bollen,2005).Althoughafulldescriptionof ables that consist of (entirely) formative (including causal and thisapproachisbeyondthescopeofthisarticle,thekeyideaisthat composite) and reflective (effect) indicators is with reference to althoughmodelsthatdifferwithrespecttotheirtypeofindicator theirimpliedstatisticalrepresentation.Alatentconstructthatcon- (formative, reflective) are not nested in the conventional sense y. sists of entirely reflective indicators is represented using explor- (i.e.,thereisnosetofparameterconstraintsthatresultinalatent shers.broadl athtoartycoannsdisctosnoffirmenattiorerlyyffaocrtmoratainvaelyintidcicmatoodreslsis. Areplaretesnetntceodnsutsriuncgt vaalraiatebnlet tvhaartiaibsldeetfhinaetdisbydeffoirnmedatibvyeirnedflieccattiovrestiondbiecastuobrss,umoredvibcye publinated mthuatltilpatleenitndcoicnasttorru,cmtsutlhtiaptleenotuirtecloymceonmsoisdteolsf.fAormcoartoivlleariyndpiocianttoriss versa),theyareoftennestedwithrespecttotheirvanishingtetrads. dmi The VTT statistic can be used to evaluate the global fit for any salliedisse amreorsetaotuisttciocamlleysuanredearvidaeilnatbifleied(ManacdCcaalnluomnly&bBereoswtinmea,t1ed99if3)t.wTohoisr tShEeMrel(aHtiivpepfeittoafl.c,o2m00p5e;tiHngipmpo&deBlsotlhleant,ar2e00n3es),teadswwiethllraessptoecttetsot ofitobe hasgenerateddebateregardingtheinherentmeaningofsuchlatent theirtetrads,whichishowitwasusedhere(seeBollen,Lennox, oronesnott c2o0n0s7t;ruHcotsw,ewllh,icBhreiisvibke,y&ondWtihlceosxc,o2p0e0o7fa;thHisowaretlilcleet(asle.e, 2B0o0l7lebn)., &theDparholpyo,s2ed00s9tu,dfyorwaansetoxtreenedsetidmeaxteamvaprliea)t.ioTnhseoffirmstoodbeljsectthiavtewoef ationandi Cneonmtspaonsaitleysciosnosrtruuscitnsgaaresibmepstlereapgrgerseegnatteidonus(ein.gg.,pmrineacnip)loefcsocmorpeos-, hWavireth,pr&eviGoureselynbpeurgb,lis2h0e1d0,in20t1h2i)s ajonudrntoalu(sWeinlleosutgedhbVy,TTBslaitro, sociuser whichisanalogoustoaprinciplecomponentsanalysisapproachto determinewhetherchildren’sperformance-basedtaskswerebetter Asal scoringthatappliesunitweights. characterized as a formative or reflective indicators of the latent hologicaleindividu smtraIuynctbsaedtdhuiatntiodcneornsttosoisoptdreantcottiicrieanllvyoaoknfedrdestifaflfeteicsrtteiivnceatlapdnhidiflfofesororepmnhcaieteissv,eolafitnesdncitcieacntoocnres-. cporanIgnsmtarudactditictoioefnvEitdFoe.nstcaetisttoichalelmpoidneflorcmomqpuaerisstoionnss,wabeoaulstothceonospidtiemreadl ch yt Following Borsboom, Mellenbergh, and van Heerden (2003), la- Psof way to represent children’s performance across a battery of ne tent constructs that are composed of reflective indicators imply a performance-based EF tasks. For example, if the nested VTTs as ericalu realist philosophical view in which latent variables are presumed indicatedthatEFtaskswerebetterrepresentedasformativeversus eAmerson tvoarieaxbilsets.apInarctofnrtormast,anladtepnrteccoendsetruthcetsmtheaatsuarreemcoenmtpoofseidndoifcafotor-r rweofluelcdtibveewinhdeitchaetrorasndofhotwhethciosnwstoruulcdtiomfpEacFt,oaurrperlaatcetdicaqluuensdtieorn- hp ythe mativeindicatorsmayimplyaconstructivistphilosophicalviewin standingofEF.Onceagain,thiswasaddressedthroughareanal- bt whichlatentvariablesdonotexistapartfromobservedmeasures, yrightedolelyfor butinsteadreflectasummaryofsuchmeasures. ydpsereivsveilooofupsrmelyseunaltstasslurmecgheaadrndgthienagtinitnhdeoiuvteridstub–aarltetteEerFsyttraoesflkiasEbFwiliettryaesakrnes,fdlewpcathitivtceehrninshdaoidf- ps coed Strategies for Differentiating Formative From cators of the latent construct of EF (Willoughby & Blair, 2011; entisntend Reflective Indicators W20i1ll2o)u.gIhnbyo,urWpirrethv,ioBulsairre,te&stFsatumdiyly, wLeiferePproorjteecdt mInovdeesstitgaretotersst, mi docucleis EFTihsrebeesgtenceornasltraupepdroaaschaesfocramnabteivueseodrtroefhleelcptidveetelarmteinntevwarhieatbhleer. choigrrhelraettieosntscfoorrreilnadtiiovnidufoarl ttahseksla(tresnt&va.6ri0a)b,lebuetstainmaetxeceopftiaobnialliltyy hisarti Thefirstapproachreliesontheapplicationofaseriesofdecision (’!.95),acrossthe2-weekinterval.Inourlongitudinalstudy,we Ts hi rules (see, e.g., Coltman, Devinney, Midgley, & Venaik, 2008; reported exceptionally high correlations for the latent variable T MacKenzie,Podsakoff,&Jarvis,2005).Theoretically,theessen- estimate of EF across 1- to 2-year intervals (’s ! .86 to .91), tial questions ask (a) whether the latent construct is assumed to which substantially exceeded the 1- to 2-year stabilities for indi- existindependentofthemeasuresusedorissolelyacombination vidualtasks.Althoughweattributedthoseresultstothemeritsof ofindicators,(b)thedirectionofcausalitybetweenindicatorsand latentvariableestimation,wehavesubsequentlybeguntoquestion the latent construct, and (c) whether a set of indicators “share a the meaning of 2-week and 2-year stabilities of this magnitude, theme,” are interchangeable, and whether the conceptual domain includingwhethertheseresultswereanartifactoffactoringtasks of construct changes based on the addition or omission of items. thatweremodestlycorrelated.Thesecondgoalofthecurrentstudy Empirically,theessentialquestionsask(a)aboutthemagnitudeof wastoexaminewhetherandhowthe2-weekretestreliabilityand correlations among indicators, (b) the extent to which indicators 2-year stability would change had EF been conceptualized as a sharethesameantecedentsandconsequencesastheconstruct,and formativelatentconstruct. (c) what the best representation of indicators as formative or Insum,theoverarchingobjectiveofthisstudywastoconsider reflective indicators is. We have considered these questions else- twocompetingwaysofrepresentingthelatentconstructofEF.A where (Willoughby et al., 2014). Ultimately, the reliance on this combinationofstatisticalandpragmaticevidencewasmarshalled FORMATIVEMEASUREMENTOFEF 323 in order to help inform this decision. The pragmatic evidence, in in multiple articles this journal, we provide only abbreviated particular, was intended to help inform questions about whether descriptionshere. andhowpracticalconclusionsaboutthestabilityandchangeinEF Workingmemoryspan(WMS). Thisspan-liketaskrequired abilitiesinearlychildhoodmaydifferasafunctionofthewaysin childrentoperformtheoperationofnamingandholdinginmind whichindividualEFtaskscoreswerecombined. twopiecesofinformationsimultaneously(i.e.,thenameofcolors and animals in pictures of “houses”) and to activate one of them (i.e.,animalname)whileovercominginterferenceoccurringfrom Method the other (i.e., color name). Items were more difficult as the numberofhouses(eachofwhichincludedapictureofacolorand Participants animal)increased. Pick-the-picture(PTP)game. Thisisaself-orderedpointing The Family Life Project (FLP) was designed to study young taskpresentedtochildrenwithaseriesoftwo,three,four,andsix children and their families who lived in two (Eastern North Car- pictures in a set. Children were instructed to continue picking s.adly. oolfinthae,CUennitteradlSPteantenssywlivtahnhiaig)hofpothveerftoyurratmesaj(oDriglle,o2g0r0a1p)h.iTcahleaFreLaPs pictures within each set until each picture had “received a turn.” herbro adopted a developmental epidemiological design in which sam- This task requires working memory because children have to blised plingprocedureswereemployedtorecruitarepresentativesample remember which pictures in each item set they have already salliedpudisseminat oaatnfdt1h,e2A9tfi2mriccehaonilfdArthemneecwrhhiicloadsn’esffbaaimmrthiill.iieeLssorweins-iidnNecdoomritnheofCnaemaroiolfileitnhsaeinswibxeorctehousontvatieteerss- tuaonnudicnhfwoeardms(tashtpieavrteei)af.olrTleohcoeantPiloyTnPaodfwmpainiscitstuoteroeresddcifhafaitcntughleetsf4oa-crarmonsdasn5tyr-iya3el-sayraenaadrs-sowelsdasss- ofitobe sampled (African American families were not oversampled in meSnitlsl.y sounds stroop. This task presented children with pic- et Pennsylvania because the target communities were at least 95% onot tures of cats and dogs and asked children to make the sound n non-African American). Full details of the sampling procedure ationorandis aInpvpeesatrigaetlosersw,h2e0r1e3)(.Vernon-Feagans, Cox, & Family Life Key ompepoowsitwehoenf sthhoatwewdhpicichtuwreasofaassdoocgi)a.teTdhiwstiathskeraecqhuirpeisctiunrheib(ieto.gry., ociser Ofthosefamiliesinterestedandeligibleandselectedtopartic- control,aschildrenhavetoinhibitthetendencytoassociatebark Assalu ipate in the study, 1,292 families completed a home visit at 2 andmeowsoundswithdogsandcats,respectively. aldu monthsofchildage,atwhichpointtheywereformallyenrolledin Spatial conflict (SC). This task presented children with a hologiceindivi tchoempstluedteyd. aInn tEoFtal,as1se,1s2sm1e(n8t7%at tohfethAegetot3a,l 4s,amanpdle/o)rch5ildyreeanr rsetismpounlise(pcicatrudrtehsaotfhacdaraspoircbtuoraetsofidaencatircaalndtobtohaatt.oInnittihaellyr,esapllotnesset nPsyceofth avsissietswsmasenctosm. Tphleitsedin(cil.ued.,efsamthioliseescwhhilodrheandfmorovwehdommoarentihna-nho2m00e ccaormd)pawtiebrleeswubitsheqthueeinrtlpylapcreemseenntetdonintlhoecarteisopnosntsheatcawredre(es.pga.,tipailcly- mericanalus mprielcelsudfreodmditrheectsatusdseyssamreeantcsoomfpclehtielddremne)aasnudretshobsyepchhoilnder,enwhwihcho tSuurbesseoqfuceanrtslya,lwteasytsitaepmpsearreeqduaibreodveathceonctarraloantetrhael rreessppoonnsseec(aer.dg).., eAerso wereabletocompleteatleastoneEFtaskduringatleastoneofthe childrenweretotouchtheirpictureofthecardespitethefactthat ythhep three(i.e.,Age3,4,and5year)homevisits.Childrenwhodidnot itappearedabovetheboat).Thistaskrequiredinhibitorycontrolas yrightedbolelyfort pdraaicdretnic(o3itp7da%tieffviensr.af4rn3oy%moftAhtfohrseiec3aw-n,hA4o-m,doeidrri5(cn-ayn!e;apr1!E,1F2.11a)5ss)w,eisctshhmilrdeensgptesenc(dntet!or(c15h76i1%ld) cr3eh-fyieledraerrnencaeshsteaosvsetmhteeoinrot.rveesrproidnesethcearsdp.aTtihalelSoCcatwioansoadfmteisntisstteimreudliatwtihthe scopdeds vins.P5e0n%nsymlvalaen;iap;!res.1p9ec),tisvtealtye,opf!res.i2d6en),coer(3b6e%ingvsre.c4r1u%iterdesiniditnhge forSmpaattitaolthcoenSflCicttasakr,rowwitsh(tShCeAex)c.epTtihoinstthaastkthwearsesidpeonntsiecaclarind in entnte low-incomestratum(77%vs.78%poor;p!.75). consisted of two black dots (“buttons”) and the test stimuli were hisdocumarticleisi Procedures at(orrirugochwht)stphthoeaintbtpiunotgtionanterrdtootwowsthpheoicilhnetfettdhoertoraitrghrhoetw.(lCepfhtoi)lidnrirtgeehndt.,wbIenuritetisaiunllbsyst,reuqacultleendlettlfoyt This Dataforthisstudyweredrawnfromhomevisitsthatoccurred theypointedintheoppositedirection.TheSCAwasadministered T whenstudychildrenwere3(twovisits),4(onevisit),and5(one atthe4-and5-yearassessments. visit) years old, as well as a school visit during the kindergarten Animal go/no-go. This is a standard go/no-go task in which year.Homevisitsconsistedofavarietyofparentandchildtasks childrenwereinstructedtoclickabutton(whichmadeanaudible (e.g., cognitive testing, interviews, questionnaires, and interac- sound)everytimetheysawananimal(i.e.,gotrials),exceptwhen tions). School visits consisted of a variety of direct child assess- it was a pig (i.e., no-go trials). Varying numbers of go trials mentsandclassroomobservations.Inthisstudy,wemakeuseof appeared prior to each no-go trial, including, in standard order, children’sachievementtestingthatwascollectedinthekindergar- 1-go, 3-go, 3-go, 5-go, 1-go, 1-go, and 3-go trials. No-go trials ten(spring)assessment. requiredinhibitorycontrol. Something’s-the-same game. This task presented children withapairofpicturesforwhichasingledimensionofsimilarity Measures wasnoted(e.g.,bothpictureswerethesamecolor).Subsequently, Executive function task descriptions. The EF battery con- a third picture was presented and children were asked to identify sistedofseventasks.Becausewehavealreadydescribedthesetask whichofthefirsttwopictureswassimilartothenewpicture.This 324 WILLOUGHBYETAL. taskrequiredthechildtoshifthisorherattentionfromtheinitial eachpairofmodels,whichwereutilizedinconjunctionwithaSAS labeledtoanewdimensionofsimilarity(e.g.,fromcolortosize). macro that was made available by Hipp and colleagues (2005) in Executive function task scoring. As previously discussed order to conduct nested VTTs. These results provided an empirical (Willoughby,Wirth,etal.,2012),EFtaskscoringwasfacilitatedby testoftherelativefitofmodelsthatdifferedwithrespecttowhether drawingacalibrationsampleofchildren—allofwhoweredeemedto EFwasareflectiveorformativelatentconstruct. have high-quality data (e.g., data collectors did not report interrup- Thesecondsetofresultsinvolvedthecreationofathreepairsof tions,childrencompletedmultipletasks)—fromacrossthe3-,4-,and summaryscores,onepairperassessmentperiod,whichrepresenteda 5-year assessments (no child contributed data from more than one child’s overall ability level on the battery of EF tasks. The first assessment). Graded response models were used to score the two summary score was a factor score estimate of a child’s ability and tasks with polytomous item response formats (i.e., PTP, WMS), representedEFasareflectiveconstruct.Thesecondsummaryscore whereastwo-parameterlogisticmodelswereusedtoscoretheremain- wasameanscoreestimateofachild’sabilityandrepresentedEFas ingtasks(allofwhichinvolveddichotomousitemsresponseformats) aformative(i.e.,composite)construct.Bothfactorandmeanscores inthecalibrationsample.Thesetofitemparametersthatwasobtained utilizedasmanyEFtasksaswereavailableforagivenchildatagiven y. fromcalibrationsamplewasappliedtoallchildren’sEFdataacross assessment, and children’s performance on each individual EF task shers.broadl aexllpaescsteesdsmaenptoss,tererisourlitin[EgAinPa])sestcoorfeistemfo-rreesapcohnstea-stkhetohrayt-bwasaesdo(ni.e.a, wmaesntinedrircoart.edWbeycthoenisridEeArePdsdciofrfee,rewnhceicshiwnatshecorrerteecsttedrefloiarbmilietaysuarned- publinated commondevelopmentalscale. developmental course of factor and mean scores using descriptive dmi Intellectualaptitudeandacademicachievementtask statistics(e.g.,Pearsoncorrelations)andlatentcurvemodels(Bollen salliedisse desWcreicphtisolenrs.preschool and primary scales of intelligence &staCnduirnrganw,h2e0t0h6e)r.aTnhdehseowresduifltfserpernocveisdeindtaheprmagemthoatdicobfacsoimsfboirniunngdEerF- ite ofob (WPPSI-III;Wechsler,2002). ChildrencompletedtheVocabu- task scores influenced substantive conclusions about stability and oneott lary and Block Design subscales of the WPPSI-III in order to changeinthelatentconstructofEFovertime. orsn provide an estimate of intellectual functioning at Age 36 months AlldescriptivestatisticswerecomputedusingSASversion9.3, i ationand (SaWttloeord,c2o0c0k1-)J.ohnson III tests of achievement (WJ III; Wood- aenstdimaalltesdtruucsitnugralMepqluuastiovners(iionnclu7d.1ing(Mlauttehnétncu&rvMe)umthoédne,ls19w9e8r–e cier sous cock,McGrew,&Mather,2001). TheWJIIIisaco-normedset 2013). Structural equation models used robust full information s Aal of tests for measuring general scholastic aptitude, oral language, maximum likelihood estimation and took the complex sampling ologicalindividu aPnicdtuarceaVdeomcaicbualacrhyiesvuebmteesntst.wTehreeuLseetdtearsWinodridcaItodresntoiffiecaartiloynreaandd- dcoesuingtn.T(ohveeSrsAamSpmliancgrobymiandceomaveaialanbdlerabcye;Hsitpraptiefitcaalt.io(n2)00in5t)owaacs- he ingachievement,andtheAppliedProblemssubtestwasusedasan usedtoconductnestedVTTs. ch yt Psof indicatorofearlymathachievement.Thevalidityandreliabilityof ne the WJ III tests of achievement have been established elsewhere as Results cu erial (Woodcocketal.,2001). eAmerson maEtharlaysscehsisldmheonotd2.lonTghietudEiCnaLlSp-rKogdriarmectkimndaethrgaarstseenss(mEeCnLtSw-Kas) VTTs hp bytthe designed to measure conceptual knowledge, procedural knowl- Thefirstresearchquestioninvolveddirectcomparisonsofmod- pyrightedsolelyfor eistideogmne,s,adanrndadwopnthrfoerrbolNmemactioosmonlamvlieCnrgceinawtleirathsfsionersssEpmdeuecncifatistciwocniotahnltcSeontapttyissrttirigachnstd(psNeuCrmsEiniSsg)- e(EflFosrimtnhaawttihvpierce)hdoiircntedefdifveimcdtuu(alrtleipfElleeFcittniavdsekic)asitcnoodrrisecsaotfwocreshrieoldfuasfeuldantceatnsiotenciitonhnges.rtrcuacutsoafl coed studies (e.g., National Assessment of Educational Progress). The Age3EFtaskspredictingAge3IQsubtests. Thefirstpair mentisintend mareatahsaksesdesasmcoemntminovnoslevtesofa“trwoou-tsintagg”eitaedmaps,tiavneddtehseigirnp;earllfocrhmiladnrecne oinftemlloedcteulsalreagbrileistysed(i.ceh.,ilBdlroecnk’sDpeesrfigonrmaanndceReocneptwtioveinVdoiccaatbourslaoryf docucleis oadnmthineisseteirteedmfsoilnlofworimngstthheecdoifmfipcluelttiyonleovfelroouftitnhge iitteemmss.eTthtihsatapis- subtests of the WPPSI) from the Age 3 assessment on the latent Thissarti proachminimizesthepotentialforfloorandceilingeffects.Item- cmoanrsitzreudctionfFEiFguarteA1g,eb3ot(hcfm. Wodielllosugfihtbtyheetdaalt.,a2w01e0ll).aAnds sbuomth- Thi response-theory methods were used to create math scores, using indicated that the latent construct of EF was significantly predic- item parameters that were published in an NCES working paper tive of the WPPSI (see Figure 1). Whereas all five EF tasks that reported the psychometric properties of the ECLS-K assess- contributed,albeitweakly,tothedefinitionofthelatentconstruct ments(Rock&Pollack,2002). ofEFinthereflective(i.e.,effectindicator)model,onlythreeof thefiveindividualEFtasksuniquelycontributedtothedefinition Analytic Strategy ofthelatentconstructofEFintheformative(i.e.,causalindicator) model (see top and bottom panels of Figure 1, respectively). In Thefirstresearchquestionwasaddressedbyestimatingthreepairs bothmodels,thelatentconstructofEFexplained42%and54%of ofstructuralequationmodels.Eachpairofmodelsregressedtwoor the observed variation in WPPSI Block Design and Receptive moreoutcomesonthelatentconstructofEF;themodelsdifferedin Vocabularyscores,respectively.ThenestedVTTwasstatistically whether individual EF tasks (i.e., EAP scores) were represented as significant, (2(10) ! 19.9, p ! .03 (see Table 1); this indicated formativeorreflectiveindicatorsofthelatentconstructofEF.Each pairofmodelswasnestedwithrespecttotheirmodelimpliedvan- ishingtetrads.Weoutputthemodelimpliedcovariancematricesfor 2Seehttp://nces.ed.gov/ecls/kinderassessments.asp. FORMATIVEMEASUREMENTOFEF 325 Table1 VanishingTetradTestComparisonsofFormativeVersusReflectiveIndicatorModelsofEF Reflective Formative Comparison Model Description n (2(df) prob (2(df) prob (2(df) prob 1 EF@Age3¡WPPSI@Age3 1,079 23.0(14) .06 3.1(4) .54 19.9(10) .03 2 EF@Age3¡ADHD@Age3,4,5 1,157 60.9(20) ).001 29.2(10) .002 31.7(10) .002 3 EF@Age5¡Achievement@Age5 1,086 81.1(35) ).001 56.3(20) ).001 24.8(15) .10 Note. Allvaluesareaggregatedacross500replications;thevanishingtetradchi-squareteststatisticsandassociatedprobabilityvaluesintheReflective andFormativecolumnsrepresenttestsofthenullhypothesisthatallofthemodelimpliedvanishingtetradsarezero.Theteststatisticandassociated probabilityvalueintheComparisoncolumnrepresentsanestedmodelcomparisonofreflectiveversusformativemodels;statisticallysignificantchi-square testsprovideempiricalsupportforthemodelwithfewervanishingtetrads(i.e.,theformativemodel).EF!executivefunction;prob!probability;df! degreesoffreedom;WPPSI!WechslerPreschoolandPrimaryScalesofIntelligence;ADHD!attention-deficithyperactivitydisorder. y. s.adl sherbro thatthedatawerebetterexplainedbytheformativemodel(i.e.,the Pragmatic Results: Descriptive Statistics publinated model with fewer vanishing tetrads). That is, the nested VTT Next,weconsideredthedescriptivestatisticsfortwosummary dmi indicated that the formative indicator specification (bottom panel salliedisse ofifcaFtiigounre(to1p) fpiatntehleodfaFtaigbuertete1r)t.han the reflective indicator speci- vmaartieasblaensdomfeoavnesraclolreEsF—paetrefaocrhmaagnec.e—Thtehawtitihs,inf-aacntodrasccroorses-teismtie- oroneofitisnottobe ppaeiArragoceftim3vioEtydFedltsiasrsoekrgdsreeprsrse(eAddiDcptaHirneDgn)tp-aratartAeendgteA-srDa3tH,e4dD,aaattntAedng5tei.osn3-Tdthoeef5iscoeitncohtnhyde- cTa5onayrdbrelemaelraes2ta,i.ontThnswesscobaoceprrtewoosisensaet-psntipmwethaeeerresceedonrtaroolettleeeawrxtnihooairntbtishivtyefl.oisnrFceiofararsrcitn,ctogharalmtnshcgeooetuhrgfeorhdoesmsbtoaimAtphpgaefteeaasc3rtootionrf ationand lsautmenmtacroiznesdtruinctFoifguErFe2at,bAogteh3mo(cdfe.lsWfiiltlothueghdbaytaerteaals.o,n2a0b1ly0)w. Aelsl EFability(rs!.96to.99)weresubstantiallylargerthanthosefor sociuser andbothindicatedthatthelatentconstructofEFwassignificantly mean score estimates of EF ability (rs ! .32 to .59). The two Asal predictiveofADHD.WhereasallfiveEFtaskscontributed,albeit scoring approaches provide divergent information regarding the ologicalindividu wreeflaekcltyiv,etomothdeel,doenfilnyittiwoonooffthtehefivlaetienndtivciodnusatlruEcFttoafskEsFuniinqutehlye ansccooruornescsse-,tditmhdeieffsewtraiebtnhicliinet-ystiiomnfeththeceoaccrrroeonlsasstt-ritouinmctseobsfteaEtbwFile.ietSyneocffoanfcadtco,trodreasanpndidtemmpeeraaonn- he contributed to the definition of the latent construct of EF in the ch yt scoreswererelativelylarge,particularlyatAges4and5(rs!.67, Psof formative model (see top and bottom panels of Figure 2, respec- .89, and .88 at Ages 3, 4, and 5 years, respectively). Within any ne tively).ThelatentconstructofEFexplained49%,73%,and60% as cu assessmentperiod,thetwoscoringapproachesprovideconvergent Amerisonal o3f,4th,eanodbs5e,rvreedspveacrtiiavteiloyn.iTnhpeanreenstt-erdepVoTrtTedwAaDsHstaDtisstciocraellsyastiAgngiefs- informationregardingindividualdifferencesinEFabilitylevels. eer icant,(2(10)!31.7,p!.002,whichindicatedthatindividualEF hp ythe tasks were better characterized as formative than reflective indi- Pragmatic Results: Growth Curve Models bt pyrightedsolelyfor cinatAokrgisneod5ferEthgFeartltaaestnekn.stpcTroehndesitcrtuthicinrtgdoafpcEaaiFdr.eomficmaocdheilesveremgerenstsienddipceartfoorrs- desiftTfimehreaetnemtsoaoscftrEonsFostaatbbiimlleietyf.icnIondrironergladtefirorontomsbfTeotartebfrlaeccht2oarrwacvateserrsituzhseetmahpeepaarnpepcsiaacrboelrnyet coed manceonfouracademicachievementtestsduringkindergartenon isnd the latent construct of EF at Age 5 (cf. Willoughby, Blair, et al., entnte 2012). As summarized in Figure 3, both models fit the data mi Table2 docucleis rweaassosniagbnliyficwaenltllyanpdrebdoicthtivinedoicfataecdadthematictheaclhaiteevnetmcoennsttriunctkionfdEerF- DescriptiveStatisticsforEFBatteryFactorandMeanScoresat hisarti garten.WhereasallsixEFtaskscontributed,albeitweakly,tothe Ages3,4,and5Years Ts Thi definitionofthelatentconstructofEFinthereflectivemodel,five 1 2 3 4 5 6 ofthesixindividualEFtasksuniquelycontributedtothedefinition of the latent construct of EF in the formative model (see top and 1.FS(3) — 2.FS(4) .99 — bottompanelsofFigure3,respectively).ThelatentconstructofEF 3.FS(5) .96 .98 — explained41%,46%,75%,and47%oftheobservedvariationin 4.MN(3) .67 .56 .51 — children’sperformanceontheWJIIILetter-Word,WJIIIPicture 5.MN(4) .85 .89 .83 .37 — Vocabulary, WJ III Applied Problems, and ECLS-K Math 6.MN(5) .75 .79 .88 .32 .59 — achievement tests, respectively. The nested VTT was not statisti- n 973 1,009 1,036 973 1,009 1,036 Mean *1.32 0.01 1.15 *0.54 *0.13 0.29 cally significant, (2(15) ! 24.8, p ! .10. Although this implied SD 0.26 0.85 0.82 0.54 0.51 0.48 thatindividualEFtaskswereequallywellcharacterizedaseither formativeorreflectiveindicatorsofthelatentconstructofEF,we Note. ns!898to1,036;allps).001.EF!executivefunction;FS! factor score estimate of EF ability using all available tasks at a given notedthatthemedian(vs.mean)pvalueforthenestedVTTtest assessment;MN!meanscoreestimateofEFabilityusingallavailable across the 500 replication was .06. This result is more similar to tasksatagivenassessment;3,4,5!Age3,4,and5yearassessments; theprevioustwooutcomesthandifferent. SD!standarddeviation. 326 WILLOUGHBYETAL. differencesinthestabilityandchangeofEFabilityfromAge3to tionwasstrongerforfactorthanmeanscoreestimates,z!39.2, 5 years, we estimated latent growth curve (LGC) models sepa- p ) .001. Nonetheless, in both approaches, the aggregation of ratelyforfactorandmeanscoresofEF.AlinearLGCfitthemean performanceacrossthebatteryoftasks(asfactorormeanscores) scores extremely well, (2(1) ! 1.2, p ! .27, root mean square resulted in an improvement in retest reliability relative to when errorofapproximation(RMSEA)!.01,90%confidenceinterval individual scores were considered alone. It is noteworthy that (CI)[.00,.08],comparativefitindex(CFI)!1.0.Themeanand when EF task performance was summarized as factor scores, the varianceoftheintercept(+ !*.05,p).001;’ !.12,p) 2-weekstabilityattheAge4yearassessmentwasnearlyidentical Int Int .001),whichcorrespondedtotheAge4assessment,andthelinear to the 2-year stability from Age 3 to 5 years (rs ! .99 and .96, slope (+ ! .41, p ) .001; ’ ! .04, p ) .001) were respectively). In contrast, when EF task performance is summa- Slope Slope statisticallysignificant.Thatis,therewassignificantvariabilityin rized using mean scores, the corresponding 2-week and 2-year averageabilityatAge4andintherateoflinearchangefromAge stabilityestimateswerebothsmalleranddifferinmagnitude(rs! 3 to 5 years. Individual differences in intercepts and slopes were .76and.32,respectively). also positively, albeit modestly, correlated, ’ ! .27, p ! Int,Slope y. .002; children with higher levels of EF ability (as indicated by shers.broadl mlineeaanrsgcroorwesthacirnossabtialsitkys)farotmAgAeg4ete3ndteod5toyheaavrse.faTshteerrreasteidsuoafl Although the benefits ofDmisocdueslsiniognEF as a latent variable are publinated variancesforthemeanscoreswerestatisticallysignificantatAges wellestablished,virtuallyallpreviousadvicehasadvocatedforthe dmi 3(ε!.59,p).001)and4(ε!.53,p).001),butnotAge5(ε! useofconfirmatoryfactoranalyticmethodsinwhichEFtasksare salliedisse a.0n7d,p.9!3a.3t2A);gethse3c,o4rraenspdo5n,dirnegspRe2cstivfoerlym.eanscoreswere.42,.47, u2s0e0d6;aMs iryeaflkeecteivteali.n,d2i0c0at0o;rsW(iEebttee,nEhospfeyr,,&HaCmhbarriackk,,2&008A)b.eTlehse, ite ofob When the identical parameterization was applied to the factor primary objective of this study was to investigate whether oneott score estimates of overall EF ability, the LGC model fit poorly, performance-based tasks may be better represented as formative orsn (2(1)!235.4,p).001,RMSEA!.45,90%CI[.41,.51],CFI! indicators.Comparisonsbetweenthreepairsofstructuralequation i ationand .b9e5c,auasnedotfhenergeasitdivuealvacroivaanrcieanecsetimmaattersixfowrafsacntoonrpsocsoirteivienddiecfaintoitres mdeomdiecls,acwhiheivcehmceonnt,sidanerdedpacrheniltd-rreante’ds iAnDteHlleDctubaelhafuvniocrtsioans, aocuat-- cier sous atAges3(ε!*.20,p).001)and5(ε!*.58,p).001).The comes,consistentlyindicatedthatEFtaskswerebestrepresented s Aal model was reestimated constraining these negative variance esti- asformativeindicators.Descriptiveresultsdemonstratedhowsub- ologicalindividu mpa)tes.0t0o10,;RhMowSeEvAer,!mo.7d9e,lf9i0t%waCsIsti[l.l7v6e,r.y82p]o,oCr,F(I2!(3).!55.21G0i1v.e3n, sotfandteivveelocpomncelnutsicohnasnrgeegainrdEinFgitnheearerltyescthrielldiahboiolidtydaifnfdertehdespuabtstetarnns- he poormodelfit,noneoftheparameterestimatesweretrustworthy; tially depending on whether EF tasks are combined as mean ch yt Psof however, we noted that the latent correlation between intercepts (consistentwithformativeindicator)versusfactor(consistentwith ne andslopesapproachedunity,’ !.98,p).001,whichwas reflectiveindicator)scores. caus Int,Slope erial consistentwiththelargecorrelationsreportedinTable2.Inafinal The initial motivation for considering the distinction between Amson efforttoobtainamodelwithacceptablefit,wereparameterizedthe formativeandreflectivemeasurementofthelatentconstructofEF eer LGCmodelbyfixingthefactorloadingsto0and1attheAge3 resultedfromourobservationsoflowtomodestintercorrelations hp ythe and 5 assessments and freely estimating the factor loading at the amongchildren’sperformanceonindividualEFtasksinbothour bt yrightedolelyfor Acdhegateenrg4meyinieneadrmawesasanessssoampcertionmst.saTltihmiinsepo(aBurraomlplerenetev&riiozuCastuiorwrnaonprk,er2mt0h0iat6tte)d,inwnvohonilcvlihendewaear ooscwbosnreerasvnewddeortethhaaetsrssm’ocowidaoetreskdt (cwWoirtirhlelolmautogiodhnebssytbeleetvtawell.es,e2no0f1imn4d)a.ixvPiimdreuavaliloreuElsiFlayb,tialwistkye ps coed second-order LGC (Willoughby, Wirth, et al., 2012). Although amongthelatentvariableofEF(Willoughby,Pek,&Blair,2013). isnd model fit was improved, it was still extremely poor, (2(2) ! Modestlevelsofmaximalreliabilityindicatethattheuseofthree entnte 1495.8,p).001,RMSEA!.82,90%CI[.78,.85],CFI!.68. to five EF tasks as indicators of a latent variable do a relatively mi docucleis Ownerceetarguasitnw,ogritvheyn, pthooourgmhowdeelafgita,innonoebsoefrvtheedpaarlaamtenettecroersrteimlaatitoens penocoersjoibnothferelparteesnetntcinogns(trourc“tc(oHmamncuoncickat&ingM”)uienldleivr,id2u0a0l1d)i.ffBery- Thissarti between intercepts and slopes that approached unity, ’Int,Slope ! implication, modest levels of maximal reliability necessitate the hi .92,p).001. administrationofsubstantiallymoretasks(indicators)tomeasure T aconstructthanhastypicallybeenthecaseorthedevelopmentof newperformance-basedindicatorsthatexhibitstrongerintercorre- Pragmatic Results: Retest Reliability lations. However, consideration of the magnitude of EF task in- Wepreviouslyreportedtheresultsofa2-weektest–reteststudy tercorrelations, the focus on maximal reliability, and the sugges- oftheEFbatteryinvolving140studyparticipantsattheAge4year tion that researchers should administer substantially more (or assessment.Inthatstudy,wenotedthatwhereasthe2-weekretest better)EFtasksinordertoimprovethemaximalreliabilityofthe reliability of individual tasks was modest (rs & .60), the correla- latentconstructofEFareallpredicatedonanimplicitassumption tionbetweenlatentvariablesrepresentingabilityacrossa2-week of reflective measurement. To the extent that performance-based retest period approached unity, ’ ! .95, p ) .001 (Wil- tasks are better construed as formative indicators of the latent Retest loughby&Blair,2011).Here,wereportthe2-weekretestcorre- construct of EF, all of these ideas are irrelevant. From the per- lationofthefactorandmeanscoreestimatesofEFabilityasrsof spective of formative measurement, the magnitude of task inter- .99and.76,respectively(bothps).001).Followingthemethod correlationsisuninformative,maximalreliabilityisnotarelevant of Raghunathan, Rosenthal, and Rubin (1996), the retest correla- metric for evaluating how well tasks represent individual differ- FORMATIVEMEASUREMENTOFEF 327 enceintrueabilitylevel,andtheadministrationofmoretasksdoes definition of the construct (i.e., all of the factor loadings were notnecessarilyimprovethequalityofmeasurement. statistically significant, albeit of modest magnitude). In contrast, Despite the substantial differences between formative and re- whenEFtaskswerespecifiedasformativeindicatorsofthelatent flective perspectives of measurement, no methods exist that un- construct of EF, only a subset of the tasks contributed to the equivocallydelineatewhichperspectiveiscorrect;moreover,itis definition of the construct. The determination of which causal entirelyconceivablethatsomeconstructsmaybeoptimallyrepre- indicators are significant indicators of the latent construct of EF sentedusingacombinationofformativeandreflectiveindicators. willdependontheoutcomesbeingconsidered.Althoughthisisa In the absence of a definitive strategy for distinguishing whether frequently noted limitation of formative models (Edwards, 2011; EF tasks are best conceptualized as formative versus reflective Howell et al., 2007b), it is not a perspective that is shared by indicators, we considered conceptual, pragmatic, and statistical everyone(Bollen,2007;Bollen&Bauldry,2011). evidence. As noted at the outset, researchers have proposed a InlightofevidencefromthenestedVTTs,wewereinterested seriesofconceptualquestionsthatmayhelpinformwhetheraset inwhetherandhowourprevioussubstantiveconclusionsregard- ofmeasuresarebetterconstruedascausaloreffectindicatorsofa ing the retest reliability and developmental change in EF would publishers.natedbroadly. pigmnartoarettdriiecdvlueeelpdaferunandscceortnnieotsfnltcersouc.gctitHnv.ieotCiwvioneendvcaiecebrapi,tltoiuwtriasehl,sleyintt,hiapEsteFnrsfoeorrtervmfceelareosnarcrgteoat-hnbaaiaztsibetnhrdgoeataradnessdksuseiltntiatnoregef- cmmmheoeanadnnteg.leosTrfotrfhoafacmattcoaitrlphistpeacrtopoerexertihssmpeaaesctecretodicsvosetmhaoelplfalafravoitsearomninltaasbt,vilavwereiteaaabscnklodesmraopetffaelrEeaeccFdthivruaeesssisunmelgstessaemsfirutehornemetr-. sallieddissemi lttaiiootennn.)tRovafatrhaiearbclotehnaasnctictcuuheraantrtealcsyteetrreiozpfirnegsskeEniltFlss,iatsrsetifhnleetecntcidovemedbicinnodanitcciaoetnpotru(samulmofudmneacls-- Ascocrlienagraanpdprdoiavcehrgees.ntTphaettefranctoofrrsecsourlteswapeprreoeavcihd,enwthfiocrhthaepsperotwxio- ite matedreflectivemeasurement,impliedthatthe2-weekstabilityof ofob representEFmorenarrowlyasthatvariationthatissharedacross EFwasnearlyperfectandthatthe1to2yearstabilitiesofEFwere oneott asetoftasks.Itisthemismatchbetweentheconceptualdefinition approximately.90.Moreover,noneoftheestimatedgrowthcurve orisn of EF and the statistical representation of EF using reflective models provided an adequate fit to factor score estimates of EF ationand ijnecdtiucraetotrhsatthfaotrmisatthiveeoivnedriacracthoirnmgocdoenlcseprnroovfidtehiasssttautdisyti.cWalerecporne-- abilityacrosstime,whichconstrainsthetypesoffuturequestions cier that can be asked of these data (e.g., predictors of individual ssous sentationofEFthatismorecompatiblewiththeintendedconcep- differences in the level and rate of change in EF). These results Aal tualdefinition. ologicalindividu catEomrspoirficthael csuopnpstorrutctfoorfcEoFncweapstueavliizdienngtftraosmksVasTTfoormfcaotimvepeintidnig- iycmoeamprlspieleodtfetlhayagted,aeltitenhrdomiuvignidheudaElbFyddifAefvegreeelon3pcesasn(idimnwpEreoFrveeasb()nileibateyrtlwyw)eeecrneom3(npaelenatrdelyl5y) he models. To be clear, although the VTTs provide an indication of ycth preservedacrossrepeatedassessmentsthatspanintervalsasshort Psof whether a model that consists entirely of reflective indicators is as2weeksandaslongas2years.Weconjecturethattheextraor- ne consistent with the data (as evidenced by a nonsignificant VTT caus dinarily high stability of EF factor scores across time was an eAmeriersonal cnmheaic-teisvsqseua—arirletyhotieumsgtphslytiatttiihssatcitco)(n,aslailstosetfna)ttitwshteiicthainltldhyiicssaiatgosnraisfipacoraesnstnibVeiclTietTsys.adAroileycslofnsooerr-t athrteifmacetaonfsfcaocrteoraipnpgrotaasckhs,twhhaticwhearpepwroexaikmlyatceodrfroelramteadti.vIenmcoenastruarset-, ythhep inspection of the results of VTTs that were used to compare ment,impliedthatthe2-weekand2-yearstabilities(rs!.76and bt .32, respectively) differed appreciably in magnitude, in a manner yrightedolelyfor mrinevodedicaelaeltsdotrahnmautomrdebepelrsreosefexinhmtiebpdoitreEtadFntaapnsoaifncotcrsme.pFatirtaisbvt,leebovfteihtrsftuoosrmtrheaeftlievocebtiasvneerdvirneeddfilcedacattoitvares; ctiomnes,isttehnetwleistshecxoprreecltaatteidona(i.ceo.,ntshtreulcotngsherouthldesbpea,npoafritnicteurlvarelnyinigf iscopndeds hdeentecrem,ingelowbahlicmhosdpeelciffiictaitsionnoitsparecfreitrererido.nSethcaotndc,atnhebreegurseesdsioton mgreoawsuthrecdurdvuerminogdaelspfeirtiothdeodfatdaewveellol,pwmitehnteavlidcehnacnegefo).rsMigonriefoicvaenrt, entnte coefficientslinkingthelatentconstructofEFtotheoutcomes(e.g., interindividualdifferencesinbothlevelandratesofchangeinEF documcleisi IoQf wsuhbettehsetsr,EAFDtHasDk,sawcheireevermepernetsetenstetsd)waserfeorimdeanttiivcealoirrrreesfpleeccttiivvee acrAosltshotiumgeh. we fully acknowledge that simple comparisons of hisarti indicators; hence, this is also not a criterion that can be used to these results do not provide a scientifically convincing approach This determine which specification is preferred. Third, the formative for determining which scoring approach is most appropriate, we T and reflective indicator models differed in the model-implied findthedifferencesinresultstoberemarkable.Clearly,inourdata covariancestructureamongtheEFtasks.Intheformative(causal (andperhapsotherdata),thedecisionaboutwhethertousefactor indicator) specification, no constraints were made regarding the or mean scoring approaches for characterizing children’s ability covariancestructureoftheindividualEFtasks—allpossiblepair- across a battery of EF tasks will fundamentally affect the infer- wise covariances were freely estimated. In the reflective (effect encesdrawnaboutthenature,development,andmalleabilityofEF indicator) specification, the covariance structure among EF indi- inearlychildhood.Practicallyspeaking,thereisstronginterestin cators is implied entirely through their shared association with a identifyinganddevelopingstrategiesthatenhanceEFinchildren latent variable. If all possible pairwise covariances were intro- for the betterment of society (Diamond, 2012). The ability to ducedbetweentheresidualvariances,theformativeandreflective detect effective strategies will be impacted by the ways in which models would be chi square equivalent models (rendering VTTs EF is conceptualized, measured, and modeled. Pragmatically, we useless). Fourth, for each of the three sets of outcomes that were favorthemeanscoring(formativeperspective)approachbecause considered,whenEFtaskswerespecifiedasreflectiveindicators theresultsconformtoexpectationsaboutthestabilityandchange of the latent construct of EF, all of the tasks contributed to the inEFthatareconsistentwiththebroaderliterature.Moreover,this 328 WILLOUGHBYETAL. approach facilitates our ability to ask questions about both the without error. This criticism can be made against the majority of antecedentsandconsequencesoftrajectoriesofEFacrosstime. appliedresearchinthesocialandbehavioralsciencesthatisbased on sum or mean scores (e.g., any scoring approach that does not Study Limitations explicitlyattendtomeasurementerror).Thiswasnotaproblemin our study, as our EF tasks that had already been purged of Thisstudywascharacterizedbytwolimitations.First,wehave measurement error prior to their use here (Willoughby, Wirth, et presentedthedistinctionformativeandreflectivelatentconstructs al.,2012).Moregenerally,byfailingtoattendtothemeasurement asadichotomy;allEFtaskswereconceptualizedaseitherexclu- error of formative indicators, one risks creating formative (or sivelycausaloreffectindicators.However,itisentirelyreasonable composite)constructsthatconflatetruescorevariationwithmea- to represent latent variables as a mix of causal and effect indica- surementerror. tors.Wedidnotconsiderthispossibilitybecausewedidnothave Fourth,inthecontextofreflectivemeasurement,theestablish- a conceptually defensible rationale for considering some of our ment of longitudinal measurement invariance is a necessary pre- tasks as causal and others as effect indicators. Second, we con- condition for modeling change across time (Widaman, Ferrer, & trastedinferencesthatresultedwhenEFtaskswererepresentedas y. Conger,2010);indeed,thiswasafocusofourearliereffortsthat shers.broadl mweeraenivneternsudsedfatcotoarppscrooxreims.aItne tfhoirsmcaatisvee, manedanreaflnedctifvaectomresacsourrees- wtheereexptuebnltisthheadtitnhethimsejoausurnreaml(eWntillporuogphebrtyie,sWoifrtha,leattaeln.,t2c0o1n2s)t.ruTcot publinated msceonritn,greaspppercotiavcehlyi.sHmoowreevaecrc,uarastenloyterdepraetsethnetedouatsseat,ctohmepmoesiatne change across time, mean level changes are ambiguous. The ex- sallieddissemi vbaertwiaebelen.cBomolpleonsitaensdanBdacualudsrayli(n2d0i1c1a)tormlaakteentacocnlesatrrucdtissttihnacttiwone tffeoonrrmsitohatneivpoelfacuolosninsbgtirilutiutcydtsinoiafslilmemspesoacssliuenragerm.aeHcnryotpsiosntvhtiaemrtiiecaanclcloeyn,stootrnaetihnectsocuoaldnsettehoseft ite muddledhere. ofob coefficientsthatrelateformativeindicatorstothelatentconstruct. oneott Challenges Associated With Formative However, these models are not estimable because of the underi- orisn Indicator Models dentificationproblemthatwasdescribedearlier.Theonlyknown ationand Inthebusiness(management,marketing)researchliterature,the wfleocrtki-vaeroiunnddicafotorrsthiisntporothbelemforimsatotivienccoorpnostrrautecttwanodotromtoesret froe-r cier sous fullgamutofopinionsonformativemeasurementisevident(Dia- longitudinalinvarianceofthesereflectiveindicatorspriortotest- s Aal mantopoulos,2008;Diamantopoulosetal.,2008;Edwards,2011). ing constraints regarding the contribution of formative indicators ologicalindividu Btuercea,uwseebmrioesftlyresaudmermsawriizlel lfiokuerlyofnothtebmeofaremvileiaxrinwgitchhatlhlaetnglietesroaf- aacltrhoosusgthimtehi(sDaiapmpraonatcohpowualossp&roPpaopseaddofpoorutlhoes,s2i0tu1a0t)i.oTnoinbveoclvleianrg, he adopting a formative measurement perspective for combining in- cross-groupcomparisons,wearesuggestingthatitmaygeneralize ch yt Psof dividualEFtasksintoanoverallscore.First,latentconstructsthat tolongitudinalsettings. ne are composed entirely of formative (causal) indicators are not as cu erial statisticallyidentified;thatis,irrespectiveofwhetheroneassumes Conclusions Amson that EF tasks are best characterized as “causing” versus “being eer caused by” the latent construct of EF, latent variables are inesti- The recent proliferation of transdisciplinary research involving hp ythe mableunlesstheyhavetwoeffectindicatorsor,equivalently,two EF underscores the importance that has been attributed to this bt yrightedolelyfor opnurootncbcoloemnmse,tsan(aMts—atchitCeiaslvlaeulrmwya&ynsaBtdurerofewinnoeefd,it1nh9e9p3alr)at.teTbnyhtitshcpeornresestferlnuetccsttiavoepfrianEcdFtiiccaais-l cctholoenssiedtrreueacastdaainrsegabonefittnthedirsictlhaiattoenrratothuferhemesaeulagtshguearsentmdsetwhnaet.tlltC-hboiesnicinsegpa.ntNuaaorlnedeaetihfnienlweitshisoi,cnhas ps coed tors (or equivalently outcomes) being used to identify it. This ofEFcharacterizeitasaconstructthatsubsumesabroadarrayof isnd problemcanbecircumventedbyaggregatingperformanceacross cognitiveabilitiesthat,collectively,facilitateengagementinnovel entnte individual EF tasks using mean scores (or equivalently principle problem solving efforts and enhance self-management. The pri- mi docucleis cmoamkpinognesnimtspalnifayliynsgis)a,ssausmwpetiodnidshaenrde,lebauvtindgoetshesolaattenthtevcaorisatbolef mcoanrfyorombajebcitliivtyeboeftwtheiesnsttuhdeysewcoasncteophtuigahlldigefhitniatnioanpspoafreEnFtalancdkthoef hisarti framework(Bollen&Bauldry,2011). useofpsychometricapproachesforcombiningEFtaskscoresthat Ts hi Second, formative constructs are sometimes criticized as “not assume reflective measurement. The combination of conceptual, T measurement”(Edwards,2011;Howelletal.,2007a,2007b;Wil- pragmatic, and statistical evidence that was presented here sug- cox, Howell, & Breivik, 2008). Traditional metrics of internal gests that performance-based measures may be better character- consistency and maximal reliability are not applicable. Similarly, izedasformativeindicatorsofthelatentconstructofEF.Decisions our recent reliance on maximal reliability estimates in order to abouthowtocombineEFtaskscoreswilldirectlyimpactthetypes create short forms of our EF task battery was predicated on the of inferences that will be made regarding the developmental ori- assumptionthattaskswereeffectindicatorsofEF(Willoughbyet gins, developmental course, and developmental outcomes of EF. al., 2013). To the extent that EF tasks are better construed as Althoughweareunabletoofferdefinitiveconclusions,theintent formativeindicatorsoftheconstructofEF,theobservedpatternof of this study was to encourage other research groups that utilize taskcorrelationsisuninformativeforthecreationofshortformsof performance-based indicators of EF to consider the distinction thebattery(thisisreplacedbyappealingtoconceptualarguments betweenformativeandreflectivemeasurementintheirownwork. aboutwhichfacetsoftheconstructareprioritized). More generally, our results point to the possibility that the con- Third, in a related vein, formative constructs have been criti- structofEFmaynotbewell-suitedtoconventionalmeasurement cized because they often assume that task indictors are measured wisdom.Althoughthisisneitheranindictmentoftheconstructof