Intelligence48(2015)85–95 ContentslistsavailableatScienceDirect Intelligence Themagicalnumbers7and4areresistanttotheFlynneffect:No evidence for increases in forward or backward recall across 85 years of data GillesE.Gignac SchoolofPsychology,UniversityofWesternAustralia,35StirlingHighway,Crawley,WesternAustralia,6009,Australia a r t i c l e i n f o a b s t r a c t Articlehistory: A substantial amount of empirical research suggests that cognitive ability test scores are Received5July2014 increasingbyapproximatelythreeIQpointsperdecade.Theeffect,referredtoastheFlynneffect, Receivedinrevisedform23September2014 hasbeenfoundtobemoresubstantialonmeasuresoffluidintelligence,aconstructknowntobe Accepted3November2014 substantiallycorrelatedwithmemoryspan.Miller(1956)suggestedthatthetypicalshort-term Availableonlinexxxx memorycapacity(STMC)ofanadultisseven,plusorminustwoobjects.Cowan(2005)suggested thatthetypicalworkingmemorycapacity(WMC)ofanadultisfour,plusorminusoneobject. Keywords: However,thepossibilitythatbothSTMCandWMCtestscoresmaybeincreasingacrosstime,in Flynneffect linewiththeFlynneffect,doesnotappeartohavebeentestedcomprehensivelyyet.Basedon Short-termmemory DigitSpanForward(DSF)andDigitSpanBackward(DSB)adulttestscoresacross85yearsofdata Workingmemory (respectiveNsof7,077and6,841),themeanadultverbalSTMCwasestimatedat6.56(±2.39), Miller’slaw andthemeanadultverbalWMCwasestimatedat4.88(±2.58).NoincreasingtrendintheSTMC or WMC test scores was observed from 1923 to 2008, suggesting that these two cognitive processesareunaffectedbytheFlynneffect.Consequently,iftheFlynneffectisoccurring,itwould appeartobeaphenomenonthatiscompletelyindependentofSTMCandWMC,whichmaybe surprising,giventheclosecorrespondencebetweenWMCandfluidintelligence. ©2014ElsevierInc.Allrightsreserved. 1.Introduction test score comparisons due to changes in test items and administrationacrosseditions(Kaufman,2010),changesinthe Oneofthemostsensationalscientificobservationsinthe rateofhumancognitivedevelopmentinboththeyoungand areaofcontemporaryintelligenceresearchisthatintelligence the elderly (Parker, 1986), changes in standard deviations testscoreshaveincreasedsinceabout1930(Flynn,2012;Lynn, (Rodgers,1998),aswellastheabsenceoffactorialinvariance 1982). The reported effect is not small, as it corresponds associated with intelligence battery test scores Must, te to approximately three IQ points per decade (Flynn, 2007; Nijenhuis,Must,&vanVianen,2009;Wichertsetal.,2004). Neisser,1998).Furthermore,theconsequencesarenotnegli- Consequently,thepurposeofthisinvestigationwastoexamine gible,asFlynn(1987)contendedthatthe“…gainssuggestthat theFlynneffectonseveralnormativesamplesattheobserved IQtestsdonotmeasureintelligencebutratheraweakcausal scorelevelonpossiblytheonlysubtestofintellectualfunction- linktointelligence”(p.190).Theprecisenatureandcausesof ingthathasessentiallynotchangedforoveracentury:Digit the“Flynneffect”remainenigmatic(Williams,2013).Further- Span.AsDigitSpanincorporatesbothforwardandbackward more,anumberoflimitationsassociatedwithstudiessupport- recall items, an additional purpose of this investigation was iveoftheFlynneffecthavebeenarticulated,includinginvalid to estimate precisely the typical verbal short-term memory capacity (STMC) and working memory capacity (WMC) of adults,soastoverifytheproposedvaluesreportedbyMiller E-mailaddress:[email protected]. (1956;7±2)andCowan(2005;4±1). http://dx.doi.org/10.1016/j.intell.2014.11.001 0160-2896/©2014ElsevierInc.Allrightsreserved. 86 G.E.Gignac/Intelligence48(2015)85–95 1.1.OverviewoftheFlynnEffect thedigitsneedtoberecalledintheorderwithwhichtheywere presented,and Digit SpanBackward (DSB), wherethe digits TheaccumulatedresearchsuggeststhattheFlynneffectis needtoberecalledinthereverseorderwithwhichtheywere morepronouncedonfluidintelligencetests,incomparisonto presented. testslikelytobeaffectedbyeducation,suchasvocabularyand Although Digit Span was initially considered a relatively knowledgeofworldlyfacts(Flynn,2007;Rönnlund,Carlstedt, poor measure of intellectual functioning (Matarazzo, 1972; Blomstedt,Nilsson,&Weinehall,2013).Inarelativelyrecent Wechsler, 1939), such a position appears to be based more investigation,Flynn(2009a)reportedongoingIQgains(1943 onpresumptionandclinicalexperience,ratherthanrigorous to2008)inBritishchildren(5.5to11yearsold)asmeasured statistical evidence (Bachelder & Denny, 1977; Verive & by the Raven’s Progressive Matrices (Raven, Court, & Raven, McDaniel, 1996). For example, Wechsler (1939) presumed 1986;Raven,Rust,&Squire,2008).Additionally,Flynn(2009b) that there was not a sufficientamountof variabilityin Digit reportedcontinued(1995–2006)IQincreasesequaltothreeIQ Spanscorestobeahighqualitydiscriminatorofintelligence, points per decade in adults based on the Wechsler scales. as approximately 90% of the adult population appeared to Based on an examination of the Seattle Longitudinal Study recallsomewherebetweenfiveandeightdigits.Additionally, (SLS)database,Schaie,Willis,andPennak(2005)reporteda Wechsler (1939) claimed that both DSF and DSB correlated Flynneffectequaltoapproximately½ofastandarddeviationin poorlywithotherintelligencesubtestsandcontainedlittleofg. cognitive ability test scores between birth cohort 1931 and However, Wechsler’s (1939) own reported results do not birth cohort 1952. As the results were most pronounced for supportsuchaposition.First,basedontheWechsler-Bellevue inductivereasoning,Schaieetal.(2005)recommendedthatit (Wechsler, 1939) normative sample (ages: 20–34, N=355), would be insightful to evaluate possible test score changes DigitSpanwasassociatedwithameaninter-subtestcorrela- acrosstimeinfluidtypecapacitiesmorebasicthaninductive tion of .38, which is comparable to the mean inter-subtest reasoning. correlationof.44forthewholebattery.Additionally,basedon Arguably,onesuchrelativelyelementarycognitiveability thesameportionofthenormativesample,Wechsler(1939) construct is memory span. Individualdifferences in memory reportedthecorrectedsubtest-FSIQcorrelation(areasonable span (WMC in particular) are known to be correlated proxyofagcomponentloading)associatedwithDigitSpanat substantially with fluid intelligence. Based on a meta- .51, which, arguably, was not substantially smaller than the analysis, Kane, Hambrick, and Conway (2005) estimated that averagecorrectedsubtest-FSIQcorrelationof.61.Morerecently, approximately 50% of the true score variance between WMC basedontheWechslerAdultIntelligenceScale–IV(WAIS-IV; andfluidintelligenceisshared.BasedontheWAIS-IVnormative Wechsler,2008)normativesample(N=2,200)andabifactor sample,Gignac(2014)suggestedthatthesharedvariancemay model,Gignac(2014)foundthatDSFandDSBwereassociated becloserto60%.Thesubstantialempiricalassociationbetween withgloadingsof.46and.58,respectively,whichwouldsuggest WMC and fluid intelligence is considered an important thatbothsubtestsaremoderateindicatorsofg.Disattenuated phenomenon,as it hasbeen theorised that WMCis a critical for imperfect reliability in subtest scores, the corrected g determinant,orratelimitingfactor,intheperformanceoffluid loadingscorrespondedto.51and.64,respectively.Jensenand intelligence tasks (Carpenter, Just, & Shell, 1990; Fry & Hale, Figueroa(1975)alsofoundthatDSBcorrelatedmoresignifi- 1996).Oberauer,Su,Wilhelm,andSander(2007)proposedthat cantly with g than DSF. Thus, although Digit Span is theassociationbetweenWMCandfluidintelligenceisembed- certainly not an excellent indicator of g, it is arguably a dedbythecentralnervoussysteminsuchawaythatonlya fair to good indicator of intellectual functioning, particu- limited number of bindings can be created to facilitate the larlyDSB. developmentofnovelrelationalrepresentations.Consequently, DigitSpanhasalsobeenobservedtosharevariancewitha given the close correspondence between WMC and fluid number of socially important variables. For example, Frank intelligenceonbothempiricalandtheoreticalgrounds,the (1983) reviewed four studies (seven independent samples) reported increases in fluid intelligence test scores (Flynn, which examined the association between the Wechsler sub- 2007) would arguably be expected to be associated with scalesandgradepointaverage.DigitSpanwasassociatedwitha concomitantincreasesinmemoryspan,particularlyWMC. meanvaliditycoefficientof.35,whichwasverycomparableto themeanvaliditycoefficientof.37acrossall11subtests.Digit 1.2.ThecaseforDigitSpan Spanhasalsobeenfoundtocorrelatewithyearsofeducation completed(r=.44,Pauletal.,2005;r=.43,Birren&Morrison, Oneofthemostcommonlyusedtestsofmemoryspanis 1961),readingcomprehension(r=.30;Daneman&Merikle, DigitSpan(Blankenship,1938;Dempster,1981).Accordingto 1996; Norman, Kemper, & Kynette, 1992), and brain volume Bronner,Healy,Lowe,andShimberg(1927),DigitSpanwasin (r=.41;Wickett,Vernon,&Lee,2000).Additionally,amongsta useasearlyas1887.DigitSpan’spopularitywasestablishedby batteryofcognitiveabilitytests,DigitSpanwasfoundtobethe virtueofthefactthatitwasincludedinbothoftheintelligence best predictor of academic achievement amongst learning- batteriesthatemergedasthemostpopularintheearly20th problem children (Serwer, Shapiro, & Shapiro, 1972). Digit century:theStanford-Binet(Terman,1917);andtheWechsler- Spanhasalsobeenfoundtobearespectablepredictorofjob Bellevue scale (W-B; Wechsler, 1939). Although there are performance(mediumcognitivedemands:r=.51;Verive& severalslightvariationsoftheDigitSpansubtest,typically,the McDaniel,1996).Finally,MillerandVernon(1992)foundthat testconsistsofadministeringseveralseriesofsingledigitstobe the association between reaction time and g was mediated recalled in a particular order. In most cases, the number of by individual differences in short-term memory span. Thus, digitswithinaseriesrangesfrom3to9.Therearetwocommon in light of theabove, it is likelytenable to suggest that Digit formsoftheDigitSpantest:DigitSpanForward(DSF),where Spanissomewherebetweenamoderatetogoodindicatorof G.E.Gignac/Intelligence48(2015)85–95 87 intellectual functioning on both empirical and theoretical sample of twelve individuals, which would arguably not be grounds(Bachelder&Denny,1977). consideredsufficientlylargetopublishfirmstatementsabout Although there is some evidence to suggest otherwise themeanlevelofWMCinthebroaderadultpopulation.Infact, (e.g., Colom, Flores-Mendoza, Quiroga, & Privado, 2005), it muchof the WMC research suffers from the same limitation is relatively widely recognised that DSB is a better measure as that identified for STMC: small, unrepresentative samples. of working memory capacity (WMC) than DSF (Hedden & Fortunately,however,DigitSpanisoftenadministeredwiththe Gabrieli,2004;Oberauer,Süß,Schulze,Wilhelm,&Wittmann, inclusionofbothforward(i.e.,DSF)andbackward(i.e.,DSB) 2000). Working memory (WM) is considered different from spanitems.Consequently,asDigitSpanhasbeenadministered short-termmemory (STM)inthatWMrequiresthemainte- within a relatively large number of high quality normative nanceandthemanipulation(ortransformation)ofinformation samples (e.g., Wechsler scales and others), there was the temporarilyduringcognitiveactivity(Baddeley&Hitch,1974; opportunitytoestimateverypreciselythetypical(i.e.,mean) Baddeley, 2002; Oberauer et al., 2000). STM, by contrast, is verbalSTMCandverbalWMCofhealthyadults,whichwasa considered to require only the maintenance of objects in secondarypurposeofthisinvestigation. memory.Theoretically,DSBisconsideredameasureofWMC, InadditiontothetypicalSTMandWMcapacitiesofadults, astherequirementtomentallyre-orderthedigitsisconsidered itwasconsideredusefultoestimatetheamountofvariability a form of cognitive manipulation (Oberauer et al., 2000). in STMC and WMC within the adult population. Miller’s law Empirically, DSB has also been found to correlate more (i.e., 7 ± 2) suggests that approximately 95% of individuals’ substantiallywithothermeasuresofWMC,incomparisonto STMC lie somewhere between five and nine objects, which DSF(Redick&Lindsey,2013).Consequently,itwashypothe- impliesthatSTMCisassociatedwithastandarddeviationof sized that STMC and WMC, as measured by DSF and DSB, approximately1(i.e.,1*±1.96).1Cowan’slaw(4±1)suggests respectively,wouldevidencesubstantialincreasesacrosstime, thatapproximately95%ofindividuals’WMCliesomewhere in line with the Flynn effect observed for fluid intelligence betweenthreeandfiveobjects,whichimpliesthatWMCis measuressuchasRaven’s(Flynn,2007).Furthermore,asDSBis associatedwithastandarddeviationofapproximately.50 abetterindicatorofWMC,itwashypothesisedthattheFlynn (.50 * ±1.96). From a coefficient of variation perspective effectwouldbemoresubstantialforDSBthanDSF. (SD/M),Miller’slawandCowan’slawimplythatSTMCand WMCareassociatedwithstandardizedvariabilityestimatesof 1.3.WhatistheAverageMemorySpan? .14(1/7)and.13(.50/4),respectively.Arguably,basedon thesevalues,theamountofSTMCandWMCvariabilitymay Inaclassicpaper,Miller(1956)contendedthatthemean be considered rather low, in comparison to other cognitive maximalnumberofseriallyprocessedobjectsahealthyadult capacities. For example, based on the WAIS-IV normative canstoreinSTMisseven,plusorminustwo.Arguably,Miller’s samplemeansandstandarddeviationsreportedinBeaujean (1956) assertion was based on a relatively small amount andSheng(2014),themeancoefficientofvariationassociated of empirical research and a liberal amount of speculation, withnineoftheWAIS-IVsubtests(45–54yearolds;N=200) ratherthanacomprehensivequantitativereviewoftheSTMC wascalculatedbymetobe.27(range:.21to.36).Additionally, research.Despitethis,Miller’s(1956)magicalnumberseven basedonthenormativesamplemeansandstandarddeviations (plus or minus two) continues to be widely recognised (18–30 year olds) associated with the BIRT Memory and (Goldstein,2010).Ofcourse,therearesomecriticsofMiller’s Information Processing Battery (BIMPB; Oddy, Coughlan, & law,withauthorsthatthevaluesseven,plusorminustwo,are Crawford,2007)reportedinBaxendale(2010),listrecalland too high or too low (Dehn, 2008). Although a substantial design recall were calculated by me to be associated with amount of STMC empirical research has accumulated since coefficients of variation of .22 and .24, respectively. Conse- Miller (1956), much of this research has been based on quently,Miller’slawandCowan’slawimplysubstantiallyless relatively small (N b25), non-representative samples variabilityinSTMCandWMCthanothercognitivecapacities. (i.e., university students), and somewhat different tasks and Itwillbenoted,however,thatmemoryspanhaslongbeen scoringprotocols.Consequently,ameta-analysisdoesnotseem suggested to be associated with relatively little variabilityin feasible in order to estimate, precisely, the mean STMC of humancapacity(Sattler,1982;Wechsler,1939).Furthermore, healthy adults. However, as described above, one important the lack of variability has been articulated to be a reason to exceptionistheDigitSpantest,whichhasbeenusedinthearea consider measures such as Digit Span to be relatively weak of intellectual assessment for over a century (Bolton, 1892; indicatorsofintellectualfunctioning(Matarazzo,1972).How- Wechsler,2008). ever,thecontentionthatSTMCandWMCareassociatedwith IncontrasttoSTMC,WMCtestsarewidelyconsideredtobe relativelylowlevelsofvariabilitydoesnotappeartoyethave moredifficult,astheyrequirethemaintenanceofinformationin beentestedspecificallyacrossanumberofnormativesamples STM,aswellasthesimultaneousmanipulation(ortransforma- andastandardizedrepresentationofvariability(i.e.,coefficient tion)ofthatinformation(Oberaueretal.,2000).Alongthelines ofvariation).Thus,asecondarypurposeofthisinvestigation ofMiller’s(1956)magicalnumberseven,Cowan(2005,2010) wasto estimate precisely the verbal STMC and verbal WMC proposedthatthemeanmaximalWMCforahealthyadultis means,standarddeviations,andcoefficientsofvariationinthe fourobjects,plusorminusone.Basedonaseriesofexperiments withnovicesandexpertsatchess,GobetandClarkson(2004) argued that Cowan (2010) magical number four was an 1 Thevalueof1.96correspondsto95%ofthestandardnormaldistribution. Multiplying the standard deviation by the 95% standard normal deviate overestimatebyoneobject,asthenaturalWMCofhealthy (i.e.,1.96)ispresumedtocorrespondtothe±2valueassociatedwithMiller’s adults appeared to be closer to three objects. Although law.Inthiscase,astandarddeviationofonecorrespondstoa±valueofvery insightful,GobetandClarkson’s(2004)studywasbasedona nearly2. 88 G.E.Gignac/Intelligence48(2015)85–95 healthyadultpopulation,soastoverifythevaluesproposedby Wechslereditions,aswellasotherpublicationswhichuseda Miller(1956;7±2)andCowan(2005;4±1). comparableDigitSpanscale,aswillbedescribedfurtherbelow. InadditiontoBeaujeanandSheng(2014),Daley,Whaley, Sigman,Espinosa,andNeumann(2003)reportedaDigitSpan 1.4.FlynnEffectandMemorySpan:previousresearch ForwardtestscoreincreaseequaltoaCohen’sd=−.19,based on two samples of Kenyan children tested between 1984 Investigations which have examined the possibility of (N=118)and 1998 (N=537). Much larger differences were cognitiveabilitytestscoreincreasesacrosstimetendtohave observedforRaven’sandavocabularytest.Itwillbenotedthat done so at the aggregate level. For example, Parker (1986) Daleyetal.alsoreportedasubstantialreductionsintestscore examinedFSIQdifferencesacrosstheW-B,theWAIS,andthe standard deviations across time (25%–30% smaller), which WAIS-R, but did not report results at the subscale level. suggests that the changes in test scores may have been due Similarly, Flynn’s (1987) comprehensive investigation was principallytoimprovementsatthelowerendofthedistribu- based principally upon total scale scores (FSIQ, VIQ, PIQ). In tion.Unfortunately,althoughthe1984samplewassomewhat ordertohelpunderstandmorefully thenatureoftheFlynn normativeinnature,itwasrathersmallinsize.Furthermore, effect,recentresearchhasfocussedupontheexaminationof the second sample was essentially a convenience sample, test score changes at the subscale level (Flynn, 2007). For whichmakesvalidinterpretationsofthecomparisonsdifficult. example,BeaujeanandSheng(2014)examinedmeanleveltest Finally, as the samples were based on children, the test scoredifferencesacrosstheWAIS,WAIS-R,WAIS-III,andWAIS- scorechangescouldhavearisenduetochangesintherateof IV at the subscale level. As the raw data were not available, maturationaldevelopmentinchildrenacrosstime. Beaujean et al. identified the subscale raw score means that InadditiontothesmallnumberofFlynneffectinvestiga- corresponded to a scaled score of 10 for each subtest to tions relevant specifically to Digit Span, a small number of determine whether scores increased across editions/time. Flynneffectstudieshaveincludedothermeasuresofmemory Beaujeanetal. reported substantialDigit Span subtestmean span.Forexample,basedonthenormativesamplesassociated increasesacrosstime.Onthesurface,theprocedureusedby with the Adult Memory and Information Processing Battery Beaujeanetal.mayseemvalid.However,suchamethodology (AMIPB; Coughlan & Hollowes, 1985; Oddy et al., 2007), wouldonlybevalidifthenumberofitems,aswellasscoring Baxendale(2010)foundvirtuallynomeandifferencesacross procedure,remainedconstantacrosseditions.Infact,thereare groupsonthelistrecalltask.However,ameanincreaseacross a large number of changes in the number of items within the two normative samples of approximately half of one subtests and scoring procedures across Wechsler editions, standard deviation was observed for the design recall task. manyofwhichcompromisetheinterpretationofasubstantial Baxendale(2010)offeredlittleinthewayofexplanationfor amountofpublishedFlynneffectresearch(Kaufman,2010;but whytheeffectwasobservedforspatialbutnotverbalrecall, seealsoFlynn,2010). except to suggest that the two processes are not perfectly In comparison to other Wechsler subtests, there have correlated.Itisprobablyimportanttonotethatalargenumber beenrelativelyfewchangestotheDigitSpansubtestoverthe oftheitemswithintheAMPIBchangedfromthe1985to2007 years. There are two significant changes that are important editions(Oddyetal.,2007). to consider, however. First, the WAIS-R (and later editions) In another investigation, Rönnlund and Nilsson (2008) awardeduptoamaximumoftwopointsforrecallingcorrectly foundthatanepisodicmemorylatentvariablemeanincreased bothtrialsassociatedwithaDigitSpanitem.Bycontrast,within by .60 of a z-score from 1988–1990 to 2003–2005, which theW-BandtheWAIS,theDSFandDSBscoressimplyreflect wouldbesuggestiveofaFlynneffect.However,anarguably the largest digit series recalled correctly. Thus, it would distinctlimitationassociatedwithRönnlundaetal.isthatthe naturallybeexpectedthattheWAISandtheW-BDigitSpan individuals selected to participate in the investigation were raw scores would be lower than those observed in later drawn exclusively from a single regional town in Sweden Wechslereditions.Infact,theDigitSpanTotal(DST)rawscore (Umeå, population: 110,000). Also, the mean age of the meanthatcorrespondstoascaledscoreof10withintheWAIS samplesincludedinRönnlundaetal.wasrelativelyold,asno is11andincreasedto15–16intheWAIS-R(ages20–24years). participantsundertheageof35yearswereincluded.Thus,the The difference of four to five points may simply reflect the resultsobservedinRönnlundaetal.maybeduetotheoverall changein scoring,notnecessarily achangein memory span health improvements reported in the elderly over the years ability.Secondly,anewDigitSpansubtestwasaddedtothe (Jeune&Brønnum-Hansen,2008). WAIS-IV(DigitSpanSequencing,DSS),andthescoresassoci- Consequently, in light of the above, the purpose of this atedwithDSTwerebasedonthesumofDSF,DSB,andDSS. investigationwastwofold:(1)toestimateacrossacombina- Thus, DST within the WAIS-IV is based on the sum of three tion of normative samples the typical verbal STM and WM subtests, rather than two subtests. Naturally, the DST raw capacitiesofhealthyadults;and(2)totestthehypothesisthat scoremeanthatcorrespondstoascaledscoreof10withinthe theverbalSTMandtheverbalWMcapacitiesofadultshave WAIS-IIItotheWAIS-IVincreasedfrom17–18to28–29(ages increasedacrosstimeinlinewiththeFlynneffect. 20–24years). Again, such an increase would not necessarily reflect an increase in ability across time, but, instead, the 2.Method changeinthescoring.Fortunately,theWAIS-R,WAIS-III,and WAIS-IVreportedadditionaltablesintheirtechnicalmanuals 2.1.Samplesandmeasure thatincludethe‘longestdigitspanforward’and‘longestdigit span backward’ means and standard deviations across all Inordertoaddressthetwoprincipalquestionsposedinthis age groups. These values facilitate valid comparisons across investigation,theresultsassociatedwithseveralpublications G.E.Gignac/Intelligence48(2015)85–95 89 Table1 MeanandstandarddeviationsassociatedwithLongestDigitSpanForward(LDSF)andLongestDigitSpanBackward(LDSB)acrosstime. Source Year N Ages LDSF LDSB DST WellsandMartin 1923 50 Adults 6.3(NA) 5.1(NA) 11.40 Wechsler 1933 236 Adults 6.60(1.13) NA NA Weisenburgetal. 1936 70 18–59 6.69(1.02) 4.87(1.16) 11.56 W-B 1939 1,081 17–70 NA NA 12.00 WMS 1945 96 20–49 6.53(1.17) 4.80(1.12) 11.23 WAIS 1955 1,785 16–75 NA NA 11.00 WAIS–R 1981 1,880 16–74 6.45(1.33) 4.87(1.43) 11.32 MAS 1991 845 18–90 6.63(1.22) 4.83(1.30) 11.46 WAIS-III 1997 2,000 16–74 6.59(1.35) 4.85(1.49) 11.44 WAIS-IV 2008 1,900 16–74 6.72(1.31) 4.84(1.39) 11.56 N-weightedM 6.56(1.22) 4.88(1.32) 11.44 Note.WellsandMartin(1923)createdanormativesamplegroupforthepurposesofstudyingpsychopathology;Wechsler(1933)publishednormativeDigitSpan Forwarddatatocomparethevariabilityassociatedwithalargenumberofhumancharacteristics;Weisenburgetal.(1936)createdanormativesamplegroupforthe purposesofstudyingaphasia;DST=DigitSpanTotal;W-B=Wechsler-Bellevue;WMS=WechslerMemoryScale;WAIS=WechslerAdultIntelligenceScale; MAS=MemoryAssessmentScales;NA=notavailable. (journalarticles,books,andtechnicalmanuals)werecompiled. WAIS-IV, the ‘longest digit span forward’ (LDSF) and the Acrossallselectedpublications,alargelyidenticalDigitSpan ‘longest digit span backward’ (LDSB) means and standard testwasadministered.Specifically,thenatureoftheDigitSpan deviationswerereportedinsupplementaltableswithintheir testconsideredforinclusioninthisinvestigationconsistedofa respective technical manuals.3 Thus, LDSF was considered a seriesofdigitsreadtotheparticipantorallyatarateofonedigit comparable estimate of DSF and LDSB was considered a everysecond.Theparticipanthadtorepeatthedigitsorally.In comparable estimate of DSB. Furthermore, LDSF added to thecaseofDSF,thedigitshadtoberepeatedintheorderwith LDSBwasconsideredanestimateofDST.Inordertoincrease whichtheywereread.InthecaseofDSB,thedigitshadtobe the comparability of the WAIS-R, WAIS-III, and WAIS-IV repeatedinthereverseorderwithwhichtheywereread.DSF normativesamplescoreswiththeothersourcesincludedin andDSBaretypicallyrecognisedasmeasuresofverbalSTMC this investigation (all of which did not include very old and verbal WMC, respectively (Oberauer et al., 2000). In all participants),I calculatedthe N-weightedmeans basedon cases,themeansincludedinthisinvestigationcorrespondedto theLDSFandLDSBvaluesassociatedwiththe16to74year thelargestseriesofdigitsrecalledcorrectly.Inalmostallcases, oldagegroups,insteadofsimplyusingthetotalsample(16 thenumberofdigitswithinaseriesrangedfromthreetonine to 90years) normative sample LDSF and LDSB means and forDSFandtwotoeightforDSB.Inmostcases,themeansand standarddeviations. standarddeviationsassociatedwithDSF,DSB,andDSTwere The DSF and DSB results associated with the Wechsler available.However,insomecases,onlytheresultsforDSFor MemoryScale(WMS;Wechsler,1945)normativesamplewere DSTwereavailable. includedinthisinvestigation4,astheDigitSpansubtestwas The sources/samples included in this investigation are essentially identical to thatincludedin theWAIS(Wechsler, listed in Table 1. It can be observed that the Digit Span 1955).However,thenormativesampleresultsassociatedwith normative sample results associated with the Wechsler- the WMS-R (Wechsler, 1997) were excluded, because the Bellevue(W-B;Wechsler,1939)2,theWechslerAdultIntelli- WMS-Rnormativesamplewasnotwidelyagerepresentative. gence Scale (WAIS; Wechsler, 1955), the Wechsler Adult Specifically, the norms associated with the WMS-R used Intelligence Scale – Revised (WAIS-R; Wechsler, 1981), the interpolatedvaluesforseveralagegroupsfrom18to45years Wechsler Adult Intelligence Scale – III (WAIS-III; Wechsler, of age (Elwood, 1991). The WMS-III (Wechsler, 1987) Digit 1997)andtheWAIS-IV(Wechsler,2008)wereincludedinthe Spannormswerealsoexcluded,astheywereidenticaltothose analysis.WithrespecttotheW-BandtheWAIS,themeansand associated with the WAIS-III (Wechsler, 1997). Finally, Digit standard deviations associated with DSF and DSB were not Span was not included in the WMS-IV (Wechsler, 2008). In reported. However, the raw score DST values (DSF + DSB) lightoftheabove,withrespecttotheWechslerMemoryScales, which corresponded to a scaled score of 10 (i.e., the scaled only the results associated with the WMS (Wechsler, 1945) mean) were published and included in this investigation. In wereincludedinthisinvestigation. the cases of the W-B and the WAIS, the raw score that With respect to non-Wechsler scales, the Digit Span correspondedtoascaledscoreof10wasconsideredappropri- normativesampleresultsassociatedwiththeMemoryAssess- ateforinclusioninthisinvestigation,astheDSFscoreandthe mentScales(MAS;Williams,1991)wereincluded,astheMAS DSBscorecorrespondedtothenumberofdigitsinthelongest Digit Span test is essentially identical to the Digit Span test series recalled accurately (Wechsler, 1939, 1955). Further- includedintheWAIS-R(Wechsler,1981).AlthoughtheMAS more,theDSBandtheDSFscoreswereaddedtogethertoform theDSTscore.WithrespecttotheWAIS-R,WAIS-III,andthe 3 TheLDSFandLDSBmeansandstandarddeviationsassociatedwiththe WAIS-RnormativesamplewerereportedintheWAIS-RNItechnicalmanual 2 TheWechsler-Bellevue(Wechsler,1939)wasnormedonatotalsampleof (Kaplan,Fein,Morris,&Delis,1991). 1750subjects,however,670ofthosewerechildrenasyoungas7years.The 4 TheWechslerMemoryScale(Wechsler,1945)wasnormedonasampleof adult portion of the normative sample amounted to 1071 participants 200healthyadults(ages25to50),however,theDSFandDSBrawscoremeans (Wechsler,1958,p.87).. andstandarddeviationswerereportedforonly96oftheadults. 90 G.E.Gignac/Intelligence48(2015)85–95 technicalmanualdoesnotincludetherawscoremeansand increasedacrosstime,threePearsoncorrelationswereestimat- standarddeviationsforDSFandDSB,theyweresuppliedtome edbetweenyearandthethreememoryspanscores(pvalues via email (M. Williams, personal communication, June 10, estimatedviapermutationtests).Noneofthecorrelationswere 2014).Lesswell-knownarethethreeoldestsourcesincluded statisticallysignificant:DSFr=.45,p=.270;DSBr=−.57, in this investigation. Weisenburg, Roe, and McBride (1936) p =.124; DST r = −.06, p =.880. Thus, as the estimated createdanormativesamplegroupforthepurposesofstudying correlationswerenon-significantanddifferentiallydirected,the and diagnosingaphasia. To this effect, a control groupof 70 hypothesis that memory span scores would evidence mean adults were selected from three hospitals in Pennsylvania. levelincreasesacrosstimewasunsupported(seeFig.1). Although the participants included in the normative sample Finally,thepossibilityofceilingeffectsassociatedwiththe were admitted to hospital, individuals who were suffering Digit Span subscale scores was also examined. As the DSF fromanypsychologicalconditionwereexcluded.Abatteryof meanof6.56wasapproximatelytwostandarddeviationsless intelligencetestswasadministeredtotheparticipants,includ- thanthemaximumpossiblescoreof9,andtheDSBmeanof ingaDigitSpantest.Theitemsrangedfromfivetoeightdigits 4.88wasapproximately2.5standarddeviationslessthanthe for DSF and threeto seven digits for DSB.Weisenburg etal. maximumpossiblescoreof8,itwasconsideredunlikelythat reportedthemeansandstandarddeviationsseparatelyforDSF the DSF and DSB subtest scores suffered from substantial andDSB.Next,Wechsler(1933)publishedanormativesample ceilingeffects.Infact,withrespecttothehighestperforming mean and standard deviation associated with DSF. Although normative group across the WAIS-R, WAIS-III, and WAIS-IV it is impossible to be certain, it would seem reasonable to (20–24yearsofage),thepercentageofparticipantswhoscored presume that the version of Digit Span used by Wechsler themaximumDSFscore(i.e.,9digits)wasequalto9.5%,7.0%, (1933)wasthesameasthatwhichmadeitswayintothewell- and11.0%,respectively.Withrespectto DSB,8.5%,7.0%,and known Wechsler scales. Finally, Wells and Martin (1923) 3.5%oftheparticipantswithintheWAIS-R,WAIS-III,andWAIS- createdanormativesamplegroupforthepurposesofstudying IV normative samples achieved the highest score (8 digits), psychopathology. Several tests were administered to the respectively.Thus,althoughtherewasasmallceilingeffectin normativesamplegroup,includingDigitSpan.TheDSFportion thedata,itwasneithersubstantial,norwasthereanincreasing ofthetestconsistedofseriesofdigitsranginguptoninedigits, trendacross time, supportingfurthertheabsence of a Flynn and the DSB portion of the test consisted of series of digits effect. ranginguptoeightdigits. A number of ostensibly useful sources of data were 4.Discussion excluded, as they were judged not to have administered a sufficiently similar Digit Span test, or the results were not This investigation had two purposes: (1) to estimate reportedinacomparablemanner.Also,somesourceswerenot preciselythetypicalverbalSTMCandverbalWMCofadults, basedonasamplesufficientlyrepresentativetobeconsidered and(2)todeterminewhetherthesecapacitieshaveincreased reasonablynormative.NotableexclusionswereRussell’s(1975, across time, in line with the Flynn effect. The results of this 1988) revision of the WMS, the Stanford-Binet (S-B) intelli- investigationsuggestthatthetypicaladulthasaverbalSTMCof gencebatteries(Terman,1917;Terman&Childs,1912),aswell 6.56objects(plusorminus2.39),andaverbalWMCof4.88 asStarr(1924),Brener(1940)andElwood(2001).Thus,based objects (plus or minus 2.58). Secondly, in contrast to fluid ontheinformationincludedinTable1,itcanbeseenthatthere intelligence test scores, STMC and WMC test scores do not were10normativesamplesourcesincludedinthisinvestiga- appeartohaveincreasedacrosstime. tion across 85years (1923 to 2008). The DSF, DSB, and Based on the results of this investigation, Miller’s (1956) DST sample sizes corresponded to 7077, 6841, and 9770, proposal that the typical STMC of an adult is approximately respectively. seven objects was largely supported in this investigation, as themeanDSFscorewasestimatedat6.56:thus,somewhere 3.Results betweensixandsevenobjects.However,ifMiller’slaw(7±2) implies that approximately 95% of individuals’ STMC fall AscanbeseeninTable1,theN-weightedDSF,DSB,andDST somewherebetweenfiveandnine,itwouldimplythatSTMC means(andSDs)correspondedto6.56(1.22),4.88(1.32),and wasassociatedwithastandarddeviationofapproximately1 11.44 (NA),respectively. Inordertoestimatethe95%lower (i.e.,1*1.96).Theresultsofthisinvestigationsuggestthatthe andupperboundsassociatedwiththeDSFandDSBdistribu- standard deviation is only somewhat larger at 1.22, which tions,theDSFandDSBstandarddeviationsweremultipliedby implies that 95% of the population’s STMC lies somewhere thestandardnormaldeviate(i.e.,1.96).InthecaseofDSF,the between 2.39 (i.e., 1.22 * 1.96) above and below the mean deviationtermcorrespondedto2.39(i.e.,1.22*1.96).Thus,it of 6.56 objects. Thus, in rounded terms, Miller’s law may be maybesuggestedthat95%oftheadultpopulationhasaverbal considered largely accurate, at least in the context of verbal STMCequaltosomewherebetween4.17and8.95objects.In STMC. the case of DSB, the deviation term corresponded to 2.58 Fromacoefficientofvariationperspective(SD/M),STMC (i.e.,1.32*1.96).Thus,itmaybesuggestedthat95%oftheadult wasassociatedwithavalueof.19,which,althoughonthelower population hasa verbal WMC equal to somewhere between side, is roughly comparable to other cognitive capacities. For 2.30and7.46objects. example,basedontheWAIS-IVnormativesamplemeansand ItcanalsobeobservedinTable1thattherewasverylittle standarddeviationsreportedinBeaujeanandSheng(2014),I variability in the means across time. The DSF, DSB, and DST calculatedthemeancoefficientofvariationassociatedwithnine rangescorrespondedto6.30–6.72,4.80–5.10,and11.00–12.00, oftheWAIS-IVsubtests(45–54yearolds;N=200)tobe.27 respectively.Totestthehypothesisthatmemoryspanscores (range:.21to.36).Additionally,basedonthenormativesample G.E.Gignac/Intelligence48(2015)85–95 91 Fig.1.ScatterplotofDigitSpanTotal,DigitSpanForward,andDigitSpanBackwardmeansacrosstime(1923–2008). means and standard deviations (18–30 year olds) associated testthanDSF.Based,ontheWAIS-IVresultsreportedinTableC. with the BMIPB (Oddy et al., 2007) reported in Baxendale 4 of the technical manual (Wechsler, 2008), the mean item (2010),listrecallanddesignrecallwerecalculatedby me to difficultiesassociatedwithDSFandDSBwerecalculatedbyme be associated with coefficients of variation of .22 and .24, tobep=.70andp=.53,respectively.Thus,DSBdoesappear respectively.Strictlyspeaking,Miller’slawimpliesacoefficient to be somewhat more difficult from a pure psychometric ofvariationof.14(1/7),whichmaybesuggestedtoberather perspective. Theoretically, WMC involves the application of low,incomparisontotheresultsreportedinthisinvestigation two principal cognitive processes, encoding and transforma- and other cognitive capacities. Thus, the somewhat larger tion, rather than simply encoding (Oberauer, Lewandowsky, estimate of variability in STMC reported in thisinvestigation, Farrell, Jarrold, & Greaves, 2012). Consequently, it may be in comparison to that implied by Miller (1956), helps bring suggested that the greater amount of variability associated STMCcloserinlinewithothercognitivecapacities. withDSBimpliesthatindividualdifferencesinthecapacityto Cowan’s(2005)proposalofatypicalWMCoffourobjects performbothprocesses(encodingandtransformation)arenot appearstobeanunderestimatebyapproximatelyoneobject,as correlated perfectly. Further support for such a position is thisinvestigationestimatedaDSBmeanof4.88,orfiverounded. reflected in the fact that DSF and DSB are only moderately As per STMC, it would be useful to replicate the estimate of correlatedatr=.55,basedontheWAIS-IVnormativesample fiveobjectsonlarge,representativesamplesandadiversityof (Wechsler, 2008). Even after disattenuation for imperfect measures(spatial,non-numeric,etc.).EvenmoresothanMiller reliability (DSF α = .81; DSB α =.82), the disattenuated (1956), Cowan (2005) appears to have underestimated the correlation(r=.67)isfarfromunity.Thus,arguably,thekey amountofvariabilityinWMCintheadultpopulation,asDSB distinctionbetweenDSBandDSFisnotsimplythatDSBismore wasassociatedwithastandarddeviationof1.32(95%normal difficult;instead,thereappearstobeaqualitativedistinction, deviation term =2.58, or three rounded), rather than the aswell(Hurlstone,Hitch,&Baddeley,2013). standarddeviationof.50impliedbyCowan’s(2005)proposalof Itwashypothesizedthatmemoryspanwouldbeaffectedby plusorminusoneobject(i.e.,.50*1.96).Thus,inlightofthe theFlynneffect,asmemoryspan(WMCinparticular)isvery resultsofthisinvestigation,Cowan’slawofWMCmaybemore closelyrelatedtofluidintelligence(Gignac,2014;Kaneetal., accurately restated as 5 ± 3. Arguably, this is a relatively 2005). The results of this investigation failed to support the substantialre-statement;again,onewhichshouldbeverifiedon hypothesisthatmemoryspanwouldbeaffectedbytheFlynn adiversityofWMCmeasures. effect.Overall,therewerenomeaningfulchangesinmemory Fromacoefficientofvariationperspective(i.e.,SD/M),it spanfrom1923to2008,asmeasuredbyDSF,DSB,andDSTtest wouldappearthatWMCisacognitiveprocessassociatedwith scores.Incontrasttomemoryspan,substantialincreaseshave substantially more variability than STMC. In fact, DSB was beenreportedforfluidintelligence,particularlyasmeasuredby associatedwitha42%largercoefficientofvariationthanDSF Raven’sProgressiveMatrices(Flynn,2012).ThelackofFlynn (.19 vs. .27). Superficially, it may be suggested that greater effect associated with STMC and WMC may be considered variabilitymaybeexpectedforDSB,asDSBisamoredifficult surprising,consideringmemoryspanissointimatelyrelated 92 G.E.Gignac/Intelligence48(2015)85–95 withfluidintelligence(Chuderski,2013;Colom,Abad,Quiroga, theWAIS(Wechsler,1955),perhapstheonlytwoeditionsthat Shih,&Flores-Mendoza,2008;Colom,Abad,Rebollo,&Shih, allowforvalidSimilaritiessubtestcomparisonsinadults.5In 2005; Kane et al., 2005). Thus, if the Flynn effect is not additiontothisinvestigation,thereareothersthathaveeither operating predominantly on g (te Nijenhuis & van der Flier, failedtoobserveaFlynneffectorhaveobservedareversalof 2013),anditisnotoperatingonSTMCorWMC,thecontention theFlynneffect(e.g.,Shayer&Ginsburg,2009;Sundet,Barlaug, that fluid intelligence test scores are increasingsubstantially &Torjussen,2004;Teasdale&Owen,2008).Ultimately,how across time is arguably difficult to reconcile. Based on the the results associated with this investigation should be WAIS-IV normative sample, Gignac and Watkins (2013) integratedwithintheFEliteraturemaybedebatable,adebate estimated that the amount of unique internal consistency whichwillnotberesolveddefinitivelyhere.Furtheranalysis reliability associated with the Perceptual Reasoning index andsynthesisisofcourseencouraged. scores(similartofluidintelligence)wasapproximately.18. From a methodological perspective, it will be noted that Thus,onceg,WMC,andSTMCvarianceisremovedfromfluid manyofthepublishedstudiessupportiveoftheFlynneffectused intelligenceliketestscores,thereisonlyasmallamountof indirectandpossiblyunsubstantiatedquantitativemethods.For reliablevarianceuponwhichtoeffectsubstantial, system- example,Parker(1986)madeuseoftheslopeassociatedwith aticchangesofanysort. time(inyears)andIQfortheStanford-Binet(Terman&Merrill, Noteworthy, however, is the item-level research which 1973)andappliedittotheestimationofdifferencescoresfrom suggeststhattheFlynneffectassociatedwithRaven’sscores individualswhocompleteddifferentWechslerscales(i.e.,WAIS maybedueprincipallytocohortdifferencesinthecapacityfor andWAIS-R).Inanothercase,BeaujeanandSheng(2014)did abstraction(Fox&Mitchum,2013).Basedontheresultsofthis nothaveaccesstotherawdata,consequently,theyestimated investigation, it would appear that any possible increases in thestandarddeviationsassociatedwiththeWechslersubscales abstraction capacity across time have occurred completely byidentifyingtherawscoreequivalentsassociatedwithscaled devoid ofany increases in WMC. Of course, statistically,it is scoresof7and13.Finally,inthecontextofevaluatingRaven’s possiblethattwosubtestsmaybeobservedtobeassociated IQscoregainsin theDutchfrom1952to 1982,Flynn(1987) withasubstantialinter-correlationacrosstwocohorts,butonly reportedthepercentageofmenwhoanswered24itemsormore one subtest evidence increases across time (Flynn, 2007). correctly and applied a method with several assumptions to However,giventhatthelargeassociationbetweenWMCand estimate the changes in terms of IQ scores. Arguably, these fluid intelligence is theorised to be, at least partly, causal in methods are not ideal and/or particularly straightforward. By nature(e.g.,Halford,Cowan,&Andrews,2007),theobservation contrast,astrengthofthisinvestigationisthatthemeansand of a Flynn effect foronlyfluid intelligencemay be suggested standard deviations were obtained directly, and the methods tobeimprobable.Nonetheless,itshouldbeacknowledgedthat usedtoanalysethedataandreporttheresultsweresimpleand thesubstantial,butcurrently non-experimentally established, straightforward. association between WMC and fluid intelligence does not There are, naturally, limitations associated with this necessitateaFEacrossbothconstructs(Flynn,2007). investigation.Inparticular,thestandardnormaldeviateterms Infact,itispossiblethattheresultsofthisinvestigationmay estimatedforSTMC(±2.39)andWMC(±2.58)assumethatthe beconsideredinlinewiththecontentionthattheFlynneffectis DSF and DSB scores were normally distributed. It is highly operating primarily at the level of abstraction ability (Flynn, unlikelythattheywereperfectlyso,asmostcognitiveabilitytest 2012;Fox&Mitchum,2013),ratherthanonatestsuchasDigit scoresareskewedtosomedegree(Micceri,1989).Consequent- Span,as Digit Span is based on stimuli to which individuals ly,theestimatesreportedinthisinvestigationareaccuratethe 85yearsagoandtodaywouldhaveaboutanequalamountof extentthatthedistributionswerenotverysubstantiallyskewed. exposure,i.e.,digitsfromonetonine.Suchacontentionmay Accesstorawdatawouldallowforevenmorepreciseestimates beconsideredostensiblyplausible,however,whenexamined thanthosereportedinthisinvestigation. thoroughly,onewoulddrawtheconclusionthathumanshave Therewerealsoslightceilingeffectsinthedata.Specifically, beenexposedtodigitsatasubstantiallyincreasingrateacross approximately5–10%oftheparticipantsrecalledcorrectlythe time. First, consider that, by 1930, only 12% of residents of largestseriesofdigitsassociatedwiththeDSF(i.e.,9)andthe New York had access to a telephone at home. By 1960, the DSB(i.e.,8)subtests.Thus,themeanSTMCandWMCvalues percentage had increased to 76% (U.S. Bureau of Labor reported in this investigation are likely underestimates to a Statistics,2006).In2011,89%ofUShouseholdshadacellular small degree. For the same reason, the variability in STMC phone and 71% a landline (U.S. Census Bureau, 2013a). and WMC scores reported in this investigation may also be Furthermore, in 1997, 18% of US residents had access to the expected to be underestimates to a small degree. Given the internetathomeand,by2007,thenumberincreased to 62% (U.S. Census Bureau, 2013). Thus, phone numbers, login numbers,personalidentificationnumbers,digitalclocks,digital 5 TheSimilaritiessubtestwithintheW-B(Wechsler,1939)andtheWAIS odometers,cablenetworkswith100sofchannels,onlinestock (Wechsler,1955)consistedof12and13items,respectively.Twooftheitems broking accounts, etc., the typical person today is very likely within the WAIS were completely revised and one additional item was included.AccordingtoWechsler(1945,p.188),arawSimilaritiesscoreof12 usingdigitsatarateastonishinglygreatertothatofthetypical correspondedtoascaledscoreof10intheW-B(ages17to70).Basedonthe person in the 1920s, the oldest data point used in this age-grouped(ages16to69)rawscoreandscaledscoreequivalentspublished investigation. inWechsler(1955),Icalculatedthatascaledscoreof10correspondedtoanN- Finally, it will be noted that there were essentially no weightedmeanrawscoreof12.99.Thus,themeanSimilaritiesrawscore appearstohaveincreasedbyonepointfromtheW-BtotheWAIS.However, changes in adult abstraction ability based on theSimilarities giventheextraitemaddedtotheWAIS,itwouldbeplausibletosuggestthat subtest (a measure of verbal abstraction; Weiss, Saklofske, therewasnomeaningfulchangeinverbalabstractionabilityinadultsfrom Coalson,&Raiford,2010)fromtheW-B(Wechsler,1939)to 1939to1955. G.E.Gignac/Intelligence48(2015)85–95 93 relatively small amount of time it takes to administer Digit of intelligence. However, given that verbal STMC and verbal Span,itwouldbearguablybeneficialfortheWechslerscalesto WMCtestscoresdonotappeartohaveincreasedinthelast includea10digitseriesandaninedigitserieswithintheDSF 85years, crystallised intelligence test scores only minimally andDSBsubtests,respectively. or inconsistently (Flynn, 2007; Lynn, 1990, 2009), and that Much of the validity of the results reported in this changes in subtest items/scoring/administration across edi- investigation rests upon the contention that Digit Span is at tionsmayexplainalargepercentageofseveralsubtestscore leastadecentindicatorofintellectualfunctioning.Somewould meanchangesacrosstime(Kaufman,2010),itmaybeprudent questionsuchacontention(e.g.,Matarazzo,1972).Although to acknowledge that the magnitude, pervasiveness, and true certainly not the best indicator of intellectual functioning, I natureoftheFlynneffectremainsasubstantiallyopenquestion. believe the empirical evidence reviewed in the introduction above suggests that Digit Span, and Digit Span Backward in Acknowledgements particular,isagoodindicatorofgandastrongcorrelateoffluid intelligence(Gignac,2014).TheDigitSpansubtestwaschosen Special thanks to Mike Williams for supplying to me via because memory span has been relatively neglected in the personalcommunicationtheDSFandDSBmeanandstandard Flynneffectliterature,aswellasbecauseitaffordedthebest deviationvaluesassociatedwiththeMemoryAssessmentScales opportunitytoevaluatetestscorechangesacrosstimefroma (Williams, 1991). Thanks also to Mark Hurlstone for some subtestthathaschangedlittleovertheyears. helpfulconversationsduringtheproductionofthismanuscript. It should also be acknowledged that the WMC and fluid intelligence research has been conducted primarily at the References latentvariablelevel,however,thisinvestigationwasconducted at the observed score level, which is compromised, to some Bachelder,B.L.,&Denny,M.R.(1977).Atheoryofintelligence:I.Spanandthe degree,bymeasurementerror.Ideally,thehypothesisofthe complexityofstimuluscontrol.Intelligence,1(2),127–150. Flynneffectwouldbeexaminedwithinthecontextoflatent Baddeley, A. D. (2002). Is working memory still working?. European variable modelling, as measurement error would be held Psychologist,7(2),85–97. Baddeley,A.D.,&Hitch,G.(1974).Workingmemory.PsychologyofLearningand constant across all comparisons (i.e., 0). However, the use Motivation,8,47–89. of latent variable modelling in this context rests upon the Baxendale,S.(2010).TheFlynneffectandmemoryfunction.JournalofClinical assumptionoffactorialinvariance.Thepublishedresearchto- andExperimentalNeuropsychology,32(7),699–703. Beaujean,A.,&Sheng,Y.(2014).AssessingtheFlynneffectintheWechsler date suggests that this is an implausible assumption (Must scales.JournalofIndividualDifferences,35(2),63–78. etal.,2009;Wichertsetal.,2004).Itremainsapossibilitythat Birren,J.E.,&Morrison,D.F.(1961).AnalysisofWAISsubtestsinrelationtoage an invariant latent variable could be created from several andeducation.JournalofGerontology,16,363–369. Blankenship,A.B.(1938).Memoryspan:areviewoftheliterature.Psychological memoryspantasks,ratherthanawholeintelligencebattery.As Bulletin,35(1),1–25. theWAIS-IVincludesthreememoryspantasks(DSF,DSB,and Bolton,T.L.(1892).Thegrowthofmemoryinschoolchildren.AmericanJournal DSS),oncetheWAIS-Vispublished,itmaybeapossibilityto ofPsychology,4,362–380. Brener,R.(1940).Anexperimentalinvestigationofmemoryspan.Journalof test the hypothesis tested in this investigation at the latent ExperimentalPsychology,26(5),467–482. variablelevel.However,itwouldbedonesowithinarelatively Bronner,A.F.,Healy,W.,Lowe,G.M.,&Shimberg,M.E.(1927).Amanualof shortspanofyears. individualmentaltestsandtesting.Boston,MA:Little,Brown&Co. Finally, the samples included in this investigation were Carpenter, P.A., Just, M.A., & Shell, P. (1990). What one intelligence test measures:atheoreticalaccountoftheprocessingintheRavenProgressive drawnexclusivelyfromtheUSA,asitprovedtobethecountry MatricesTest.PsychologicalReview,97(3),404–431. withthelargestnumberofgoodqualitysamplesavailableforthe CensusBureau,U.S.(2013).ComputerandinternetuseintheUnitedStates. purposesofexaminingthequestionsraisedinthisinvestigation. (2013,May)RetrievedSeptember23,2013,fromhttp://www.census.gov/ prod/2013pubs/p20-569.pdf ItispossiblethattheFlynneffectmaybeobservedformemory Chuderski, A. (2013). When are fluid intelligence and working memory spanscoresinothernationalities.Researchersareencouraged isomorphicandwhenaretheynot?Intelligence,41(4),244–262. to explore this possibility, providing sufficiently good quality Colom, R., Abad, F.J., Quiroga, M., Shih, P.C., & Flores-Mendoza, C. (2008). Workingmemoryandintelligencearehighlyrelatedconstructs,butwhy? sourcesofdatacanbeidentified.Similarly,anextensionofthis Intelligence,36(6),584–606. investigation on samples of data from children may prove Colom,R.,Abad,F.J.,Rebollo,I.,&Shih,P.(2005).Memoryspanandgeneral enlightening. However, it would appear that there would be intelligence:Alatent-variableapproach.Intelligence,33(6),623–642. Colom,R.,Flores-Mendoza, C.,Quiroga, M.Á.,&Privado,J.(2005). Working fewer good quality samples available for inclusion in such an memoryandgeneralintelligence:Theroleofshort-termstorage.Personality investigation.Forexample,the‘longestdigitspanforward’and andIndividualDifferences,39(5),1005–1014. ‘longestdigitspanbackward’meansandstandarddeviations Coughlan,A.,&Hollowes,C.(1985).Adultmemory&informationprocessing battery.Leeds,UK:LeedsUniversityHospital. associated with the WISC-R (Wechsler, 1974) were not Cowan,N.(2005).Themagicalnumber4inshort-termmemory:Areconsider- published,tomyknowledge.BasedontheWISC-III(Wechsler, ationofmentalstoragecapacity.BehaviouralandBrainSciences,24,87–185. 1991)andWISC-IV(Wechsler,2003)normativesamples,there Cowan, N. (2010). The magical mystery four how is working memory werevirtuallynochangesinmeanLDSFandLDSBvalues. capacitylimited,andwhy?CurrentDirectionsinPsychologicalScience, 19(1),51–57. Inconclusion,itiscommonlystatedthattheaccumulated Daley,T.C.,Whaley,S.E.,Sigman,M.D.,Espinosa,M.P.,&Neumann,C.(2003).IQ empirical results suggest that intelligence test scores have ontherisetheFlynneffectinruralKenyanchildren.PsychologicalScience, increasedbyapproximatelythreeIQpointsperdecade(Neisser 14(3),215–219. Daneman, M., & Merikle, P.M. (1996). Working memory and language etal.,1996;Nisbettetal.,2012).Suchevidenceisoccasionally comprehension: A meta-analysis. Psychonomic Bulletin & Review, 3(4), usedintheacademicpress(e.g.,Flynn,2007;Stanovich,2011) 422–433. andinthepopularpress(e.g.,Gladwell,2007;Holloway,1999; Dehn,M.J.(2008).Workingmemoryandacademiclearning:Assessmentand intervention.Hoboken,NJ:JohnWiley&Sons. Murdoch, 2007) to support the position that conventional Dempster,F.N.(1981).Memoryspan:Sourcesofindividualanddevelopmental intelligencetestscoresareofquestionablevalidityasindicators differences.PsychologicalBulletin,89(1),63–100. 94 G.E.Gignac/Intelligence48(2015)85–95 Elwood,R.W.(1991).TheWechslerMemoryScale-Revised:Psychometric Must,O.,teNijenhuis,J.,Must,A.,&vanVianen,A.E.M.(2009).Comparabilityof characteristicsandclinicalapplication.NeuropsychologyReview,2(2), IQscoresovertime.Intelligence,37,25–33. 179–201. Neisser,U.(Ed.).(1998).Therisingcurve:Long-termgainsinIQandrelated Elwood, R.W. (2001). MicroCog: assessment of cognitive functioning. measures.Washington,DC:AmericanPsychologicalAssociation. NeuropsychologyReview,11(2),89–100. Neisser,U.,Boodoo,G.,Bouchard,T.J.,Jr.,Boykin,A.W.,Brody,N.,Ceci,S.J.,& Flynn,J.R.(1987).MassiveIQgainsin14nations:WhatIQtestsreallymeasure. Urbina,S.(1996).Intelligence:knownsandunknowns.AmericanPsychol- PsychologicalBulletin,101,171–191. ogist,51(2),77–101. Flynn,J.R.(2007).Whatisintelligence?NewYork,NY:CambridgeUniversity Nisbett, R.E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D.F., & Press. Turkheimer,E.(2012).Intelligence:newfindingsandtheoreticaldevelop- Flynn,J.R.(2009a).RequiemfornutritionasthecauseofIQgains:Raven’sgains ments.AmericanPsychologist,67(2),130–159. inBritain1938–2008.EconomicsandHumanBiology,7,18–27. Norman,S.,Kemper,S.,&Kynette,D.(1992).Adults'readingcomprehension: Flynn,J.R.(2009b).TheWAIS-IIIandWAIS-IV:Daubert motionsfavorthe Effectsofsyntacticcomplexityandworkingmemory.JournalofGerontology, certainlyfalseovertheapproximatelytrue.AppliedNeuropsychology,16, 47(4),258–265. 1–7. Oberauer,K.,Lewandowsky,S.,Farrell,S.,Jarrold,C.,&Greaves,M.(2012). Flynn,J.R.(2010).ProblemswithIQgains:Thehugevocabularygap.Journalof Modeling working memory: an interference model of complex span. PsychoeducationalAssessment,28(5),412–433. PsychonomicBulletin&Review,19(5),779–819. Flynn,J.R.(2012).Arewegettingsmarter?:risingIQinthetwenty-firstcentury. Oberauer,K.,Su,H.-M.,Wilhelm,O.,&Sander,N.(2007).Individualdifferences CambridgeUniversityPress. inworkingmemorycapacityandreasoningability.InA.R.A.Conway,C. Fox,M.C.,&Mitchum,A.L.(2013).Aknowledge-basedtheoryofrisingscoreson Jarrold,M.J.Kane,A.Miyake,&J.N.Towse(Eds.),Variationinworking “culture-free”tests.JournalofExperimentalPsychology:General,142(3), memory(pp.21–48).NewYork:OxfordUniversityPress. 979–1000. Oberauer,K.,Süß,H.M.,Schulze,R.,Wilhelm,O.,&Wittmann,W.W.(2000). Frank,G.(1983).TheWechslerenterprise:Anassessmentofthedevelopment, Working memory capacity—facets of a cognitive ability construct. structure,anduseoftheWechslertestsofintelligence.Oxford:Pergamon PersonalityandIndividualDifferences,29(6),1017–1045. Press. Oddy,M.,Coughlan,A.,&Crawford,H.(2007).BIRTmemoryandinformation Fry,A.F., & Hale,S. (1996). Processing speed, working memory, and fluid processingbattery.Horsham,UK:BrainInjuryResearchTrust. intelligence:Evidenceforadevelopmentalcascade.PsychologicalScience, Parker,K.C.H.(1986).Changeswithage,year-of-birthcohort,agebyyear- 7(4),237–241. of-birthcohortinteraction,andstandardizationoftheWechsleradult Gignac,G.E.(2014).Fluidintelligencesharescloserto60%ofitsvariancewith intelligencetests.HumanDevelopment,29(4),209–222. workingmemorycapacityandisabetterindicatorofgeneralintelligence. Paul,R.H.,Lawrence,J.,Williams,L.M.,Richard,C.C.,Cooper,N.,&Gordon,E. Intelligence,47,122–133. (2005).Preliminaryvalidityof“integneuroTM”:Anewcomputerized Gignac,G.E.,&Watkins,M.W.(2013).Bifactormodelingandtheestimationof batteryofneurocognitivetests.InternationalJournalofNeuroscience, model-basedreliabilityintheWAIS-IV.MultivariateBehavioralResearch, 115(11),1549–1567. 48(5),639–662. Raven,J.C.,Court,J.H.,&Raven,J.(1986).ManualforRaven’sProgressiveMatrices Gladwell,M.(2007).Noneoftheabove.NewYorker,83(40),92–96(2007, andVocabularyScales:Section2—ColouredProgressiveMatrices.London:H. December). K.Lewis. Gobet,F.,&Clarkson,G.(2004).Chunksinexpertmemory:Evidenceforthe Raven,J.,Rust,J.,&Squire,A.(2008).Manual:ColouredProgressiveMatricesand magicalnumberfour…orisittwo?Memory,12(6),732–747. CrichtonVocabularyScale.London:NCSPearson. Goldstein,E.(2010).Cognitivepsychology:Connectingmind,researchand Redick,T.S.,&Lindsey,D.R.(2013).Complexspanandn-backmeasuresof everydayexperience.Belmont,CA:CengageLearning. workingmemory:Ameta-analysis.PsychonomicBulletin&Review,20(6), Halford,G.S.,Cowan,N.,&Andrews,G.(2007).Separatingcognitivecapacity 1102–1113. fromknowledge:Anewhypothesis.TrendsinCognitiveSciences,11(6), Rodgers, J.L. (1998). A critique of the Flynn Effect: Massive IQ gains, 236–242. methodologicalartifacts,orboth?Intelligence,26(4),337–356. Hedden,T.,&Gabrieli,J.D.(2004).Insightsintotheageingmind:aviewfrom Rönnlund,M.,Carlstedt,B.,Blomstedt,Y.,Nilsson,L.G.,&Weinehall,L.(2013). cognitiveneuroscience.NatureReviewsNeuroscience,5(2),87–96. Secular trends in cognitive test performance: Swedish conscript data Holloway,M.(1999).Flynn’seffect.ScientificAmerican,280(1),37–38. 1970–1993.Intelligence,41(1),19–24. Hurlstone,M.J.,Hitch,G.J.,&Baddeley,A.D.(2013).Memoryforserialorder Rönnlund, M., & Nilsson, L.G. (2008). The magnitude, generality, and acrossdomains:Anoverviewoftheliteratureanddirectionsforfuture determinants of Flynn effects on forms of declarative memory and research.PsychologicalBulletin,140(2),339–373. visuospatialability:Time-sequentialanalysesofdatafromaSwedish Jensen, A.R., & Figueroa, R.A. (1975). Forward and backward digit span cohortstudy.Intelligence,36(3),192–209. interactionwithraceandIQ:PredictionsfromJensen'stheory.Journalof Russell,E.W.(1975).Amultiplescoringmethodfortheassessmentofcomplex EducationalPsychology,67(6),882–893. memory functions. Journal of Consulting and ClinicalPsychology, 43(6), Jeune,B.,&Brønnum-Hansen,H.(2008).Trendsinhealthexpectancyatage65 800–809. forvarioushealthindicators,1987–2005,Denmark.EuropeanJournalof Russell,E.W.(1988).RenormingRussell'sversionoftheWechslermemory Ageing,5(4),279–285. scale. Journal of Clinical and Experimental Neuropsychology, 10(2), Kane, M.J., Hambrick, D.Z., & Conway, A.R. (2005). Working memory 235–249. capacityandfluidintelligencearestronglyrelatedconstructs:commenton Sattler,J.M.(1982).Assessmentofchildren’sintelligenceandspecialabilities. Ackerman,Beier,andBoyle(2005).PsychologicalBulletin,131(1),65–71. Boston:Allyn&Bacon. Kaplan,E.,Fein,D.,Morris,R.,&Delis,D.C.(1991).WechslerAdultScale–Revised Schaie,K.W.,Willis,S.L.,&Pennak,S.(2005).Anhistoricalframeworkforcohort –NeuropsychologicalInstrument-Manual.SanAntonio,TX:Psychological differencesinintelligence.ResearchinHumanDevelopment,2(1–2),43–67. Corporation. Serwer,B.J.,Shapiro,B.J.,&Shapiro,P.P.(1972).Achievementpredictionof Kaufman,A.S.(2010).“InWhatWayAreApplesandOrangesAlike?”ACritique 'high-risk'children.PerceptualandMotorSkills,35(2),347–354. ofFlynn’sInterpretationoftheFlynnEffect.JournalofPsychoeducational Shayer,M.,&Ginsburg,D.(2009).Thirtyyearson–alargeanti‐Flynneffect/ Assessment,28(5),382–398. (II):13‐and14‐year‐olds.Piagetiantestsofformaloperationsnorms Lynn,R.(1982).IQinJapanandtheUnitedStatesshowsagrowingdisparity. 1976–2006/7.BritishJournalofEducationalPsychology,79(3),409–418. Nature,297,222–223. Stanovich,K.(2011).Rationalityandthereflectivemind.OxfordUniversityPress. Lynn,R.(1990).Differentialratesofsecularincreaseoffivemajorprimary Starr,A.S.(1924).TheDiagnosticValueoftheAudito-VocalDigitMemorySpan. abilities.BiodemographyandSocialBiology,37(1–2),137–141. PsychologicalClinic,15,61–84. Lynn,R.(2009).FluidintelligencebutnotvocabularyhasincreasedinBritain, Sundet,J.M.,Barlaug,D.G.,&Torjussen,T.M.(2004).TheendoftheFlynn 1979–2008.Intelligence,37,249–255. effect?:Astudyofseculartrendsinmeanintelligencetestscoresof Matarazzo, J.D. (1972). Wechsler’s measurement and appraisal of adult Norwegian conscripts during half a century. Intelligence, 32(4), intelligence (5thed.).NewYork:OxfordUniversityPress. 349–362. Micceri, T. (1989). The unicorn, the normal curve, and other improbable teNijenhuis,J.,&vanderFlier,H.(2013).IstheFlynneffectong?:Ameta- creatures.PsychologicalBulletin,105(1),156–166. analysis.Intelligence,41(6),802–807. Miller,G.A.(1956).Themagicalnumberseven,plusorminustwo:somelimits Teasdale,T.W.,&Owen,D.R.(2008).Seculardeclinesincognitivetestscores:A onourcapacityforprocessinginformation.PsychologicalReview,63(2), reversaloftheFlynnEffect.Intelligence,36(2),121–126. 81–97. Terman,L.M.(1917).TheStanfordrevisionandextensionoftheBinet-Simonscale Miller,L.T.,&Vernon,P.A.(1992).Thegeneralfactorinshort-termmemory, formeasuringintelligence.Vol.18,Baltimore,MD:Warwick&York. intelligence,andreactiontime.Intelligence,16(1),5–29. Terman,L.M.,&Childs,H.G.(1912). Atentativerevision andextensionof Murdoch,S.(2007).IQ:Asmarthistoryofafailedidea.Hoboken,NJ:JohnWiley theBinet-Simon measuring scaleof Intelligence. JournalofEducational &Sons. Psychology,3(2),61–74.