Neuropsychology Copyright2007bytheAmericanPsychologicalAssociation 2007,Vol.21,No.4,419–430 0894-4105/07/$12.00 DOI:10.1037/0894-4105.21.4.419 Estimating the Percentage of the Population With Abnormally Low Scores (or Abnormally Large Score Differences) on Standardized Neuropsychological Test Batteries: A Generic Method With Applications John R. Crawford Paul H. Garthwaite UniversityofAberdeen TheOpenUniversity Catherine B. Gault UniversityofAberdeen Information on the rarity or abnormality of an individual’s test scores (or test score differences) is fundamentalininterpretingtheresultsofaneuropsychologicalassessment.Ifastandardizedbatteryof tests is administered, the question arises as to what percentage of the healthy population would be expectedtoexhibitoneormoreabnormallylowtestscores(and,ingeneral,jormoreabnormallylow scores). Similar issues arise when the concern is with the number of abnormal pairwise differences betweenanindividual’sscoresonthebattery,orwhenanindividual’sscoresoneachcomponentofthe batteryarecomparedwiththeindividual’smeanscore.AgenericMonteCarlosimulationmethodfor tackling such problems is described (it requires only that the matrix of correlations between tests be available)andiscontrastedwiththeuseofbinomialprobabilities.ThemethodisthenappliedtoIndex scores for the Wechsler Adult Intelligence Scale—Third Edition (WAIS–III; D. Wechsler, 1997) and Wechsler Intelligence Scale for Children—Fourth Edition (WISC–IV; D. Wechsler, 2003). Three computerprogramsthatimplementthemethodsaremadeavailable. Keywords:neuropsychologicalassessment,multipletests,single-caseinference,MonteCarlomethods, prevalenceofdeficits Information on the rarity or abnormality of test scores (or test itisimportanttoknowwhatpercentageofthepopulationwouldbe score differences) is fundamental in interpreting the results of a expectedtoexhibitjormoreabnormallylowscores. neuropsychological assessment (Crawford, 2004; Strauss, Sher- Oneapproachtothisissuewouldbetotabulatethepercentages man, & Spreen, 2006). When attention is limited to a single test, of a test battery’s standardization sample exhibiting j or more this information is immediately available; if an abnormally low abnormalscores;thatis,thequestioncouldbetackledempirically. scoreisdefinedasonethatfallsbelowthe5thpercentile,then,by However, to our knowledge, such base rate data have not been definition,5%ofthepopulationisexpectedtoobtainascorethat provided for test batteries commonly used in neuropsychology. is lower (for example, in the case of Wechsler IQs or Indexes, Thus,itwouldbeveryusefuliftherequiredpercentagescouldbe scores of 75 or lower are below the 5th percentile). However, estimatedstatistically.Thiscouldeasilybeachievedifthetestsin multiple tests are used in neuropsychological assessment. There- thebatterywereeitherindependent(i.e.,uncorrelated)orperfectly fore, the important question arises as to what percentage of the correlated. healthy population would be expected to exhibit at least one When tests are independent, the required percentages can be abnormallylowtestscore.Thispercentagewillbehigherthanfor obtained using the binomial distribution (Ingraham & Aiken, any single test, and knowledge of it is liable to guard against 1996). For example, if a battery of six tests is administered, and overinference; that is, concluding impairment is present on the abnormalityisdefinedasascorefallingbelowthe5thpercentile, basisofone“abnormally”lowscorewhensucharesultisnotatall then the percentage of the population expected to show one or uncommoninthegeneral,healthypopulation.Itisalsoimportant toknowwhatpercentageofthepopulationwouldbeexpectedto moreabnormalscoresis26.49%.Thisresultisobtainedbyfinding obtaintwoormore,orthreeormore,abnormalscores;ingeneral, thebinomialprobabilityforrormore“successes”(i.e.,r(cid:1)1ifthe interest is in one or more abnormal scores) in n (cid:1) 6 trials (i.e., tests),withp,theprobabilityof“success”(i.e.,anabnormalscore) oneachindividualtrial,setat.05.Multiplyingtheresultantprob- JohnR.CrawfordandCatherineB.Gault,SchoolofPsychology,Col- ability by 100 provides the estimate of the required percentage. legeofLifeSciencesandMedicine,King’sCollege,UniversityofAber- Note that the percentage obtained is less than the sum of the deen, Aberdeen, United Kingdom; Paul H. Garthwaite, Department of percentages for each test (which are all 5%) because a small Statistics,TheOpenUniversity,MiltonKeynes,UnitedKingdom. numberofindividualsexhibitinganabnormalscoreononeofthe CorrespondenceconcerningthisarticleshouldbeaddressedtoJohnR. tests will, by chance, also exhibit an abnormal score on one or Crawford,SchoolofPsychology,CollegeofLifeSciencesandMedicine, King’s College, University of Aberdeen, Aberdeen AB24 3HN, United moreoftheothertestsbutwillonlycontributeoncetotheformer Kingdom.E-mail:[email protected] percentage. 419 420 ABNORMALITYOFDIFFERENCES At the other extreme, if the tests were all perfectly correlated, Carlo trials are run—it will achieve a high level of accuracy in thepercentageofthepopulationexhibitingoneormoreabnormal estimatingthequantitiesrequired.Inthepresentarticle,weoutline scores would be identical to the percentage for any individual a generic Monte Carlo approach to answering questions of this test—thatis,5%(ifanyonetestfallsbelowthe5thpercentile,then typeandrelatedquestionsthatarealsoofrelevancetoneuropsy- sowillallothers).Theproblemwithbothofthesescenariosisthat chologicalassessment.Beforesettingoutthemethod,weturnfirst neitherisatallrealisticfortestscommonlyusedinneuropsycho- tooutliningtheserelatedquestions. logical assessment. As an example, the average correlation be- tween the four Index scores of the Wechsler Adult Intelligence Number of Abnormal Pairwise Differences Between Scale—Third Edition (WAIS–III; Wechsler, 1997) and Wechsler Components of a Battery Intelligence Scale for Children—Fourth Edition (WISC–IV; Comparisonofanindividual’stestscoresagainstnormativedata Wechsler,2003)are0.61and0.51respectively(thesefigureswere is a basic part of the assessment process. However, in neuropsy- obtained from the correlations between Indexes presented in the chology,suchnormativecomparisonstandardsshouldbesupple- relevanttestmanuals). mented with the use of individual comparison standards when IngrahamandAiken(1996)suggestedthattheuseofbinomial attempting to detect and quantify the extent of any acquired probabilities is justified even when the tests are not independent, impairments(Crawford,2004;Lezak,Howieson,Loring,Hannay, providedthatpotentialusersareawarethattheestimatesprovided &Fischer,2004).Forexample,apatientofhighpremorbidability are upper limits on the percentage of the population exhibiting mayscoreatorclosetothemeanofanormativesample,butthis abnormal scores. That is, they acknowledge that the binomial maystillrepresentaseriousdeclinefortheindividualconcerned. approachcanoverestimatetherequiredpercentages. Conversely, a patient may score well below the normative mean, There are two potentially serious problems with Ingraham and butthismaybeentirelyconsistentwiththeindividual’spremorbid Aiken’s(1996)suggestionofusingbinomialprobabilities.First,if ability.Becauseofthis,emphasisisplacedontheuseofindividual thecorrelationsbetweentestsaresubstantial(astheyoftenarefor comparison standards: Most tests used in neuropsychology are at neuropsychological measures), the binomial estimates of the per- least moderately correlated in the general population; thus, large centageexhibitingatleastoneabnormallylowscorewillbevery discrepanciesinapatient’stestprofilesuggestanacquiredimpair- inflated. Thus, a patient who exhibits one abnormally low score ment on those tasks that are performed relatively poorly (Craw- willlookmuchlessunusualthanistrulythecase.Second,unfor- ford,1992). tunately, Ingraham and Aiken are wrong in asserting that the Base rate data are available for a number of tests used in binomialestimatesinvariablyprovideupperlimitsonthepercent- neuropsychologytoassistclinicianstoquantifytheabnormalityof age exhibiting abnormal scores. Although this assertion holds anydiscrepanciesexhibitedbytheirpatients.Althoughthesedata when j (cid:1) 1 (i.e., when it is required to estimate the percentage provide invaluable information, an obvious issue arises: If a pa- exhibitingoneormoreabnormalscores),itwillverycommonlybe tient’sprofileofstrengthsandweaknessesareexamined,then,by the case that the binomial estimates will underestimate the per- definition,multiplecomparisonsareinvolved.Therefore,although centageswhenjisgreaterthanone(i.e.,whentheinterestisinthe thepercentageofthepopulationexpectedtoexhibitagivensizeof percentage exhibiting two or more, or three or more, abnormal discrepancybetweeneachpossibletestpairisreadilyavailablefor tests,etc.).Thus,usingbinomialprobabilities,apatientwith(say) manytestbatteriesusedinneuropsychology(e.g.,forIndexscores threeabnormallylowscoresmaybeestimatedtobeveryunusual ontheWAIS–IIIandWISC–IV),itwouldbeusefultoknowwhat when,infact,anappreciablepercentageofthegeneralpopulation percentagewouldexhibitjormoreabnormalpairwisedifferences willexhibitthisnumberofabnormallylowscores. overall.Toourknowledge,baseratedataofthislatterformarenot Inviewoftheseproblemswiththeuseofbinomialprobabilities, availableforbatteriescurrentlyinuseinneuropsychology.Fortu- itisworthexploringalternativeapproaches.Giventhatthefocus nately,however,itisonlyalittlemorecomplicatedtoestimatethe of the present article is on obtaining base rate information for requiredpercentagesusingMonteCarlosimulationmethodsthan standardized test batteries, it is assumed that the tests concerned itistoestimatethepercentageexpectedtoexhibitagivennumber follow a multivariate normal distribution (standardized batteries ofabnormallylowscores. aredesignedtopossessthisveryproperty).Itmightbethoughtthat this assumption would allow us to directly calculate the required Number of Abnormal Differences Between Components probabilitieswhenthetestsarenotindependent.However,obtain- of a Battery and an Individual’s Mean Score ingasolutiontothistypeofproblembydirectanalyticmeansis remarkablydifficult(Gentz,1992;Ingraham&Aiken,1996).This Analternativetopairwisecomparisonsofanindividual’sscores is because conditional distributions derived from a multivariate istoobtaintheindividual’smeanscoreonthecomponentsofthe normaldistributiondonot,ingeneral,haveaclosedformwhenthe battery and compare each component with this mean. This ap- conditionisaninequality(e.g.,thescoreonTestXisabnormally proach, developed by Silverstein (1984), has a number of advan- low) rather than an equality (e.g., the score on Test X is 36). tagesovertheformerapproach(Crawford&Allan,1996).These Numerical multiple integration of tail areas from these awkward are perhaps most evident with large batteries, in which Silver- conditionaldistributionswouldbenecessaryiftherequiredprob- stein’s approach serves to reduce the number of comparisons abilitiesaretobeobtainedthroughcalculation. involvedtomanageableproportions.Forexample,ifthereare10 Given these difficulties, an approach based on Monte Carlo testsinabattery,thereare45possiblepairwisecomparisons. simulationmethodshasobviousappeal.Ithastheadvantagethat Evenwithasmallernumberofcomponents,Silverstein’s(1984) itisrelativelyeasytoimplement,and—ifalargenumberofMonte method,byprovidingacommonindividualizedcomparisonstan- CRAWFORD,GARTHWAITE,ANDGAULT 421 dard for each test, has advantages. In arriving at a formulation, The next step (Step 2) is to generate a random vector of k neuropsychologistshavetointegratetheinformationfromaprofile independent, standard normal variates, where k is the number of analysis of a given battery with other test data and information testsinthebatteryofinterest(allstandardstatisticalpackagesand fromahostofothersources(i.e.,themedicalhistory,theclinical spreadsheet packages can generate random variates of this form, interview,behavioralobservations,etc.).Anythingthateasesthis either directly or with the use of widely available macros). The burden is to be encouraged. For example, we agree with Long- vectorofindependentstandardnormalvariatesisthenpostmulti- man’s(2004)suggestionthatSilverstein’sapproachshouldalsobe pliedbythelowertriangularCholeskidecompositionmatrix(Step preferred over the pairwise approach when analyzing WAIS–III 3)andisthenanobservationfromthedesiredmultivariatenormal Index scores. Longman prepared a table to allow neuropsycholo- distributionwithmeanvector0andcovariancematrixR.Steps2 giststoestimatetheabnormalityofthedeviationsofIndexscores and3arerepeatedalargenumberoftimes;intheexamplesused from patients’ mean Index scores. An equivalent table for the in the present article, we draw one million vectors (i.e., to repre- WISC–IV has been provided by Flanagan and Kaufman (2004). sentthescoresofonemillioncases)foreachproblemstudied.An These tables provide very useful information to aid test interpre- example of generating an observation using this method is pro- tation; as part of the present study, we supplement them by videdintheAppendix. examiningthepercentageofthepopulationexpectedtoexhibitjor Notethattheobservationsobtainedbythisprocesshavemeans moredeviations(seenextsection). andstandarddeviationsof0and1,respectively.Itwouldbeeasy to obtain observations that had the same means and standard Estimating Base Rates for the Number of Abnormal deviations as tests in the battery of interest: One would simply Scores and Score Differences for WAIS–III and WISC– multiplytheobservationsbythedesiredstandarddeviation(say15 IV Indexes as in the Wechsler scales) and add the desired mean (say 100). However, there is absolutely no need to do this: The results Analysis of the WAIS–III can be conducted at the level of the obtainedwouldbeidenticalbecausethetransformationislinear. subtests, Index scores, or IQs. A strong case can be made for Theabovestepsareemployedregardlessofthequestionbeing basingtheprimaryanalysisofthesescalesattheleveloftheIndex posed.Weturnnowtotheproceduresadoptedtoaddressthethree scores.Indexscoreshavetheadvantagethattheyaremorereliable problemsidentifiedearlier. thantheindividualsubtests(andonlymarginallylessreliablethan theIQs).Inaddition,becausetheyreflectthefactorstructureofthe Estimating the Percentage of the Population Exhibiting j instrument,theyhavesuperiorconstructvaliditytotheIQs.More- or More Abnormally Low Scores over,empiricalevidenceindicatesthatsuchfactor-basedcompos- ites are better able to differentiate between healthy and impaired When the aim is to estimate the percentage of the population performance than either the IQs or measures of subtest scatter thatwouldexhibitjormoreabnormallylowscores,theprocedure (Crawford,Johnson,Mychalkiw,&Moore,1997). isverystraightforward.Thefirststepistodecidethecriterionused Inviewoftheforegoingpoints,inthepresentstudywefocuson to define an abnormally low score on a subtest or Index of the WAIS–IIIIndexscoresratherthansubtests.Thepercentageofthe batteryandtotranslatethisintoastandardnormaldeviate.Inthe population expected to exhibit j or more abnormally low Index example, we define an abnormally low score as a score that falls scores is quantified, as is the percentage expected to exhibit j or belowthe5thpercentileandsothecorrespondingstandardnormal moreabnormalpairwisedifferencesbetweenIndexscores;finally, deviateis–1.645.Aresearcherorclinicianmayprefertodefinean thepercentageexpectedtoexhibitjormoreabnormaldeviations abnormallylowscoreasonethatfallsbelowthe10thpercentile,in fromanindividual’smeanIndexscoreisquantified.Thesamedata which case the standard normal deviate required is –1.282; alter- aregeneratedfortheWISC–IV. natively,ifanabnormallylowscoreisdefinedasascoremorethan onestandarddeviationbelowthemean,thenthevaluerequiredis simply–1.0. Method Toestimatethepercentageofthepopulationthatwouldexhibit The Generic Simulation Method j or more abnormal scores, the number of abnormal scores ob- tainedoneachMonteCarlotrial(i.e.,foreachsimulatedmember To conduct the simulations, neuropsychologists need to have of the population) is recorded and a tally kept of the number access to R, the k (cid:2) k matrix of correlations between the k exhibiting j abnormal scores across the course of the simulation. components (i.e., subtests or Indexes, etc.) of the test battery. For example, say that only 1,000 trials were run, that 200 cases Fortunately,however,thisistheonlyinformationthatisrequired. exhibitedoneabnormalscoreeach,andafurther100casesexhib- Formoststandardizedbatteriesoftests,thiscorrelationmatrixwill itedtwoabnormalscores.Thentheestimatefromthissimulation be available in the user manual or an accompanying technical would be that 30% of the population will exhibit one or more manual. abnormal scores, and 10% will exhibit two or more abnormal The starting point for the simulation is to obtain the Choleski scores. decomposition of R. The Choleski decomposition, which can be seenasthesquarerootofR,takestheformofalowertriangular Estimating the Percentage of the Population Exhibiting j matrix.Macrosforwidelyusedspreadsheetorstatisticalpackages or More Abnormal Pairwise Differences Between Scores areavailabletoperformthisdecomposition,asarealgorithmsfor mostcomputerlanguages.Thisstepisonlyperformedoncefora To estimate the percentage of the population exhibiting j or givensimulation. moreabnormaldifferencesbetweenpairsofscores,itisnecessary 422 ABNORMALITYOFDIFFERENCES tocalculatethestandarddeviationofthedifferencebetweeneach averagedcorrelationmatrix(i.e.,averagedacrossallagebands)for possiblepairoftests;iftherearektests,thentherearek(k(cid:3)1)/2 theIndexeswasextractedfromtherelevantmanualandusedasthe possiblepairs(e.g.,ifthereare6tests,thenthereare15pairs).The input for the simulations. For both batteries, we tabulated the formula for the standard deviation of the difference (sX (cid:3) Y) percentageofthenormalpopulationexpectedtoexhibitjormore betweenstandardnormalscoresis abnormally low scores, j or more abnormally large pairwise dif- ferences,andjormoreabnormallylargedeviationsfromindivid- (cid:1) sX(cid:3)Y(cid:1) 2(cid:2)2rXY, (1) uals’meanIndexscores. wherer isthecorrelationbetweenanygivenpairoftestsinthe XY Results battery. The standard deviations of the difference between each pair of tests is then multiplied by the standard normal deviate Illustration of the Role of the Number of Tests in a correspondingtothecriterionadoptedforanabnormaldifference. Battery and Their Intercorrelations in Determining the Forexample,ifanabnormaldifferenceisdefinedasadifference Percentage of the Population Exhibiting j or More that would occur in less than 5% of the population regardless of sign (i.e., regardless of the direction of the difference), then the Abnormally Low Scores valuerequiredis1.960.(Notethat,unlikethefirstscenario,itisthe Before tabulating results for the WAIS–III and WISC–IV, we absolute value of the difference that is evaluated in the Monte first use the Monte Carlo method for a more didactic purpose: Carlophaseoftheanalysis.) namely to illustrate the role played by the number of tests in a IntheMonteCarlophaseoftheanalysis,thenumberofabnor- batteryandtheintercorrelationsbetweenthemindeterminingthe mal differences is recorded on each trial (i.e., for each case) and percentageofthepopulationexhibitingjormoreabnormallylow summed across all trials (cases) to then express the number of scores. The number of tests in the battery was set at either 4, 6, casesexhibitingjormoreabnormaldifferencesaspercentages. or 10, and the average correlation between the tests was set at either0,0.3,0.5,or0.7(forsimplicitythiswasachievedbysetting Estimating the Percentage of the Population Exhibiting j allintercorrelationstothesamevalue).Asinallsimulationresults or More Abnormal Scores Relative to Individuals’ Mean that follow, one million Monte Carlo trials were run (i.e., one Scores on the Battery million cases were drawn) for each combination of these two factors.TheresultsofthisprocedurearepresentedinTable1. Forthisanalysis,itisnecessarytocalculatethestandarddevi- ItcanbeseenfromTable1that,asexpected,thepercentageof ation of the difference between individuals’ mean scores on the the population expected to exhibit j or more abnormal scores battery and each of the subtests or Indexes contributing to the increases markedly with the number of tests in the battery. For mean (Crawford, Allan, McGeorge, & Kelly, 1997; Silverstein, example, when the average correlation between tests is 0.3 and 1984).When,asinthepresentcase,thescoresarestandardnormal therearefourtestsinthebattery,itcanbeseenthat16.38%ofthe scores,theformulais population are expected to exhibit one or more abnormally low s (cid:1) (cid:1)1(cid:3)R(cid:1) (cid:2)2m(cid:1) , (2) test,butthisrisesto30.74%whenthebatteryconsistsof10tests. M(cid:3)X X Table1illustratesthatthecorrelationsbetweentestsinthebattery also strongly influence the percentage expected to exhibit abnor- (cid:1) whereRisthemeanoftheelementsinthefullcorrelationmatrix mally low tests. However, the direction of this effect differs as a (includingtheunitiesinthediagonal),andm(cid:1)Xisthemeanofthe functionofj.Withregardtothepercentageexhibitingoneormore column(or,equivalently,therow)ofthematrixthatcontainsTest abnormaltests(i.e.,whenj(cid:1)1),itcanbeseenthatthepercent- Xintheleadingdiagonal(i.e.,thecorrelationofTestXwithitself). ages fall as the average correlation rises. For example, if the Formula2isappliedktimestocalculatethestandarddeviationof batteryconsistsofsixtests,thepercentageexpectedtoexhibitan thedifferencebetweeneachofthektestsandthemeantestscore. abnormallylowtestis26.49%whenthetestsareuncorrelated(i.e., As was the case for the standard deviation of the difference average correlation (cid:1) 0) but falls to 14.85% when the average betweenapairoftests,thisstandarddeviationisthenmultipliedby correlation is 0.7. In contrast, when j (cid:4) 1, the effect is reversed. thestandardnormaldeviatecorrespondingtothecriterionusedto Using the same example of a battery of six tests, 3.29% are defineabnormality.IntheMonteCarlophaseoftheanalysis,the expectedtoexhibittwoormoreabnormaltestswhentheaverage number of abnormal deviations from a case’s mean score is re- correlationiszero,butthisrisesto7.26%foranaveragecorrela- corded on each trial and summed across all trials (cases) to then tionof0.7. express the numbers of cases exhibiting j or more abnormal Whentheaveragecorrelationbetweentestsissetat0(i.e.,the deviationsaspercentages. tests are independent), the results from the Monte Carlo simula- tions are the same, except for trivial differences stemming from Estimating the Percentage of the Population Exhibiting j Monte Carlo variation, to those obtained by calculating binomial or More Abnormal Scores and Score Differences on the probabilities (e.g., for a battery of six tests, the Monte Carlo estimateof26.49%forj(cid:1)1isidenticaltotwodecimalplacesto WAIS–III and WISC–IV the binomial estimate). Therefore, by comparing the results for (cid:1) The methods described above were applied to the four Index differentvaluesofRinTable1,itcanbeseenthatusingbinomial scores of the WAIS–III and WISC–IV. For each battery, the probabilitiestoestimatethepercentageofthepopulationexpected CRAWFORD,GARTHWAITE,ANDGAULT 423 Table1 PercentageofPopulationExpectedtoExhibitjorMoreAbnormallyLowScores((cid:5)5th Percentile)asaFunctionoftheNumberofTests(k)intheBatteryandtheAverageCorrelation (r(cid:1))BetweenTests Percentageexhibitingjormoreabnormallylowscores r(cid:1) k 1 2 3 4 5 6 7 8 9 10 0 4 18.53 1.40 0.05 0.00 0 6 26.49 3.29 0.22 0.01 0.00 0.00 0 10 40.10 8.64 1.15 0.10 0.01 0.00 0.00 0.00 0.00 0.00 .3 4 16.38 3.09 0.52 0.06 .3 6 22.07 5.89 1.64 0.41 0.09 0.01 .3 10 30.74 11.54 4.67 1.94 0.79 0.31 0.11 0.04 0.01 0.00 .5 4 14.42 4.14 1.22 0.26 .5 6 18.64 6.89 2.88 1.19 0.42 0.10 .5 10 24.67 11.55 6.21 3.52 2.02 1.13 0.61 0.29 0.12 0.03 .7 4 12.07 4.99 2.22 0.80 .7 6 14.85 7.26 4.09 2.31 1.21 0.49 .7 10 18.55 10.54 6.94 4.83 3.40 2.38 1.62 1.06 0.60 0.26 toexhibitjabnormaltestscoreswillgivepoorapproximationsto Estimated Percentages of the Population Exhibiting at thecorrectvaluesifthetestsaremoderatelytohighlycorrelated. Least j Abnormally Low Index Scores on the WAIS–III The reason for this can be illustrated with a simple thought and WISC–IV experiment.Takeabatteryconsistingofamoderatenumber(e.g., The results of estimating the percentage of the population ex- 6)ofhighlycorrelatedtestsandsupposeacase’sscoreononeof hibiting at least j abnormally low Index scores on the WAIS–III thesetests(TestX)iswellwithinthenormalrange.Then,foreach arepresentedinTable2.TheequivalentresultsfortheWISC–IV oftheothertests,thechancesthattheyareabnormalwillbewell are presented in Table 3. In both cases a range of (increasingly belowthe5%probabilityassignedundertheassumptionofinde- stringent)definitionsofabnormalitywereapplied,rangingfroma pendence. As a result, the binomial approach overestimates the score that was more than one standard deviation below the mean percentage of such cases that will exhibit one or more abnormal (i.e.,belowthe15.9thpercentile)toascorethatfellbelowthe1st scores.Notethat,intheconversesituation,thatis,whenacase’s percentile.Ourownpreferenceistodefineabnormalityasascore scoreonTestXisintheabnormalrange,theprobabilitythatthe falling below the 5th percentile, and for this reason we have other tests will also be abnormal will be well above the 5% presented these results in bold. These different definitions of probabilityassignedwhenindependenceisassumed.Thiswillnot abnormalityshouldcovermostrequirements. affectthepercentageofcasesexhibitingoneormoreabnormaltest ItcanbeseenfromTable2thatalthough,bydefinition,5%of scores: If a case has already contributed to this percentage by the population would be expected to exhibit an abnormally low exhibitinganabnormalscoreonTestX,itisirrelevanthowmany of the other tests are also abnormal. However, this latter feature Table2 doesexplainwhy(when,asinthepresentexample,thereisasmall PercentageoftheNormalPopulationExpectedtoExhibitat or moderate number of tests) the binomial estimates flip from LeastjAbnormallyLowIndexScoresontheWechslerAdult providinganoverestimateoftherequiredpercentagetoproviding IntelligenceScale—ThirdEdition(WAIS–III) an underestimate of the percentage of the population that will Percentageexhibitingjormoreabnormallylow exhibittwoormoreabnormaltestscores(andthreeormore,etc.). WAIS–IIIIndexscores Note that if there are a large number of moderately to highly Criterionfor correlatedtests,thebinomialestimatesofthepercentageexhibit- abnormality 1 2 3 4 ingjormoreabnormaltestswilltendtobeoverestimatesevenfor j(cid:1)2.Forexample,ifthereare20testsinabatteryinwhichthe (cid:5)15.9th 34.43 17.40 8.57 3.20 (cid:5)10th 23.74 10.34 4.53 1.49 averagecorrelationwas0.7,thebinomialestimateofthepercent- <5th 13.21 4.64 1.74 0.48 ageexhibitingtwoormoreabnormaltestsis26.41%,whereasthe (cid:5)2nd 5.88 1.58 0.49 0.11 MonteCarloestimateis15.66%.However,thebinomialestimates (cid:5)1st 3.12 0.69 0.19 0.04 willstillfliptoprovidingunderestimatesforlargervaluesofj.To Note. Increasinglystringentdefinitionsofabnormalityareusedranging continuewiththeexample,thebinomialestimateofthepercentage from(cid:5)15.9thpercentile(i.e.,morethan1SDbelowthemean)tobelow exhibiting three or more abnormal tests is 7.54%, whereas the the1stpercentile.Wedefineabnormalityasascorefallingbelowthe5th MonteCarloestimateis11.65%. percentile,andforthisreasonwehavepresentedtheseresultsinbold. 424 ABNORMALITYOFDIFFERENCES Table3 Table4 PercentageoftheNormalPopulationExpectedtoExhibitat PercentageoftheNormalPopulationExpectedtoExhibitjor LeastjAbnormallyLowIndexScoresontheWechsler MoreAbnormalPairwiseDifferences,RegardlessofSign, IntelligenceScaleforChildren—FourthEdition(WISC–IV) BetweenIndexScoresontheWechslerAdultIntelligence Scale—ThirdEdition(WAIS–III) Percentageexhibitingjormoreabnormallylow WISC–IVIndexscores Percentageexhibitingjormoreabnormalpairwise Criterionfor differences(regardlessofsign)betweenWAIS–III abnormality 1 2 3 4 Indexes Criterionfor (cid:5)15.9th 37.07 17.01 7.31 2.20 abnormality 1 2 3 4 5 6 (cid:5)10th 25.67 9.81 3.66 0.93 <5th 14.29 4.21 1.28 0.26 (cid:5)25% 65.65 47.68 28.10 7.52 1.02 0.01 (cid:5)2nd 6.33 1.35 0.33 0.05 (cid:5)15% 47.28 28.03 12.42 2.14 0.14 0.00 (cid:5)1st 3.34 0.56 0.11 0.01 (cid:5)10% 35.15 17.68 6.34 0.80 0.03 0.00 <5% 20.20 7.69 1.97 0.16 0.00 0.00 Note. Increasinglystringentdefinitionsofabnormalityareusedranging (cid:5)2% 9.15 2.41 0.43 0.02 0.00 0.00 from(cid:5)15.9thpercentile(i.e.,morethan1SDbelowthemean)tobelow (cid:5)1% 4.90 0.98 0.14 0.00 0.00 0.00 the1stpercentile.Wedefineabnormalityasascorefallingbelowthe5th percentile,andforthisreasonwehavepresentedtheseresultsinbold. Note. Increasinglystringentdefinitionsofabnormalityareusedranging from a difference exhibited by less than 25% of the population to a difference exhibited by less than 1%. Our preference is to define an abnormaldifferenceasoneexhibitedbylessthan5%ofthepopulation, scoreonanysingleWAIS–IIIIndex,asizeablepercentageofthe andforthisreasonwehavepresentedtheseresultsinbold. population (13.21%) would be expected to exhibit one or more abnormallylowIndexscoresoutofthepossiblefourIndexscores. However, it can also be seen that the percentages fall off fairly plied by 100 to express them as percentages, for j (cid:1) 1 to 3 steeplywhenmovingtotwoormoreabnormalscores. are26.49%,3.28%,and0.22%(thepercentagesforj(cid:4)3arevery TurningtotheresultsfortheWISC–IV,againitcanbeseenthat small and so are not reported). Comparing these estimates with itwillnotbeveryunusualforamemberofthenormalpopulation thoseobtainedbyMonteCarlosimulationrevealsthat,forboththe to exhibit at least one abnormal Index score (defined for present WAIS–III and WISC–IV, the binomial approach has fairly poor purposesasbelowthe5thpercentile)butthatthepercentagesfall accuracy. Just as was the case when estimating the percentages off fairly steeply when moving to two or more abnormal scores exhibiting abnormally low scores, the binomial approach overes- andbeyond.TheresultsfortheWISC–IVaresimilartothosefor timatesthepercentagesexhibitingoneormoreabnormalpairwise theWAIS–III.Thisistobeexpectedbecauseboththepatternof differenceswhenj(cid:1)1andunderestimatesthepercentageswhen correlations between Indexes and the absolute magnitude of the j (cid:4) 1. For example, for the WAIS–III, the binomial estimate for correlationsarebroadlysimilarforbothbatteries.Havingsaidthat, j (cid:1) 1 is 26.49% compared with the Monte Carlo estimate theintercorrelationsbetweentheWISC–IVIndexesaresomewhat of 20.20%; for j (cid:1) 2 the binomial estimate is 3.28% compared lower than those for the WAIS–III. As a result, the percentages with7.69%. exhibiting j or more abnormally low scores on the WISC–IV are somewhathigherthanthosefortheWAIS–IIIwhenj(cid:1)1(14.29% vs. 13.21%), and somewhat lower for j (cid:4) 2 (i.e., the figures for j(cid:1)2are4.21%vs.4.64%). Table5 PercentageoftheNormalPopulationExpectedtoExhibitjor MoreAbnormalPairwiseDifferences,RegardlessofSign, Estimated Percentages of the Population Exhibiting j or BetweenIndexScoresontheWechslerIntelligenceScalefor More Abnormally Large Pairwise Differences Between Children—FourthEdition(WISC–IV) Index Scores on the WAIS–III and WISC–IV Percentageexhibitingjormoreabnormalpairwise The results of estimating the percentage of the population ex- differences(regardlessofsign)betweenWISC–IV hibiting j or more abnormally large pairwise differences between Indexes Criterionfor WAIS–III Index scores are presented in Table 4; the equivalent abnormality 1 2 3 4 5 6 resultsfortheWISC–IVappearinTable5.Itcanbeseenthatfor both batteries, a reasonable percentage of the population is ex- (cid:5)25% 65.56 47.61 28.03 7.61 1.06 0.01 pected to exhibit one or more abnormal pairwise differences (cid:5)15% 47.21 28.01 12.45 2.19 0.15 0.00 (20.20%fortheWAIS–III,20.16%fortheWISC–IV). (cid:5)10% 35.01 17.71 6.35 0.83 0.03 0.00 <5% 20.16 7.71 2.00 0.17 0.00 0.00 Although Ingraham and Aiken (1996) only considered use of (cid:5)2% 9.12 2.43 0.44 0.02 0.00 0.00 binomialprobabilitieswhenestimatingthenumberofabnormally (cid:5)1% 4.88 0.99 0.14 0.00 0.00 0.00 lowscoresonabattery,itisworthconsideringhowwellbinomial probabilitiescanmodelthepercentagesofthepopulationexpected Note. Increasinglystringentdefinitionsofabnormalityareusedranging from a difference exhibited by less than 25% of the population to a to exhibit j or more abnormal score differences. Defining an difference exhibited by less than 1%. Our preference is to define an abnormalpairwisedifferenceasoneexhibitedbylessthan5%of abnormaldifferenceasoneexhibitedbylessthan5%ofthepopulation, thehealthypopulation,therequiredbinomialprobabilities,multi- andforthisreasonwehavepresentedtheseresultsinbold. CRAWFORD,GARTHWAITE,ANDGAULT 425 Estimated Percentages of the Population Exhibiting j or Table7 More Abnormally Large Deviation Scores Relative to PercentageoftheNormalPopulationExpectedtoExhibitjor Their Mean Index Score on the WAIS–III and WISC–IV MoreAbnormalWechslerIntelligenceScalefor Children—FourthEdition(WISC–IV)IndexScoresRelativeto Theestimatedpercentagesofthepopulationexhibitingjormore Individuals’MeanIndexScores(RegardlessofSign) abnormally large deviation scores relative to their mean Index score on the WAIS–III are presented in Table 6; the equivalent Percentageexhibitingjormoreabnormal deviationscores(regardlessofsign)onthe results for the WISC–IV appear in Table 7. As was the case for WISC–IV pairwise differences, when an abnormal deviation score for each Criterionfor individualIndexisdefinedasoneexhibitedbylessthan5%ofthe abnormality 1 2 3 4 population, a sizeable percentage of the healthy population (cid:5)25% 61.16 32.83 5.01 0.94 (16.74%fortheWAIS–III,16.71%fortheWISC–IV)isexpected (cid:5)15% 42.22 16.20 1.33 0.20 to exhibit one or more abnormal deviation. Again, however, the (cid:5)10% 30.49 8.97 0.45 0.06 percentages fall off rapidly as j increases. For example, for the <5% 16.71 3.22 0.07 0.01 WAIS–III,only3.21%ofthepopulationisexpectedtoexhibittwo (cid:5)2% 7.21 0.82 0.01 0.00 or more abnormal deviations (the corresponding figure for the (cid:5)1% 3.72 0.29 0.00 0.00 WISC–IVis3.22%). Note. Increasinglystringentdefinitionsofabnormalityareusedranging from a difference exhibited by less than 25% of the population to a Discussion difference exhibited by less than 1%. Our preference is to define an abnormaldeviationasoneexhibitedbylessthan5%ofthepopulation,and An Example of the Application of the Present Approach forthisreasonwehavepresentedtheseresultsinbold. Using the Results for WAIS–III Index Scores Toillustratethepotentialapplicationsofthepresentapproach, isunusuallylow,itisbynomeanshighlyunusualforamemberof we take the hypothetical case of a patient who had suffered a thegeneralpopulationtoexhibitanabnormallylowscoreonone severe traumatic brain injury and had been administered the ofthefourIndexes. WAIS–III as part of a more comprehensive neuropsychological Turning to pairwise comparisons of the patients’ Index scores, assessment.Thepatient’sscoresontheWAIS–IIIIndexeswereas supposewedefineanabnormalpairwisedifferenceasonethatis follows: Verbal Comprehension (VC) (cid:1) 118; Perceptual Organi- exhibitedbylessthan5%ofthepopulation.ByconsultingTable zation(PO)(cid:1)106;WorkingMemory(WM)(cid:1)78;andProcessing B.2oftheWAIS–IIImanual,itcanbeconcludedthatfourofthe Speed(PS)(cid:1)68.Supposewedefineanabnormallylowscoreas patients’sixpairwisecomparisonsareabnormal(VCvs.WM,VC onethatfallsbelowthe5thpercentile(Indexscoresof75orlower vs.PS,POvs.WM,andPOvs.PS).Forexample,adiscrepancy arebelowthe5thpercentile).Thenoneofthepatient’sscores(PS) of26ormorepointsbetweenVCandWMisrequiredtomeetour wouldbeclassifiedasabnormallylow.ReferringtoTable2,itcan chosen criterion for abnormality. That is, from Table B.2 of the beseenthat13.21%ofthepopulationisexpectedtoexhibitoneor manual it can be seen that 5.4% of the standardization sample moreabnormallylowscores.Thus,althoughthepatient’sPSscore exhibitedadifferenceof25ormorepointsbetweenVCandWM, whereas only 4.3% exhibited a difference of 26 or more points. Thediscrepancybetweenthepatient’sscoresonVCandWMis40 points and therefore easily meets the criterion for abnormality. Table6 ReferringtoTable4,wecanseethatonly0.16%ofthepopulation PercentageoftheNormalPopulationExpectedtoExhibitjor is expected to exhibit four or more abnormal pairwise differ- MoreAbnormalWechslerAdultIntelligenceScale—Third ences. Edition(WAIS–III)IndexScoresRelativetoIndividuals’Mean Turning to examination of the deviations from the patient’s IndexScores(RegardlessofSign) meanIndexscore,supposethat,aspreviously,weuseadeviation Percentageexhibitingjormoreabnormal thatisexpectedtobeexhibitedbylessthan5%ofthepopulation deviationscores(regardlessofsign)onthe as our criterion. The mean Index score is 92.50; therefore, the WAIS–III deviationsfromthismeanare25.50forVC,13.50forPO,–14.50 Criterionfor for WM, and –24.50 for PS. Referring to Longman’s (2004) abnormality 1 2 3 4 estimated base rate data for deviations, two of the Index scores (cid:5)25% 61.19 32.86 4.91 0.95 qualifyasabnormallylargeaccordingtoourchosencriterion(VC (cid:5)15% 42.27 16.21 1.28 0.20 isabnormallyhigh,PSabnormallylow).Forexample,forthePS (cid:5)10% 30.51 8.96 0.43 0.06 Index, the (absolute) observed difference must exceed 17.73 to <5% 16.74 3.21 0.07 0.01 meetthechosencriterionforabnormality.ReferringtoTable6,it (cid:5)2% 7.21 0.81 0.01 0.00 (cid:5)1% 3.73 0.28 0.00 0.00 can be seen that only 3.21% of the population is expected to exhibit two or more abnormal deviations from their mean Index Note. Increasinglystringentdefinitionsofabnormalityareusedranging scores. from a difference exhibited by less than 25% of the population to a In this example, the indications of abnormally large pairwise difference exhibited by less than 1%. Our preference is to define an abnormaldeviationasoneexhibitedbylessthan5%ofthepopulation,and discrepancies and deviation scores survived the further scrutiny forthisreasonwehavepresentedtheseresultsinbold. applied;thatis,itisestimatedthatfewhealthyindividualswould 426 ABNORMALITYOFDIFFERENCES exhibitthisnumberofabnormalscoredifferences.Notealsothat differences.Insummary,theseprogramsperformallthenecessary theclient’spatternofstrengthsandweaknessesisconsistentwith clericalworkrequiredtoexaminetheabnormalityofIndexscores a head injury. This combination of profile and degree of abnor- as well as Index score differences and deviations. See a later mality gives a high degree of confidence in a conclusion that the sectionfordetailsofwheretodownloadtheseprograms. clienthassufferedsignificantacquiredimpairment. It should be noted that, to estimate the abnormality of each It will be appreciated, however, that this need not be the case. individualpairwisedifference,theprogramsuseFormula1;thatis, Forexample,supposeanindividualexhibitedonlyoneabnormally abnormalityisestimatedstatisticallyratherthanbyusingempirical largepairwisedifferenceoroneabnormallylargedeviation.Such baseratesasfoundintherelevanttestmanuals(e.g.,TableB.2of resultswillnotbeuncommoninthehealthypopulation;20.2%are theWAIS–IIImanual).Theformerapproachisinkeepingwiththe expectedtoexhibitoneormoreabnormallylargepairwisediffer- use of statistical, rather than empirical, methods to estimate the ences (see Table 4), and 16.74% are expected to exhibit one or abnormalityofdeviationsfromindividuals’meanIndexscoresin moreabnormallylargedeviationscores(seeTable6).Evenwhen the present article (Formula 2) and in the tables provided by the weakness in such a case is in line with clinical expectations Longman(2004)andFlanaganandKaufman(2004). (e.g., PS or WM in a client with head injury), much more in the The result is that, on occasion, the number of pairwise differ- wayofconvergingevidencefromothersourcesisrequiredbefore encesordeviationsestimatedtobeabnormalwilldifferdepending one could be confident in inferring the presence of acquired onwhethertheprogramisusedortheempiricalbaseratedatain impairment. In cases when there is little in the way of theory or the relevant manual. For example, take the earlier example in priorempiricalevidencetospecifyalikelypatternofimpairment, whichatraumaticbraininjurycaseexhibitedfourabnormalpair- aknowledgeofthebaseratesforthenumberofabnormalscores wise differences between the case’s Index scores. Suppose, how- orscoredifferencesassumesparticularimportance. ever, that the case scored 105 rather than 106 on the PO Index. Beforeconcludingthisexample,itshouldbenotedthatmaking Using the statistical method (as implemented in the WAIS and useofthedataprovidedinthepresentarticleaddslittletothetime WISC programs), the case is still classified as exhibiting four taken to analyze a patient’s profile. That is, the time consuming abnormal pairwise differences—the difference (of 27 points) be- aspectsarethosethatprecedeuseofthetablespresentedhere,and tween the PO Index and WM remains abnormal (it is estimated these will already form a part of many neuropsychologists’ prac- that 3.64% of the healthy population will exhibit this size of a tice (i.e., most neuropsychologists are aware that they should be differenceorlarger;i.e.,lessextremethanintheoriginalexample concerned with the degree of abnormality of any scores or score [3.00%] but nevertheless still meets the selected criterion for differences present in a patient’s profile). Moreover, we are not abnormality).However,usingtheempiricalbaserates,thisdiffer- suggesting that neuropsychologists use both the pairwise and de- ence is not classified as abnormal (from Table B.2, 5.2% of the viationapproachtoexaminingdiscrepancies.Ourownpreference normative sample exhibited a difference of this magnitude or is for the approach of Longman (2004), see also Flanagan and larger compared with 4.6% for the difference of 28 points in the Kaufman (2004), but we recognize that many may prefer, or at originalexample). least be more familiar with, the pairwise method adopted in the Anumberoffactorscontributetodifferencesbetweenempirical WAIS–IIIandWISC–IVmanuals. and statistical base rates; for example, statistical base rates are Tofurthereaseuseofthesedata,wehavewrittentwocomputer unaffectedbytheinevitablesmall“bumpsandwiggles”thatwill programs,onefortheWAIS–III(WAISIII_Percent_Abnorm.exe) occurinempiricaldistributionsevenwhennormativesamplesare and one for the WISC–IV (WISCIV_Percent_Abnorm.exe). The large. However, the most important factor is that a number of programsrequirethatusersenteranindividual’sscoresonthefour people in the normative sample will obtain the same difference Indices and select a criterion for abnormality. For example, they scoreasthatobtainedbytheindividualofinterest.Whenempirical mightchoosetodefineanabnormallylowscoreasascorefalling base rates are employed then, typically, the percentage recorded below the 5th percentile. The programs automate the process of and read off by the user is the percentage equaling or exceeding determining whether each of an individual’s Index scores are this difference. In contrast, because the statistical approach treats abnormally low and whether they exhibit abnormally large pair- thedataascontinuous,itwill,inessence,credithalfthoseobtain- wise differences or abnormally large deviations from the mean ingthesamedifferencescoreashavingalargerdifferenceandthe Indexscore.Theprogramsreportthepercentageofthepopulation otherhalfasobtainingasmallerdifference;thus,thepercentages expected to exhibit lower scores for each Index and either the willnormallybealittlelowerthanthoseobtainedfromempirical percentage of the population expected to obtain larger pairwise rates. This is akin to the procedure used for forming standard differencesorlargerdeviations(theuserselectsbetweenthelatter percentiles;moreover,thesameholdswhenestimatingtheabnor- ofthesetwooptions). mality of deviation scores using existing tables (as these too are Itthenreportsthenumberofscoresfortheindividualthatmeet derivedstatistically). thechosencriterionforabnormalityandreportsthepercentageof thepopulationexpectedtoexhibitthisnumber,ormore,ofabnor- The Distinction Between the Abnormality of Differences mally low scores. The reporting of abnormally large pairwise and the Reliability of Differences differences or deviations follows a similar format. That is, if the user has chosen to analyze pairwise differences, the program When examining differences in an individual’s score profile, a records the number of the individual’s pairwise differences that crucial distinction is that between the abnormality of the differ- meetthecriterionforabnormalityandreportsthepercentageofthe encesandthereliabilityofthedifferences(Crawford,2004).Ifa populationexpectedtoexhibitthisnumber,ormore,ofabnormal differenceisreliable,thenthismeansitisunlikelytohavearisen CRAWFORD,GARTHWAITE,ANDGAULT 427 asaresultofmeasurementerrorinthetests.Thatis,theobserved First,itisunlikelythatbinomialprobabilitieswillyieldaccurate differenceisliabletoreflectagenuinedifference.However,reli- estimatesofthebaserateforthereasonsoutlinedearlier.Second, able differences can be very common in the cognitively intact ifbinomialprobabilitiesareused,thenitiscrucialthattheprob- population, particularly if the tests have high reliability. Thus abilitycalculatedandreportedisthesumofprobabilitiesforjor reliable differences, on their own, cannot serve as a basis for moresuccesses,nottheprobabilityofexactlyjsuccesses.So,for inferring impairment on the test that is performed more poorly. example, if there are six items in a battery, and the criterion for Thepresentarticleissolelyfocusedontheabnormalityofdiffer- impairmentisthatapatientexhibitsfourormoreabnormalscores, ences. then the binomial probabilities for four, five, and six successes needtobesummedtoestimatethebaserate.Thebinomialprob- abilityforexactlyjsuccessesmaygrosslyunderestimatethebase Use of These Methods in Group Research rate.Grunseitetal.(1994)usedtheexampleofabatteryof16tests Thefocusoftheforegoingdiscussionhasbeenontheuseofthe andacriterionofscoresonestandarddeviationbelowthemeanon generic Monte Carlo method as an aid to interpretation of an twotests.Theyreportedtheprobabilityofmeetingthiscriterionby individualpatient’sprofile.However,themethodalsohasanum- chance as 0.268 (this is the binomial probability for exactly 2 ber of applications in group-based research, either as a means of successes out of 16), whereas the correct figure is 0.747 (the evaluating the results of an existing study or to inform decision probabilityof2ormoresuccesses). makingatthedesignstage.Forexample,instudiesofthepreva- The third problem is that, even if it were appropriate to use lenceofneuropsychologicaldeficitsinvariouspatientpopulations binomialprobabilitiestoestimatethebaserate(i.e.,thetestswere (e.g.,multiplesclerosis,HIV,diabetes,mildcognitiveimpairment, independent),andevenifthecorrectprobabilitieswerecalculated, patientswhohaveundergonecoronaryarterybypassgrafts,etc.), thisprocedurewouldstillnottestwhetherthepercentageofcases it is common practice to administer multiple neuropsychological in a sample meeting the criterion significantly exceeds chance. teststoassessperformanceacrossarangeofcognitivedomains. Grunseitetal.(1994)havesuggestedthatitdoes;thatis,theyhave Theprevalenceofdeficitsinthesepopulationsisthenestimated referredtotheprocedureastestingwhetherthenumberofpatients bycalculatingthepercentageofcasesmeetingapredefinedcrite- identified is “significantly different from chance” (p. 906), but it rionforimpairment.Thecriteriausedinsuchstudiesareusually canbeseenthattheclinicalsampledoesnotfeatureinanyofthe oftheformthatapatientshouldexhibitjormoreabnormallylow abovecalculations,andthereforethemethodisnotaninferential scoresonthebattery,wherebothjandthedefinitionofabnormal- method. ityvariesfromstudytostudy.Forexample,thecriterionmaybe Wesuggestthatproblemsofthiskindshouldbeaddressedusing thatapatient’sscoresonatleastthreetestsmustbeonestandard atwo-stageprocess.Thefirststageshouldbetodeterminethebase deviationbelowthemean. rate,thatis,toestimatethepercentageofindividualsthatwillbe A number of authors have stressed the need to be concerned classifiedasimpairedbychance.If,aswillrarelybethecase,the with the base rate of false positives in such studies (Grunseit, tests in the battery are independent (i.e., uncorrelated), then the Perdices, Dunbar, & Cooper, 1994; Heaton, Miller, Taylor, & binomial distribution can be used to estimate this percentage. In Grant, 2004; Iverson, White, & Brooks, 2006; Lewis, Maruff, & most circumstances, however, the Monte Carlo method outlined Silbert,2004),andthereareanumberofvividempiricaldemon- here should be used. Note that if estimates of the population strations of the dangers when such base rates are ignored (de correlationsbetweenthecomponentsofthebatteryarenotavail- Rotrou et al., 2005; Lewis, Maruff, Silbert, Evered, & Scott, able(e.g.,ifthebatteryconsistsofmeasuresderivedfromdiverse 2006a,2006b;Palmer,Boone,Lesser,&Wohl,1998). sources), then the Monte Carlo method can be used in a more As noted by most of these studies, one clear solution to this exploratory fashion to examine the base rates under different problemistoincludeamatchedhealthycontrolsampletoobtain assumptionsforthepopulationcorrelations. empirical base rates for the criterion used. However, practical Regardless of the method used to estimate the base rate, a constraints mean that many prevalence studies do not include second analysis should then be performed to determine whether controlsamples.Insuchcircumstances,itisthereforeimportantto the observed number of cases in the clinical sample that were estimate the base rate at which members of the healthy (i.e., classified as impaired significantly exceeds the base rate. This cognitively intact) population would be identified as cases as a secondstageshouldbeperformedusingthebinomialdistribution safeguardagainstoverestimatingtheprevalenceofcognitivedef- with the estimated base rate used to specify p, the probability of icits. success on each “trial.” Unlike the first stage of the analysis, the Grunseit et al. (1994), in an article that raised a number of useofbinomialprobabilitiesisentirelyappropriateforthissecond important methodological issues concerning research in HIV pa- stage as the observations (trials, in binomial parlance) are inde- tients, recommended that, when a control sample is unavailable, pendent; that is, they are individual cases rather than tests. (Note studiesshouldtestwhetherthepercentageofpatientsclassifiedas that here we assume that the correlations between tests used to impairedexceedschance.Althoughweagreewiththeneedtotake estimatethebaseratewereobtainedfromalargestandardization accountofbaserates,theapproachtheyrecommendtoachievethis sample such that we can treat the base rate as fixed. When the aimisinappropriate.Grunseitetal.suggestexaminingwhetherthe correlationsareobtainedfromamoremodestsampleitwouldbe percentageofcasesidentifiedasimpairedexceedsthebaseratefor appropriatetotreatthestandardizationsampleasasampleandtest impairmentascalculatedusingthebinomialdistribution.Thereare foradifferencebetweentwoindependentproportionsratherthan, threeproblemswiththis. ashere,compareasampleproportionagainstafixedproportion.) 428 ABNORMALITYOFDIFFERENCES To illustrate the suggested procedure, suppose a study of the from the following web page: http://www.abdn.ac.uk/(cid:6)psy086/ prevalenceofimpairmentinnewlydiagnoseddiabeticshasadmin- dept/PercentAbnormKtests.htm. isteredabatteryof16teststoasampleof120patientsandthat,for Theprogramimplementsthegenericmethodsdevelopedinthe convenience, the correlation between tests in the battery’s stan- presentarticleandcanbeusedtocalculatethepercentageexhib- dardizationsampleareall0.5.Supposealsothatthecriterionused iting j or more abnormally low tests, j or more abnormally large to identify a case as cognitively impaired was performance one pairwisedifferencesbetweentests,andjormoreabnormallylarge deviations from individuals’ mean scores. It requires the user to standard deviation below the mean on three or more tests and specifythenumberoftestsinthebattery(uptoamaximumof20 that48patients(40%ofthesample)metthiscriterion.Application tests), specify a criterion for abnormality (e.g., below the 5th oftheMonteCarlomethodrevealsthat34.48%ofthegeneral(i.e., percentile in the case of abnormally low scores), and enter the cognitivelyintact)populationarealsoexpectedtomeetthiscrite- correlationbetweentestsintheformofalowertriangularmatrix. rion. This constitutes Stage 1 (i.e., calculation of the base rate, Theoutputsconsistoftheestimatedpercentageofthepopulation whichweassumeisderivedfromaverylargenormativesample). expectedtoexhibitjormoreabnormallylowscores(e.g.,ifthere Calculationofthebinomialprobability(Stage2)for48successes are three tests in the battery, it records the percentage of the (i.e., cases identified) in 120 trials (i.e., the sample n) with a populationexpectedtoexhibitoneormore,twoormore,andthree probability of success of 0.3448 on each trial (i.e., for each abnormallylowscores).Theprogramalsorecordsthepercentage individual)yieldsabinomialprobabilityof0.120.Thus,thenum- of the population expected to exhibit j or more abnormally large berofcasesidentifiedasimpaireddoesnotsignificantlyexceedthe deviations from individuals’ mean scores on the battery, and the baserate.Inthishypotheticalexample,wewouldconcludethatwe percentageofthepopulationexpectedtoexhibitjormoreabnor- cannot reject the null hypothesis that the rate is the same as that mallylargepairwisedifferencesbetweentestsinthebattery. expectedinthegeneralpopulation;thatis,thereisnoevidencefor Theprogramallowstheusertoselectfrom1of10criteriafor cognitive deficits in the population of newly diagnosed diabetic abnormality. In the case of abnormally low scores, these range patients. from a score estimated to be exhibited by less than 25% of the Inthisexample,itisclearfromthebaseratethatthecriterionfor population through to a score exhibited by less than 1% (the identifying caseness was very lax. To help avoid such problems, intermediate criteria include scores that are 1, 1.5, or 2 standard theMonteCarlomethoddevelopedherecanalsobeusedprospec- deviations below the mean; i.e., scores below the 15.8th, 6.6th, tivelyasameansofselectingcriteriaforprevalencestudies.That or2.28thpercentiles).Forsimplicity,thecriterionselectedbythe is,variouscandidatecriteriacanbeevaluatedtoestimatetherate user to define an abnormally low score is also used to define at which they will yield false positives to find a criterion accept- abnormallylargepairwisedifferencesandabnormallylargedevi- able to the investigator. As referred to earlier, when estimates of ations from individuals’ mean scores. For example, if an abnor- thepopulationcorrelationsareunavailable,themethodcanbeused mally low score is defined as a score falling below the 5th per- in a more exploratory fashion to model the base rates under centile,thenanabnormallylargepairwisedifference(ordeviation) differentassumptionsforthemagnitudeofthesecorrelations.For is defined as a difference (or deviation) that is exceeded by less example,asShallice;(1988)andothers(Crawford&Garthwaite, than5%ofthenormalpopulation,regardlessofsign. 2005)havenoted,populationcorrelationsamongneuropsycholog- Althoughtheprogramissimpletouseandtheoutputresembles icalteststypicallyaveragearound0.5;thisroughandreadyruleof thatpresentedinthetablesofthepresentarticle,usersmaywish thumbcouldserveasonemodelforthecorrelations. to verify their understanding by running an example drawn from Note that the population correlations referred to here are the thepresentarticlethroughtheprogrambeforeusingittogenerate correlationsinthegeneralpopulation;itwouldnotbeappropriate equivalentbaseratedataforotherbatteries. tousecorrelationsobtainedfromclinicalsamplestoestimatebase For practical reasons, the program is limited to dealing with rates as these will often differ substantially from the latter. The batteriesconsistingof20orlesscomponents.Thisshouldnotbe direction of the difference is hard to predict: Correlations are a serious imitation as few batteries routinely used in neuropsy- strongly influenced by the degree of dispersion of scores and so chology are larger than this. In the case of batteries with more correlationswillcommonlybehigherinimpairedsamples.Onthe than20components,itislikelythatcomponentswillbecombined otherhand,ifdissociationsbetweenparticulartasksarecommonin intocompositeindexesofsomeform.Therefore,itshouldstillbe agivenclinicalpopulation,thencorrelationswilltendtobelower. possibletogeneratebaseratedataforsuchbatteries,albeitonlyfor Beforeleavingthistopic,notethattherewillbeoccasionsinwhich thecompositescores. theinterestisinexaminingwhetherasubgroupofpatientsexceeds Asreferredtoearlier,wehavealsowrittenprogramsdesigned a particular base rate in a clinical population; in such circum- specificallytoassistintheanalysisofanindividual’sperformance stances the prescription against using correlations from clinical on the WAIS–III and WISC–IV Indexes; these programs can be samplesobviouslydoesnothold. obtainedfromthesamelocationsasthegenericprogram. Computer Program for Estimating the Percentage Some Caveats on the Use of These Methods Exhibiting j or More Abnormal Scores and Score The generic simulation method outlined in the present article Differences on Standardized Test Batteries treats the means, standard deviations, and correlations of the A generic computer program for PCs (PercentAbnormK.exe) componentsofabatteryaspopulationparameters.Theassumption hasbeenwrittentoaccompanythisarticle.Itcanbedownloaded isthatthestandardizationsamplewassufficientlylargesothatthe
Description: