education sciences Article How the Mastery Rubric for Statistical Literacy Can Generate Actionable Evidence about Statistical and Quantitative Learning Outcomes RochelleE.Tractenberg CollaborativeforResearchonOutcomesandMetrics;DepartmentsofNeurology;Biostatistics,Bioinformatics& BiomathematicsandRehabilitationMedicine,GeorgetownUniversityMedicalCenter,Suite207BuildingD, 4000ReservoirRoadNW,Washington,DC20057,USA;[email protected] AcademicEditor:JamesAlbright Received:7July2016;Accepted:12December2016;Published:24December2016 Abstract: Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards “big” data: while automated analyses can exploit massive amounts of data, the interpretation—and possiblymoreimportantly,thereplication—ofresultsarechallengingwithoutadequatestatistical literacy. Thesecondtrendisthatscienceandscientificpublishingarestrugglingwithinsufficient/ inappropriate statistical reasoning in writing, reviewing, and editing. This paper describes a modelforstatisticalliteracy(SL)anditsdevelopmentthatcansupportmodernscientificpractice. Anestablishedcurriculumdevelopmentandevaluationtool—theMasteryRubric—isintegrated withanew,developmental,modelofstatisticalliteracythatreflectsthecomplexityofreasoningand habitsofmindthatscientistsneedtocultivateinordertorecognize,choose,andinterpretstatistical methods. Thisdevelopmentalmodelprovidesactionableevidence,andexplicitopportunitiesfor consequential assessment that serves students, instructors, developers/reviewers/accreditors of acurriculum,andinstitutions. Bysupportingtheenrichment,ratherthanincreasingtheamount, ofstatisticaltraininginthebasicandlifesciences,thisapproachsupportscurriculumdevelopment, evaluation, and delivery to promote statistical literacy for students and a collective quantitative proficiencymorebroadly. Keywords: statisticalliteracy;masteryrubric;collectivequantitativeproficiency;basicsciences;life sciences;scientificpractice;curriculumdevelopment;curriculumevaluation;actionableevidence 1. Introduction Statistical literacy (SL) is widely described as important for full social participation (see [1]; elementarycurricula,e.g.,[2,3];highereducationandbeyond,e.g.,[4–6]). Althoughthisistruefor allstudents,thereisaspecialrelationshipbetweenstatisticsandscientificresearchthatamplifiesthe importanceofdevelopingappropriatestatisticalliteracyinundergraduateorgraduate/post-graduate studentsinthesciences. Empiricalresearchreliesonstatisticalmethods,andstatisticsisawide,dynamicfieldperpetually propelledbynewandimprovedmethods. Thisfaroutstripsthecapacitiesofotherfieldstofullyadapt to these innovations, much less to incorporate all “relevant” methods in their own PhD curricula. Recently,Weissgerberetal. (2016)[7]correctlyarticulatethat—andthemyriadempiricalarguments why—basic scientists need training in statistics (see also [8–16]; see also [17]). In fact, science PhD programsfaceanearlySisyphusiantask: toadapttosomeoranynewmethods,oreventoprepare theirstudentstoadapt,sothattheirnon-statisticaldisciplinemayexploitthepowerofnew,orjustify selectingestablished,statisticalmethods. Learningallstatisticalmethodsisclearlynotfeasible;even Educ.Sci.2017,7,3;doi:10.3390/educsci7010003 www.mdpi.com/journal/education Educ.Sci.2017,7,3 2of16 afocuson“just”thosethatarecurrentlyrelevantforthedisciplinemayimpedeadoptionofnewer, moreefficientmethodsinthefuture. However,initiatingthedevelopmentofstatisticalliteracyand orientingsciencestudentstovaluequantitativemethods,whichempowersthemtoseekadditional trainingwhenneeded,mightbeanachievablegoal. Exemplifyingthespecialimportanceofstatisticalliteracyforscientistsastheyaretrainedisthe CarnegiemodelofthedoctoratewhereinPhDprogramspreparegraduatestobe/become“stewards”of theirscientificdisciplines(see[18](pp. 9–14)). Thedefinitionofastewardofadisciplineis“someone who will creatively generate new knowledge, critically conserve valuable and useful ideas, and responsiblytransformthoseunderstandingsthroughwriting,teaching,andapplication”[18](p. 5). Consistentwiththedisciplinarystewardshipmodel,Hensonetal. (2010)[12]proposea“collective quantitativeproficiency”(CQP)modelexplicitlylinkingthevaluationofquantitativemethodswithin the culture of a scientific discipline to the training in these methods that is provided to the future researchersin(stewardsof)thatdiscipline.TheCQPwasdescribedoriginallyforeducationresearchers, but the argument and model are appropriate to all sciences. In fact, Weissgerber et al. (2016) [7] reviewonlythemostrecentliteraturerepresentingthedamagethatweakorincomplete(orincorrect) knowledgeofstatisticsandstatisticalmethodsiscurrentlyhavingontherigor,interpretabilityand reproducibilityofscientificworkacrossthebasicandlifesciences. Establishedscientificpractitioners mustbecomemorestatisticallyliteratetoeffectivelymodelthiscompetencyfortheirmenteesand students,toteacheffectively,andtopromotecompetenceinwriting,reviewing,andeditingacross thesciences. AsShulmannoted,“(b)othscholarshipandteachinginanyfieldreflectthecharacter ofinquiry,thenatureofcommunity,andthewaysinwhichresearchandteachingareconductedin thatparticulardisciplineordisciplinaryintersection”[19](p. xii). Studentsatalllevelsneedtoknow (andobserve)thattheirscientificmentorsalsovalue—andcontributeto—thecollectivequantitative proficiency(CQP)thatdisciplinarystewardshiprequires. Despitetheimportanceofstatisticalliteracyandcompetencyforthepracticingscientistandthe steward,doctoralprogramsmaystrugglewithrecommendationstoaddstatisticaltraining(see[20–22]). Many science PhD programs include no formal statistical training, or just a single course (see [23]; seealso[7]). Twoemergingtrendsinthebasicandlifesciencesarehighlightingagrowingneedfor theaddition—andintegration-ofstatisticaltraininginthesedisciplines. Thefirsttrendistowards “big” data across basic and life sciences; where the potential to automate—and thereby remove from active consideration—statistical inferences across datasets could ultimately exclude formal trainingandreasoninginstatisticsandexperimentaldesign. WhilesomePhDprogramscontemplate adding statistical training to their programs, there is also movement to integrate “big data” into trainingfutureorcurrentstewardsofthebiomedicalsciences—withoutattentiontoreproducibility, experimentaldesign,inferentialstatistics,orstatisticalliteracy(e.g.,[24]). Whileautomatedanalyses canexploitmassiveamountsofdata,withoutstatisticalliteracy,theinterpretation—andpossiblymore importantly,thereplication,ofresultsischallenged. However,“statisticalliteracy”isnotincludedas akeycompetencyinmostfields(e.g.,bioinformatics[25];biology[26])andwhereitisdiscussed,it relatestoundergraduatesingle-courseeducationalrequirements(compliance)ortosomethingless concretelydefined(e.g.,[27–29]). TheseargumentsfocusonundergraduateandPhDlevelprograms because at the Master’s level, the course load is usually rigidly fixed; however, those seeking or completingMaster’slevelpreparationarealsochallengedwhenitcomestostatisticalliteracy. It seems impossible to achieve the goal of a “collective quantitative proficiency” [12] among disciplinarystewardsgiventheresistanceto(orlackoftimefor,orlackofopportunities/interestin) courseworkbeyondintroductorystatisticaltraining(e.g.,[22]). However, addingorretainingone coursein“introductorystatistics”isalsounlikelytoachievesufficientstatisticalliteracyformodern scientificpractice—aseitheraproduceroraconsumerofargumentthatreliesonquantitationand data. Aone-courseapproachtostatisticalliteracyforPhDprogramsimpliesthat: (A) thesinglecourseissufficienttoteachthecritical—andcomplex—setofskillsthatencompasses “thewaysinwhichresearch... (is)conductedinthatparticulardiscipline[19];and Educ.Sci.2017,7,3 3of16 (B) thesinglecoursewillsupportthelevelofconsumptionandproductionofstatisticalarguments representingcompetentstewardshipofadisciplinethatusesthesemethods. The one-course-done model of statistical training exemplifies the comment by Henson et al. (2010)[12](p. 235)that“(f)acultyandstudentsoftenperceivequantitativemethodsasastaticfield tobemastered”. Moreover,thecurrentconceptualizationsofstatisticalliteracyaregroundedinthe satisfactionofanundergraduaterequirement(e.g.,[30,31],forexample,theGuidelinesforAssessment and Instruction in Statistics Education (GAISE [31]; see also e.g., [6]). The scientist, professional, and/orinstructormustbeconsideredtohavestatisticalliteracyneedsthatdifferqualitativelyand quantitativelyfromthoseofundergraduateswhoseusefor,orapplicationof,statisticalreasoningand methodsisnotyetknown. Forprofessionalscientists,statisticalliteracymustsupporttheresponsible stewardshipoftheirdisciplines,producingandconsumingstatisticalarguments(seee.g.,[12];[32] (p. xiii);seealso[23]). Thisisacomplexsetofskillsrequiredforliteraturereview,documentingthe backgroundandcontextual(apartfromthestatistical)significanceofone’swork,andforwritingand reviewingmanuscripts. Insteadofreinforcingtheperceptionthatquantitativemethodsare“static”,an explicitlydevelopmentalmodelofstatisticalliteracydirectsattentionofPhDscientists(studentsand mentorsalike)towardstheirownawarenessoftheimportanceof,andvarietyin,quantitativemethod optionsfortheirresearchanddiscipline. Becausethemodelisdevelopmental,itcanbeaugmentedto accommodatelearnersearlier(thanthePhD;see[33])intheirtraining. Themodel,describedinthe nextsection,isintendedto: A. promote metacognitive awareness of what statistical literacy encompasses for disciplinarystewards; B. exemplifythelinkbetweenthisstatisticalliteracyandthe“collectivequantitativeproficiency” ofHensonetal. (2010)[12];and C. represent statistical literacy training that could be integrated into—or at least initialized within—anyPhDscienceprogram(andpossiblyearlier). This conceptualization of statistical literacy as developmental could fulfill the objectives of increasing statistical sophistication for scientists, reviewers, and faculty/mentors who are training future scientists, reviewers and faculty/mentors. Moreover, although other models have separated statistical “literacy”, “reasoning”, and “thinking” [34], these are actually three stages in a developmental trajectory that describes a deepening of sophistication with respect to data and principles of statistics. A more explicit statement of this development is intended to promote the “cultural” shift towards CQP in PhD training in basic and life sciences like biology, physiology, biochemistry,andgenetics—towardsamoreholistic,reflective,andadaptiveviewofstatisticalliteracy (SL).Acurriculumdevelopmentandevaluationtool,theMasteryRubric(describedinthenextsection), canbeusedtocreate,evaluate,orrevisecurriculathatcangenerateactionableevidence(see[35])of performancebystudents,instructors,andinstitutions. Inthismanuscript,anewMasteryRubricfor StatisticalLiteracy(MR-SL)ispresented,anditspotentialtogenerateactionableevidenceofgrowth and development in understanding of fundamental statistical concepts, and reasoning with them, isexplored. TheMasteryRubric A traditional rubric is assignment-specific and lists the skills the grader requires in the work product,alongwithperformancelevelsfrompooresttobest[36](Chapter1). TheMasteryRubricis similar,butoutlinestheknowledge,skillsandabilities(KSAs)tobedevelopedwithinthecurriculum (orovertime),togetherwithperformancelevelsthatcharacterizethelearnermovingfromnoviceto expert[37,38]. RelatedtotheMasteryRubricistheconceptofa“learningprogression”(e.g.,[39](p. 1))which describesshiftsfromnaïveto“moreexpertunderstanding”andisbasedonhowchildrenlearnthe Educ.Sci.2017,7,3 4of16 conceptsofinterest(butsee[40]foranexamplewithlawstudents). Whereasalearningprogression represents a curricular segment (e.g., Schwarz et al. 2009 [41]), the Mastery Rubric [37,42,43] represents the entire (predominantly) post-baccalaureate curriculum. Like the Mastery Rubric for StatisticalLiteracy(MR-SL,Table2),twoothersweredesignedtocaptureandencouragedevelopment throughoutthecareer[42,43]. Additionally,unlikealearningprogression,theMasteryRubricispublic: explicationofcurricularobjectives,andwhatworkproductslooklikethenthesearemet,facilitatesthe identificationbyfaculty,mentors,orevaluatorsofstrengthsandweaknessesinthecurriculumitself. Thisalsoformalizestheevidencethatanyindividualmayelicit(instructor/institution)orpresent (student)tosupporttheirclaimofachievingatargetperformancelevelthroughoutthecurriculum. Thiscanalsosupportfacultyinothercoursestocreateopportunitiestogeneratethisevidence,and instruction supporting the same objectives from diverse contexts and perspectives. Explicit and publicdescriptionofthenecessaryevidencecan,inturnpromotelearnerstoself-monitor,andspur theindividual(student,instructor,orinstitution)toseek(orcreate)opportunitiestogeneratesuch evidence[42]. TheMasteryRubricrepresentstheperspectiveofMessick(1994)[44]: articulatingwhatKSAs studentsshouldpossessattheendofthecurriculum;whatbehavioursbythestudentswillreveal these KSAs; and what tasks will elicit these specific behaviours. Toohey (1999) [45] refers to this outcomes-basedapproachas“systems-”or“performance-based”,andeveryMasteryRubricfollows thisapproach. Thus,bydesign,anyMasteryRubricsupportsassessablecurriculumdevelopment, evaluation,anddeliverybecauselearningobjectivesarearticulatedandpublicsothateachcanbe explicitlyalignedtoindividuals’progressanddevelopmentalongthearticulatedcontinuumfrom novicetoexpert. Then,conversationsaboutcurricularobjectives,andactionableevidenceofwhether ornottheyarebeingmet,arepossibleforallstakeholders. Inthenextsections,theMR-SLispresentedanddescribed,anditsalignmentwithprinciplesof learningoutcomesdocumentation[46]isanalyzed. 2. MaterialsandMethods EveryMasteryRubricisconstructedwithtwodimensions: performancelevelsthatrepresent adevelopmentaltrajectory(columns)andknowledge,skills,andabilitiesthatrepresentthetargetsof theteachingand/orlearning(rows;[38]). ThemethodsbywhicheachdimensionoftheMR-SLwas constructedarearticulatedbelow. Adegreesoffreedomanalysis[47–49]wasusedtocreateamatrixto permitexaminationofalignmentoffeaturesoftheMR-SLwiththePrinciplesforLearningOutcomes articulatedbytheNationalInstitutionforLearningOutcomesAssessment(NILOA[46]). 2.1. TheMasteryRubricforStatisticalLiteracy(MR-SL):EstablishingaDevelopmentalTrajectory As noted, one of the two essential elements in the creation of a Mastery Rubric (MR) is the articulationofadevelopmentaltrajectory. Muchoftheresearchinstatisticalliteracyhasfocusedon understandinghowstudentsorexpertsthinkaboutdata(e.g.,[50–52])—whichmeansthatthetwoends ofthe“developmentaltrajectory”inthisdiscussiontodateare“completingtheundergraduatecourse” and“beinganexpert”. TheMasteryRubricforStatisticalLiteracy(MR-SL)wasdesignedfromtheoppositeperspective, namely,toarticulatewhatiscommonacrossmiddlestagesofengagementwithdata(consumption andproduction),withdesiredentryandexitcriteriaforeachstage,alonganexplicitcontinuumfrom morenaïvetomoreexpert. ThisisachievedbyexplicitreferencetoBloom’sTaxonomyofEducational Objectives[53];seealso[54]. Moreover,theMR-SLwasconstructedsynthesizingadevelopmental viewofBloom’staxonomywithalong-standingmodelofthedevelopmentofgeneralliteracy[55], focusingontheknowledge,skills,andabilitiesspecifictostatisticalliteracyarisingbyconsensusfrom theliterature(e.g.,[50–52,56–58]. Table1presentstheBloom’sTaxonomiccontextoftheMR-SL. Educ.Sci.2017,7,3 5of16 Table1.Performancelevelsinthedevelopmentaltrajectoryof“statisticalliteracy”:Givenaresearchquestion,proposal,manuscript,report,orgrant,thisreader will/is. Pre-Literate BeginningLiteracy FunctionallyLiterate Skilled(Fluent) Independent(Journeyman) Expert(Master) Read,generallyunderstand,notice Consolidatingreadingand Readandunderstand;reliably Readorskipstats/methods Understandscientificquestionand grosserrors,e.g.,ifcategorical understanding,beginningto identifymisspecificationof sections—nocritiqueor clarify/encouragewritertoclarify methodappliedtocontinuous learnhowtoanalyze(with methodschosenoremployed. evaluation.Assumewriter Understandscientificquestiontoalign objectivessoastoalignstatistical(or variableorviceversa.Developing software).Awarenessofrules Chooseandexecutecorrect (and/orpublisher)mustknow statistical(orgraphical)methodsoptionsto graphical)methodsoptionsto meta-cognitiveawarenessthatifa ofthumb(e.g.,samplesizevs. analysis,notnecessarilyableto whatthey’redoing.Accept desiredobjectives.Expertreviewoftechnical desiredobjectives.Expertreview questionarisesintheirmind,the representative-ness;parametric choosetheseveralmethods resultswithoutquestion. featuresofproposal/paper-notnecessarilyof andevaluation—anddiagnosisand methodmaynotbecorrector vs.nonparametricoptions; thatcouldbeequallyviable Unengagedwithstatistical thescience/statisticsalignment.Qualifiedas remediation. clearlyarticulated.Initial “correlationisnotcausation”). dependingoninvestigator’s reasoning,lackingquantitative independentexpertsinstatisticalreasoning. Qualifiedtotakeindividualsfrom engagementwithstatistical Activelydeveloping objectives.Qualifiedasafluent, habitsofmindoranawareness pre-literatethroughtoMasterlevel reasoning,developingawarenessof knowledge,skillsandabilities butnotasanindependent, oftheirroleinscience. statisticalreasoning. thisskillandhowtogrow/useit. requiredforstatisticalliteracy. statisticalreasoner. Bloom’s3–5Chooseandapply techniques.Analyze& Bloom’s2,3,understandand Bloom’s5–6evaluate(review)andsynthesize NotyetontheBloom’s Bloom’s1remembering, interpret.Identifylimitations, Bloom’s6synthesizefornew applybutonlyapplywhat fornewmethodsbutnotforevaluationof trajectory. understanding. butnotsophisticatedenough methods,andevaluationofothers. you’retoldtoapply. others. toindependentlyreview literature,proposals,grants. Notacarefulconsumer. BecomingacarefulConsumer. Acarefulconsumer.Becomingacarefulproducer. Expertconsumer,expertproducer. Developing:capacityto evaluate;senseofwhatis/is notappropriate;abilityto Noorlimitedcapacitytocritique.Requiresexternal“validation”to critique;opinionsondebates Expertreviewer—capableofstewardshipofthe Expertreview,diagnosisandrecommenderofremediation; believewhatispresented(e.g.,“itwaspublishedinJAMA!” (e.g.,applicationof not-statisticsdiscipline. capablestewardofastatisticaldiscipline. “CochraneReviewsarecorrect”). multi-modelinference; Bayesianvs.frequentist;when tousemultiple-comparisons corrections). Educ.Sci.2017,7,3 6of16 Table1includesanimportantcolumnthatdoesnotactuallyappearintheMasteryRubricbut is included here because it is such a common stage across the biomedical and life sciences (and acrosssomesocialandeducationalsciencesaswell): thepre-literatenon-reasoner. Thisindividualis describedconsistentlyincritiquesofthequalityofjournalandgrantreviewing(seealsoe.g.,[27,29]), andisidentifiedspecificallybythelackofskillsintherecentreviewbyWeissgerberetal. (2016)[7]. The difference between a scientist who functions at this level and one who functions even at the BeginningLiteracylevelisprofound—andtheireffectsunderminingtherigorandreproducibilityof scientificresearchareincreasinglylesstolerable(e.g.,[8–16]; seealso[17]). Recognitionthatsome reviewsprovidedforjournaleditorialdecisions,aswellasgrantfunding,representfunctioningat thislevelshouldbehighlightedintheseimportantdecision-makingcontexts(i.e.,eventhisstructure representsactionableevidence). TheMR-SLcanpromoteremediationoftheseidentifiedweaknessesbyindividualsseekingto generateevidencetheyare“ontherighttrack”oratleastattheBeginningLiteracystage—andby institutionsseekingtoprovideopportunitiestoachievelearningoutcomesconsistentwithperformance atthisstage(orbeyond).PhDstudentsandscientistsareoftennotoperatingateventhelowestBloom’s taxonomy[53]level(knowledge-themainleveltargetedbymoststatisticaltraining,seee.g.,[21,27]) whileprofessionally,theymustfunctionatthehighestlevel(e.g.,[10–12];seealso[32](p. xiii)). Evidence of this (perhaps surprisingly low) level of functioning with respect to statistical and quantitative argumentation comes from a variety of sources (e.g., [7,8,14–16]); the pre-literate non-reasoneriscommonandproblematic. Ifevidenceisfoundthataninstitutionistrainingpeopleto thislevel(andnotbeyond),actionmustbetakentoremediatethesituationortoreconfigurecurriculum orlearningobjectivesthatpurposefullyaimatthislevelofperformance. TheMR-SLtreatsstatistical literacyinasimilarmannertogeneralliteracy[54]: comprisingasetoflearnable,improvableskills. InordertopromotedevelopmentofaCQPbyinitiatingthelearningandimprovingofthisskillset,the MR-SLcouldbeusedtopromotecurricularorinstitutionalremediation. 2.2. TheMasteryRubricforStatisticalLiteracy(MR-SL):KSAsforSL TheseconddimensionofaMasteryRubricisthearticulationofknowledge,skills,andabilities (KSAs)thataretobetargetedandgrownthroughoutthedevelopmentaltrajectory. FortheMR-SL, thelistofKSAsrepresentingstatisticalliteracywasderivedbysynthesizingseveralmodelsofstatistical literacywiththemoreactive“empiricalenquiry”modelofWildandPfannkuch([57];seealso[58]). BecausethedevelopmentaltrajectoryfortheseKSAsdescribeschangefrommorenaïvetomoreexpert performance,thequalificationofhowtheseKSAsareexecutediscaptured(anddescribed)intherow thatoutlinesgrowthanddevelopmentineachKSAovertime/training. TheSLKSAsweresynthesized from“Afour-dimensionalframeworkforstatisticalthinkinginempiricalenquiry”[52](p. 19)andthe “StatisticalThinking”facilitydescribedin[58](p. 218)intothenewMR-SLshowninTable2. Educ.Sci.2017,7,3 7of16 Table2.MasteryRubricforStatisticalLiteracy(MR-SL). Independent(Journeyman) PerformanceLevel BeginningLiteracy FunctionalLiteracy Skilled(Apprentice)Literacy Expert(Master) Literacy Read,generallyunderstand,notice Consolidatingreadingand Read&understand;reliably Understandscientificquestionand Understandscientificquestionto grosserrors,e.g.,ifcategoricalmethod understanding,beginningtolearn identifymisspecificationof clarify/encouragewritertoclarify alignstatistical(orgraphical) appliedtocontinuousvariableorvice howtoanalyze(withsoftware). methodschosenoremployed. objectivessoastoalignstatistical(or methodsoptionstodesired versa.Developingmeta-cognitive Awarenessofrulesofthumb(e.g., Chooseandexecutecorrect graphical)methodsoptionstodesired objectives.Expertreviewof Generaldescription awarenessthatifaquestionarisesin samplesizevs.representativeness; analysis,notnecessarilyableto objectives.Expertreviewand technicalfeaturesof ofstatisticalliteracy theirmind,themethodmaynotbe parametricvs.nonparametric choosetheseveralmethodsthat evaluation—anddiagnosisand proposal/paper-notnecessarilyof correctorclearlyarticulated.Engaging options;“correlationisnot couldbeequallyviabledepending remediation. thescience/statisticsalignment. withstatisticalreasoning,developing causation”).Activelydeveloping onresearchobjectives.Qualifiedas Qualifiedtotakeindividualsfrom Qualifiedasindependentexpertin awarenessofthisskillandhowto knowledge,skillsandabilities afluent,butnotasanindependent, pre-literatethroughtoMasterlevel statisticalreasoning. grow/useit. requiredforstatisticalliteracy. statisticalreasoner. statisticalreasoning. Bloom’s3–5Chooseandapply Considerationsfor techniques.Analyzeandinterpret. Bloom’s2,3,understandandapply Bloom’s5–6evaluate(review)and evidenceof Bloom’s1remembering, Identifylimitations,butnot Bloom’s6synthesizefornewmethods, butonlyapplywhatyou’retold synthesizefornewmethodsbut performanceatthis understanding. sophisticatedenoughto andevaluationofothers. toapply. notforevaluationofothers. level independentlyreviewliterature, proposals,grants. Canidentifytheproblemthatis Canidentifytheproblemthatis Canidentifygapsandarticulate articulatedwithinliteraturethatis Cansynthesizeanddefinea articulatedwithinliteraturethatis problem(researchquestions)that Candiagnoseandremediateindividual reviewed,andcanrecognizewhen theoreticalormethodological reviewed,butnotderiveorsynthesize arisefromcriticalliterature synthesisanddefinitionsoftheoretical incompletereviewisprovided. problembasedonacriticalreview oneacrossmultiplesources.Doesnot reviews,canrecognizewhen and/ormethodologicalproblemsbased Defineaproblem Doesnotderiveorsynthesizenew oftheliteratureinoneoracross questiondesignfeaturesorevidence incompletereviewisprovidedand onacriticalreviewoftheliteratureas basedoncritical issuesfromsingleormultiple scientificdomains.Recognizes basesupportingproblemsarticulated alsorecognizestheneedto wellascriticalevaluationoflessexpert literaturereview sources.Acknowledgesthatdesign whenandhowsolutionsto inwhatwasreviewed.Mightargue considerwiderscopeofliterature synthesisacrosscontexts—i.e.,intermsof featuresandevidencebaseare problemsfromdiversecontextsare thattheimpactfactorofajournalas foralternativesolutionstoa classroomworkaswellasgrant essentialforunderstandingthe orarenotappropriateoradaptable evidencethatanarticlepublishedthere problemcommonacrosscontexts proposalsandmanuscripts. validityofclaimsorresearch fornewapplications. is“good”or“correct”. ordomains. problemsarticulatedbyothers. Canidentifyandcritique(as appropriate)themeasurementsystem Choosesmeasurementthat Choosesmeasurementthat usedinanygivenstudy/analysis.Can optimizespowerratherwhat optimizesgeneralizabilityand chooseandjustifynominal-,interval-,or Understandsthattherearedifferent Identifyor specificallyaddresseshypothesisof interpretabilityofresults,and ratio-levelanalyticmethods. Cannotidentifythemeasurement measurementsystemsbutdoesnot choose—and interest.Limitedconsiderationof acknowledgesthatpowermay Understandsthelimitationsofdifferent systemforvariableswithin knowhoworwhyratio-leveldata justify—the interactionand suffer—justifiably.Canjustify(and typesintermsofanalysisassumption manuscriptsunlesstheyarearticulated mightbetransformedintointerval measurement mediation/moderationeffects. recommendasappropriate)the requirements,andcanarticulatethe explicitly.Iftheyarearticulated,this orordinaldata.Treatsnominal propertiesof Understandsthatnominaland transformationofdatafromone tradeoffinscientificexplanatorypower informationwouldnotbeuseful/used. datawithnumericlabelsasifthey variables ordinaldatadonotbehaveas typetoanotherifappropriate. associatedwithmeasurementanddata areratio-level. ratio-level(orevenintegral) Carefulconsiderationofinteraction typechoices.Expertconsiderationof variablesdo. andmediation/moderationeffects. interactionandmediation/moderation effects.Diagnosisandremediationof eachoftheseacrosscontexts. Educ.Sci.2017,7,3 8of16 Table2.Cont. Independent(Journeyman) PerformanceLevel BeginningLiteracy FunctionalLiteracy Skilled(Apprentice)Literacy Expert(Master) Literacy Candesignappropriatedata Canidentifydatacollection Canmatchthecorrectdata Canidentifydatacollectionfeaturesin collectionandidentifyinstruments featuresiftheyarepresentina collectiondesigntotheinstruments textiftheymatchbasicdesign andoutcomes(andcovariates)that Expertlydesignscollectionofdata, manuscript/proposal—including andoutcomesofinterest,butneeds elementsfromintroductorymaterials supportthetestingofspecific includingpowercalculations,modeling morecomplexandadvanced assistanceinconceptualizing (e.g.,t-test,chisquare)butcannot hypotheses.Collaborateswith requirements,measurement/sampling Designthecollection methods-butcannotderivethemif covariatesandtheirpotentialroles derivethemiftheyarenotpresent. expertasneededonappropriate erroranddatamissingness.Designsand ofdata theyarenotpresent.Recognizes intheplannedanalyses.May Cannotdesigndatacollection useofadvancedmethods, cancritiquesensitivityanalysesas covariatesifmentioned,butdoes includecovariates“becausethatis initiatives.Cannotconceptualize includingaccommodating appropriate,andfluentlydiagnosesand notrequireformalconsideration whatisdone”withoutbeingableto covariatesortheirrolesinanalysisor measurementandsamplingerror, remediateseachoftheseacrosscontexts. (orjustification)orevaluationof justifytherolesofanyinthe interpretation. attrition(ifneeded),andmodeling covariates. hypothesestobetested. requirements. Recognizesneedforpilotstudies Differentiatespilotandfull-scale Independentlyconceptualizespilot Doesnotdifferentiatepilotstudiesand andasksforappropriateassistance studies,butdoesnotconsiderthe studiesthataddressrelevant Expertlydesignsandanalyzespilot fullstudies;mightnotplanapilotto inthedesignandanalysis.Pilot ‘failures’uncoveredbypilotwork designissues.Mayseekexpert studies,utilizingthedataforfullstudy ensurestudyfeaturesarefeasible. resultsareseentobeusefulin Piloting,analysisand tobeinformative-andmightstopif advicefordesign,power,and design,analysisplanningandpower, MightcallastudywithasmallN addressingscalabilityissues.May interpretation pilotstudyuncoversproblems. analysisplanningfortheirown withintheirownandothers’work. “pilot”justbasedonsamplesize. seekassistancewithscalability Mightconsiderlargerscalestudy work,andconsistentlyrecognizes Diagnosesandremediateseachofthese Cannotevaluateorinterpret(theirown basedonpilotresults.Doesnot unnecessaryifpilotresultsareas whenreviewingdemandsare acrosscontexts. or)others’pilotwork recognizewhendesignorreview expected. beyondtheirskillset. demandsarebeyondtheirskillset. Perceivesdifferencesbetween Clearlyandconsistentlydifferentiates Recognizesdifferencesbetween Intheirownwork,candifferentiate “planned”,and“unplanned”data plannedandunplannedanalysesintheir Doesnotperceivedifferencesbetween “planned”,and“unplanned”data betweenexploratoryanalysisand analysisintheirownwork,butnot ownworkandthatofothers.Utilizesall Discerning “planned”,and“unplanned”data analysisintheirownandothers’ hypothesistesting,butnot inothers’workunlessitis typesofanalysisappropriatelyin “exploratory”, analysisintheirownorothers’work. work,evenwhenothersdonot “planned”and“unplanned” identified.Maynotrecognizethat supportofcoherentcontributionsto “planned”,and Doesnotrecognizethatexploratory recognizeitintheirownwork. analyses.Mayincorrectly exploratoryanalysescanbe science.Consistentlyrequiresothersto “unplanned”data analysescanbeplannedorunplanned Knowsthatexploratoryanalyses characterize“exploratory”analysis plannedorunplanned,doesnot dothesame,andcandiagnoseand analysis andthattheseshouldbedescribedas canbeplannedorunplanned,and ashypothesistesting(plannedor knowwhyitmightmatterto remediateeachoftheseacrosscontextsin such. canidentifywhichisincludedin unplanned). communicatewhichtheyare ordertosupportscientificintegrityand theirownandothers’work. doing/reporting. competence. Usesthedefaultsettingsof Canseamlesslyintegrate softwaretoguideanalysisplanning hypothesisgenerationintothe (andexecutionintheunplanned Whensoftwaregeneratesandtests considerationofliteratureordata Expertlydistinguisheshypothesistesting analysiscase).Attentionisfocused hypotheses,treatsthatas“what analysis.Intheirownandothers’ Usesthedefaultsettingsofsoftwareto andhypothesisgeneration.Reliably onplannedanalysesand wassupposedtohappen”anddoes work,recognizesthat,and guideanalysisplanning(andexecution recognizesandcommunicatesthe Hypothesis hypothesisgenerationinthat notdifferentiatetheseresultsfrom articulateshow,hypothesis intheunplannedanalysiscase).Like differencesbetweentheseinallwritten generationbasedon context;unlikelytogenerate thoseanticipatedandresulting generationfromplannedand software,doesnotdifferentiate andoralwork.Consistentlyseeksto plannedand testablehypotheses.Maynot fromplannedanalyses.Can unplannedanalysesdifferintheir plannedorunplanned,nestedor integrateplausibilityandscientific unplannedanalyses recognizethathypothesesmaybe generatenewhypotheses,butis evidentiaryweightandtheirneed non-nestedhypothesistests.Doesnot contextualizationintohypothesis generatedandtestedinorby likelytobasetheseondatawithout forindependentreplication. generatehypotheses. generation.Diagnosesandremediates unplannedanalysesorwithinthe appealtotheory,plausibility,or Dependsonknowledge,context, eachoftheseacrosscontexts. intermediatestepssoftware context. andskillswithsynthesis—andnot executestocompletethedesired software—togeneratetestable analysis. hypotheses. Educ.Sci.2017,7,3 9of16 Table2.Cont. Independent(Journeyman) PerformanceLevel BeginningLiteracy FunctionalLiteracy Skilled(Apprentice)Literacy Expert(Master) Literacy Believesthatthep-valueis“true”and Understandsthatthep-value Understandsthatthenull representstheevidenceforthe representsevidencesupportingthe Understandsthatthep-valuedoes hypothesesthatstatisticalteststest Communicatesconsistentlythatthenull hypothesisortheorybeingtested. nullhypothesis,notthestudy notrepresentthe“truth”ofthe arenevertheactualpurposeofthe hypothesesthatstatisticalteststestare Nevercorrectsformultiple hypothesis.Recognizesthatvery hypothesisbeingtested,butcannot analysis.Resistsreificationandis nevertheactualpurposeoftheanalysis. comparisonsintheirownwork;does smallp-valuesarenot“highly articulatewhyitisuseful/used. committedtogood-faitheffortsto Resistsreificationandiscommittedto notsuggestorquestiontheneedforit significantresults”,butdoesnot Interpretationof Interpretsp-valuesthatare“very falsifyhypotheses,notsimplytest good-faitheffortstofalsifyhypotheses, inreviewing.Resistsmultiple consistentlycorrectthislanguage results close”tothenominalalphalevel thenull.Appliesmultiple notsimplytestthenull.Seekscompeting, comparisonscorrectionssuggestedby whenreviewing.Canapply (e.g.,0.049–0.10)asstatistically comparisonstopromote plausible,alternativemodelsor reviewersorcollaboratorsifitcauses multiplecomparisonscorrections, meaningfulevidenceoftrends; reproducibleresults.Intheirown explanations.Encouragescollaborators “significant”resultstodisappear.Does butdoessowhenreminded.Does interpretsverysmallp-valuesas andothers’work,seekscompeting, todoallofthese,anddiagnosesand notseekcoherenceintheanalysisplan notinsistonthesecorrectionsin “highlysignificant”results. plausible,alternativemodelsor remediateseachoftheseacrosscontexts. orthealignmentofmethods,results, workthattheyreview(grants, explanations. andinterpretation. manuscripts,coursework). Contextualizesresultswithrespect Intheirownwork,draws totheentiretyofthe Expertlydifferentiateseffectsizes, p-value—drivenconclusionsthat conclusionsthatarecontextualized manuscript/grant,andsocan clinicalsignificanceandstatistical p-valuedrivenconclusionswithout mayincludeconsiderationof withtheentiremanuscript/grant. detectcaseswhereconclusionsare significance.Canarticulateeither considerationoflimitations.No limitationsincludingmultiple Inreviewing,doesnotrequirethat notalignedwiththe multi-trait/multi-method(MMTM)or contextualizationoftheresultswith comparisons.Conclusionsare conclusionsarealignedwiththe introduction/background, othertriangulationapproach,including Drawand priorliteratureorwiththeforegoing typicallysuperficial—i.e.,notvery wholedocument,anddoesnot methods,and/orresults.Careful mixedmethodsanalysistounderstand contextualize portionsofthedocument.Conclusions deeplycontextualizedwiththe requirefullcontextualization. considerationoflimitations andcontextualizeresults.Consistently conclusions maynotactuallyrepresentresults; literature.Conclusionsaretypically Incompleteconsiderationof derivingfromthemethodandits requiresfullcontextualizationof overinterpretationandfailureto alignedwithresults,butmaynot limitationsintheirownworkand applicationinthespecificstudy. conclusionsinothers’workandfully identifyoracknowledgelimitations. bewell-contextualizedwiththerest inconsistentrequirementthat Requiresfullcontextualizationof contextualizesconclusionsintheirown ofthedocument(paper,grant). limitationsareacknowledgedin conclusionsinothers’workand work.Diagnosesandremediateseachof others’work. strivestofullycontextualize theseacrosscontexts. conclusionsintheirownwork. Expertcommunicatorandreviewerof Readsthestatisticsandmethods Consistentproficientuseof Readsthestatisticsandmethods scientificcommunicationrelatingtoor Doesnotcommunicatestatistical sectionssuperficially.Doesnot statisticalandquantitative sectionsandidentifieswhatthey includingstatisticalandquantitative informationclearlyorconsistently, recognizeinconsistencies(e.g., languagetocorrectlydescribewhat areandarenotabletoreview materials.Consistentsensitivityto skipsthemethodssectionofpapersor authordescribesdataascategorical wasdone,why,andhow.Sufficient competently.Canformulate audienceandappropriateinterpretation grants.Doesnotdifferentiate andplanst-test).Maystatethat considerationgiventolimitations queriesforeithertheauthororfor andcontextualizationofresults.In Communication appropriateandinappropriate “onlythep-valueisneeded”when withexplicitcontextualizationof anexperttohelpthemcompletea reviewingproposals,cananticipate communicationwithstatisticsorother reviewinghowresultsare resultsconsistentlyincludedinthe review.Seekstocollaboratewith (diagnose)challengesfordissemination quantitativematerial.Doesnot communicated.Doesnotgenerate interpretationofresults.Errorsof statisticalexperttoensurethat andcommunication,anddifferentiate generateorevaluatecommunicationof communicationofstatisticalor comprehensionofthistext—ifthey team-basedreportingiscoherent, errorsinreasoningfromfailuresto statisticalorquantitativematerial. quantitativematerialandshould arise—ariseonthesideofthe consistent,andaccurate. discloseorarticulate.Diagnosesand notreviewthese. reader. remediateseachoftheseacrosscontexts. Educ.Sci.2017,7,3 10of16 The model of statistical thinking articulated by Wild and Pfannkuch ([57] discussed in [52] (pp.18–20),capturesthefeaturesofliteracy,reasoning,andthinkingthatarerelevantforgraduate sciencecurricula(asnotedby[58];seealso[50])andbeyond. Thus,thismodelembodies“... valueon theintegrationofquantitativemethodsaspartofthesubstantiveenterpriseofdoctoraleducation”[12] (p. 236). TheKSAs(rows)intheMR-SLare: • Defineaproblembasedoncriticalliteraturereview; • Identifyorchoose—andjustify—themeasurementsystem; • Designthecollectionofdata; • Piloting,analysisandinterpretation; • Discerning“exploratory”,“planned”,and“unplanned”dataanalysis; • Hypothesisgenerationbasedonplannedandunplannedanalyses; • Interpretationofresults; • Drawandcontextualizeconclusions; • Communication. These KSAs generally define the scientific method—and also require content knowledge. TheinitiationanddevelopmentoftheseKSAscouldthereforebeintegratedacrossmultiplecontent courseareas,andalsoforthosewhoarepracticingscientists—whetherornottheycompletedPhD-level training.TheMR-SLservestolinkinstructioninstatisticalmethodswiththeapplication,andreasoning with,thosemethodsandresults. Thus,itcansupporttheinitiationofthedevelopmentofthissetof KSAsandtheircontinuedpromotionwithin,andbeyondtheendingof,formaleducation. Thereare sixmutuallyexclusiveperformanceleveldescriptorsforeachoftheseKSAsintheMR-SL(Table2); theintegrationoftheBloom’slevelfunctioningatdifferentstageswiththefeaturesofstatisticalliteracy areexplicit. 3. Results TheMR-SLinTable2co-articulatescognitiveperspectivesonthedevelopment(columns)derived fromextantliteraturewithcontext-appropriateandexplicit,butflexible,descriptionsofacomplex set of knowledge skills and abilities KSAs (rows) that represent statistical literacy as a learnable, improvableskillset. Table3providesaroughalignmentoftheKSAsintheMR-SLanddefinitionsof statistical“literacy”,“thinking”,and“reasoning”inpriormodels. Table3.Alignmentofmodelsofstatisticalreasoning,thinking,andliteracywiththeMR-SLKSAs. Garfield,delMas, TractenbergMR-SLKSAs Chance2003[34] (definingStatisticalLiteracyLike WildandPfannkuch (Definitionsof Chall[55]DefinedGeneral BishopandTalbot2001[58](statisticalThinking) 1999[57](statistical StatisticalLiteracy, Literacy:AsaLearnableand Thinking) Thinking, ImprovableSkillSet) Reasoning) Defineaproblembasedoncritical Constructingand Identifytheproblem. Statisticalthinking. literaturereview. reasoningfrommodels. Takingaccountof Identifyorchoose—andjustify—the variation;Constructing measurementpropertiesof Plantheexperiment/survey/observationalstudy. Statisticalthinking. andreasoningfrom variables. models. Designthecollectionofdata. Pilotandadjust(analyzeandinterpretthedata) Piloting,analysisandinterpretation. Discerning“exploratory”, Constructingand “planned”,and“unplanned”data Dofinalstudy; reasoningfrommodels; analysis. collectandpresentthedata; transnumeration Hypothesisgenerationbasedon analyzeandinterpretthedata. (transformingdatafor plannedandunplannedanalyses. understanding); synthesisofproblem Interpretationofresults. “Tothinkstatisticallymeansthatonecan: contextandstatistical Statisticalreasoning. Drawandcontextualize 1.Readdata,criticallyandwithcomprehension; understanding. 2.Producedatathatprovideclearanswersto conclusions. importantquestions; Communicate. 3.Drawtrustworthyconclusionsbasedondata”[58](p.220).