Jordan Harshman and Ellen Yezierski Assessment Data-driven Inquiry: A Review of How to Use Assessment Results to Inform Chemistry Teaching Abstract Introduction 2010), and Britain (Simon, 1992), just to With abundant access to assessments Between homework, quizzes, class- name a few, have shown increased focus of all kinds, many high school chemistry room activities, high stakes summative on research of teacher assessment prac- teachers have the opportunity to gather exams, informal classroom observations, tices. Initially, we sought to investigate data from their students on a daily basis. and other inputs, high school chemis- how high school chemistry teachers use This data can serve multiple purposes, try teachers around the globe have ac- the results of assessment to make data- such as informing teachers of students’ cess to a wide variety of student data. driven decisions about their teaching. We content diffi culties and guiding instruc- Through the analysis and interpretation searched for research on the use of data tion in a process of data-driven inquiry. of this data, teachers can uncover great resulting from chemistry-specifi c as- In this paper, 83 resources were re- amounts of information, including, but sessments (e.g., an investigation of how viewed to provide a complete descrip- not limited to, their students’ concep- teachers interpret the results from a spe- tion of this process, which has not been tions about content and the educational cifi c item covering percent yield prob- previously done. In reviewing the lit- impact of their own instruction. With lems in stoichiometry). After fi nding no erature, we found that: 1) there is very this information, teachers can tailor in- sources that were chemistry-specifi c and little research detailing the data-driven struction to their classroom and even very little that was science-specifi c, we inquiry process in a way that can be to individual students. The impact of scoured the general education assess- readily implemented by teachers; 2) the teachers effectively using the results of ment literature only to fi nd that these research largely neglects the incorpora- their many informal and formal, sum- resources did not provide a satisfactory tion of disciplinary content in the data- mative and formative assessments on level of data-use guidelines for day-to- driven inquiry process; 3) suggestions the learning of their students cannot be day instruction. As a result, an in-depth for teachers’ actions provided by the understated (U.S. Department of Edu- examination of what is available and research is general, limiting the impact cation, 2008; 2011; Institute of Educa- what is missing in the data-driven inqui- of these suggestions; and 4) the practical tion Sciences, 2009). There is no better ry literature is warranted and, therefore, considerations and fi delity of implemen- source of information for teachers to use is the goal of this review. It is important tation of data-driven inquiry have not to make instructional decisions than data to apply the general process of effective- been examined. Implications for chem- from their own students. Every assign- ly using assessment results to the context istry teachers are presented along with ment, homework, quiz, test, activity, lab, of high school chemistry, because the a call for future research on key areas, in-class question, and discussion yields learning goals and modes of assessment thus benefi ting researchers of assess- valuable instructional feedback to high tend to vary by discipline and educa- ment processes. Finally, general data- school chemistry teachers. This is free, tional setting. Thus, while reviewing the driven inquiry research is described in continuous, and customized-to-your- literature, we also illustrate some of the the context of chemistry-specifi c ex- own-students professional development ideas as they apply to the specifi c con- amples that provide useful, practical available every single day to teachers. text of high school chemistry teaching. suggestions for high school chemistry The conglomeration of literature pre- It is important to note that this review teachers. sented here will not detail the most ef- derives from bodies of literature that fective implementation of this process, range from informal, formative assess- but it will portray what helpful advice is ments to high-stakes, national summa- already available as well as what areas tive assessments. Although the contexts Keywords: formative assessment, summative need to be understood better for the use of these assessments vary drastically, all assessment, informing instruction, guiding of data to inform teaching. types of assessment generate data that practice, instructional sensitivity, instructional validity, data use, data driven inquiry, The United States (U.S. Depart- can serve many purposes, one of which decision-making, assessment as inquiry, ment of Education, 2011), Caribbeans is being a guide for instruction. In this refl ection, interpretation of assessments, (Ogunkola & Archer-Bradshaw, 2013; light, the process of inquiry that de- analysis of assessments, and learning Seecharan, 2001), Hong Kong and scribes how teachers are to inform their objectives in assessment Singapore (Towndrow, Tan, Yung, & Cohen, instructional practice differs because of WINTER 2017 VOL. 25, NO. 2 97 differences in design, goals (objectives), to use the data in their classrooms to in the Online Resource). Data collection and format of items and results, but is the inform instruction. occurred primarily from 2011 to 2013, same general process in either formative, but a few more recent articles have been summative, diagnostic, proximal, or dis- Research Questions added to round out the literature review. tal assessments. Stated otherwise, we The following research questions In describing the scope of this review, believe that all assessments produce data guided this review: it is important to note that not all steps that, when analyzed considering their 1. According to relevant literature, of the assessment process are present. contexts, have the potential to inform in- what is the process that teachers In efforts to answer our research ques- struction. Additionally, we use the term should undergo to guide their in- tions in depth, information regarding the “assessment” in a colloquial manner. struction by assessment results and purpose of assessing, design of assess- The term “assessment” as we use it im- how can that process be exemplifi ed ments, goal/learning objective setting, plies two processes: collecting data from in a high school chemistry setting? and sharing results with stakeholders is students and subjecting these data to cri- 2. What signifi cant limitations and/or not included in this review. teria that implies evaluation. gaps exist in the description of how To effi ciently present what is docu- As will be reviewed, several sources teachers should guide their instruc- mented in the research regarding the have described the process that teach- tion by assessment results? process of how teachers are to use the re- ers should use in order to guide their sults of their assessments to guide their practice with the results of assessments. Materials and Methods instruction, we include: 1) an overarch- However, there is currently no extensive This review was conducted via an ing defi nition and nomenclature for the review of the literature that describes integrative literature review method de- process; 2) examples of the process both how to carry out this process, nor is scribed by Torraco (2005). In this ap- from the literature (in various contexts) there any critique of possible limitations proach, common features of a process and our application to chemistry; and 3) of such a process. Since the processes or concept are integrated towards a com- detailed descriptions of what each indi- are described as general principles, the prehensive understanding of that pro- vidual step in the process of informing task of translating the principles into cess. The resources for this study were instruction via assessment results entails practice is left entirely up to the instruc- gathered via electronic searches from as well as major fi ndings of research for tors, which may create diffi culty when Web of Knowledge/Science, Google each step. translating research into practice (Black Scholar, and the local libraries at Miami & Wiliam, 1998). This review uniquely University (in conjunction with Ohio- Results synthesizes three separate bodies of lit- Links). Keywords in electronic searches erature to present an integrated descrip- included various combinations of: for- Research Question 1: Data-Driven tion of the use of data from assessments mative assessment, summative assess- Inquiry to guide instruction: ment, informing instruction, guiding In response to the fi rst research ques- 1. Generic descriptions of the process practice, instructional sensitivity, in- tion, the process by which teachers are of data use to inform instruction at structional validity, data use, data driv- to guide their instructional practice is the classroom level from analysis of en inquiry, decision-making, assessment defi ned by assessment data that drives an high-stakes standardized tests. as inquiry, refl ection, interpretation of inquiry about teaching and learning, or 2. General suggestions for how this assessments, analysis of assessments, data-driven inquiry. This process goes by process is carried out in the class- and learning objectives in assessment. many other names: data-informed edu- room by teachers using formative For library searches, the keywords for- cational decision-making (U.S. Depart- assessments. mative and summative assessment lead ment of Education, 2008), data-driven 3. A set of criteria regarding the in- the main section about assessment. decision-making (U.S. Department of structional sensitivity of assess- There were approximately 400-500 titles Education, 2010; 2011; Brunner, 2005; ments used for making instructional in this and neighboring sections. To fi l- Ackoff, 1989; Drucker, 1989; Mandinach, decisions. ter through these books as well as the 2005), assessment as inquiry (Calfee & Our hope is that this article will: 1) electronic resources, allusions to one Masuda, 1997), cycle of instructional im- inspire researchers to investigate this or more of the keywords in the chap- provement (Datnow, Park, & Wholstet, vastly understudied topic of data-driven ters (books) or headings, subheadings, 2007), formative feedback system (Hal- inquiry; 2) encourage practitioners to or abstracts (articles) had to be present. verson, Pritchett, & Watson, 2007), ac- consider the potential of the informa- A large number of articles were also tion research (Babkie & Provost, 2004), tion presented here has to positively identifi ed by references used in other re- response to intervention [although Deno impact their instruction; and 3) encour- sources. In total, 83 books and articles and Mirkin (1977) did not call it this, age professional developers to build were selected for this literature review they are credited with the central idea], programs that help teachers enact the based on the aforementioned criteria and a review of similar processes from mechanistic, day-to-day details of how (full list of resources reviewed available the Institute of Education Sciences (IES) 98 SCIENCE EDUCATOR in association with the U.S. Department a teacher plans a pedagogical strategy inform instruction is depicted. Here, the of Education simply call it the data use based on his/her learning objectives or results of the assessment were analyzed cycle (Institute of Education Sciences, goals (Plan). As a note, we favor the with consideration to the original teach- 2009). For a graphical example, Figure term “goals” over “objectives” as the lat- ing strategy employed. In the fi nal ex- 1 shows the data-driven decision-making ter can hold a connotation (particularly ample (Table 3), data-driven inquiry is process from the U.S. Department of Ed- among teachers) as learning objectives used to help identify the specifi c content ucation (2010; 2011). when data-driven inquiry additionally area in which students are struggling Data-driven inquiry frameworks re- calls for instructional objectives. Thus, the most. It highlights the importance sembles scientifi c inquiry in process, “goals” entails both. Then, s/he imple- of the process that can be used in order namely, defi ning a problem, collect- ments the teaching method (Implement), to isolate the detailed, specifi c learning ing data, analyzing and interpreting the designs an assessment related to the objective not met by the students. Infor- data, and then making and assessing learning objective, and collects/organizes mation in these tables is then referred to a decision. Although the ideas behind the assessment results (Assess). The throughout the description of the indi- the various processes are similar, the analysis and interpretation (Analyze) of vidual steps. name is not, which explains the dis- the data can get complicated as s/he can crepancy between the labels in Figure 1 analyze and interpret both in terms of the Steps of Data-Driven Inquiry and those we use throughout this paper. learning goals and/or problems or ques- Defi ning a problem – Step 1. There is We describe the process for using data tions different from those learning goals a semantic difference that identifi es the to inform teaching with the terms “data (i.e., how effective was the teaching unit for which analysis takes place. Gen- use process” and “data-driven inquiry” strategy, other compounding factors to erally when the word “goal” or “prob- throughout this review. As an anecdotal performance, impact of educational set- lem” is used, it refers to a student out- example for how these terms could be ting, etc.). Finally, a pedagogical action come, a learning objective, or a problem used in an educational context, we begin is hypothesized through refl ection on the with students’ understandings and is set the cycle depicted in Figure 1 with plan- results and other relevant contextual in- prior to data collection in order to guide ning. As a note, several authors com- formation (Refl ect), and then the assess- the design of assessments. The original ment that a teacher can start anywhere ment process begins anew with the new goal in Table 2 was to assess students’ on the cycle (Brunner, 2005; U.S. De- pedagogical strategy being used. understandings of atomic structure. Al- partment of Education, 2010; Institute ternatively, when “hypothesizing” or of Education Sciences, 2009), but this The Process of Data-Driven Inquiry “question posing” appears, it refers to example is presented in the chronologi- as Exemplifi ed in Chemistry the attempt to explain or address results cal order typically seen in teaching. First To aid the understanding of data- of the designed assessments and there- driven inquiry processes, three examples fore occurs after data collection. These from the reviewed literature were cho- are hypotheses about the effect of factors sen and presented in Tables 1-3. These such as educational contexts, individual examples include scenarios in which and class-level history, and even word- the data-driven inquiry cycle led the ing of items (as just a few examples) on instructors to modify the assessment student performance assessments. In Ta- (Table 1), inform instructional decisions ble 2, the teacher hypothesized that the (Table 2), and identify content diffi cul- teaching strategy used may be having a ties in order to refi ne assessment goals large impact on the outcome. Analysis (Table 3). In each table, the fi rst column of both types of questions are important identifi es which step of the data-driven in instructional improvement because in inquiry process is being exemplifi ed and order to know how to adjust instruction, the second column contains an example teachers need to know where problems from the literature. The last column il- exist in students’ understandings, but lustrates how these examples might exist also need to understand how confound- in a high school chemistry context, thus ing factors impact the results from which Figure 1. Data-Driven Decision-Making addressing the fi rst research question. one draws conclusions (Institute of Edu- (Department of Education, 2010) This shows Table 1 highlights how an original goal cation Sciences, 2009; U.S. Department a representative data use process, although is modifi ed after the teacher gets assess- of Education, 2011). it does not use the same nomenclature as we ment results and also illustrates the im- In the data-driven decision making do in this review. This cycle includes defi ning portance of considering the alignment of model, Knapp (2006), Cuban (1998), a problem (plan), collecting data (implement the assessment task and the learning ob- and Copland (2003) detail the impor- and assess), analyzing and interpreting the data (analyze), and making decisions (refl ect), jectives. In the constitution example (Ta- tance of the ability to reframe potential similar to scientifi c inquiry. ble 2), the use of assessments to directly interpretations of data from multiple WINTER 2017 VOL. 25, NO. 2 99 Table 1. Modifying assessment example Step in Data-Driven Inquiry 4th Grade Vocabulary (Calfee and Masuda, 1997) Categorizing Reaction Types Defi ning a problem One hypothetical 4th grade boy (Sam) may have a poor In high school chemistry, a student (older Sam) may not vocabulary. understand chemical reactions. Designing/Collecting assessment An assessment is designed and implemented to determine An assessment that requires Sam to identify reactions as his grade-level in vocabulary. synthesis, decomposition, etc. Interpretation and analysis Assessment reveals his vocabulary grade level to be On the assessment, Sam cannot adequately identify the 2.4 (in between 2nd and 3rd grade) so his teacher deems types of chemical reactions. him not very smart. Making instructional decisions Teacher places him in a low ability group in order to give Figuring Sam doesn’t understand chemical reactions, the him slower pace. teacher goes over the defi nitions and how to identify them multiple times. Defi ning an alternative Teacher thinks that Sam possesses an adequate Being able to identify types of chemical reactions is not problem/hypothesis to describe results vocabulary, but does not perform well on the skill of the only aspect of understanding them, and he may vocabulary recall, which is only one aspect of understand other aspects of reactions. understanding vocabulary. Designing/Collecting assessment The teacher tasks Sam to defi ne words such as The teacher develops an alternative assessment that asks “petroleum,” use it in an original sentence, and match Sam to predict the products including states of matter the word to a defi nition in different assessments. and balancing of various types of reactions. Interpretation and analysis If Sam performs differentially on these tasks and others If this assessment yields different results, then Sam like it, then he probably understands the word but the probably understands one but not all aspects of type of assessment affects his performance and thus chemical reactions. only the pure recall skill required by that assessment type may be what Sam struggles with. Making instructional decisions No instructional decision was provided with this example, With additional information, the teacher can either target however, they do say the following: The teacher needs specifi cally the identifi cation aspect or make curricular to consider the consequences that (a) short-term change regarding chemical reactions if that teacher assessment(s) has/have on long-term decisions, such as decides identifi cation is not as highly valued as other the decision to place Sam in a low ability group for what aspects of chemical reactions. could simply be due to the assessment design or context (such as time). perspectives. These multiple interpreta- the analysis and interpretation of the re- 2010). In its simplest defi nition, instruc- tions, formed as questions or hypotheses sulting data. These suggestions are best tional sensitivity is the extent to which (Calfee & Masuda, 1997), give teachers illustrated in Table 3 where the teacher students’ performance refl ects the qual- the opportunity to access a wide variety modifi es a complex goal (understanding ity of instruction received (Koscoff & of information about their students and of stoichiometry) to a simpler goal (un- Klein, 1974). their own teaching from one item, one derstanding of molar ratios) thereby as- To demonstrate instructional sensitiv- assessment, or a group of assessments. In sessing that goal in a more valid manner. ity, imagine a chemistry teacher wants to the U.S. Department of Education large Designing assessments and collect- evaluate the effectiveness of a simulation scale study of elementary, middle, and ing data – Step 2. Teachers frequently of particles responding to increasing tem- high school teachers (2008), only 38% design, collect, and analyze students’ perature and therefore administers some of teachers reported having profession- data using formative assessments such as form of assessment. If the a) content as- al development that focused on how to quizzes, homework, in-class activities, sessed by the assessment aligns with the formulate questions to answer with data and tests. Many of these contain items learning objectives, b) items are not be- from assessments. To address this, we that are designed (or at least chosen) ing misinterpreted by the students, and c) refer teachers to the IES’s guidelines for by the teachers themselves. A constant format of the response allows the teacher a good hypothesis: 1) identifi es a prom- concern for these items is the extent to to validly determine students’ thought ising intervention, 2) ensures that out- which the results can be used to deter- processes, the results can be interpreted comes can be measured accurately, and mine the instructional effectiveness. This to make conclusions about the effect of 3) lends itself for comparison study (pre- has been referred to as consequential the simulation on student learning. Factors post or treatment-control designs) (Insti- (Linn & Dunbar, 1991; Messick, 1989), a-c are aspects of instructional sensitiv- tute of Education Sciences, 2009). Ad- instructional (Yoon & Resnick, 1998), ity, and without them being considered ditionally Suskie (2004) warns against or pedagogical validity (Moran & in some way, any conclusion about having too many learning goals or inap- Malott, 2004), but has recently been the student responses would be suspect. propriate (too complex or too simple) called instructional sensitivity (Ruiz- For example, an assessment item that learning goals as this negatively affects Primo, 2012; Popham, 2007; Polikoff, simply asks “predict if the volume of 100 SCIENCE EDUCATOR Table 2. Using assessment to inform instruction Step in Data-Driven Inquiry U.S. Constitution (Calfee and Masuda, 1997) Atomic Structure Defi ning a problem A teacher wants students to have a fundamental A teacher wants students to have fundamental understanding of the U.S. Constitution. understanding of atomic structure. Designing/Collecting assessment In order to gain information on what students already The teacher asks the students to tell them anything they know about the Constitution, the teacher asks his know about the structure of an atom. Some might students to tell him something about the Constitution. volunteer something about protons, neutrons, and After total silence, he follows up with open questions electrons but provide little follow-up. The teacher about the federal government in Washington D.C. After could then ask about plum pudding versus solar another long silence, he writes keywords on the board system models, periodicity, or sub-atomic particles, such as “President,” “Congress,” and “Supreme Court” but gets little response. hoping to elicit dialogue, but still no response. Interpretation and analysis One possible conclusion is that the students actually Considering the lull in discussion, the teacher assumes know nothing about these topics, informing the they know very, very little about atomic structure. teacher that he will have to start from scratch. Making instructional decisions The teacher begins at the absolute basics, tailoring As a result, the teacher begins talking about charges of his instruction to include all aspects of the sub-atomic particles and basic construction of the government and the Constitution. atoms. Defi ning an alternative problem/hypothesis A hypothesis is that the students do not normally A hypothesis is that the students do not normally to describe results participate in such open discussions and for lack of participate in such open discussions and for lack of knowing how to react, students remain silent. knowing how to react, students remain silent. Designing/Collecting assessment To test this right on the spot, he shifts gears and asks To test this, the teacher asks more convergent questions, for his students to tell him something about weather such as what charges do the proton, neutron, and because he knows they understand weather. electron have? Interpretation and analysis If the response is no longer silence, this may be an If the students can answer these questions, then it could indication that it is the pedagogical style (or other have been the open-ended discussion, not the lack contextual aspects) that is yielding the silence, not in content knowledge that cause students to remain necessarily a lack of content knowledge regarding silent. the U.S. Constitution. Making instructional decisions No instructional decision was provided with this example. Further, if the teacher’s interpretation is that students don’t do well with open-ended discussions, that teacher may revise his prompts to elicit more from the students. gas will increase or decrease at higher and what is assessed by the items leads standards for evaluation of instructional temperature,” has limited sensitivity to to misinterpretations and missed oppor- sensitivity have been published. Outside instruction in this case for three reasons: tunities to gather valuable information of “typical” sources of assessment data 1) it assesses prediction, whereas the about teaching and learning. Project (i.e., homework, quizzes, etc.), Calfee simulation focuses on explanation (related 2061 of the American Association for and Masuda (1997) would argue that, in to factor a); 2) “volume of gas” can be the Advancement of Science provides light of a classroom assessment as ap- misinterpreted by students as “volume of alignment criteria of this nature that plied social science research (Cronbach, gas particle” (factor b); and 3) the item is may prove helpful for chemistry teach- 1988), teachers should be open to data susceptible to guessing given its current ers (Roseman, Kesidou, & Stern, 1996; collection complete with observations format (factor c). A more instructionally Stern & Ahlgren, 2002). Chemistry ex- and interviews. sensitive item might be “using drawings amples of this alignment (or the lack Interpretation and analysis of data – of gas particles, explain why an increase thereof) can be found in Table 1 and the Step 3. Even when an assessment of any in temperature will cause an increase in implications section. kind has been designed so that learning volume” because it minimizes factors Instructional sensitivity is an important objectives and assessed concepts are a-c, meaning that results can more read- consideration in the data use process be- aligned, the task of interpreting results ily be used to inform instruction. cause if teachers use items that are sen- in a way that is meaningful to instruc- It is commonly stated that the assess- sitive to their instruction, they can use tors is daunting. Textbooks on classroom ment items must align with the learning data from those items to adjust instruc- assessment (McMillan, 2011; Popham, objectives of that particular unit (Irons, tion (Ruiz-Primo, 2012). Judging both 2002; Witte, 2012) frequently discuss 2008; Anderson; 2003; Taylor, 2003). In quantitatively and qualitatively the degree the importance of: regards to a data-driven inquiry process, to which items are instructionally sensi- • understanding validity, reliability, this alignment is crucial. Lack of align- tive has been examined (Popham, 2007; descriptive and (minimal amount ment between goals of the assessment Burstein, 1989; Polikoff, 2010), but no clear of) inferential statistics WINTER 2017 VOL. 25, NO. 2 101 Table 3. Identifying content diffi culties and refi ning assessment goals Perimeter of polygons (Institute of Education Step in Data-Driven Inquiry Sciences, 2009) Stoichiometry Defi ning a problem Teachers at an elementary school examine 4th and A teacher wants to assess students’ knowledge of 5th graders’ profi ciency rates in language arts stoichiometry. and mathematics. Designing/Collecting assessment Standardized tests were administered to all students. The teacher gives assessment with items akin to: If 5.00 g of sodium phosphate react with excess calcium chloride, how much (g) calcium phosphate will precipitate assuming 100% yield? Interpretation and analysis Profi ciency rates were higher in language arts than in Over half of the students could not answer this mathematics. In particular, arithmetic was satisfactory, question correctly. In looking over the work, she but geometry shapes and measurement skills yielded results realized that students struggled with many things that indicated inadequate profi ciency for the class. Even such as writing formulae and equations, balancing more specifi cally, a teacher noticed that most students equations, and dimensional analysis. struggled on measuring perimeters of polygons, which was surprising as that only required satisfactory performance on arithmetic. Making instructional decisions and Their proposed action in this case was to design and collect Since the teacher couldn’t determine which specifi c designing/collecting assessment more assessment information. They took questions from piece needed further instruction, the teacher workbooks regarding perimeters of polygons and designed assessment items to only assess mole to gathered more data. mole ratios. Interpretation and analysis They began to notice that students performed well on problems She began to notice that about a quarter of the where polygons were drawn for them, but did not perform students either didn’t include stoichiometric well on real-life application word problems. coeffi cients or reversed them. Designing/Collecting assessment As a result, they developed lesson plans focused on the Because of this, she created a few problems that application of perimeter calculations in polygons to word specifi cally addressed the concept of mole to mole problems, tested again, and found a signifi cant improvement ratios and why and how the coeffi cients are used. in the performance on these items. Making instructional decisions In the future, the teachers used the same strategies to In future years, she will begin with problems that emphasize application problems to address the problem. only assess mole-to-mole ratios and move on to problems that assess other components of stoichiometry. • ethics of assessment teachers generally want to see that stu- (NCLB, 2002), the U.S. Department of • absence of bias when evaluating dents have learned what they have in- Education launched a series of large- students so as to inform an tended the students learn. This method, scale studies to assess teachers’ abilities instructor, often associated with a performance- to interpret quantitative assessment data • means of summarizing and repre- oriented learning approach, is one of two (U.S. Department of Education, 2008; senting results for oneself as an general approaches. The other interpre- 2010; 2011). First, the U.S. Department instructor, other teachers, parents, tation strategy is a growth approach, or of Education found that teachers wanted and other decision-makers. student-referenced assessment (Harlen more opportunities for professional de- However, even an understanding of the & James, 1996), meaning that teachers velopment specifi c to the interpretation previously mentioned psychometric as- reference students’ previous assessments and incorporation of data from standard- pects does not describe how it applies to a in interpreting the current assessments. ized tests, and that they were unlikely teacher’s particular content, context, and It is generally accepted that these two to do this if they were not confi dent in pedagogical style and ability. The previ- methods should be performed in con- their ability to do so (U.S. Department ously mentioned interpretation strategies junction with each other in interpreting of Education, 2008). A few years later, generally align with the criterion-refer- and delivering the results of assess- a different sample revealed that teachers enced era of assessment, meaning the ments (Bell & Cowie, 2002). As shown preferred to interact with colleagues on results are interpreted based on predeter- in Table 1, Sam is assessed by a criteria common assessments to interpret data mined criteria defi ned by the discipline (understanding of chemical reactions), rather than participate in formal profes- (Linn, Baker, & Dunbar, 1991). In sci- but a growth model could easily be in- sional development (U.S. Department of ence classrooms, Bell and Cowie (2002) corporated by repeated measurements of Education, 2010). With the most recent note that teachers frequently rely on cri- understanding of chemical reactions. sample, teachers seemed to express diffi - teria (where the criterion is a scientifi c Some years following the implemen- culties in fundamental data interpretation understanding of phenomena) because tation of No Child Left Behind Act skills, such as differentiating bar graphs 102 SCIENCE EDUCATOR from histograms, interpreting results at the individual student level. Similar Education Sciences, 2009; Irons, 2008) of cross-sectional versus longitudinal to the analysis of qualitative assessment or peer-assess (Falchikov, 1995) can be results, comparing subgroups in more data, it is diffi cult to comment on how informed by assessment results, but few complex data tables, effect of outliers to make instructional decisions in the details are available. Table 2 presents in calculation of the mean or failure to absence of a specifi ed learning goal, how a chemistry teacher might teach a consider distribution when given a mean, question, and actual student results. concept using a different teaching method and a fi rm understanding of validity, re- Even when this information is present, it while Table 3 shows the teacher did not liability, and measurement error (U.S. is diffi cult to guide instruction because “go on” with the curriculum when she Department of Education, 2011). Over the awareness of a content defi ciency realized that her students were not un- the course of these three studies with alone does not directly inform or drive derstanding the assessed content. participants from the 2004-2008 school pedagogical decisions (Knapp, 2006). Research Question 2: Conclusions years, teachers showed a limited ability However, this awareness does serve two about Gaps and Limitations to properly understand how to interpret purposes. Firstly, the U.S. Department In response to the ideas presented by quantitative data from assessments, yet of Education (2011) and others (Burke the literature above, we have discovered increasingly relied on the support of & Depka, 2011; Irons, 2008; Institute of several limitations and gaps within the similar-ability colleagues to assist in Education Sciences, 2009) suggest that existing literature. A “limitation” in- this process. An insubstantial number of teachers use the results of assessment dicates that a signifi cant amount of re- teachers were shown to possess adequate to determine whether they should move search has been conducted, but that lit- data use skills in other studies (Brunner, forward or recover, reteach, review, or in erature does not discuss the depth that is 2005; Cizek, 2001; Herman & Gibbons, general allocate more time to the content required for teachers to adapt research 2001; Datnow, Park, & Wohlstetter, found to challenge students. Empirical into practice. A “gap” indicates that 2007, Popham, 1999; Mandinach, 2005) studies suggest that teachers have been there is very little to no existing research. as well. Some authors have tried to give doing so for decades (Bennett, 1984; Upon reviewing the literature pertaining teachers tips on how to present their data Gipps, 1994; U.S. Department of Edu- to data use, we discuss fi ve conclusions: visually or organize them into tables for cation, 2011). Secondly, the Institute of 1. Gap: Data-driven inquiry is only ease of interpretation (Burke & Depke, Education Sciences (2009) recommends discussed in a general sense and does 2011; Anderson, 2003), which add to a increasing the students’ awareness of not address the mechanistic details growing list of tools that aid interpre- their own content defi ciencies in order to required to guide day-to-day instruc- tation as opposed to provide a detailed encourage self-directed assessment. tion. While the literature describing as- framework for how to use these results. As Knapp (2006) asserted, both of sessment as inquiry is valuable, it largely Quantitative assessment data (from these recommendations of reteaching excludes suggestions, instructions, or multiple choice, true/false, matching, and giving students feedback for self- guidelines that describe precisely what etc.) is not the only data type available improvement suggest what to do, but teachers should do. That is, the literature to teachers. Qualitative assessment data the assessment results do not elaborate would tell a chemistry teacher to conduct (from free response, essay, fi ll-in-the- how to carry it out. In general, this was the general process of designing, imple- blank, etc.) is also widely accessible to fairly common throughout the resources menting, analyzing, interpreting, and teachers. In this type of assessment data, reviewed, as phrases such as “consider acting on (the results of) an assessment. it is more important to interpret results what needs to be taught differently” However, unless a teacher’s assessment with an absence of bias and alignment (Irons, 2008), “[attempt] new ways of and results closely mimic the context of with a pre-determined rubric (McMillan, teaching diffi cult or complex concepts” a provided example, the process delin- 2011). Without a specifi c content area (Halverson et al., 2009), “a lesson… can eated in the research acts as a compass and educational context, it is very diffi - be appropriately modifi ed based on the as opposed to a set of directions; it can cult to describe what is entailed in quali- collected fi ndings,” (Witte, 2012) and point you in the right direction, but only tative data analysis, that rely so heavily “[Use] results to revise the unit” (Taylor, with more detailed guidance will you get on interpretive frameworks. 2003) served as main suggestions for to your destination. This lack of specifi c- Making and assessing instructional teachers. Suskie (2004) elaborates fur- ity can be detrimental to the translation decisions – Step 4. Outside of compre- ther by claiming that it is not that results of research into practice. In Black and hending what the data signify, it is also simply cannot dictate how to adjust in- Wiliam’s “black box” paper (1998), the suggested that teachers use the results struction, but should not dictate how to authors state: from assessments to appropriately guide adjust instruction as only professional their instruction. Referencing a review judgment in light of results should be “Teachers will not take up ideas that of formative assessment literature from used to make such decisions (Bernhardt, sound attractive, no matter how ex- Bell and Cowie (2002), actions in re- 2004). It has also been reported that an tensive the research base, if the ideas sponse to assessment data can take place intentional plan by teachers for students are presented as general principles at the classroom level, small group, or to self-assess (Yorke, 2003; Institute of that leave the task of translating them WINTER 2017 VOL. 25, NO. 2 103 into everyday practice entirely up to supports the need to consider disciplin- paradigms exist in most of the litera- the teachers (pg 10).” ary content in assessment interpretation. ture, which limits the true potential 3. Limitation: Although the idea of the data to inform practice. As 2. Gap: The process of guiding in- that teachers should enact data-driven discussed in the literature, the primary struction by analyzing assessment inquiry (similarly to a social science reason that teachers analyze and inter- results is described without reference researcher) to effectively use the re- pret assessment results is to identify the to educational context or disciplinary sults of their assessments is uncon- content area(s) on which students per- content. In accordance with the previous tested, the pragmatics and fi delity of form poorly. Although this is necessary conclusion, we postulate that the data- implementation of the process have in the data-driven inquiry, the prescribed driven inquiry process does not detail a not been studied. In a universal agree- action is usually to reteach, recover, re- day-to-day view because, to a large ex- ment, the resources reviewed point to visit, or emphasize the suspect content. tent, it generalizes across discipline areas teachers using an assessment process We again refer to the lack of a context and all educational levels of instruction. that includes goal-setting, data collec- for this fi nding because without context, In most studies reviewed, the process of tion, interpretation, and analysis in or- one cannot possibly suggest an appropri- data-driven inquiry is seemingly identi- der to inform their instruction. Although ate action as more information as to what cal for elementary, middle school, and the agreement amongst so many authors was done previously is required. Instruc- secondary level teachers and the students provides a strong argument for the ef- tional strategies and materials used orig- they teach. This is not necessarily incor- fectiveness of the process, few short- or inally help inform how these should be rect as general social science methods long-term studies have examined how changed in light of assessment results, inherent in data-driven inquiry apply to well particular teachers implement the because a teacher then has evidence to the gamut of student and teacher popula- entire process. Some notable exceptions suggest the teaching may have been less tions. However, if a researcher were to are recent works in science education effective than desired. This along with investigate the very specifi c, mechanistic (Haug & Ødegaard, 2015; Iczi, 2013; the format of the assessment questions, details of the process, s/he would need Tomanek, Talanquer, & Novodvorsky, the content being assessed, the wording to recognize that the learning goals, as- 2008; Ruiz-Primo & Furtak, 2007) and of the item, and a great many other con- sessment types and content, classroom the three Department of Education stud- textual pieces of information all factor discourse, and all other aspects of as- ies cited earlier. Without this investiga- into the interpretation of the results in or- sessment evidence are unique at each tion, a characterization of use of data for der to determine the best pedagogical re- educational level. teachers of a specifi c discipline is not sponse. As a note, we agree with Knapp Similarly, the majority of the research available. This makes it impossible to (2006) and Suskie (2004) who claim that does not focus on one particular disci- determine what, if any, data use training assessment results by themselves can- pline or another. It can be expected that should be developed for current and pre- not inform instruction when considered the assessment goals, format, analysis, service teachers. Additionally, since the in isolation. However, we do assert that and interpretation along with their ap- research lacks a consistent context that assessment results along with contextual propriate pedagogical actions in lan- weaves together the pedagogy with the information should guide teachers in guage arts would differ greatly from that consideration of the content, there is lit- their pedagogical decisions. of the physical sciences, for example. tle discussion of the pragmatics involved Even within an academic discipline, the in implementing data-driven inquiry Implications for High School data-driven inquiry process of chemistry with fi delity: Do teachers value the use Chemistry Teachers can look entirely different from that of of data to inform teaching? What skills The answer to our fi rst research ques- biology, could be different from stoichi- do teachers need in order to properly and tion also addresses the implications of ometry to gas laws, from conceptual gas effectively use data to adjust instruction? this review to secondary chemistry in- law problems to mathematical gas law How much time will teachers have to struction. Since the literature was not problems, or even from one conceptual dedicate to conduct proper data analysis based in the context of chemistry, we are Charles’ Law problem to another asked and interpretation? With other responsi- only able to offer recommendations for in a different format on the same assess- bilities and the potential for instructional how teachers should enact data-driven ment. This content consideration aligns improvement, is it realistic for teachers inquiry. However, there are a couple of with the spirit of pedagogical content to allocate the required time? To what implications that can be pulled from the knowledge (PCK, Shulman, 1987), al- extent does current pre-service teacher general suggestions. First, in defi ning though few articles mention the role of training address the skills required for ef- the goals for assessments (and therefore PCK in the interpretation of assessments fective data use? These along with many the focus of the analysis to conduct on (Novak, 1993; Park & Oliver, 2008). other questions pertaining to fi delity of resulting data), teachers should ensure Coffey et al. (2011) also claimed that in implementation remain uninvestigated. that their results can inform a possible formative assessment, research widely 4. Limitation: Both “what content intervention. Consider two hypothetical neglects disciplinary content, which to reteach” and “teach it differently” inquiries: 1) Did my students understand 104 SCIENCE EDUCATOR movement of gas particles as postulated Failure to recognize that the 36% who high school chemistry teachers’ by kinetic molecular theory? 2) Did my responded this way specifi cally strug- ability to carry out data-driven didactic style of instruction best help gled with geometries as opposed to any inquiry? students understand movement of gas other factor could lead to a misdiagnosis Research pertaining to the synthesis particles as postulated by kinetic mo- of student diffi culties. of best practices is not just a call for lecular theory? The second question will the chemistry-specifi c context, but also be used to help answer a question about Future Directions for Research a general call for continued research in the teacher’s performance whereas the Considering the context for which assessment to incorporate ideas deriving fi rst only implies that if the students un- this review was conducted, high school from the pedagogical content knowledge derstand it, then the teacher must have chemistry teachers are the main subjects literature. The assessment process can- taught it well or vice versa. for which the following suggested re- not be fully articulated speaking only in As a second implication, emphasis search ideas are presented. If research is generalities, but must also be described was put on the alignment of learning to inform practice in a signifi cant way, in consideration of the nature of the goals to assessment items. If a teacher further research must be completed on content being assessed. Similarly, the wishes to assess students’ understand- how the general process of data-driven content needs to take a signifi cant role ings of molecular polarity, that teacher inquiry is implemented in an everyday in guiding instruction. The effectiveness must realize that asking “Is ammonia po- context for chemistry teachers. We have of general instructional modifi cations lar?” assesses nomenclature (ammonia = already begun data analysis on a study like “reteach” or “change your teaching NH), Lewis structures (and concept of with this goal, but one study cannot pos- approach” can only be evaluated fully 3 valence electrons), effect of atomic elec- sibly capture the variability in the enact- when the context and nature of the con- tronegativity on bond polarity, electron ment of this process. Studies focused on tent is given. This is not to say that these and molecular geometry and, fi nally, data-driven inquiry need to incorporate suggestions are ineffective, but rather molecular polarity as a consideration of chemistry pedagogical content knowl- that the use of data to guide instruction individual bond polarities and three di- edge, as an appropriate investigation is not a general situation and specifi c ac- mensional geometries. As a result, this will need to search for what steps of the tions depend on the context in which the teacher needs to ask the question in a process are not present just as much as results are generated. way that will yield results to allow these (if not more than) what steps are pres- various factors to be investigated and/or ent. The latter will describe the current References controlled for. Lastly, chemistry teach- processes in place and inform the state Ackoff, R. L. (1989). From data to wis- ers will benefi t from understanding the of data-driven inquiry, whereas the for- dom. Journal of Applied Systems Analy- limitations of what assessment results mer is crucial to identifying areas where sis, 16, 3-9. can tell them. Interpretation and analysis chemistry teachers can improve. After Anderson, L. W. (2003). Classroom assess- can identify specifi c content areas where an initial, context- and content-oriented ment: Enhancing the quality of teacher students struggle, but that needs to be data use process is better defi ned, several decision making. Mahwah, N.J: L. Erlbaum combined with the contextual informa- inquiries will remain, including: Associates. tion only accessible to the teacher of the 1. What are the characteristics of high Babkie, A. M., & Provost, M. C. (2004). class. Using the molecular geometry ex- school chemistry teachers’ data use Teachers as researchers. Intervention in ample, if a teacher identifi es that 36% of process? School and Clinic, 39(5), 260-268. the class labeled the ammonia as trigonal 2. What are the best practices incorpo- Bell, B., Cowie, B. (2002). Formative as- planar as opposed to trigonal pyramidal rating data-driven inquiry based on sessment and science education. Kluwar on account of missing/neglecting the PCK specifi c to assessment in sec- Academic Publishers. lone pair of electrons on nitrogen (lead- ondary level chemistry? Bennett, N., Desforges, C., Cockburn, A., ing to a nonpolar response), that teacher 3. In what areas can high school chem- & Wilkinson, B. (1984). The quality should seek to obtain more information: istry teachers improve their data- of pupil learning experiences. London: Who are these 36%? Did they struggle driven inquiry skills? Lawrence Erlbaum Associates. with Lewis structures or molecular ge- 4. What limitations in regards to prag- Bernhardt, V. L. (2004). Data analysis for ometry? How did I teach this? Have they matic and fi delity of implementation continuous school improvement. Eye on showed any decreased performance with issues exist in proposed interven- Education, Larchmont, NY. that instructional strategy previously? tions targeted at improving high Black, P., & Wiliam, D. (1998). Inside the Also, instead of “36% of the class la- school chemistry teachers’ data- black box. Phi Delta Kappan, 80(2), beled ammonia as trigonal planar,” what driven inquiry? 139. if the results were presented as “36% of 5. How can professional development Brunner, C., Fasca, C., Heinze, J., Honey, the class responded that ammonia was of data use skills in either (or both) M., Light, D., Mandinach, E., et al. nonpolar?” There are multiple reasons continuing chemistry teacher train- (2005). Linking data and learning: why a student would respond this way. ing or pre-service training improve The Grow Network study. Journal of WINTER 2017 VOL. 25, NO. 2 105 Education for Students Placed At Risk, Falchikov, N. (1995). Improving feedback Kosecoff, J. B., & Klein, S. P. (1974, 10(3), 241–267. to and from students. In P. Knight (Ed.), April). Instructional sensitivity statis- Assessment for Learning in Higher Edu- tics appropriate for objectives-based Burke, K., & Depka, E. (2011) Using for- cation Birmingham: Kogan Paget. test items. Paper presented at the An- mative assessment in the RTI framework. nual Conference of the National Coun- Solution Tree Press, Bloomtington, IN. Gallagher, L., Means, B., & Padilla, C. cil on Measurement in Education, (2008). Teachers’ use of student data Burstein, L. (1989). Conceptual consider- Chicago, IL. systems to improve instruction, 2005 to ations in instructionally sensitive assess- 2007. U.S. Department of Education, Linn, R. L., Baker, E. L., & Dunbar, S. B. ment. Technical Report 333. Center for the Study of Evaluation, National Center Offi ce of Planning, Evaluation and Pol- (1991). Complex, performance-based for Research on Evaluation, Standards, icy Development, Policy and Program assessment: Expectations and validation and Student Testing, Graduate School of Studies Service. criteria. Educational researcher, 20(8), 15-21. Education & Information Studies, Uni- Gipps, C. (1994) Beyond testing: Towards versity of California, Los Angeles. a theory of educational assessment. Lon- Mandinach, E. B., Honey, M., Light, D., Calfee, R. C., & Masuda, W. V. (1997). don: The Falmer Press. Heinze, C., & Rivas, L. (2005, June). Creating an evaluation framework for Classroom assessment as inquiry. In Halverson, R., Prichett, R. B., & Watson, data-driven decision-making. Paper pre- Phye, G. D. Handbook of classroom J. G. (2007). Formative feedback systems sented at the National Educational Com- assessment. Learning, adjustment, and and the new instructional leadership. puting Conference, Philadelphia, PA. achievement, San Diego: Academic Madison, WI: University of Wisconsin. Press. McMillan, J.H. (2011). Classroom assess- Hamilton, L., Halverson, R., Jackson, S., ment: Principles and practice for effec- Cizek, G. J. (2001). Conjectures on the rise Mandinach, E., Supovitz, J., & Wayman, tive standards-based instruction. 5ed. and fall of standards setting: An intro- J. (2009). Using student achievement Pearson. duction to context and practice. In G. J. data to support instructional decision Cizek (Ed.), Setting performance stan- making (NCEE 2009-4067). Washing- Means, B., Chen, E., DeBarger, A., & Padilla, dards: Concepts, methods, and perspec- ton, DC: National Center for Education C. (2011). Teachers’ ability to use data tives (pp. 3-18). Mahwah, NJ: Lawrence Evaluation and Regional Assistance, In- to inform instruction: Challenges and Erlbaum & Associates. stitute of Education Sciences, U.S. De- supports. Offi ce of Planning, Evaluation Coffey, J. E., Hammer, D., Levin, D. M., & partment of Education. and Policy Development, U.S. Depart- ment of Education. Grant, T. (2011). The missing disciplin- Harlen, W., &James, M. (1996) Creating a ary substance of formative assessment. positive impact of assessment on learn- Means, B., Padilla, C., & Gallagher, L. Journal of Research in Science Teach- ing. Paper presented to the American (2010). Use of education data at the lo- ing, 48(10), 1109-1136. Educational Research Association An- cal level: From accountability to instruc- Copland, M. A. (2003). The Bay Area nual Conference, New York. tional improvement. U.S. Department of Education. School Collaborative: Building the ca- Haug, B. S., & Ødegaard, M. (2015). For- pacity to lead. In Murphy, J., & Datnow, mative assessment and teachers’ sensi- Messick, S. (1989). Meaning and values in A. (Eds.), Leadership lessons from com- tivity to student responses. Int. J. Sci. test validation: The science and ethics prehensive school reform (pp. 159–184). Educ. 37(4), 629-654. of assessment. Educational Researcher, Thousand Oaks, CA: Corwin Press. 18(2), 5-11. Herman, J., & Gribbons, B. (2001). Les- Cronbach, L. J. (1988). Five perspectives sons learned in using data to support Moran, D. J., & Malott, R. W. (Eds.). on validity argument. Test validity, 3-17. school inquiry and continuous improve- (2004). Evidence-based educational Cuban, L. (1998). How schools change ment: Final report to the Stuart Founda- methods. Accessed online via Elsevier. reforms: Redefi ning reform success and tion (CSE Technical Report 535). Los No Child Left Behind (NCLB) Act of failure. Teachers College Record, 99, Angeles: UCLA Center for the Study of 2001, Pub. L. No. 107-110, § 115, Stat. 453–477. Evaluation. 1425 (2002). Datnow, A., Park, V., & Wohlstetter, P. Izci, K. (2013). Investigating High School Novak, J. D. (1993). How do we learn our (2007). Achieving with data: How high- Chemistry Teachers’ Perceptions, Knowl- lessons? The Science Teacher, 60(3), performing school systems use data to edge and Practices of Classroom Assess- 50–55. improve instruction for elementary stu- ment. Dissertation for the University of Ogunkola, B. J., & Archer-Bradshaw, dents. Los Angeles, CA: University of Missouri – Columbia. R. E. (2013). Teacher quality indicators Southern California, Center on Educa- Irons, A. (2007). Enhancing learning as predictors of instructional assessment tional Governance. through formative assessment and feed- practices in science classrooms in sec- Deno, S. L., & Mirkin, P. K. (1977). Data- back. Routledge. ondary schools in Barbados. Research in Based program modifi cation: A manual. Science Education. 43, 3-31. Knapp, M. S., Swinnerton, J. A., Copland, Drucker, P. F. (1989). The new realities: In M. A., & Monpas-Huber, J. (2006). Park, S., & Oliver, J. S. (2008). Revisit- government and politics, in economics Data-Informed leadership in education. ing the conceptualisation of pedagogi- and business, in society and world view. Center for the Study of Teaching and cal content knowledge (PCK): PCK as New York: Harper & Row. Policy. a conceptual tool to understand teachers 106 SCIENCE EDUCATOR