Value Added (cid:31)(cid:30)(cid:29)(cid:28)(cid:27)(cid:26)(cid:29)(cid:25)(cid:24)(cid:23)(cid:27)(cid:29)(cid:22) (cid:24)(cid:21)(cid:21)(cid:24)(cid:20)(cid:28)(cid:19)(cid:18)(cid:24)(cid:27)(cid:28)(cid:24)(cid:29)(cid:20)(cid:30)(cid:24)(cid:17) How might we systematically Below, we present a series of questions and A:assess teacher “effectiveness”? answers (as well as resources for more in- Is it accurate or fair to evalu- depth analysis) aimed at disentangling this ate teachers based on student complex issue. In particular, we are concerned test scores? These questions lie at the heart of that a narrow use of “value added” as the recent debates surrounding teacher quality single measure of teacher effectiveness will and evaluation. Most centrally, these debates have a detrimental effect on student learning, have focused on the appropriate use of “value teacher retention, and educational equity. In other words, without more careful implemen- added measures” (hereafter VAM) in judging tation and use, VAM could exacerbate the teacher effectiveness and making decisions very problems they are alleged to help ad- about teacher compensation, promotion, and dress. dismissal. VAM uses changes in student test scores to determine how much “value” an individual teacher has “added” to student growth during the school year. Some policymakers, school districts, and educational advocates have applauded VAM as a straightforward measure of teacher effectiveness: the better a teacher, the better students will perform on standardized tests. However, many promi- nent researchers and educators have ex- pressed concern and urged caution. Value Added? 1 UCLA IDEA VAM is a new statistical tool school support, etc.).4 The teacher ends up with A:for quantifying teacher effec- a score that is supposed to reflect her individual tiveness on the basis of student impact on student achievement.5 gains on standardized tests. VAM compares students’ test scores at the This is the potential appeal of VAM: Evaluate beginning of the year with their results on a teachers on the basis of how much academic comparable test at the end of the year, thus growth their students experience over the isolating the “value added” by a particular course of the school year.6 Use these evalua- teacher.2 In theory, a teacher’s “value added” tions to identify and reward “effective” is the unique contribution she makes to her teachers, and dismiss or target those who are students’ academic progress.3 deemed “ineffective” for professional develop- ment. VAM has also gained popularity for its VAM marks an improvement over methods relative statistical sophistication.7 that evaluate teacher effectiveness based on average (“raw achievement”) scores. For But this is not just an academic exercise. example, comparing two teachers’ average Policy makers and educational leaders are student test scores to one another does not increasingly talking about using VAM to make take into account where each group of stu- high-stakes decisions — decisions that will dents began. Teacher A may have a higher shape the quality of education students receive. class average than Teacher B, but Teacher B’s To gain a clearer sense of these measures, students may have began the year with much including the potential unintended conse- lower scores. Thus, Teacher B’s students may quences of evaluating teachers based on stu- have actually made greater gains. dent test scores, we offer a closer look at the methodology and practical implementation Proponents of VAM (including some equity- of VAM. minded educational advocates) therefore point to its improved accuracy and fairness. Unlike previous approaches, VAM attempts to account for 1) where each group of stu- dents began and 2) the influence of external factors on student growth (greater family re- sources, instruction in previous grades, out of Value Added? 2 UCLA IDEA (cid:31)(cid:30)(cid:29)(cid:28)(cid:27)(cid:26)(cid:25)(cid:24)(cid:27)(cid:23)(cid:22)(cid:30)(cid:21)(cid:20)(cid:19)(cid:29) (cid:18)(cid:27)(cid:22)(cid:29)(cid:17)(cid:20)(cid:18)(cid:16)(cid:17)(cid:29)(cid:27)(cid:18)(cid:15)(cid:19)(cid:27)(cid:21)(cid:18)(cid:17)(cid:20)(cid:19) (cid:14)(cid:29)(cid:18)(cid:28)(cid:13)(cid:22)(cid:29)(cid:27)(cid:30)(cid:12)(cid:27)(cid:11)(cid:29)(cid:18)(cid:10)(cid:9)(cid:29)(cid:22) (cid:29)(cid:12)(cid:12)(cid:29)(cid:10)(cid:11)(cid:20)(cid:21)(cid:29)(cid:15)(cid:29)(cid:28) Many researchers and statisti- • The instability of teachers’ scores: VAM A: cians argue that VAM does not is relatively unstable over time. In one provide a sufficiently reliable study, a large percentage of the teachers and valid measure of teacher who were identified as “most effective” effectiveness, particularly when used to make one year were then identified as “least high-stakes personnel decisions.8 Method- effective” the next year.11 This is partially ological problems with VAM include: because the impact of a teacher simply cannot be separated from other influences • The non-random sorting of teachers and (both inside and outside the school).12 If students: VAM assumes that what teach- test scores were an accurate measure of ers do in the classroom has a causal effect teacher effectiveness, one would expect on student test scores. Increasing scores much greater stability in teachers’ scores are a result of greater teacher effectiveness. from year to year.13 Decreasing scores are a result of teacher ineffectiveness. However, such causal • The difficulty of isolating teacher effects: interpretation requires random sorting.9 Fundamentally, the impact of teachers VAM is most credible when students are cannot (and perhaps should not) be sepa- randomly sorted into classes, and teach- rated from external influences on student ers are randomly assigned to those classes. growth. There are many reasons why Without random sorting, it is impossible to students score well on standardized tests. know whether rising or falling test scores Certainly one reason is that their teacher can actually be attributed to the individual effectively taught the material. But stu- teacher.10 Importantly, non-random sort- dents also score well because they have ing is often a deliberate practice (on the access to learning opportunities outside part of schools and parents) used to ensure their classroom. Even within the same that students are assigned to the classroom classroom, students may not be getting most likely to meet their learning needs. the same educational experiences and supports: Value Added? 3 UCLA IDEA o Students are exposed to more adults than just the teacher at school, including other teachers, classroom aides, tutors, etc.14 o Students attend after-school, summer, and weekend educational programs. o Students go home to families that provide different kinds of learning opportunities.15 There is, consequently, growing consensus that VAM is simply too unreliable to be used widely or to form the single basis for teacher evaluation. However, even if some of the methodological issues outlined above were to be addressed, there are additional reasons to be concerned about the consequences of VAM for student learning, teacher retention, and educational equity. Value Added? 4 UCLA IDEA (cid:31)(cid:30)(cid:29)(cid:28)(cid:27)(cid:29)(cid:26)(cid:25)(cid:27)(cid:28)(cid:30)(cid:25)(cid:27)(cid:24)(cid:23)(cid:28)(cid:25)(cid:22)(cid:28)(cid:21)(cid:29)(cid:20) (cid:19)(cid:22)(cid:21)(cid:22)(cid:28)(cid:25)(cid:22)(cid:18)(cid:25)(cid:18)(cid:27)(cid:17)(cid:23)(cid:22)(cid:16)(cid:25)(cid:15)(cid:19)(cid:25)(cid:22)(cid:17)(cid:25)(cid:16) (cid:23)(cid:14)(cid:27)(cid:13)(cid:12)(cid:11)(cid:27)(cid:14)(cid:23)(cid:26)(cid:27)(cid:25)(cid:18)(cid:19)(cid:17)(cid:29)(cid:28)(cid:21)(cid:23)(cid:22)(cid:29)(cid:20) (cid:25)(cid:15)(cid:19)(cid:21)(cid:28)(cid:10) Evaluating teachers based widespread (and often encouraged) practice A: solely on student test scores pri- of “teaching to the test.” In addition to drill- oritizes test preparation at the ing students on test-type questions, teachers expense of more enriching and who gain familiarity with the test may focus challenging curriculum. VAM assumes that on ‘likely-to-be-tested’ topics and organize gains in student test scores are synonymous learning in the format of common test ques- with meaningful forms of learning. However, tions.18 Ultimately, the skills being tested offer the tests used to determine teacher effective- a very limited representation of the kinds of ness often focus on “testable skills” rather thinking, knowledge, and practices we aim to than deep and broad conceptual understand- cultivate in classrooms.19 ing. Using student test scores as the single indica- For example, whereas mathematical knowl- tor of teacher effectiveness may exacerbate edge may be easier to assess on short an- educational inequity. Under NCLB, schools swer or multiple choice tests, subjects such enrolling large numbers of low-income stu- as history, civics, English literature, writing, dents and students of color often have fo- and critical thinking require distinct forms of cused on a narrow set of “testable skills” to assessment.16 Evaluating teachers based on avoid sactions. As educational researcher student test scores creates incentives to di- Mike Rose writes, “You can prep kids for a minish instruction in these areas.17 A focus on standardized test, get a bump in scores, yet “testable skills” also narrows the curriculum not be providing a very good education. The within the subjects most emphasized by recent end result is the replication of a troubling pat- policies: math and reading. In the domain of tern in American schooling: poor kids get an literacy, high-stakes tests often accompany education of skills and routine, a lower-tier scripted curriculum that emphasize fluency education, while students in more affluent and speed over reading comprehension. districts get a robust course of study.”20 Equity oriented, high-quality teaching and learning What, then, does VAM value? While not dis- must be defined as more than doing well on a missing that information from tests can some- narrow set of measures. times be useful, we are concerned that VAM directs curriculum and instruction towards lower-level skills. This is reflected in the Value Added? 5 UCLA IDEA Further, linking teacher evaluation with test Ultimately, basing professional evaluation on scores provides a disincentive for working VAM is likely to result in the demoralization with the most vulnerable populations of and attrition of teachers possibly and stu- students. According to the Economic Policy dents. Teachers will face what is legitimately Institute, “teachers have been found to receive perceived as arbitrary and unfair forms of lower ‘effectiveness’ scores when working evaluation, without adequate attention to with English language learners, special edu- the conditions within which they work.26 cation students and low-income students than Rather than creating opportunities for teach- when they teach more affluent and education- ers to hone their craft, VAM demands an ally advantaged students.”21 Thus, teachers even greater emphasis on raising student test may be further discouraged from working in scores. Thus, the narrowing of curriculum and the most high-need schools. Within schools instruction leads to the deskilling and devalu- and classrooms, students with greater or spe- ing of teachers.27 This shift will hinder teach- cial educational needs may be perceived as ers’ ability to create intellectually rich contexts ‘pulling down’ teachers’ VAM scores.22 High- where all students have an opportunity to stakes accountability has already led some learn – the kind of education many joined the schools to pressure their most struggling teaching force to help cultivate, and the kind students to transfer or drop out.23 of education students deserve. Finally, a narrow use of VAM may have a detrimental effect on teacher collaboration and morale. As stated, VAM aims to isolate the contributions of individual teachers on student outcomes. If increasing test scores are linked to monetary rewards, teachers may be less likely to collaborate or coordinate efforts to support students across classrooms.24 This potential trend stands in stark contrast to research that links high levels of teacher col- laboration and peer learning with high levels of student achievement.25 Value Added? 6 UCLA IDEA (cid:31)(cid:30)(cid:29)(cid:28)(cid:27)(cid:26)(cid:25)(cid:24)(cid:30)(cid:29) or all these reasons, we believe equity- in long division or that English Learners F minded educational advocates ought had particular difficulty with word prob- to challenge the use of VAM as the lems, he can take action to provide target- single measure of teacher effective- ed assistance in these areas. Schools and ness, particularly in the context of high-stakes districts can also take action to provide personnel decisions. As reflected in the Los specific supports. Angeles Times’ (2010) recent publication of teachers’ scores, singling out individual teach- • Classroom observations: Provide teach- ers as “effective” or “ineffective” based on ers with quality feedback about their unreliable information is not a fair or useful classroom practice. This includes offering strategy for improving teacher quality. Infor- specific suggestions about what to im- mation is a good thing as long as we know ex- prove on and how to improve on it. This actly what that information is telling us, and should take place in a low-stakes environ- how we can use it to better the educational ment where teachers receive professional experiences of all students. support to continue developing their practice. VAM might be useful as one piece of a much larger plan for improving teacher quality and • Professional development: Create high- student learning. For example, rather than quality professional development expe- focusing on individual teachers, VAM could riences where teachers can build their rep- be a useful tool for school- or district-level as- ertoire of skills, particularly in those areas sessment. 28 Focusing on school-level change that test observation data have identified and formative evaluation would help circum- as needing improvement. This includes vent some of the threats to collaboration and creating opportunities for teacher collabo- equity mentioned above. ration and peer learning. For VAM to help individual teachers reflect • Comprehensive assessment of students: on and improve their practice, it must be part Using student portfolios and other forma- of a more comprehensive approach to evalua- tive assessments would address concerns tion. This approach ought to include: that a narrow focus on standardized out- come measures can lead to “teaching to • Well-analyzed test data: Overall value the test” or a narrowing of the curriculum added scores do not tell us where to focus as mentioned above. improvement efforts. Instead, we need to provide teachers with specific data about how particular groups of students per- form on particular tasks. If a teacher can see that all third graders made mistakes Value Added? 7 UCLA IDEA Endnotes 1 According to McCaffrey, et. al., (2004) the 7 Economic Policy Institute (EPI), (2010), p. 2. teacher’s contribution to student outcomes is 8 Random sorting is similar to experiments that defined as the difference between a student’s designate a “control” group and a “variable” achievement in the teacher’s class and his/her group, with the goal of identifying the unique predicted achievement with a teacher of “aver- effects of a particular variable (in this case, the age” effectiveness. Also, see Daniel Willing- individual teacher). ham’s short video for a succinct explanation of VAM and Merit Pay: http://www.youtube.com/ 9 Braun (2005). As economist Jesse Rothstein watch?v=uONqxysWEk8 (2009) argues, in order for Value Added Mea- sures to be of use, “they must reflect teachers’ 2 Corocan, 2010, p. 4. causal effects on the student outcomes of inter- 3 As Corocan explains, “If we assume that many est, not preexisting differences among students of the external factors influencing a student’s for which the teachercannot be given credit or fourth grade achievement are the same as those blame.” influencing her third grade achievement, then 10 Berry (2010) and Sass, (2008). the change in the student’s score will cancel out these effects and reveal only the impact of 11 Amrein-Beardsley (2008) and McCaffrey, et. al., changes since the third grade test, with the year 2004 (RAND). This is also due to the problem of fourth grade instruction being the most obvi- of missing data. ous” (2010, p. 4). 4 According to Braun (2005, p. 7), “that number, 12 As educational researcher Wayne Au (2011) expressed in scale score points, may take on both writes, “The year-to-year instability that Sass positive and negative values. It describes how [2008] highlights shows that test scores have different that teacher’s performance is from the very little to do with the effectiveness of a single performance of the typical teacher, with respect teacher and have more to do with the change to the average growth realized by the students in of students from year to year (unless, of course, their classes.” one believes that one-third of the highest ranked teachers in the first year of the study simply 5 Braun, 2005, p. 2 decided to teach poorly in the second).” 6 According to the Economic Policy Institute 13 Sometimes termed the “spill-over effect,” this (EPI), “Value added approaches are a clear is an especially important factor to consider improvement over status test-score comparison in middle and high school, where students’ (that simply compare the average student scores learning and growth in distinct subjects and of one teacher to the average student scores of classrooms may be (and, ought to be) mutu- another); over change measures (That simply ally influential. For example, learning how to compare the average student scores of a teacher develop an argument in the context of history in one year to her average student scores in the or social studies may positively influence a previous year); and over growth measures (that students’ development in English. Or, practice simply compare the average student scores of a with problem solving in one content area might teacher in oneyear to the same students’ scores fruitfully support students’ learning in another. when they were in an earlier grade the previ- ous year)…Although value added approaches 14 EPI, 2010, p. 9. improve over these other methods, the claim that they can ‘level the playing field’ and provide 15 As Sean Corocan of the Anneburg Institute for School Reform argues, “it makes little educa- reliable, valid, and fair comparisons of individual tional sense to force such skills to conform to teachers is overstated” (2010, p. 9). such a structure purely for value added assess- ment” (2010, p. 14). Value Added? 8 UCLA IDEA 16 EPI, 2010, p. 16. 25 As educational researcher Mary Kennedy writes, “We measure and track their value added 17 EPI, 2010, p. 17; Corocan, 2010; McCaffrey, test scores but we do not measure their teaching et. al., 2004 (RAND) For example, teachers loads, planning time, student absences, propor- who do try to teach the full curriculum (or who tion of difficult-to-teach or resistant students, might be focused on preparing their students for frequency of outside interruptions, access to the type of work they will encounter in future textbooks or equipment of good quality, or grades) may find their students not gaining as whether their instructional materials arrived much as others, whose teachers resort to some before the school year began” (2010, p. 596). form of teaching to the test (Braun, 2005, 16). 26 Describing the experience of one veteran 18 An increasingly narrow focus on testing may teacher, Rose writes, “The school’s test scores also contribute to student disengagement and were not adequate last year, so the principal, teacher demoralization. As one teacher states, under immense pressure, mandated a “scripted” “Children have not stopped doing what children curriculum, that is, a regimented curriculum do but teachers don’t have time to deal with focused on basic math and literacy skills fol- it. They don’t have time to talk to their class, lowed by all teachers. The principal also di- and help the children figure out how to resolve rected the teachers not to change or augment things without violence. Teachable moments to this curriculum. So Priscilla cannot draw upon help the schools and children function are gone” her cabinets full of materials collected over the (EPI, 2010, 19). years to enliven, extend, or individualize instruc- tion. (Though like any experienced teacher, she 19 Rose (In press, Dissent). figures out ways to use what she can when she 20 EPI (2010), p. 3. “Other human service sectors, can.) The teachers have also been directed by public and private, have also experimented with the principal to increase the time spent on the rewarding professional employees by simple literacy and math curriculum and trim back sci- measures of performance, with comparably ence and social studies. Art and music have been unfortunateresults. In both the United States cut entirely. “There is no joy here,” she told me, and Great Britain, governments have attempted “only admonition.” (Rose, in press, 2011) to rank cardiac surgeons by their patients’ 27 Garcia (2010) http://educationadvocacy.word- survival rates, only to find that they had created press.com/2010/09/09/all-eyes-in-education- incentives for surgeons to turn away the sickest on-los-angeles-monica-garcia/ patients” (p. 7). 28 This touches on one of the central criticisms 21 EPI (2010), p. 16. of VAM: “When teachers receive data based on 22 Hinchey, (2010), 1. once-a-year standardized tests, they rarely are informed of why they are or are not effective in 23 EPI (2010), p. 18. teaching their students. They simply have raw scores, absent any deeper analytics that can help 24 Metlife Foundation (2009); Jackson, C.K. & their improve their classroom teaching prac- Bruegmann, E. (2009) As Barnett Barry of the tices” (Berry, 2010, p. 4). Center for Teaching Quality reports, “Over 90 percent of the nation’s teachers report that their colleagues contribute to their teaching effective- ness. New teachers, in particular, were more likely to strongly agree that their success in the classroom hinged on the effectiveness of others” (2010, p. 5). Value Added? 9 UCLA IDEA