DOCUMENT RESUME TM 032 504 ED 452 219 Walker, Sherry Freeland, Ed. AUTHOR High-Stakes Testing: Too Much? Too Soon? TITLE Education Commission of the States, Denver, CO. INSTITUTION ISSN-0736-7511 ISSN 2000-00-00 PUB DATE 22p.; Theme Issue. Published three times a year. ECS NOTE Distribution Center, Education Commission of the States, 707 17th Street, Suite 2700, Denver, CO 80202-3427 (Annual subscription, $20). Tel: 303-299-3600; For full text: http;//www.ecs.org. Serials (022) Collected Works PUB TYPE State Education Leader; v18 n1 Win 2000 JOURNAL CIT MF01/PC01 Plus Postage. EDRS PRICE *Accountability; *Educational Testing; Elementary Secondary DESCRIPTORS Education; *High Stakes Tests; Scores; *State Programs; Test Results; *Test Use; Testing Programs ABSTRACT This theme issue focuses on the use and consequences of high stakes tests. The lead article, "High-Stakes Testing: Too Much? Too Soon?" by Sherry Freeland Walker, introduces the topic and related issues, outlining the pros and cons of high stakes testing by the states. The problem, some experts say, is that states have tried to do too much too soon without the proper preparation and support for everyone involved. "The History of Testing," by Sherry Freeland Walker, traces the growth of high stakes testing through the last half century and in the present context of the standards movement. "High-Stakes Assessments Bring Out the Critics," by Jennifer Dounay, discusses a number of criticisms of high stakes testing and some responses from the public. "Why Is 'Teaching the Test' a Bad Thing?" by Lorrie Shepard, explores the issues of test score inflation, curriculum distortion, and safeguards against political pressures in testing. "How States Are Responding to Low-Performing Schools," by Katy Anthes, Susie Saavedra, Judie Mathers, and Jane Armstrong, describes the interventions states with high stakes accountability systems are using with low performing schools. The effects of high stakes tests on teacher education are outlined in "High-Stakes Testing Pressures Teacher Education" by Michael Allen. Other "Maryland Moves toward Intervention" (Mary articles in this issue are: (1) "Why "Texas Test Withstanding Court Scrutiny" (Jill Weitz); (3) Fulton); (2) "Poor Test Do We Need High-Stakes Assessments?" (Michael Sentance); (4) "Performance Management, Not Just Results Lead to Math Consortium"; and (5) Accountability" (Peter Robertson). (SLD) Reproductions supplied by EDRS are the best that can be made from the original document. State Education teadeir/ Cr N N tr) Winter 2000 Number 1 Volume 18 High-Stakes Testing: Too Much? Too Soon? Zr 0 LC) EDUCATION U.S. DEPARTMENT OF Improvement Office of Educational Research and INFORMATION 0 EDUCATIONAL RESOURCES PERMISSION TO REPRODUCE AND CENTER (ERIC) DISSEMINATE THIS MATERIAL HAS Et/this document has been reproduced as BEEN GRANTED BY organization received from the person or originating it. 5 f!_v_Ja-tiL:e Minor changes have been made to improve reproduction quality. in this Points of view or opinions stated document do not necessarily represent TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) official OERI position or policy. 1 Education Commission of the States 2 EDUCATION E EDUCATION ST COMMISSION OF THE STATES CS WI& WINTER 2000 NUMBER VOLUME 18 HIGH The mission of the Education Commission of TESTING: the States (ECS) is to help state leaders identify, TOO MUCH? develop and implement public policy for education TOO SOON? that addresses current and future needs of a learning society. by Sherry Freeland Walker schools are doing. Policymakers see them as a ick up the newspaper and you're way to raise standards and achievement and likely to see an article about state hold students and educators accountable. But as assessments that carry big conse- support grows on one hand, so does opposition quences. Parents and students in one state In this issue on the other. Are high-stakes tests worthwhile? protest the test, while policymakers in another laud or bemoan the outcomes of their state's Or is the controversy around them likely to High-stakes assessments latest assessment. Across the nation, state and derail the standards movement? bring out the critics. Page 4 district leaders are putting more emphasis on Is "teaching the test" a bad Lagging skills testing and using test results to make more thing? Pages 7-8 No one disputes that too many American decisions about students and schools. Will this How states are responding students are not gaining the knowledge and student be promoted to the next grade? Will to low-performing schools. skills they need to succeed in college and the that one graduate from high school? Should Pages 13-14 workforce. Only about one-third are proficient this school be reconstituted? in reading and fewer still in math, according to High-stakes testing pres- Using assessment tests for such "high- sures teacher education. National Assessment of Educational Progress stakes" purposes is gaining public support as a Page 16 way to determine how good a job public Continued on next page . 0 0 Continued from previous page with the new standards," said Richard F. Elmore, Harvard School of Education profes- scores. Even the most advanced U.S. students e sor, at a recent Washington, D.C., conference. lag behind their peers in other countries on the While some policymakers are rethinking Third International Mathematics and Science assessments, others say the low scores are just Study. And public opinion shows Americans ' - an indication of the work that needs to be done. . increasingly critical of public schools overall. SO "When we fired this missile," Todd Bankofier -.0- . es . - Almost every state has set content stan- - of the Arizona Board of Education said, "we dards for what students should know and mea- knew we had to guide it. It's going to take suring whether students are meeting those some left turns and some right turns, but it - standards is a natural outgrowth. To date, 17 would be wrong to turn it completely back." states, the District of Columbia and Puerto . . "Doing away with the tests or the conse- Rico have policies that base promotion or quences is the easy way out," Robert Schwartz . retention on a student's score on a state and/or and Matthew Gandal wrote in the January 19, '11 district assessment (see page 11). Twenty-seven . 2000, issue of Education Week. "It allows us to -e. states have high school exit exams (though not avoid the hard work of improving instruction all are tied to graduation or test beyond 9th- . o- and restructuring the use of time and resources . . ' grade skills). Polls consistently show public - so that all students are given the time and sup- support for standardized testing. - port needed to meet standards." Pros and cons 100 Confronting the dilemma Proponents of high-stakes testing argue - Jay P. Heubert and Robert M. Hauser of that it leads to achievement and other gains: the National Research Council's Committee on Students know what is expected and that the Appropriate Test Use recommend in High- test really counts, so they work harder. Stakes Testing for Tracking, Promotion and - Graduation that policymakers keep the follow- Schools identify and can address student 0..00 - .0 0 ing principles of appropriate test use in mind: . . weaknesses early. - Use the right test. Tests are valid only when Similarly, schools discover areas of overall . used for the specific purpose for which they weakness, prompting them to refocus 000 - were designed. resources where they are most needed. Remember tests are not perfect. Questions . Education across the state is more consistent, are but a sample of possible questions that I 11. eliminating situations where schools in some could be asked in a given area. districts are superior to others. . Don't use a test as the sole determinant of OS - The public sees gains from year to year and . - . . . a major decision. Promotion and graduation regains confidence in public schools. decisions should be based on many factors. Critics say the tests sometimes are too . - . . 0 II - hard, lead teachers to teach to the test, take Don't justify bad decisions with a test .11 00 0 0 time away from instruction, and are expensive. score or any other kind of information. Teachers say they're unprepared to teach to the Tests will not lead to better outcomes if dis- - standards, and students claim they're being tricts and schools lack the services to help tested unfairly, on content they haven't yet had. students who don't come up to standard. . I. Some parents and students are calling for an The answer to who's right the critics or end to high-stakes testing, and some policy- the supporters seems to be both. If the right makers are reexamining plans to tie tests to key test is used in the right way, in conjunction decisions such as graduation or to make high- with other measurements, it can be an effective stakes tests the central part of an accountability way to assess student learning. Without atten- system (see pages 4-6). tion to factors such as discrimination, curricu- lum and accuracy, however, it can be . Too much, too soon? detrimental to both students and schools alike. 00 - The problem, some experts say, is that This issue of State Education Leader looks states have tried to do much too soon without at the controversy around high-stakes testing. the proper preparation and support for everyone Freeland Walker is ECS publications director involved. Education Commission of the States "Teachers and principals simply do not STATE EDUCATION LEADER 4 0000 know how to do what they are expected to do VOL. 18 WINTER 2000 NO. 1 2 1=1 D j H by Sherry Freeland I Walker I [normal test- f ing has become the kudzu of modern American society, a requirements. Most took the form of multiple- esting is big news these days, and healthy vigorous 71 choice items that students either passed or the stakes are getting higher and grower penetrating all failed and primarily pinpointed gains at the low higher. As business and the public put end of the spectrum. The tests did little if any- more pressure on public schools and students to available space. 00 thing to measure how much students were achieve at higher levels, the use of testing is Gary Natriello, Columbia learning or how advanced their skills were. expanding rapidly. University's Teachers Throughout the last century, the uses of College, and Aaron M. Standards movement standardized testing, and the reasons for using Pallas, Michigan State Growing criticism of public schools led it, have grown considerably. As Gary Natriello University policymakers and educators to turn toward test- of Columbia University's Teachers College and ing to measure higher skills and to gain support Aaron M. Pallas of Michigan State University, for raising standards. The late 1980s saw the say, "formal testing has become the kudzu of rise of assessment tied to accountability for stu- modern American society, a healthy vigorous dent and school performance, although states grower penetrating all available space." were relying heavily on nationally published Half a century standardized tests, rather than assessments Standardized testing has been a feature of geared to individual state standards. public schools for half a century, initially serv- The early days of test results tied to ing largely to compare schools and students accountability, however, were criticized as against a standard set by testing companies. showing an inflated pattern of scores. Because the tests suddenly had high stakes, teachers Another use was to "sort" students, such as were teaching to the test, critics said. They identifying those considered fit for higher based their reasoning largely on the fact that education versus those who would be better gains on the National Assessment of suited to vocational school. Educational Progress tests were not as high as The 1970s saw an eruption of interest in scores on other assessments. "minimum competency testing." Then, as now, While the current wave of education say Robert Linn and Joan Herman of the National Center for Research on Evaluation, reform continues to emphasize accountability, it is more tied to the setting and implementing of Standards and Student Testing, "reformers state standards, both content (what students sought to improve education by holding educa- should know) and performance (how well they tors and students accountable for achieving stan- are able to do it). States are aligning assess- dards of performance, using tests for high school graduation and or grade-to-grade promotion." ments to their standards and demanding much By the early 1980s, nearly three-fourths of more from students than they have previously. Education Commission of the States STATE EDUCATION LEADER the states had minimum competency testing Freeland Walker is ECS publications director El VOL. 18 WINTER 2000 NO. 1 3 5 11;r1r%ic04, of W. ASSESSMENTS BRING OUT THE CRITICS by Jennifer Dounay or many people, both in and outside Kaplan, known for its SAT and ACT prepa- the education policymaking field, the ration books, has released books to help stu- concept of assessing students on their dents and parents of young children prepare for knowledge and skills seems a perfectly innocu- standardized tests in Florida, New York, Texas ous proposition. After all, why shouldn't pupils and Massachusetts. WAs the assessment be held accountable for learning what they have "Dumbing down" of the curriculum been taught during a given school year or by a stakes have increased Another criticism is that the curriculum certain milestone in their school careers? may be "dumbed down" as a result of state- for both students and This proposition, however, is not as simple mandated testing. Some people fear rote mem- as it may appear. As the assessment stakes have schools, various 'stress orization may be stressed rather than increased for both students and schools, various problem-solving skills and that teachers will "stress points" in the system are causing some points' in the system focus on subject areas or facts most likely to students, parents and others to question the are causing some stu- appear on assessments, rather than more com- validity of assessment and accountability plex skills, such as critical thinking. systems. dents, parents and There also is widespread concern that sub- Too much pressure jects not tested (for instance, fine arts or physi- others to question the Parents in some states are asserting that cal education) will be accorded less class time validity of assessment some high-stakes tests place undue pressure on or set aside altogether, as some elementary young children. Stories of increasing numbers schools have done with recess, to spend more and accountability of children suffering from sleep disorders and time on academics. systems. other stress-related maladies have appeared in Critics also argue that too much time is the press in the past few years. taken away from instruction when students are Districts across the nation have offered coached on testing techniques and then spend Saturday and summer tutorial classes to give hours taking the tests. children extra time to work on skills that may Score discrepancies be tested. The Hartford, Connecticut, schools Parents, as well as the general public, also offered classes during the 1999 spring break to doubt the integrity of a state assessment when help 3rd, 5th and 7th graders prepare for the scores do not match their children's grades or Connecticut Mastery Test scheduled for the achievement measured by other tests. Numer- fall. (To the district's credit, scores did improve Education Commission of the States STATE EDUCATION LEADER ous media articles have profiled students with significantly.) VOL. 18 WINTER 2000 NO. 1 6 Individual students likewise will feel the "A" or "B" averages who attain low scores on effect of the MCAS. The class of 2003 will be state assessments or fail to pass high school the first whose high school graduation will exit examinations. Parents and students wonder depend upon students' scoring at the proficient whether grades are inflated or if the bar on the or advanced level on all of the 10th-grade tests. state assessments has been set unreasonably Like Virginia, scores so far have been low. In high. Parents in affluent areas of New York 1999, only 34% of students reached those lev- such as Rye, Great Neck and Mamaroneck els in English language arts, 24% in mathemat- were shocked, for example, when, according to ics, and 24% in science and technology. a November 1999 New York Times article, one in five of their children failed the state's new Minority discrimination 8th-grade math assessments. Some test critics point out that students State issues from predominantly white and middle- to upper-class districts score the highest on high- Discrepancies between indicators of stu- stakes and other assessments. An analysis of dent achievement have shown up at the state the 1998 MCAS tests, conducted by the Gaston and district levels as well. For example, Institute for Latino Community Development at Virginia began in 1998 to assess 3rd, 5th and Students from the University of Massachusetts-Boston, found 8th graders as well as high schoolers on the predominantly white that cities with the highest proportions of state's Standards of Learning (SOLs) in Hispanic test takers fared worst on the 10th- English, history/social sciences, mathematics and middle- to upper- grade math assessments, with failure rates and science. Starting in the 2006-07 academic class districts score nearly as high for African-American students. year, only schools whose pass rates meet or While the statewide average failure rate for stu- exceed 70% in the four subject areas will be the highest on high- dents of all races on this assessment was 52%, eligible for accreditation, with the exception of it was 83% for Hispanic students and 80% for 3rd-grade science and history, whose minimum stakes and other African-American students. pass rate for accreditation will be 50%. assessments. 00 Testing programs in other states have Results of the spring 1999 tests reveal turned up similar gaps in minority achievement, only 6.5% of Virginia much work to be done although Texas' system the Texas schools met the pass-rate standard in all four of Assessment of Academic Skills (TAAS) the subjects. In Fairfax County, where students recently survived a legal challenge that claimed posted an average SAT score of 1095 in 1998 the high school exit exam discriminates against (versus a national average of 1005) and where Hispanics and blacks (see page 10 for more). 91% of students continue to postsecondary edu- While recognizing the differences in passage cation, only 54% passed the SOLs in 1998. rates among blacks (60%), Hispanics (64%) Because of these discrepancies, Virginia and whites (86%) in the spring 1999 adminis- has taken measures to evaluate the fairness of tration, U.S. District Judge Ed Prado wrote: the SOLs assessments. In February 1999, test- ing experts from three universities declared the "The evidence suggests that the State of SOLs valid and reliable. And a new SOLs Test Texas was aware of probable disparities Technical Advisory Committee has been com- and that it designed the TAAS account- missioned to report annually on the assess- ability system to reflect an insistence on ments' validity and reliability and propose standards and educational policies that are suggestions and recommendations for future uniform from school to school." changes. Mistakes and cheating Massachusetts' assessment results likewise have raised eyebrows in that state. The High-visibility examples of security Massachusetts Comprehensive Assessment breaches, teacher and administrator cheating, System (MCAS) tests 4th, 8th and 10th graders and mistakes made by testing companies also in English language arts, math and science and have shaken the public's confidence in assess- technology. In September 1999, the State Board ment systems. of Education voted to rate schools in two-year Essay questions for Ohio's 4th- and 8th- cycles based on their students' performance. grade writing assessments had to be rewritten Schools that do poorly must submit improve- after a paper quoted students discussing the ment goals to the state which, if unmet in two essay questions before some schools in the years, will open the schools to state takeover. Education Commission of the States state had administered them. Rhode Island STATE EDUCATION LEADER VOL. 18 WINTER 2000 Continued on next page NO. 1 5 r7 I I Continued from previous page Isolated instances of civil disobedience as well as organized resistance to high-stakes postponed administering mathematics and assessments have appeared in several states. English assessments for 4th, 8th and 10th Students in some Massachusetts cities sat out graders last year after widespread security the spring 1999 administration of the MCAS. A breaches were discovered. teacher in Harwich refused to give his students Test-tampering cases in Houston, Austin the 8th-grade history test after noticing that and eight other Texas districts may have been some questions dealt with the Civil War, which the impetus for the September 1999 creation of students had not yet studied. Groups such as the state's Public Education Integrity Task the Coalition for Authentic Reform in Force. In New York City, 52 teachers and Education and Cambridge Parents Against the QQ [Policymakers] administrators were named in a December 1999 MCAS have been established. Parents in sev- report for helping students improve their test must remember that, eral cities, including Boston, "have encouraged scores by a variety of means. their children to boycott the test or have taken Mistakes in scoring also have occurred. while scores may them out of the public schools," according to an Writing assessments for 4th, 7th and 10th October 31, 1999, Boston Globe article. reflect improvements graders in Washington State were rescored by Likewise, the Christian Science Monitor hand and subsequently released two months in schools or the tests reported that "in certain Detroit suburbs par- behind schedule after scoring mistakes were ticularly Birmingham, Troy and Farmington themselves, the final discovered in summer 1999. In September protesting parents have refused to allow their 1999, testing company CTB/McGraw Hill goal of states' stan- children to take the state test. In some towns, informed officials in Indiana, North Carolina, fewer than 15% of students participated in state South Carolina, Wisconsin and New York City dards and assessment testing a number so small as to render any that their tests may have been scored incor- results meaningless." The same article notes systems is not neces- rectly. Ramifications of the blunder were espe- that students intentionally have failed tests or cially strong in New York City, where more sarily the race for refused to take them in California, Wisconsin than 8,600 students were erroneously placed in and Illinois as well. ever-higher scores but summer school as a result of "low" test scores. What's next? the race for students' Backlash What's a policymaker to do? After all, test- Such cases of confusion, potential unfair- solid preparation for ing experts themselves caution that when ness and frustration have led to public outcry higher standards and new assessments are the workplace or against tests in some locales and responses implemented, scores will reflect the greater from decisionmakers. The results of the math postsecondary challenges placed upon students and the teach- portion of Arizona's new assessment instru- ers who must prepare them. education. PP ment, which members of the class of 2002 must There are no simple solutions. Policy- pass to graduate, revealed that 0% of the makers, however, must be cautious to avoid 44,245 students who took the test exceeded the alienating their constituencies or dismissing standard in math and only 11% met the parents' concerns. Above all, they must remem- standard. ber that, while scores may reflect improve- In response to cries from parents, students ments in schools or the tests themselves, the and educators across the state that the test is final goal of states' standards and assessment too difficult, the state board agreed to reexam- systems is not necessarily the race for ever- ine the scoring levels. Likewise, the Virginia higher scores but the race for students' solid state board has indicated it is open to discus- preparation for the workplace or post- sion of changing the history portion of the secondary education. SOLs, on which significantly fewer students Dounay is an ECS research associate.EI attain the proficient level than in other subjects that the state tests. Education Commission of the States STATE EDUCATION LEADER VOL. 18 WINTER 2000 NO. 1 6 by Lorrie Shepard [)"i] test-driven curriculum encourages teaching of skills in "generalize" to the intended curriculum con- ccording to a recent survey reported isolation, which may tent. In fact, controlled studies have shown that by Education Week, testing is the students may not be able to answer the same number one accountability tool, deny students the very questions if asked even in slightly different adopted in 48 of 50 states. Test results are ways. intended to focus attention on raising student activities that might In one classic experimental study, all stu- achievement. Yet, critics complain that the have made the prob- dents in a study were taught to translate from emphasis on testing leads to problems of Roman to Arabic numerals. The group tested in "teaching the test." What is meant by that, and lems understandable the same order did well, but when the other why is it a bad thing? and useful. from group was asked to translate in reverse Typically, teaching the test means devoting the drop-off in Arabic to Roman numerals extended time to subject areas that are tested, performance was startling. Students lost from 35 such as reading and math, to the exclusion of to 50 percentile points, showing they never other subjects. Test format becomes a template understood how the-number system really works. for how tested subjects are taught. Worksheets and practice assessments mirror the anticipated Curriculum distortion accountability tests as much as possible. A The negative effects of teaching the test on recent study in Texas, for example, found that student learning are the flip side of test-score teachers in urban schools were required to use inflation. In a nationwide survey for the test-prep materials from September through National Science Foundation, the majority of March, when the Texas Assessment of teachers acknowledged shifting instructional Academic Skills test was given. emphasis from nontested to tested topics and, Test-score inflation at the same time, reported negative impacts of mandated testing on curriculum and learning. When tests are developed initially, they are Although critics originally feared that testing designed to reflect curriculum frameworks or would take instructional time away from content standards. Particular test questions are "frills," such as art and citizenship, research intended only to be samples of the full curricu- shows that untested subjects such as social lum. How students do on the test is supposed to studies and science have been relegated to show how well they have mastered that curricu- Education Commission of the States Friday afternoons or even eliminated. lum. But if students practice only questions that STATE EDUCATION LEADER imitate the test, test performance may no longer VOL. 18 WINTER 2000 NO. 1 Continued on next page 7 Safeguards Continued from previous page Developing new forms of the test each year Even in tested subjects, instruction is is one limited safeguard that prevents practicing focused only on skills covered by the test. In a on specific test items. In addition, the move- study by Mary Lee Smith, elementary teachers ment toward performance assessments is aimed had given up reading real books, writing and at correcting the distorting effects of multiple- long-term projects and were focusing on word choice test formats. The more that extended QQ The movement recognition, recognition of spelling errors, tasks on tests reflect the actual kinds of written language usage, punctuation and arithmetic toward performance expression, problem solving and applications of operations. knowledge that are intended in the curriculum, assessments is aimed Unfortunately, a test-driven curriculum the less likely it is that teaching to the test will encourages teaching of skills in isolation, distort either learning or test-score gains. at correcting the which may deny students the very activities The content of a test alone, however, can- that might have made the problems understand- distorting effects of not be sufficient safeguard against political able and useful. Practicing only test-like for- pressures. Ultimately, the best remedies are multiple-choice test mats also elicits different cognitive processes (1) to put less weight on a single indicator than working with more extended and challeng- formats. when judging the quality of schools and (2) ing curricular materials. For example, students acknowledge accurately that the responsibility are asked to read artificially short passages and for student achievement is shared among stu- search for answers to formulaic questions. They dents, parents, teachers, school administrators, practice finding mistakes rather than doing sig- community leaders and policymakers. nificant writing on their own, and they learn to Shepard is professor of education, University of guess by eliminating wrong answers. Colorado at Boulder. xccal"Er3 ST ,yE EDUC ION r Published three times a year. Annual subscriptions are $20. 