Education Policy Brief The Limits and Possibilities of International Large-Scale Assessments David J. Rutkowski and Ellen L. Prusinski VOLUME 9, NUMBER 2, SPRING 2011 This brief will focus, in particular, on how CONTENTS INTRODUCTION these three international assessments function and how they are relevant for American edu- Introduction...........................................1 International large-scale assessments have a cation. Finally, for those interested in learn- long history of influencing educational policy Who designs and administers these ing more about international large-scale around the world. In the United States, these assessments?..........................................1 assessments, this brief also offers suggestions assessments are often cited by policymakers for further reading. What do these assessments and media commentators as indicators of how measure?................................................2 the American educational system is falling behind its international peers. Following the Who participates in the assessments release of the 2009 Program for International WHO DESIGNS AND and how are they organized?...............2 Student Assessment (PISA) results, for exam- ADMINISTERS THESE ple, Secretary of Education Arne Duncan ASSESSMENTS? What do I need to be aware of when argued, “PISA results, to be brutally honest, I look at assessment results?................3 show that a host of developed nations are out- PISA is coordinated by the Organization for educating us” (Duncan, 2010). Although Economic Cooperation and Development Why should the U.S. participate in international assessments are certainly (OECD), while TIMMS and PIRLS are both these assessments?................................3 intended to help countries understand how administered by the International Association Conclusion and Additional their education systems compare with those for the Evaluation of Educational Achieve- Resources...............................................4 of their peers, they are also intended to pro- ment (IEA). Participating countries work vide far more information than a ranking. together with these organizations to design References..............................................4 the assessments. Each country is then respon- The staff of the Center for Evaluation & Edu- sible for administering the assessment and cation Policy (CEEP) at Indiana University is results are sent to the organization for data often asked about how international large- verification, processing, and scoring. scale assessments influence U.S. educational policy. This policy brief is designed to pro- vide answers to some of the most frequently OECD asked questions encountered by CEEP The mission of the OECD is to promote poli- researchers concerning the three most popu- cies that will improve the economic and lar international education large-scale assess- social well-being of people around the world. ments: the Programme for International Established in 1961, the OECD is headquar- Student Assessment (PISA); the Trends in tered in Paris, France, and has 34 member International Mathematics and Science Study countries that represent the world’s most (TIMSS); and the Progress in International industrialized nations. The OECD analyzes Reading Literacy Study (PIRLS). These and compares data, sets international stan- questions include: dards, and recommends policies to govern- ments around the globe. • Who designs and administers these UPCOMING POLICY BRIEFS . . . assessments? IEA  School Choice Issues in Indiana • What do these assessments measure? The IEA is an independent, international • Who participates in the assessments and cooperative of national research institutions  Revamping the Teacher how are they organized? and government research agencies that aims Evaluation Process • What do I need to be aware of when I look to provide high-quality data capable of  An Update on Childhood at assessment results? increasing policymakers’ understanding of Obesity Trends • Why should the U.S. participate in these key factors influencing teaching and learning. assessments? Since its founding in 1958, the IEA has con- ducted more than 23 research studies on cross-national achievement. The organization PISA it is not only countries, but also jurisdictions is headquartered in the Netherlands with PISA is an assessment of 15-year-old students within countries that participate in these study centers in Germany and the United that tests content knowledge but is not limited assessments. Thus, the U.S. is compared not States. The IEA is grounded in the belief that to school-based curricula. Instead, PISA only with other countries, but also provinces, the diversity of educational philosophies, assesses applied knowledge and literacy and cities, and linguistic communities. For exam- models, and approaches that characterize the emphasizes assessment of the functional ple, in 2009, Shanghai, China, participated world’s education systems constitute a natural skills students acquired during their school- for the first time in PISA. However, to date, laboratory in which countries can learn from ing. The guiding question asked by PISA is China has not fully participated in any of one another. “How well can students nearing the end of these three assessments. Therefore, care compulsory schooling apply their knowledge should be taken when comparing one city in to real-life situations?” The three subject China to an entire national educational sys- areas tested on PISA are reading literacy, tem like the United States. WHAT DO THESE ASSESSMENTS mathematics literacy, and science literacy, but PIRLS, PISA, and TIMSS are administered MEASURE? PISA also includes measures of general com- to a sample of students so that the results can petencies such as learning strategies. be generalized to the larger population. At the TIMSS, PISA, and PIRLS are similar in PIRLS same time, it is important to recognize that many ways; however, they each collect unique information from different popula- PIRLS collects data to provide information each assessment defines the population to tions (see Table 1). One important distinction on trends in reading literacy achievement of which it is generalizing (and thus from which between IEA (TIMSS & PIRLS) and OECD fourth-grade students. PIRLS includes an the sample is drawn) differently. (PISA) studies is that IEA studies are grade array of questions that investigate the experi- • TIMSS: Grades 4 and 8 students (Grade- based. On the other hand, PISA differs from ences young children have at home and in based sample) the TIMSS and PIRLS approach in that it school in learning to read. The assessment is • PIRLS: Grade 4 students (Grade-based samples 15-year-olds (age based), regardless offered to fourth-grade students because sample) of how many years of schooling students fourth-grade represents an important transi- • PISA: 15-year-old students (Age-based have received. Each approach has its own tion point in children’s development as read- sample) advantages. For example, a grade-based sam- ers. In many countries, children are expected To ensure that the samples selected by each ple ensures that all students who take the to have learned to read by fourth grade and country are representative of the population assessment have had a similar amount of are beginning to transition from learning to as a whole, samples for each country are ver- schooling, while an age-based sample can read to reading to learn. Because new coun- ified by an international sampling referee. better focus on all skills attained through the tries participate in PIRLS each cycle, PIRLS first 15 years of life. As a result of sampling also provides baseline data for new partici- Although the specific development process and other differences, the assessments should pating countries. In addition, PIRLS collects for each assessment is different, each assess- not be viewed as interchangeable and the data an array of information about the reading cur- ment is developed with concern for cross- provided by each assessment ought to be riculum and instruction in each participating national comparability and validity. Ques- examined separately. country. tions for each assessment are developed through a collaborative, international process TIMSS that involves international subject area TIMSS provides data about trends in mathe- experts, national country representatives, and matics and science achievement of students specialized translators. Before the assessment WHO PARTICIPATES IN THE in the fourth and eighth grades. The content is administered, a field test is conducted to ASSESSMENTS AND HOW ARE assessed in TIMSS is based on an internation- ensure that the assessment has low bias. The ally agreed upon common curriculum in math THEY ORGANIZED? following is a brief explanation of how each and science. TIMSS collects detailed infor- assessment is administered. There is great diversity among the countries mation not only about student achievement in and economies that participate in the assess- TIMSS: Administered every four years, math and science, but also about teacher ments, including diversity in economic devel- TIMSS began in 1995. The next round of preparation, resource availability, and the use opment, geographical location, language, and TIMSS is currently taking place (in 2011), of technology. size. In fact, it is important to recognize that with over 60 participating systems. The data . TABLE 1. Overview of PISA, TIMSS, AND PIRLS PISA TIMSS PIRLS How often are tests conducted? Every 3 years Every 4 years Every 5 years Who is tested? 15-year-old students 4th and 8th grade students 4th grade students What is tested? Ability to apply skills and compe- Math and Science curricula Reading curriculum tencies to “real-world” contexts Who sponsors the test? OECD IEA IEA How many systems participated in the last cycle? 65 60 55 THE LIMITS AND POSSIBILITIES OF INTERNATIONAL LARGE-SCALE ASSESSMENTS — 2 are scheduled to be released in the end of are then used to create scores for populations. 2012. Analysis of the resulting data is possible but PISA: PISA was first administered in 2000 TABLE 2. PISA Reading Scores requires sophisticated analysis techniques. and is offered every three years. In 2009, the Combined Reading Literacy Scale Finally, when looking at international assess- most recent round of testing, 65 countries and ment results, it is vital to remember that popu- OECD Average 493 other education systems—including all 34 lations change. Although the assessments OECD countries, 26 non-OECD countries, OECD Countries allow for consideration of changes in a coun- and 5 non-national education systems—par- Korea, Republic of 539 try’s results over time, changing demographic ticipated in PISA. The data for the next round Finland 536 patterns and country contexts mean that of PISA will be made available in 2012. Canada 524 changes in scores must be interpreted with PIRLS: PIRLS was first administered in New Zealand 521 caution. When looking at scores over time, the 2001 and is offered every five years. The next country context, education system, and demo- Japan 520 round of PIRLS is currently taking place (in graphic composition must be kept in mind. Australia 515 2011), with over 55 participating systems. Netherlands 508 The data are scheduled to be released in the end of 2012. Belgium 506 Norway 503 WHY SHOULD THE U.S. Estonia 501 PARTICIPATE IN THESE Switzerland 501 WHAT DO I NEED TO BE AWARE ASSESSMENTS? Poland 500 OF WHEN I LOOK AT Iceland 500 As described above, there are limits to what ASSESSMENT RESULTS? United States 500 the assessments are able to show us about the TIMSS, PISA, and PIRLS all provide data Sweden 497 American educational system. However, par- that can be used to compare countries to one Germany 497 ticipation in the assessments also offers valu- another. Although each country has an aggre- able information capable of pointing out Ireland 496 gate achievement score on the assessment, trends in education and student achievement. the ranking of countries does not always indi- France 496 For example, participation in TIMSS, PISA, cate significant differences in achievement. Denmark 495 and PIRLS can help countries: For example, on the PISA 2009 assessment, United Kingdom 494 • determine their global educational stand- the U.S. average score of 500 was not signif- Hungary 494 ing in subjects essential for further learn- icantly different from the OECD average Portugal 489 ing, including reading, mathematics, and score of 493. Although the U.S. was ranked science, Italy 486 14th overall on reading literacy on the PISA • profile relative strengths and weaknesses Slovenia 483 2009, only six OECD countries, including in reading, mathematics, and science Korea, Finland, and Canada, had signifi- Greece 483 achievement in an international context, cantly higher average scores (see Table 2). Spain 481 • measure educational progress over time After accounting for standard errors, 14 Czech Republic 478 both within and between countries, OECD countries, including Germany, Nor- Slovak Republic 477 • inform national and local policy about way, and Poland, had results that were not schools’ curricula and instruction, Israel 474 measurably different from the U.S. Thus, • collect in-depth information about school Luxembourg 472 although assessments results are comparable environments, resources, and instruction, cross-nationally, it is difficult to specifically Austria 470 and rank countries. Rather, scores provide coun- Turkey 454 • examine concerns about equity in learning tries with information on where they gener- Chile 449 opportunities. ally rank compared to others. Mexico 425 Another common question related to U.S. Additionally, because of the complexity of participation in international assessment is the study design, scores cannot be compared why the U.S. should participate when Ameri- at the individual level. In other words, it is can students also regularly participate in the never appropriate to compare one student to National Assessment of Educational Progress another student using these assessments. One (NAEP). Although NAEP is similar to reason this is not possible is because of the TIMSS, PISA, and PIRLS in that it measures Average is higher than the U.S. volume of information covered by each students’ performance in reading, mathemat- average assessment. Because each assessment aims to ics, and science, it generally cannot be used to cover a large amount of information, no stu- Average is not measurably different benchmark the U.S. performance to that of from the U.S. average dent takes the entire test. This does not mean other countries because NAEP is designed that results are not valid or reliable at aggre- Average is lower than the U.S. aver- specifically to meet national and state infor- gated levels. Rather, advanced and proven age mation needs. statistical techniques are used to infer five More broadly, participation in international possible scores for each participant, which large-scale assessments can help the U.S. THE LIMITS AND POSSIBILITIES OF INTERNATIONAL LARGE-SCALE ASSESSMENTS — 3 think comparatively about its education sys- and provided answers to some of the most tem. Although the American education sys- CONCLUSION AND ADDITIONAL commonly asked questions regarding interna- tem is certainly unique, it may not be as RESOURCES tional assessment. different from other countries as is often If you are interested in learning more about assumed. For example, in the TIMSS 2007 International assessments such as PISA, the organizations who administer these fourth-grade sample, Algeria, Australia, TIMSS, and PIRLS provide valuable infor- assessments or how TIMSS, PISA, and Hong Kong, and New Zealand, among other mation on the American education system. PIRLS are developed, administered, and ana- countries, all had percentages of immigrants Because each international assessment is lyzed, the following resources offer addi- that were greater than the U.S. unique in its goals, format, and content, tional information about the assessments: results of each assessment must be under- Additionally, Williams (2003), points out that stood as providing distinctive information OECD thinking comparatively about education can that, when taken together, contribute to a www.oecd.org help us learn from others, help us understand fuller understanding of how the American and work with others, and help us understand education system compares with education IEA ourselves. As Williams describes, “Interna- www.iea.nl systems in other countries. Because no one tional comparison provides surprising TIMSS educational system dominates the design of timss.bc.edu/timss2011/index.html insights” into the status of the American edu- these assessments, these three international cation system and helps us better understand assessments differ in many ways from PISA ourselves. Among the most striking examples www.pisa.oecd.org national assessments such as the National of such insights is the breakdown of the 2009 Assessment of Educational Progress (NAEP). PIRLS PISA results by racial/ethnic groups in the To allow specific states to compare to other pirls.bc.edu/pirls2011/index.html U.S. When broken out by race/ethnicity, national educational systems, the National Asian-American students ranked among the NCES International Activities Center for Education Statistics is developing highest achieving students in the world. nces.ed.gov/surveys/international/ a new study to link national and international Additionally, White American students had student assessments so that states can mea- scores similar to the top-performing coun- NAEP-TIMSS sure their performance against international tries, New Zealand and Finland. However, http://nces.ed.gov/timss/naeplink.asp benchmarks. It should be noted that linking Hispanic American and Black American par- two assessments such as NAEP and TIMSS is NAEP ticipants fell far below the international aver- a complicated process and interpretations http://nces.ed.gov/nationsreportcard/ age. In fact, these groups of students are resulting from the study should be taken with grouped with the lowest OECD members of caution. Turkey, Chile, and Mexico. Such results raise troubling questions about the persistent Additionally, because the cross-country com- References inequality in American education and suggest parisons based on international assessment that more needs to be done to ensure that all data found in the popular media often misuse Duncan, A. (2010, December). Secretary Arne American students, regardless of race/ethnic- or oversimplify the results of these assess- Duncan’s remarks at OECD’s release of the ity or socio-economic class, receive a high- ments, it is critical to understand how the Program for International Student Assessment quality education. tests are developed, administered, and ana- (PISA) 2009 Results. Available at: http:// www.ed.gov/news/speeches/secretary-arne- lyzed. In order to help readers better under- duncans-remarks-oecds-release-program-inter- stand the reports they encounter in the media national-student-assessment- and the ways policymakers use international Williams, J.H. (2003). Why compare? Why all assessment results in their policy arguments, educators should think internationally. Interna- this brief has offered an overview of the tests tional Educator, 18-25. Education Policy Briefs are executive edited by Jonathan A. Plucker, Ph.D. and published by the Center for Evaluation & Education Policy Indiana University 1900 East Tenth Street Bloomington, IN 47406-7512 812-855-4438 More about the Center for Evaluation & Education Policy and our publications can be found at our Web site: http://ceep.indiana.edu 0801110106 THE LIMITS AND POSSIBILITIES OF INTERNATIONAL LARGE-SCALE ASSESSMENTS — 4

