PISA 2003 Data Analysis Manual SAS® USERS PISA 2003 The OECD Programme for International Student Assessment’s (PISA) 2003 survey collected data on students’ performances in reading, mathematics, science and problem solving, as well as contextual information on Data Analysis students’ background, home characteristics and school factors which could infl uence performance. The fi rst results were released in December 2004 and presented in two volumes: Learning for Tomorrow’s World – First Results from PISA 2003 (OECD, 2004) and Problem Solving for Tomorrow’s World – First Measures Manual of Cross-Curricular Competencies from PISA 2003 (OECD, 2004). This publication is an essential tool for researchers, as it provides all the information required to understand the PISA 2003 database and perform analyses in accordance with the complex methodologies used to SAS® USERS collect and process the data. It includes detailed information on how to analyse the PISA 2003 data, enabling researchers to both reproduce the initial results and to undertake further analyses. The publication includes: (cid:129) Introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; (cid:129) Worked examples providing full syntax in SAS®; and (cid:129) Comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database comprises micro-level data on student performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. It can be downloaded from www.pisa.oecd.org. THE OECD PROGRAMME FOR INTERNATIONAL STUDENT ASSESSMENT (PISA) PISA is a collaborative process among the 30 member countries of the OECD and nearly 30 partner countries. It brings together scientifi c expertise from the participating countries and is steered by their governments on the basis of shared, policy-driven interests. PISA is an unprecedented attempt to measure student achievement, as is evident from some of its features: – The literacy approach: PISA aims to defi ne each assessment area (mathematics, science, reading and problem P IS solving) not mainly in terms of mastery of the school curriculum, but in terms of the knowledge and skills A needed for full participation in society. 2 0 – A long-term commitment: It will enable countries to monitor regularly and predictably their progress in meeting 0 3 key learning objectives. D a – The age-group covered: By assessing 15-year-olds, i.e. young people near the end of their compulsory ta education, PISA provides a signifi cant indication of the overall performance of school systems. A n – The relevance to lifelong learning: PISA does not limit itself to assessing students’ knowledge and skills but aly also asks them to report on their own motivation to learn, their beliefs about themselves and their learning s is strategies. M a n OECD’s books, periodicals and statistical databases are now available via www.SourceOECD.org, our online library. u a This book is available to subscribers to the following SourceOECD themes: l Education and Skills S A Emerging Economies S Transition Economies U® Ask your librarian for more details on how to access OECD books on line, or write to us at: S E [email protected] R S www.oecd.org Programme for International Student Assessment ISBN 92-64-01063-7 www.pisa.oecd.org 98 2005 02 1 P CCoovveerr--PPIISSAAmmaannuuaall--SSAASS--1199mmmm..iinndddd 11 55//0066//0055 2200::5566::3388 Programme for International Student Assessment PISA 2003 Data Analysis Manual SAS® Users OECD ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT PPIISSAA22000033mmaannuuaall--ffaauuxxttiittrree--SSAASS..iinndddd 11 55//0066//0055 2200::3388::5599 Foreword d r o w e r o F The OECD’s Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future, rather than how well they master particular curricula. The data collected during each PISA cycle are an extremely valuable source of information for researchers, policy makers, educators, parents and students. It is now recognised that the future economic and social well-being of countries is closely linked to the knowledge and skills of their populations. The internationally comparable information provided by PISA allows countries to assess how well their 15-year-old students are prepared for life in a larger context and to compare their relative strengths and weaknesses. The PISA 2003 database, on which this manual is focused, contains information on over a quarter of a million students from 41 countries. It includes not only information on their performance in the four main areas of assessment – reading, mathematics, science and problem solving – but also their responses to the Student Questionnaire that they complete as part of the assessment. Data from the school principals are also included. The PISA 2003 Data Analysis Manual has evolved from the analytical workshops held in Sydney, Vienna, Paris and Bratislava, which exposed participants to the various techniques needed to correctly analyse the complex databases. It allows analysts to confi dently replicate procedures used for the production of the PISA 2003 initial reports, Learning for Tomorrow’s World – First Results from PISA 2003 (OECD, 2004a) and Problem Solving for Tomorrow’s World – First Measures of Cross-Curricular Competencies from PISA 2003 (OECD, 2004b), and to accurately undertake new analyses in areas of special interest. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the variables constructed from the student and school questionnaires. This information was previously published in the Manual for the PISA 2000 Database (OECD, 2002a). The PISA 2003 Data Analysis Manual is in four parts – the fi rst two sections give a detailed theoretical background and instructions for analysing the data; the third section lists the program codes (syntaxes and the macros), which are needed to carry out the analyses; and the fourth section contains a detailed description of the database. PISA is a collaborative effort by the participating countries, and guided by their governments on the basis of shared policy-driven interests. Representatives of each country form the PISA Governing Board which decides on the assessment and reporting of results in PISA. There are two versions of this manual – one for SPSS® users and one for SAS® users. The OECD recognises the creative work of Christian Monseur in preparing the text for both versions of the manual in collaboration with Sheila Krawchuk and Keith Rust, as well as his preparation of the program coding for the SAS® users’ manual. The coding for the SPSS® users’ manual was prepared by Wolfram Schulz and Eveline Gebhardt. The main editorial work was completed at the OECD Secretariat by Miyako Ikeda, Sophie Vayssettes, John Cresswell, Claire Shewbridge and Kate Lancaster. The PISA assessments and the data underlying the manuals were prepared by the PISA Consortium under the direction of Raymond Adams. 3 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--ffoorreewwoorrdd--SSAASS..iinndddd 33 55//0066//0055 2200::4466::4411 Table of Contents s t n e t n o C f o e l b a T Users’ Guide ..........................................................................................................................................................9 CHAPTER 1 The OECD’s Programme for International Student Assessment ......................................................11 An overview of PISA .................................................................................................................................12 What makes PISA unique? .......................................................................................................................13 How the assessment takes place .............................................................................................................15 About this manual ......................................................................................................................................16 CHAPTER 2 Sample Weights ................................................................................................................................................19 Introduction ................................................................................................................................................20 Weights for simple random samples .....................................................................................................21 Sampling designs for education surveys ...............................................................................................22 Why do the PISA weights vary? .............................................................................................................27 Conclusions .................................................................................................................................................30 CHAPTER 3 Replicate Weights .............................................................................................................................................31 Introduction ................................................................................................................................................32 Sampling variance for simple random sampling .................................................................................32 Sampling variance for two-stage sampling ...........................................................................................38 Replication methods for simple random samples ..............................................................................44 Resampling methods for two-stage samples ........................................................................................46 The Jackknife for unstratifi ed two-stage sample designs ..................................................................47 The Jackknife for stratifi ed two-stage sample designs .......................................................................48 The Balanced Repeated Replication method ......................................................................................49 Other procedures for accounting for clustered samples ..................................................................51 Conclusions .................................................................................................................................................51 CHAPTER 4 The Rasch Model ..............................................................................................................................................53 Introduction ................................................................................................................................................54 How can the information be summarised? ..........................................................................................54 The Rasch model for dichotomous items ............................................................................................56 Other Item Response Theory models ...................................................................................................69 Conclusions .................................................................................................................................................69 5 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--TTooCC--SSAASS..iinndddd 55 55//0066//0055 2200::4477::3366 s CHAPTER 5 t n e Plausible Values .................................................................................................................................................71 t n o Individual estimates versus population estimates ...............................................................................72 C f The meaning of plausible values .............................................................................................................72 o Comparison of the effi ciency of Warm Likelihood Estimates, Expected A Posteriori e bl estimates and plausible values for the estimation of some population statistics .........................76 a T How to perform analyses with plausible values ..................................................................................78 Conclusions .................................................................................................................................................80 CHAPTER 6 Computation of Standard Errors ................................................................................................................81 Introduction ................................................................................................................................................82 The standard error on univariate statistics for numerical variables ...............................................82 The SAS® macro for computing the standard error on a mean ......................................................85 The standard error on percentages ........................................................................................................87 The standard error on regression coeffi cients ....................................................................................90 The standard error on correlation coeffi cients ...................................................................................92 Conclusions .................................................................................................................................................93 CHAPTER 7 Analyses with Plausible Values ....................................................................................................................95 Introduction ................................................................................................................................................96 Univariate statistics on plausible values ................................................................................................96 The standard error on percentages with plausible values ..............................................................101 The standard error on regression coeffi cients with plausible values ...........................................101 The standard error on correlation coeffi cients with plausible values ..........................................103 Correlation between two sets of plausible values ............................................................................104 A fatal error shortcut ..............................................................................................................................109 An unbiased shortcut ..............................................................................................................................110 Conclusions ...............................................................................................................................................111 CHAPTER 8 Use of Profi ciency Levels .............................................................................................................................113 Introduction ..............................................................................................................................................114 Generation of the profi ciency levels ....................................................................................................114 Other analyses with profi ciency levels ................................................................................................120 Conclusions ...............................................................................................................................................124 CHAPTER 9 Analyses with School Level Variables ......................................................................................................125 Introduction ..............................................................................................................................................126 Limits of the PISA school samples .......................................................................................................127 Merging the school and student data fi les ..........................................................................................128 Analyses of the school variables ...........................................................................................................128 Conclusions ...............................................................................................................................................130 6 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--TTooCC--SSAASS..iinndddd 66 55//0066//0055 2200::4477::3377 CHAPTER 10 s t n Standard Error on a Difference ..................................................................................................................131 e t n Introduction ..............................................................................................................................................132 o C The standard error of a difference without plausible values .........................................................134 f o The standard error of a difference with plausible values ................................................................139 e l Multiple comparisons .............................................................................................................................141 b a T Conclusions ...............................................................................................................................................143 CHAPTER 11 OECD Average and OECD Total ................................................................................................................145 Introduction ..............................................................................................................................................146 Recoding of the database for the estimation of the OECD total and OECD average .............146 Duplication of the data for avoiding three runs of the procedure ................................................149 Comparisons between OECD average or OECD total estimates and a country estimate .........149 Conclusions ...............................................................................................................................................152 CHAPTER 12 Trends .................................................................................................................................................................153 Introduction ..............................................................................................................................................154 The computation of the standard error for trend indicators on variables other than performance .....................................................................................................................................155 The computation of the standard error for trend indicators on performance variables .........158 Conclusions ...............................................................................................................................................164 CHAPTER 13 Multilevel Analyses .......................................................................................................................................165 Introduction ..............................................................................................................................................166 Simple linear regression .........................................................................................................................166 Simple linear versus multilevel regression analyses .........................................................................170 Fixed effect versus random effect ........................................................................................................173 Some examples with SAS® ....................................................................................................................174 Limitations of the multilevel model in the PISA context ...............................................................190 Conclusions ...............................................................................................................................................192 CHAPTER 14 Other Statistical Issues ..................................................................................................................................193 Introduction ..............................................................................................................................................194 Analyses by quarters ...............................................................................................................................194 The concepts of relative risk and attributable risk ...........................................................................199 Instability of the relative and attributable risks .................................................................................201 Computation of the relative risk and attributable risk ....................................................................202 Conclusions ...............................................................................................................................................202 CHAPTER 15 SAS® Macros .....................................................................................................................................................203 Introduction ..............................................................................................................................................204 Structure of the SAS® macros ...............................................................................................................204 7 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--TTooCC--SSAASS..iinndddd 77 55//0066//0055 2200::4477::3388 ts Appendix 1: PISA 2003 International Database ....................................................................................231 n te Appendix 2: Student Questionnaire ........................................................................................................245 n Co Appendix 3: Educational Career Questionnaire ...................................................................................259 of Appendix 4: Information Communication Technology (ICT) Questionnaire ................................261 e bl Appendix 5: School Questionnaire ..........................................................................................................265 a T Appendix 6: Student Questionnaire Data File Codebook ..................................................................279 Appendix 7: School Questionnaire Data File Codebook ....................................................................325 Appendix 8: Student Cognitive Test Data File Codebook ...................................................................339 Appendix 9: Student and School Questionnaire Indices .....................................................................369 Appendix 10: Scores Allocated to the Items ............................................................................................395 References .............................................................................................................................................................401 8 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--TTooCC--SSAASS..iinndddd 88 55//0066//0055 2200::4477::3399 e d USERS’ GUIDE i u G ’ s r Preparation of data fi les se U All data fi les (in text format) and the SAS® control fi les are available on the PISA Web site (www. pisa.oecd.org). SAS® users By running the SAS® control fi les, the PISA 2003 SAS® student data fi le and the PISA 2003 SAS® school data fi le are created. Please keep the both fi les in the same folder and run commands for assigning the folder as a SAS® library before starting analysis. For example, if the student and school SAS® data fi les are saved in the folder of “c:\pisa2003\ data\”, the following commands need to be run to create a SAS® library: libname PISA2003 “c:\pisa2003\data\”; run; The ten SAS® macros presented in Chapter 15 need to be saved under “c:\pisa2003\prg”. SAS® syntax and macros All syntaxes and macros used in this manual can be copied from the PISA Web site (www.pisa. oecd.org). Each chapter of the manual contains a complete set of syntaxes, which must be done sequentially, for all of them to run correctly, within the chapter. Rounding of fi gures In the tables and formulas, fi gures were rounded to a convenient number of decimal places, although calculations were always made with the full number of decimal places. Country abbreviations used in this manual AUS Australia FRA France KOR Korea PRT Portugal AUT Austria GBR United Kingdom LIE Liechtenstein RUS Russian Federation BEL Belgium GRC Greece LUX Luxembourg SVK Slovakia BRA Brazil HKG Hong Kong-China LVA Latvia SWE Sweden CAN Canada HUN Hungary MAC Macao-China THA Thailand CHE Switzerland IDN Indonesia MEX Mexico TUN Tunisia CZE Czech Republic IRL Ireland NLD Netherlands TUR Turkey DEU Germany ISL Iceland NOR Norway URY Uruguay DNK Denmark ITA Italy NZL New Zealand USA United States ESP Spain JPN Japan POL Poland YUG Serbia FIN Finland Socio-economic status The highest occupational status of parents (HISEI) is referred to as the socio-economic status of the students throughout this manual. It should be noted that occupational status is only one aspect of socio-economic status, which can also include education and wealth. The PISA 2003 database also includes a broader socio-economic measure called the index of Economic, Social and Cultural Status (ESCS), which is derived from the highest occupational status of parents, the highest educational level and an estimate related to household possessions. 9 © OECD 2005 PISA 2003 Data Analysis Manual: SAS® Users PPIISSAA22000033mmaannuuaall--gguuiiddee--SSAASS..iinndddd 99 55//0066//0055 2200::4488::2244