ebook img

A Handbook of Statistical Analyses using SAS PDF

165 Pages·1996·4.032 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Handbook of Statistical Analyses using SAS

A Handbook of Statistical Analyses using SAS JOIN US ON THE INTERNET VIA WWW, GOPHER, FTP OR EMAIL: WWW: http://www.thomson.com GOPHER: gopher.thomson.com A service of I®p® FTP: ftp.thomson.com EMAIL: [email protected] A Handbook of Statistical Analyses using SAS Brian S. Everitt Professor of Statistics in Behavioural Science Institute of Psychiatry London, UK and GeoffDer Statistician MRC Medical Sociology Unit University of Glasgow Glasgow, UK Springer-Science+Business Media, B.V. First edition 1996 © 1996 Brian S. Everitt and Geoff Der Originally published by Chapman & Hall in 1996 Typeset in 10/12 pt Times by Thomson Press (India) Ltd, New Delhi, India ISBN 978-0-412-71050-6 ISBN 978-1-4899-4547-1 (eBook) DOI 10.1007/978-1-4899-4547-1 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored, or transmitted, in any form or by any means, without the prior permission in writing of the publishers, or in the case of reprographic reproduction only in accordance with the terms of the licences issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of licences issued by the appropriate Reproduction Rights Organization outside the UK. Enquiries concerning reproduction outside the terms stated here should be sent to the publishers at the London address printed on this page. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. A catalogue record for this book is available from the British Library Library of Congress Catalog Card Number: 96-84542 (Ç9 Printed on permanent acid-free text paper, manufactured in accordance with ANSI/NISO Z39.48-1992 and ANSI/NISO Z39.48-1984 (Permanence of Paper). Contents Preface vii A brief introduction to SAS 1 Introduction 1 The SAS language 7 SAS graphics 18 Some tips for preventing and correcting errors 21 1 Data description and simple inference: mortality and water hardness in the UK 23 1.1 Description of data 23 1.2 Methods of analysis 23 1.3 Analysis using SAS 24 Exercises 35 2 Multiple regression: predicting crime rates in states of the USA 36 2.1 Description of data 36 2.2 The multiple regression model 37 2.3 Analysis using SAS 38 Exercises 53 3 Analysis of variance 1: survival times of animals 54 3.1 Description of data 54 3.2 Analysis of variance model 54 3.3 Analysis using SAS 54 Exercises 60 4 Analysis of variance 2: effectiveness of slimming clinics 61 4.1 Description of data 61 4.2 Analysis of variance model 61 4.3 Analysis using SAS 64 Exercises 68 5 Analysis of repeated measures: salsolinol excretion rates 69 5.1 Description of data 69 5.2 Analysing repeated measures data 69 5.3 Analysis using SAS 70 Exercises 80 I vi I CONTENTS 6 Logistic regression: relationship between the incidence of byssinosis and the dustiness of the workplace 82 6.1 Description of data 82 6.2 Logistic regression 82 6.3 Analysis using SAS 84 Exercises 92 7 Analysis of survival times: motion sickness and the survival of black ducks 93 7.1 Description of data 93 7.2 Describing survival times and Cox's regression model 94 7.3 Analysis using SAS 96 Exercises 103 8 Principal components and factor analysis: statements about pain 105 8.1 Description of data 105 8.2 Principal components and factor analysis 106 8.3 Analysis using SAS 108 Exercises 114 9 Cluster analysis: classification of occupations 115 9.1 Description of data 115 9.2 Cluster analysis 115 9.3 Analysis using SAS 117 Exercises 127 10 Discriminant analysis: identifying types of Tibetan skulls 128 10.1 Description of data 128 10.2 Discriminant analysis 128 10.3 Analysis using SAS 130 Exercises 136 11 Correspondence analysis: car-changing patterns 137 11.1 Description of data 137 11.2 Displaying contingency tables graphically- correspondence analysis 137 11.3 Analysis using SAS 140 Exercises 146 Appendix: answers to selected exercises 147 References 152 Index 155 Preface SAS, standing for Statistical Analysis System, is a powerful software package for the manipulation and statistical analysis of data. The system is described in detail in several manuals totalling almost 10000 pages. Much of the material in these manuals is excellent but their very bulk can be disturbing for potential users, in particular, for users new to SAS. In this text an attempt is made to describe and demonstrate, in a relatively brief and straightforward manner, how a variety of statistical analyses can be applied using SAS. The examples in each chapter use, primarily, the most basic SAS procedures and options; for many users these will prove adequate for similar analyses of their own data. Additionally, the material in this text should serve as a useful introduction to the greater detail available in the SAS manuals themselves. All the data sets used in this text are taken from A Handbook of Small Data Sets (referred to herein as SDS), by Hand et aI., also published by Chapman & Hall. B.S. Everitt and G. Der A brief introduction to SAS INTRODUCTION SAS is an integrated system for manipulating, analysing and presenting data. It is a modular system with a large range of modules that may be added to the basic system, known as BASE SAS. Here we concentrate on the STAT and GRAPH modules in addition to the main features of the base SAS system. The SAS language At the heart of SAS is a programming language made up of statements that specify how data are to be processed and analysed. The statements correspond to operations to be performed on the data or instructions about the analysis. A SAS program consists of a sequence of SAS statements grouped together into blocks, referred to as 'steps'. There are two types of steps: data steps and proc (procedure) steps. A data step is used to prepare data for analysis. It creates a SAS data set and may organize the data and modify it in the process. A proc step is used to analyse the data in a SAS data set. A typical program might consist of a data step to read in some raw data followed by a series of proc steps analysing that data. If, in the course of the analysis, the data need to be modified, another data step will be needed in order to do this. Learning to use the SAS language is largely a question of learning the statements that are needed to do the analysis required and of knowing how to structure them into steps. There are a few general principles that are useful to know. Most SAS statements begin with a keyword that identifies the type of statement. All SAS statements must end with a semicolon. The most common mistake for new users is to omit the semicolon and the effect is to combine two statements into one. Usually the result will not be an interpretable statement and an error message will be given. Occasionally, though, the result will be a valid statement, but will clearly be one that is likely to have unintended results. _____________ ~_2_ _~ 1 ~I A_B_R_I_E_F_I_N_T_R_O_D_U_C_T_I_O_N_T__O _SA _S_ ___________~ Statements may extend over more than one line and there may be more than one statement per line. However, keeping to one statement per line, as far as possible, helps to avoid errors and to identify those that do occur. SAS statements fall into four broad categories according to where in a program they can be used. These are: • data step statements; • proc step statements; • statements that can be used in both data and proc steps; • global statements - which apply to all subsequent steps. Since the function of the data and proc steps is so different it is perhaps not surprising that many statements are only applicable to one type of step. A simple example of a global statement is the title statement which defines a title to be printed on procedure output and graphs. The title is then used until changed or reset. Data and proc steps begin with a data or proc statement, respectively, and end at the next data or proc statement, or the next run statement. When a data step has the data included, the step ends after the data. Understanding where steps begin and end is important because SAS programs are not executed statement by statement, but step by step. While learning to use SAS, it may be useful to explicitly mark the end of each step by inserting a run statement. Data step statements must be within the relevant data step, i.e. after the data statement and before the end of the step. Likewise, proc step statements must be within the proc step. Another important rule concerns the names given to variables and data sets. These may contain letters, numbers and underlined characters, but cannot be more than eight characters long and cannot begin with a number. It is permissible, although inadvisable, to use a name that is already used by SAS, e.g. the name of a function. When a list of variable names is needed in a SAS program an abbreviated form can often be used. A variable list of the form sex--weight refers to the variables sex and weight and all the variables positioned between them in the data set. Where a set of 10 variables, for example, have names of the form score1, score2, ... , score10, that is they have a root in common, score in this case, but end in a consecutive set of numbers, they can be referred to by a variable list of the form score1--score10, and they do not need to be contiguous in the data set. The Windows user interface There are a number of ways in which a SAS program may be run and the results accessed. Here we focus on using the pull down menus ofthe Microsoft Windows version of SAS. Most of the features will be the same under other

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.