Table Of ContentThis page intentionally left blank
Applied Linear Models with SAS
This textbook for a second course in basic statistics for undergraduates or first-year graduate
studentsintroduceslinearregressionmodelsanddescribesotherlinearmodelsincludingPoisson
regression, logistic regression, proportional hazards regression, and nonparametric regression.
Numerousexamplesdrawnfromthenewsandcurrenteventswithanemphasisonhealthissues
illustratetheseconcepts.
Assuming only a pre-calculus background, the author keeps equations to a minimum and
demonstratesallcomputationsusingSAS.Mostoftheprogramsandoutputaredisplayedina
self-containedway,withanemphasisontheinterpretationoftheoutputintermsofhowitrelates
tothemotivatingexample.Plentyofexercisesconcludeeverychapter.AllofthedatasetsandSAS
programsareavailablefromthebook’sWebsite,alongwithotherancillarymaterial.
Dr. Daniel ZeltermanisProfessorofEpidemiologyandPublicHealthintheDivisionofBio-
statisticsatYaleUniversity.Hisapplicationareasincludeworkingenetics,HIV,andcancer.Before
movingto Yale in 1995,hewason thefaculty ofthe UniversityofMinnesotaandattheState
UniversityofNewYorkatAlbany.HeisanelectedFellowoftheAmericanStatisticalAssociation.
HeservesasassociateeditorofBiometricsandotherstatisticaljournals.HeistheauthorofModels
forDiscreteData (1999),Advanced Log-Linear Models UsingSAS (2002),Discrete Distributions:
ApplicationintheHealthSciences(2004),andModelsforDiscreteData:2ndEdition(2006).Inhis
sparetimeheplaysthebassooninorchestralgroupsandhasbackpackedhundredsofmilesofthe
AppalachianTrail.
Applied Linear Models
with SAS
Daniel Zelterman
YaleUniversity
CAMBRIDGEUNIVERSITYPRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521761598
© Daniel Zelterman 2010
This publication is in copyright. Subject to statutory exception and to the
provision of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published in print format 2010
ISBN-13 978-0-511-77476-8 eBook (EBL)
ISBN-13 978-0-521-76159-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of urls for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
Preface pageix
Acknowledgments xiii
1 Introduction 1
1.1 WhatIsStatistics? 1
1.2 StatisticsintheNews:TheWeatherMap 4
1.3 MathematicalBackground 6
1.4 Calculus 7
1.5 CalculusintheNews:NewHomeSales 9
1.6 StatisticsintheNews:IMFLoansandTuberculosis 11
1.7 Exercises 13
2 PrinciplesofStatistics 21
2.1 BinomialDistribution 21
2.2 ConfidenceIntervalsandtheHubbleConstant 25
2.3 NormalDistribution 26
2.4 HypothesisTests 30
2.5 TheStudentt-Test 34
2.6 TheChi-SquaredTestand2×2Tables 42
2.7 WhatAreDegreesofFreedom? 47
2.8 SAS,inaNutshell 49
2.9 SurveyoftheRestoftheBook 51
2.10 Exercises 52
3 IntroductiontoLinearRegression 58
3.1 Low-Birth-WeightInfants 58
3.2 TheLeastSquaresRegressionLine 59
3.3 RegressioninSAS 63
3.4 StatisticsintheNews:FutureHealthCareCosts 65
3.5 Exercises 66
v
vi Contents
4 AssessingtheRegression 75
4.1 Correlation 75
4.2 StatisticsintheNews:CorrelationsoftheGlobalEconomy 77
4.3 AnalysisofVariance 78
4.4 ModelAssumptionsandResidualPlots 81
4.5 Exercises 84
5 MultipleLinearRegression 90
5.1 IntroductoryExample:MaximumJanuaryTemperatures 90
5.2 GraphicalDisplaysofMultivariateData 94
5.3 LeverageandtheHatMatrixDiagonal 96
5.4 JackknifeDiagnostics 99
5.5 PartialRegressionPlotsandCorrelations 102
5.6 Model-BuildingStrategies 105
5.7 Exercises 110
6 Indicators,Interactions,andTransformations 120
6.1 IndicatorVariables 120
6.2 SynergyintheNews:AirlineMergers 127
6.3 InteractionsofExplanatoryVariables 128
6.4 Transformations 132
6.5 AdditionalTopics:LongitudinalData 137
6.6 Exercises 138
7 NonparametricStatistics 150
7.1 ATestforMedians 150
7.2 StatisticsintheNews:MathAchievementScores 153
7.3 RankSumTest 155
7.4 NonparametricMethodsinSAS 156
7.5 RankingandtheHealthiestState 157
7.6 NonparametricRegression:LOESS 160
7.7 Exercises 163
8 LogisticRegression 169
8.1 Example 169
8.2 TheLogitTransformation 170
8.3 LogisticRegressioninSAS 173
8.4 StatisticsintheNews:TheNewYorkMets 177
8.5 KeyPoints 178
8.6 Exercises 179
9 DiagnosticsforLogisticRegression 187
9.1 SomeSyntaxforproc logistic 188
9.2 ResidualsforLogisticRegression 190
vii Contents
9.3 InfluenceinLogisticRegression 193
9.4 Exercises 197
10 PoissonRegression 204
10.1 StatisticsintheNews:LotteryWinners 204
10.2 PoissonDistributionBasics 204
10.3 RegressionModelsforPoissonData 206
10.4 StatisticsintheNews:AttacksinIraq 208
10.5 PoissonRegressioninSAS 209
10.6 Exercises 215
11 SurvivalAnalysis 225
11.1 Censoring 225
11.2 TheSurvivalCurveandItsEstimate 227
11.3 TheLog-RankTestandSASProgram 232
11.4 Exercises 235
12 ProportionalHazardsRegression 237
12.1 TheHazardFunction 237
12.2 TheModelofProportionalHazardsRegression 239
12.3 ProportionalHazardsRegressioninSAS 241
12.4 Exercises 243
13 ReviewofMethods 247
13.1 TheAppropriateMethod 247
13.2 OtherReviewQuestions 249
Appendix:StatisticalTables 255
A.1 NormalDistribution 255
A.2 Chi-squaredTables 257
References 259
SelectedSolutionsandHints 263
Index 269