SAS/STAT® 14.2 User’s Guide The GENMOD Procedure ThisdocumentisanindividualchapterfromSAS/STAT®14.2User’sGuide. Thecorrectbibliographiccitationforthismanualisasfollows:SASInstituteInc.2016.SAS/STAT®14.2User’sGuide.Cary,NC: SASInstituteInc. SAS/STAT®14.2User’sGuide Copyright©2016,SASInstituteInc.,Cary,NC,USA AllRightsReserved.ProducedintheUnitedStatesofAmerica. Forahard-copybook:Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inanyformorby anymeans,electronic,mechanical,photocopying,orotherwise,withoutthepriorwrittenpermissionofthepublisher,SASInstitute Inc. Forawebdownloadore-book:Youruseofthispublicationshallbegovernedbythetermsestablishedbythevendoratthetime youacquirethispublication. Thescanning,uploading,anddistributionofthisbookviatheInternetoranyothermeanswithoutthepermissionofthepublisheris illegalandpunishablebylaw.Pleasepurchaseonlyauthorizedelectroniceditionsanddonotparticipateinorencourageelectronic piracyofcopyrightedmaterials.Yoursupportofothers’rightsisappreciated. U.S.GovernmentLicenseRights;RestrictedRights:TheSoftwareanditsdocumentationiscommercialcomputersoftware developedatprivateexpenseandisprovidedwithRESTRICTEDRIGHTStotheUnitedStatesGovernment.Use,duplication,or disclosureoftheSoftwarebytheUnitedStatesGovernmentissubjecttothelicensetermsofthisAgreementpursuantto,as applicable,FAR12.212,DFAR227.7202-1(a),DFAR227.7202-3(a),andDFAR227.7202-4,and,totheextentrequiredunderU.S. federallaw,theminimumrestrictedrightsassetoutinFAR52.227-19(DEC2007).IfFAR52.227-19isapplicable,thisprovision servesasnoticeunderclause(c)thereofandnoothernoticeisrequiredtobeaffixedtotheSoftwareordocumentation.The Government’srightsinSoftwareanddocumentationshallbeonlythosesetforthinthisAgreement. SASInstituteInc.,SASCampusDrive,Cary,NC27513-2414 November2016 SAS®andallotherSASInstituteInc.productorservicenamesareregisteredtrademarksortrademarksofSASInstituteInc.inthe USAandothercountries.®indicatesUSAregistration. Otherbrandandproductnamesaretrademarksoftheirrespectivecompanies. SASsoftwaremaybeprovidedwithcertainthird-partysoftware,includingbutnotlimitedtoopen-sourcesoftware,whichis licensedunderitsapplicablethird-partysoftwarelicenseagreement.Forlicenseinformationaboutthird-partysoftwaredistributed withSASsoftware,refertohttp://support.sas.com/thirdpartylicenses. Chapter 45 The GENMOD Procedure Contents Overview: GENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3063 WhatIsaGeneralizedLinearModel? . . . . . . . . . . . . . . . . . . . . . . . . . . 3064 ExamplesofGeneralizedLinearModels . . . . . . . . . . . . . . . . . . . . . . . . 3065 TheGENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3066 GettingStarted: GENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3068 PoissonRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3068 BayesianAnalysisofaLinearRegressionModel . . . . . . . . . . . . . . . . . . . . 3073 GeneralizedEstimatingEquations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3085 Syntax: GENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3088 PROCGENMODStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3089 ASSESSStatement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3094 BAYESStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3095 BYStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3105 CLASSStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3105 CODEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3108 CONTRASTStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3109 DEVIANCEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3112 EFFECTPLOTStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3112 ESTIMATEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3113 EXACTStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3115 EXACTOPTIONSStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3117 FREQStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3120 FWDLINKStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3120 INVLINKStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3120 LSMEANSStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3121 LSMESTIMATEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3122 MODELStatement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3123 OUTPUTStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3134 ProgrammingStatements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3137 REPEATEDStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3139 SLICEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3144 STOREStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3144 STRATAStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3144 VARIANCEStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3146 WEIGHTStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3146 ZEROMODELStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3146 (cid:70) 3062 Chapter45:TheGENMODProcedure Details: GENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3147 GeneralizedLinearModelsTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . 3147 SpecificationofEffects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3158 ParameterizationUsedinPROCGENMOD. . . . . . . . . . . . . . . . . . . . . . . 3159 Type1Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3159 Type3Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3160 ConfidenceIntervalsforParameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 3162 FStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3163 LagrangeMultiplierStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3163 PredictedValuesoftheMean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3164 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3164 MultinomialModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3166 Zero-InflatedModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3166 TweedieDistributionForGeneralizedLinearModels . . . . . . . . . . . . . . . . . . 3168 GeneralizedEstimatingEquations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3170 AssessmentofModelsBasedonAggregatesofResiduals . . . . . . . . . . . . . . . 3178 CaseDeletionDiagnosticStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 3182 BayesianAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3185 ExactLogisticandExactPoissonRegression . . . . . . . . . . . . . . . . . . . . . . 3190 ResponseLevelOrdering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3193 MissingValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3194 DisplayedOutputforClassicalAnalysis . . . . . . . . . . . . . . . . . . . . . . . . 3194 DisplayedOutputforBayesianAnalysis . . . . . . . . . . . . . . . . . . . . . . . . 3202 DisplayedOutputforExactAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . 3204 ODSTableNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3204 ODSGraphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3208 Examples: GENMODProcedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3210 Example45.1:LogisticRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . 3210 Example45.2:NormalRegression,LogLink . . . . . . . . . . . . . . . . . . . . . . 3212 Example45.3:GammaDistributionAppliedtoLifeData . . . . . . . . . . . . . . . 3215 Example45.4:OrdinalModelforMultinomialData . . . . . . . . . . . . . . . . . . 3218 Example45.5:GEEforBinaryDatawithLogitLinkFunction . . . . . . . . . . . . . 3221 Example45.6:LogOddsRatiosandtheALRAlgorithm . . . . . . . . . . . . . . . . 3224 Example45.7:Log-LinearModelforCountData . . . . . . . . . . . . . . . . . . . . 3226 Example45.8:ModelAssessmentofMultipleRegressionUsingAggregatesofResiduals 3231 Example45.9:AssessmentofaMarginalModelforDependentData . . . . . . . . . 3238 Example45.10:BayesianAnalysisofaPoissonRegressionModel . . . . . . . . . . 3240 Example45.11:ExactPoissonRegression . . . . . . . . . . . . . . . . . . . . . . . 3253 Example45.12:TweedieRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 3256 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3259 (cid:70) Overview: GENMODProcedure 3063 Overview: GENMOD Procedure The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean ofapopulationtodependonalinearpredictor throughanonlinearlinkfunctionandallowstheresponse probability distribution to be any member of an exponential family of distributions. Many widely used statisticalmodelsaregeneralizedlinearmodels. Theseincludeclassicallinearmodelswithnormalerrors, logisticandprobitmodelsforbinarydata,andlog-linearmodelsformultinomialdata. Manyotheruseful statistical models can be formulated as generalized linear models by the selection of an appropriate link functionandresponseprobabilitydistribution. SeeMcCullaghandNelder(1989)foradiscussionofstatisticalmodelingusinggeneralizedlinearmodels. ThebooksbyAitkinetal.(1989)andDobson(1990)arealsoexcellentreferenceswithmanyexamplesof applicationsofgeneralizedlinearmodels. Firth(1991)providesanoverviewofgeneralizedlinearmodels. Myers,Montgomery,andVining(2002)provideapplicationsofgeneralizedlinearmodelsintheengineering andphysicalsciences. Collett(2003)andHilbe(2009)providecomprehensiveaccountsofgeneralizedlinear modelswhentheresponsesarebinary. Theanalysisofcorrelateddataarisingfromrepeatedmeasurementswhenthemeasurementsareassumedto bemultivariatenormalhasbeenstudiedextensively. However,thenormalityassumptionmightnotalwaysbe reasonable;forexample,differentmethodologymustbeusedinthedataanalysiswhentheresponsesare discreteandcorrelated. Generalizedestimatingequations(GEEs)provideapracticalmethodwithreasonable statisticalefficiencytoanalyzesuchdata. LiangandZeger(1986)introducedGEEsasamethodofdealingwithcorrelateddatawhen,exceptforthe correlationamongresponses,thedatacanbemodeledasageneralizedlinearmodel. Forexample,correlated binaryandcountdatainmanycasescanbemodeledinthisway. TheGENMODprocedurecanfitmodelstocorrelatedresponsesbytheGEEmethod. YoucanusePROC GENMODtofitmodelswithmostofthecorrelationstructuresfromLiangandZeger(1986)byusingGEEs. For more details on GEEs, see Hardin and Hilbe (2003); Diggle, Liang, and Zeger (1994); Lipsitz et al. (1994). Bayesian analysis of generalized linear models can be requested by using the BAYES statement in the GENMOD procedure. In Bayesian analysis, the model parameters are treated as random variables, and inference about parameters is based on the posterior distribution of the parameters, given the data. The posterior distribution is obtained using Bayes’ theorem as the likelihood function of the data weighted with a prior distribution. The prior distribution enables you to incorporate knowledge or experience of the likely range of values of the parameters of interest into the analysis. If you have no prior knowledge of the parameter values, you can use a noninformative prior distribution, and the results of the Bayesian analysis will be very similar to a classical analysis based on maximum likelihood. A closed form of the posteriordistributionisoftennotfeasible,andaMarkovchainMonteCarlomethodbyGibbssamplingis usedtosimulatesamplesfromtheposteriordistribution. SeeChapter7,“IntroductiontoBayesianAnalysis Procedures,”foranintroductiontothebasicconceptsofBayesianstatistics. Alsoseethesection“Bayesian Analysis: AdvantagesandDisadvantages”onpage128inChapter7,“IntroductiontoBayesianAnalysis Procedures,”foradiscussionoftheadvantagesanddisadvantagesofBayesiananalysis. SeeIbrahim,Chen, andSinha(2001)foradetaileddescriptionofBayesiananalysis. In a Bayesian analysis, a Gibbs chain of samples from the posterior distribution is generated for the model parameters. Summary statistics (mean, standard deviation, quartiles, HPD and credible intervals, (cid:70) 3064 Chapter45:TheGENMODProcedure correlationmatrix)andconvergencediagnostics(autocorrelations;Gelman-Rubin,Geweke,Raftery-Lewis, andHeidelbergerandWelchtests;theeffectivesamplesize;andMonteCarlostandarderrors)arecomputed foreachparameter,aswellasthecorrelationmatrixandthecovariancematrixoftheposteriorsample. Trace plots,posteriordensityplots,andautocorrelationfunctionplotsthatarecreatedusingODSGraphicsarealso providedforeachparameter. The GENMOD procedure enables you to perform exact logistic regression, also called exact conditional binarylogisticregression,andexactPoissonregression,alsocalledexactconditionalPoissonregression,by specifyingoneormoreEXACTstatements. Youcantestindividualparametersorconductajointtestfor severalparameters. Theprocedurecomputestwoexacttests: theexactconditionalscoretestandtheexact conditionalprobabilitytest. Youcanrequestexactestimationofspecificparametersandcorrespondingodds ratioswhereappropriate. Pointestimates,standarderrors,andconfidenceintervalsareprovided. TheGENMODprocedureusesODSGraphicstocreategraphsaspartofitsoutput. Forgeneralinformation aboutODSGraphics,seeChapter21,“StatisticalGraphicsUsingODS.” What Is a Generalized Linear Model? Atraditionallinearmodelisoftheform y D x0ˇC" i i i wherey istheresponsevariablefortheithobservation. Thequantityx isacolumnvectorofcovariates,or i i explanatoryvariables,forobservationithatisknownfromtheexperimentalsettingandisconsideredtobe fixed,ornonrandom. Thevectorofunknowncoefficientsˇ isestimatedbyaleastsquaresfittothedatay. The" areassumedtobeindependent,normalrandomvariableswithzeromeanandconstantvariance. The i expectedvalueofy ,denotedby(cid:22) ,is i i (cid:22) D x0ˇ i i Whiletraditionallinearmodelsareusedextensivelyinstatisticaldataanalysis,therearetypesofproblems suchasthefollowingforwhichtheyarenotappropriate. (cid:15) It might not be reasonable to assume that data are normally distributed. For example, the normal distribution(whichiscontinuous)mightnotbeadequateformodelingcountsormeasuredproportions thatareconsideredtobediscrete. (cid:15) Ifthemeanofthedataisnaturallyrestrictedtoarangeofvalues,thetraditionallinearmodelmight 0 notbeappropriate,sincethelinearpredictorx ˇ cantakeonanyvalue. Forexample,themeanofa i measured proportion is between 0 and 1, but the linear predictor of the mean in a traditional linear modelisnotrestrictedtothisrange. (cid:15) Itmightnotberealistictoassumethatthevarianceofthedataisconstantforallobservations. For example,itisnotunusualtoobservedatawherethevarianceincreaseswiththemeanofthedata. Ageneralizedlinearmodelextendsthetraditionallinearmodelandisthereforeapplicabletoawiderrange ofdataanalysisproblems. Ageneralizedlinearmodelconsistsofthefollowingcomponents: (cid:15) Thelinearcomponentisdefinedjustasitisfortraditionallinearmodels: (cid:17) D x0ˇ i i (cid:70) ExamplesofGeneralizedLinearModels 3065 (cid:15) A monotonic differentiable link function g describes how the expected value of y is related to the i linearpredictor(cid:17) : i g.(cid:22) / D x0ˇ i i (cid:15) Theresponsevariablesy areindependentfori=1,2,...andhaveaprobabilitydistributionfroman i exponentialfamily. Thisimpliesthatthevarianceoftheresponsedependsonthemean(cid:22)througha variancefunctionV: (cid:30)V.(cid:22) / Var.y / D i i w i where(cid:30) isaconstantandw isaknownweightforeachobservation. Thedispersionparameter(cid:30) is i eitherknown(forexample,forthebinomialorPoissondistribution,(cid:30) D 1)ormustbeestimated. Seethesection“ResponseProbabilityDistributions”onpage3147fortheformofaprobabilitydistribution fromtheexponentialfamilyofdistributions. As in the case of traditional linear models, fitted generalized linear models can be summarized through statistics such as parameter estimates, their standard errors, and goodness-of-fit statistics. You can also makestatisticalinferenceabouttheparametersbyusingconfidenceintervalsandhypothesistests. However, specificinferenceproceduresareusuallybasedonasymptoticconsiderations,sinceexactdistributiontheory isnotavailableorisnotpracticalforallgeneralizedlinearmodels. Examples of Generalized Linear Models Youconstructageneralizedlinearmodelbydecidingonresponseandexplanatoryvariablesforyourdataand choosinganappropriatelinkfunctionandresponseprobabilitydistribution. Someexamplesofgeneralized linearmodelsfollow. Explanatoryvariablescanbeanycombinationofcontinuousvariables,classification variables,andinteractions. Traditional Linear Model (cid:15) responsevariable: acontinuousvariable (cid:15) distribution: normal (cid:15) linkfunction: identity,g.(cid:22)/ D (cid:22) Logistic Regression (cid:15) responsevariable: aproportion (cid:15) distribution: binomial (cid:18) (cid:19) (cid:22) (cid:15) linkfunction: logit,g.(cid:22)/ D log 1(cid:0)(cid:22) (cid:70) 3066 Chapter45:TheGENMODProcedure Poisson Regression in Log-Linear Model (cid:15) responsevariable: acount (cid:15) distribution: Poisson (cid:15) linkfunction: log,g.(cid:22)/ D log.(cid:22)/ Gamma Model with Log Link (cid:15) responsevariable: apositive,continuousvariable (cid:15) distribution: gamma (cid:15) linkfunction: log,g.(cid:22)/ D log.(cid:22)/ The GENMOD Procedure TheGENMODprocedurefitsageneralizedlinearmodeltothedatabymaximumlikelihoodestimationofthe parametervectorˇ. Thereis,ingeneral,noclosedformsolutionforthemaximumlikelihoodestimatesofthe parameters. TheGENMODprocedureestimatestheparametersofthemodelnumericallythroughaniterative fittingprocess. Thedispersionparameter(cid:30) isalsoestimatedbymaximumlikelihoodor,optionally,bythe residualdevianceorbyPearson’schi-squaredividedbythedegreesoffreedom. Covariances,standarderrors, andp-valuesarecomputedfortheestimatedparametersbasedontheasymptoticnormalityofmaximum likelihoodestimators. Anumberofpopularlinkfunctionsandprobabilitydistributionsareavailableinthe GENMODprocedure. Thebuilt-inlinkfunctionsareasfollows: (cid:15) identity: g.(cid:22)/ D (cid:22) (cid:15) logit: g.(cid:22)/ D log.(cid:22)=.1(cid:0)(cid:22)// (cid:15) probit: g.(cid:22)/ D ˆ(cid:0)1.(cid:22)/,whereˆisthestandardnormalcumulativedistributionfunction (cid:26) (cid:22)(cid:21) if(cid:21) ¤ 0 (cid:15) power: g.(cid:22)/ D log.(cid:22)/ if(cid:21) D 0 (cid:15) log: g.(cid:22)/ D log.(cid:22)/ (cid:15) complementarylog-log: g.(cid:22)/ D log.(cid:0)log.1(cid:0)(cid:22)// Theavailabledistributionsandassociatedvariancefunctionsareasfollows: (cid:15) normal: V.(cid:22)/ D 1 (cid:15) binomial(proportion): V.(cid:22)/ D (cid:22).1(cid:0)(cid:22)/ (cid:15) Poisson: V.(cid:22)/ D (cid:22) (cid:15) gamma: V.(cid:22)/ D (cid:22)2 (cid:70) TheGENMODProcedure 3067 (cid:15) inverseGaussian: V.(cid:22)/ D (cid:22)3 (cid:15) negativebinomial: V.(cid:22)/ D (cid:22)Ck(cid:22)2 (cid:15) geometric: V.(cid:22)/ D (cid:22)C(cid:22)2 (cid:15) multinomial (cid:15) zero-inflatedPoisson (cid:15) zero-inflatednegativebinomial Thenegativebinomialandzero-inflatednegativebinomialaredistributionswithanadditionalparameterkin thevariancefunction. PROCGENMODestimateskbymaximumlikelihood,oryoucanoptionallysetitto aconstantvalue. Fordiscussionsofthenegativebinomialdistribution,seeMcCullaghandNelder(1989); Hilbe(1994,2007);Long(1997);CameronandTrivedi(1998);Lawless(1987). The multinomial distribution is sometimes used to model a response that can take values from a number of categories. The binomial is a special case of the multinomial with two categories. See the section “MultinomialModels”onpage3166andMcCullaghandNelder(1989,Chapter5)foradescriptionofthe multinomialdistribution. Thezero-inflatedPoissonandzero-inflatednegativebinomialareincludedinPROCGENMODeventhough they are not generalized linear models. They are useful extensions of generalized linear models. See the section“Zero-InflatedModels”onpage3166forinformationaboutthezero-inflateddistributions. Models fordatawithcorrelatedresponsesfitbytheGEEmethodarenotavailableforzero-inflateddistributions. Inaddition,youcaneasilydefineyourownlinkfunctionsordistributionsthroughDATAstepprogramming statementsusedwithintheprocedure. Animportantaspectofgeneralizedlinearmodelingistheselectionofexplanatoryvariablesinthemodel. Changes in goodness-of-fit statistics are often used to evaluate the contribution of subsets of explanatory variables to a particular model. The deviance, defined to be twice the difference between the maximum attainableloglikelihoodandtheloglikelihoodofthemodelunderconsideration,isoftenusedasameasure ofgoodnessoffit. Themaximumattainableloglikelihoodisachievedwithamodelthathasaparameterfor everyobservation. Seethesection“GoodnessofFit”onpage3154forformulasforthedeviance. Onestrategyforvariableselectionistofitasequenceofmodels,beginningwithasimplemodelwithonlyan interceptterm,andthentoincludeoneadditionalexplanatoryvariableineachsuccessivemodel. Youcan measuretheimportanceoftheadditionalexplanatoryvariablebythedifferenceindeviancesorfittedlog likelihoodsbetweensuccessivemodels. AsymptotictestscomputedbytheGENMODprocedureenableyou toassessthestatisticalsignificanceoftheadditionalterm. TheGENMODprocedureenablesyoutofitasequenceofmodels,upthroughamaximumnumberofterms specifiedinaMODELstatement. Atablesummarizestwicethedifferenceinloglikelihoodsbetweeneach successivepairofmodels. ThisiscalledaType1analysisintheGENMODprocedure,becauseitisanalogous to Type I (sequential) sums of squares in the GLM procedure. As with the PROC GLM Type I sums of squares,theresultsfromthisprocessdependontheorderinwhichthemodeltermsarefit. TheGENMODprocedurealsogeneratesaType3analysisanalogoustoTypeIIIsumsofsquaresintheGLM procedure. AType3analysisdoesnotdependontheorderinwhichthetermsforthemodelarespecified. A GENMODprocedureType3analysisconsistsofspecifyingamodelandcomputinglikelihoodratiostatistics forTypeIIIcontrastsforeachterminthemodel. Thecontrastsaredefinedinthesamewayastheyareinthe (cid:70) 3068 Chapter45:TheGENMODProcedure GLMprocedure. TheGENMODprocedureoptionallycomputesWaldstatisticsforTypeIIIcontrasts. This iscomputationallylessexpensivethanlikelihoodratiostatistics,butitisthoughttobelessaccuratebecause thespecifiedsignificancelevelofhypothesistestsbasedontheWaldstatisticmightnotbeasclosetothe actualsignificancelevelasitisforlikelihoodratiotests. AType3analysisgeneralizestheuseofTypeIIIestimablefunctionsinlinearmodels. Briefly,aTypeIII estimable function (contrast) for an effect is a linear function of the model parameters that involves the parameters of the effect and any interactions with that effect. A test of the hypothesis that the Type III contrastforamaineffectisequalto0isintendedtotestthesignificanceofthemaineffectinthepresence of interactions. See Chapter 47, “The GLM Procedure,” and Chapter 15, “The Four Types of Estimable Functions,”formoreinformationaboutTypeIIIestimablefunctions. AlsoseeLittell,Freund,andSpector (1991). AdditionalfeaturesoftheGENMODprocedureincludethefollowing: (cid:15) likelihood ratio statistics for user-defined contrasts—that is, linear functions of the parameters and p-valuesbasedontheirasymptoticchi-squaredistributions (cid:15) estimatedvalues,standarderrors,andconfidencelimitsforuser-definedcontrastsandleastsquares means (cid:15) abilitytocreateaSASdatasetcorrespondingtomosttablesdisplayedbytheprocedure(seeTable45.12 andTable45.13) (cid:15) confidenceintervalsformodelparametersbasedoneithertheprofilelikelihoodfunctionorasymptotic normality (cid:15) syntaxsimilartothatofPROCGLMforthespecificationoftheresponseandmodeleffects,including interactiontermsandautomaticcodingofclassificationvariables (cid:15) abilitytofitGEEmodelsforclusteredresponsedata (cid:15) abilitytoperformBayesiananalysisbyGibbssampling Getting Started: GENMOD Procedure Poisson Regression YoucanusetheGENMODproceduretofitavarietyofstatisticalmodels. AtypicaluseofPROCGENMOD istoperformPoissonregression. You can use the Poisson distribution to model the distribution of cell counts in a multiway contingency table. Aitkin et al. (1989) have used this method to model insurance claims data. Suppose the following hypotheticalinsuranceclaimsdataareclassifiedbytwofactors: agegroup(withtwolevels)andcartype (withthreelevels).
Description: