COPYRIGHT Abraham, B. and Ledolter, J. Introduction to Regression Modeling Belmont, CA: Duxbury Press, 2006 Abraham Abraham˙C04 November8,2004 1:29 4 Multiple Linear Regression Model 4.1 INTRODUCTION InthischapterweconsiderthegenerallinearmodelintroducedinEq.(1.10), y=β +β x +···+β x +(cid:3) (4.1) 0 1 1 p p whichlinksaresponsevariableytoseveralindependent(alsocalledexplanatory or predictor) variables x ,x ,...,x . We discuss how to estimate the model 1 2 p parametersβ=(β ,β ,...,β )(cid:5)andhowtotestvarioushypothesesaboutthem. 0 1 p Youmayfindthesubsequentdiscussioninterestingfromatheoreticalstandpoint because it uses linear algebra to establish general results. It also maps out an elegantgeometricapproachtoleastsquaresregression.Bepreparedforsubspaces, basisvectors,andorthogonalprojections. 4.1.1 TWO EXAMPLES In order to motivate the general notation, we start our discussion with two ex- amples:theureaformaldehydefoaminsulation(UFFI)exampleandthegascon- sumptiondataofChapter1. UFFI Example InExample1.2.5ofChapter1,weconsidered12homeswithoutUFFI(x =0) 1 and12homeswithinsulation(x =1).Foreachhomeweobtainedanair-tightness 1 measure(x )andareadingofitsambientformaldehydeconcentration(y).The 2 modelinEq.(1.6)relatestheambientformaldehydeconcentration(y)oftheith hometoitsairtightness(x )andthepresenceofUFFI(x ): 2 1 y =β +β x +β x +(cid:3) , i=1,2,...,24 (4.2) i 0 1 i1 2 i2 i Table1.2liststheinformationonthe12houseswithoutUFFI(x =0)first;the 1 remaining 12 homes with UFFI (x =1) are listed second. Note that Chapter 1 1 87 Abraham Abraham˙C04 November8,2004 1:29 88 MultipleLinearRegressionModel uses z and x for the predictors x and x . The 24 equations resulting from 1 2 model(4.2), 31.33= β +β 0+β 0+(cid:3) 0 1 2 1 28.57= β +β 0+β 1+(cid:3) . 0 1 . 2 2 . . . . 56.67= β +β 0+β 9+(cid:3) 0 1 2 12 43.58= β +β 1+β 1+(cid:3) . 0 1 . 2 13 . . . . 70.34=β +β 1+β 10+(cid:3) 0 1 2 24 canbewritteninvectorform,       31.33 1 0 0 (cid:3)      1  28....57 1... 0... 1... β0  (cid:3)...2    56.67=1 0 9β +(cid:3)  1 12       43.58 1 1 1 β (cid:3)   ...  ... ... ...  2  1...3 70.34 1 1 10 (cid:3) 24 Inshort, y=Xβ+(cid:2) (4.3) where   1 0 0   31.33 1... 0... 1...  β   (cid:3)1    0 y=28....57; X=1 0 9 ; β=β1; and (cid:2)= (cid:3)...2  1 1 1  β 70.34 ... ... ...  2 (cid:3)24 1 1 10 Gas Consumption Data InExample1.2.7ofChapter1werelatethefuelefficiencyoneachof38carsto theirweight,enginedisplacement,andnumberofcylinders.Considerthemodel y =β +β x +β x +β x +(cid:3) , i=1,2,...,38 (4.4) i 0 1 i1 2 i2 3 i3 i where y =gasconsumption(milespergallon)fortheithcar i x =weightoftheithcar i1 x =enginedisplacementfortheithcar i2 x =numberofcylindersfortheithcar i3 Abraham Abraham˙C04 November8,2004 1:29 4.1 Introduction 89 Theresulting38equations 16.9=β +β 4.360+β 350+β 8+(cid:3) 0 1 2 3 1 15.5=β +β 4.054+β 351+β 8+(cid:3) 0 1 2 3 2 . . . . . . . . . 31.9=β +β 1.925+β 89+β 4+(cid:3) 0 1 2 3 38 canbewritteninvectorformas         16.9 1 4.360 350 8 β0 (cid:3)1       15....5=1... 4.0...54 35...1 8...ββ1+ (cid:3)...2  2 31.9 1 1.925 89 4 β (cid:3) 3 38 Inshort, y=Xβ+(cid:2) where         16.9 1 4.360 350 8 β0 (cid:3)1       y=15....5; X=1... 4.0...54 35...1 8...; β=ββ1; and (cid:2)= (cid:3)...2  2 31.9 1 1.925 89 4 β (cid:3) 3 38 (4.5) 4.1.2 THE GENERAL LINEAR MODEL These two examples show us how we can write the general linear model (4.1) in vector form. Suppose that we have information on n cases, or subjects i= 1,2,...,n. Let y be the observed value on the response variable and let x , i i1 x ,...,x bethevaluesontheindependentorpredictorvariablesoftheithcase. i2 ip The values of the p predictor variables are treated as fixed constants; however, the responses are subject to variability. The model for the response of case i is writtenas y =β +β x +···+β x +(cid:3) i 0 1 i1 p ip i =µ +(cid:3) (4.6) i i whereµ =β +β x +···+β x isadeterministiccomponentthatisaffected i 0 1 i1 p ip by the regressor variables and (cid:3) is a term that captures the effect of all other i variablesthatarenotincludedinthemodel. We assume that (cid:3) is a random variable with mean E((cid:3) )=0 and variance i i V((cid:3) )=σ2,andwesupposethatthe(cid:3) arenormallydistributed.Furthermore,we i i assume that the errors from different cases,(cid:3) ,...,(cid:3) , are independent random 1 n variables.Theseassumptionsimplythattheresponsesy ,...,y areindependent 1 n Abraham Abraham˙C04 November8,2004 1:29 90 MultipleLinearRegressionModel normalrandomvariableswithmean E(y )=µ =β +β x +···+β x and i i 0 1 i1 p ip varianceV(y )=σ2. i Weassumethatthevariance V(y )isthesameforeachcase.Notethatthis i is an assumption that needs to be checked because one needs to check all other model assumptions, such as the form of the deterministic relationship and the normaldistributionoftheerrors. Then equationsin(4.6)canberewritteninvectorform,       y...1=β0+β1x11+... ···+βpx1p+(cid:3)...1 yn β +β x +···+β x (cid:3)n 0 1 n1 p np Inshort, y=Xβ+(cid:2) (4.7) where         y1 1 x11 x12 ··· x1p β0 (cid:3)1 y=y...2; X=1... x...21 x...22 ··· x2...p; β=β...1; and (cid:2)=(cid:3)...2 yn 1 xn1 xn2 ··· xnp βp (cid:3)n You should convince yourself that this representation is correct by multiplying outthefirstfewelementsof Xβ. The assumptions on the errors in this model can also be written in vector form. We write (cid:2)∼N(0,σ2I), a multivariate normal distribution with mean vector E((cid:2))=0 and covariance matrix V((cid:2))=σ2I. Similarly, we write y∼ N(Xβ,σ2I), a multivariate normal distribution with mean vector E(y)=Xβ andcovariancematrixV(y)=σ2I. 4.2 ESTIMATION OF THE MODEL We now consider the estimation of the unknown parameters: the (p+1) re- gression parameters β, and the variance of the errors σ2. Since y ∼N(µ ,σ2) i i withµ =β +β x +···+β x areindependent,itisstraightforwardtowrite i 0 1 i1 p ip downthejointprobabilitydensity p(y ,...,y |β,σ2).Treatingthis,forgiven 1 n data y,asafunctionoftheparametersleadstothelikelihoodfunction (cid:8) (cid:10) √ (cid:9)n L(β,σ2|y ,...,y )=(1/ 2πσ)nexp − (y −µ )2/2σ2 (4.8) 1 n i i i=1 Maximizing(cid:11)thelikelihoodfunctionL withrespecttoβisequivalenttominimiz- ing S(β)= n (y −µ )2 with respect to β. This is because the exponent in i=1 i i Eq.(4.8)istheonlytermcontainingβ.Thesumofsquares S(β)canbewritten invectornotation, S(β)=(y−µ)(cid:5)(y−µ)=(y−Xβ)(cid:5)(y−Xβ), sinceµ=Xβ (4.9) Abraham Abraham˙C04 November8,2004 1:29 4.2 EstimationoftheModel 91 TheminimizationofS(β)withrespecttoβisknownasleastsquaresestimation, and for normal errors it is equivalent to maximum likelihood estimation. We determinetheleastsquaresestimatesbyobtainingthefirstderivativesofS(β)with respecttotheparametersβ ,β ,...,β ,andbysettingthese(p+1)derivatives 0 1 p equaltozero. Theappendixshowsthatthisleadstothe(p+1)equations X(cid:5)Xβˆ=X(cid:5)y (4.10) Theseequationsarereferredtoasthenormalequations.ThematrixXisassumed to have full column rank p+1. Hence, the (p+1)×(p+1) matrix X(cid:5)X is nonsingularandthesolutionofEq.(4.10)isgivenby βˆ=(X(cid:5)X)−1X(cid:5)y (4.11) Theestimateβˆ inEq.(4.11)minimizes S(β),andisknownastheleastsquares estimate(LSE)ofβ. 4.2.1 A GEOMETRIC INTERPRETATION OF LEAST SQUARES ThemodelinEq.(4.7)canbewrittenas y =β 1+β x +···+β x +(cid:2) 0 1 1 p p =µ+(cid:2) (4.12) where the (n×1) vectors y and (cid:2) are as defined before, and the (n×1) vec- tors1=(1,1,...,1)(cid:5)andx =(x ,x ,...,x )(cid:5),for j=1,2,...,p,represent j 1j 2j nj the columns of the matrix X. Thus, X=(1,x ,...,x ) and µ=Xβ=β 1+ 1 p 0 β x +···+β x . 1 1 p p TherepresentationinEq.(4.12)showsthatthedeterministiccomponentµis alinearcombinationofthevectors1,x ,...,x .LetL(1,x ,...,x )betheset 1 p 1 p ofalllinearcombinationsofthesevectors.Ifweassumethatthesevectorsarenot linearly dependent, L(X)=L(1,x ,...,x ) is a subspace of Rn of dimension 1 p p+1.Notethattheassumptionthat1,x ,...,x arenotlinearlydependentis 1 p thesameassayingthat X hasrank p+1. Wewanttoexplaintheseconceptsslowlybecausetheyareessentialforunder- standingthegeometricinterpretationthatfollows.First,notethatthedimension oftheregressorvectors1,x ,...,x isn,thenumberofcases.Whenwedisplay 1 p the(p+1)regressorvectors,wedothatinn-dimensionalEuclideanspace Rn. Thecoordinatesoneachregressorvectorcorrespondtotheregressor’svalueson the n cases. For example, the regressor vector x may represent the air tightness ofahome,andthedimensionofthisvectoris24,ifmeasurementson24homes aretaken.Notethatformodelswithanintercept,oneoftheregressorcolumnsis alwaysthevectorofones,1. Obviously, it is impossible to graph vectors in 24-dimensional space, but you can get a good idea of this by considering lower dimensional situations. Considerthecaseinwhichn=3,andusetworegressorcolumns:theunitvector Abraham Abraham˙C04 November8,2004 1:29 92 MultipleLinearRegressionModel FIGURE4.1 Two Vectorsin (−0.3, 0.5, 0.7) Three-Dimensional (1, 1, 1) Space,andthe Two-Dimensional 1 SpaceSpannedby 3 TheseTwoVectors n o nsi 0 (0, 0, 0) e 1 m Di −1 −1 0mension 2 Di −1 0 1 Dimension 1 1=(1,1,1)(cid:5) and x=(−0.3,0.5,0.7)(cid:5). These two vectors are graphed in three- dimensional space in Figure 4.1. Any linear combination of these two vectors results in a vector that lies in the two-dimensional space that is spanned by the vectors 1 and x. We highlight this by shading the plane that contains all linear combinations.Weseethat L(1,x)isasubspaceof R3,anditsdimensionis2. Observe that we have selected two vectors 1 and x that are not linearly dependent.Thismeansthatoneofthetwovectorscannotbewrittenasamultiple oftheother.Thisisthecaseinourexample.Notethatthematrix   1 −0.3   X=[1,x]=1 0.5 1 0.7 hasfullcolumnrank,2. Whatwouldhappeniftworegressorvectorswerelinearlydependent;forex- ample,if1=(1,1,1)(cid:5)andx=(0.5,0.5,0.5)(cid:5)?Here,everylinearcombinationof 1andx,α 1+α x=α 1+α (0.5)1=(α +0.5α )1,isamultipleof1.Hence, 1 2 1 2 1 2 the set of all linear combinations are points along the unit vector, and L(1,x) defines a subspace of dimension 1. You can also see this from the rank of the matrix X:Therankof   1 0.5   X=[1,x]=1 0.5 1 0.5 isone; X doesnothavefullcolumnrank. If we contemplate a model with two regressor columns, 1 and x, then we supposethat1andxarenotlinearlydependent.Iftheywerelinearlydependent, wewouldencounterdifficultiesbecauseaninfinitenumberoflinearcombinations could be used to represent each point in the subspace spanned by 1 and x. You canseethisfromourexample.Thereisaninfinitenumberofvaluesforα and 1 α thatresultinagivenvalueα +0.5α =c. 2 1 2 Abraham Abraham˙C04 November8,2004 1:29 4.2 EstimationoftheModel 93 FIGURE4.2 y Geometric Representationof theResponseVector L(X) y andtheSubspace L(X) m Nowwearereadytogotothemoregeneralcasewithalargenumberofcases, n.Supposethattherearetworegressors(p=2)andthreeregressorcolumns1, x ,andx .Weassumethatthesethreecolumnsarenotlinearlydependentandthat 1 2 thematrix X=[1,x ,x ]hasfullcolumnrank,rank3.Theregressorvectorsare 1 2 elementsin Rn,andthesetofalllinearcombinationsof1,x ,x ,L(1,x ,x ), 1 2 1 2 definesathree-dimensionalsubspaceof Rn.If1,x ,x werelinearlydependent, 1 2 thenthesubspacewouldbeoflowerdimension(either2or1). Nowweconsiderthecasewith p regressorsshowninFigure4.2.Theoval representsthesubspaceL(X).Thevectorµ=β 1+β x +···+β x isalin- 0 1 1 p p earcombinationof1,x ,...,x ,andispartofthesubspaceL(X).Thispictureis 1 p simplifiedasittriestoillustrateahigherdimensionalspace.Youneedtouseyour imagination. Until now, we have talked about the subspace of Rn that is spanned by the p+1regressorvectors1,x ,...,x .Next,letusaddthe(n×1)response 1 p vector y tothepicture(seeFigure4.2).Theresponsevector y isnotpartofthe subspace L(X).Foragivenvalueofβ,Xβ isavectorinthesubspace; y−Xβ is the difference between the response vector y and the vector in the subspace, andS(β)=(y−Xβ)(cid:5)(y−Xβ)representsthesquaredlengthofthisdifference. MinimizingS(β)withrespecttoβcorrespondstofindingβˆ sothat y−Xβˆ has minimumlength. Inotherwords,wemustfindavector Xβˆ inthesubspaceL(X)thatis“clos- est”toy.ThevectorinthesubspaceL(X)thatisclosesttoyisobtainedbymaking thedifferencey−Xβˆ perpendiculartothesubspaceL(X);seeFigure4.3.Since 1,x ,...,x areinthesubspace,werequirethat y−Xβˆ isperpendicularto1, 1 p x ,...,andx . 1 p Thisimpliestheequations 1(cid:5)(y−Xβˆ)=0 x(cid:5)(y−Xβˆ)=0 1 ··· x(cid:5) (y−Xβˆ)=0 p Combiningthese p+1equationsleadsto X(cid:5)(y−Xβˆ)=0 Abraham Abraham˙C04 November8,2004 1:29 94 MultipleLinearRegressionModel FIGURE4.3 A y y − X b^ GeometricViewof LeastSquares L(X) ^ X b and X(cid:5)Xβˆ=X(cid:5)y thenormalequationsinEq.(4.10)thatwepreviouslyderivedalgebraically. Weassumethat X hasfullcolumnrank, p+1.Hence, X(cid:5)X hasrank(p+ 1), the inverse (X(cid:5)X)−1 exists, and the least squares estimate is given by βˆ= (X(cid:5)X)−1X(cid:5)y.NoticethatwehaveobtainedtheLSEsolelythroughageometric argument;noalgebraicderivationwasinvolved. The vector of fitted values is given by µˆ =Xβˆ, and the vector of resid- uals is e=y−µˆ =y−Xβˆ. The geometric interpretation of least squares is quitesimple.Leastsquaresestimationamountstofindingthevectorµˆ =Xβˆ in thesubspace L(X)thatisclosesttotheobservationvector y.Thisrequiresthat the difference (i.e., the residual vector) is perpendicular (or othogonal) to the subspace L(X). Hence, the vector of fitted values µˆ =Xβˆ is the orthogonal projectionof y ontothesubspace L(X).Inalgebraicterms, µˆ =Xβˆ=X(X(cid:5)X)−1X(cid:5)y=Hy whereH =X(X(cid:5)X)−1X(cid:5)isann×nsymmetricandidempotentmatrix.Itiseasy toconfirmthat H isidempotentas HH =X(X(cid:5)X)−1X(cid:5)X(X(cid:5)X)−1X(cid:5)=H The matrix H is an important matrix because it represents the orthogonal projectionof y onto L(X).Itisreferredtoasthe“hat”matrix. Thevectorofresidualse=y−µˆ =y−X(X(cid:5)X)−1X(cid:5)y=(I −H)y isalso aprojectionof y,thistimeonthesubspaceof Rn thatisperpendicularto L(X). Thevectoroffittedvaluesµˆ =Xβˆ andthevectorofresidualseareorthog- onal,whichmeansalgebraicallythat X(cid:5)e=X(cid:5)(y−Xβˆ)=0 See the normal equations in Eq. (4.10). Hence, least squares decomposes the responsevector y=µˆ +e=Xβˆ +(y−Xβˆ) into two orthogonal pieces. The vector of fitted values Xβˆ is in L(X), whereas thevectorofresiduals y−Xβˆ isinthespaceorthogonalto L(X). Itmayhelpyoutolookatthisintheverysimplestspecialcaseinwhichwe haven=2casesandjustasingleregressorcolumn,1=(1,1)(cid:5).Thisrepresents Abraham Abraham˙C04 November8,2004 1:29 4.2 EstimationoftheModel 95 the“mean”regressionmodel,y =β +(cid:3) ,withi=1,2.Howdoesthislookgeo- i 0 i metrically?Sincethenumberofcasesis2,wearelookingatthetwo-dimensional Euclideanspace.Drawintheunitvector1=(1,1)(cid:5) andtheresponsevector y= (y ,y )(cid:5).Forillustration,take y=(0,1)(cid:5).Weproject y=(y ,y )(cid:5)=(0,1)(cid:5) onto 1 2 1 2 thesubspaceL(1),whichisthe45-degreelineinthetwo-dimensionalEuclidean space.Theprojectionleadstothevectoroffittedvaluesµˆ =0.51=(0.5,0.5)(cid:5)and theLSEβˆ =0.5.Theestimateistheaverageofthetwoobservations,0and1.The 0 residual vector e=y−µˆ =(0−0.5,1−0.5)(cid:5)=(−0.5,0.5)(cid:5) and the vector of fittedvaluesµˆ =(0.5,0.5)(cid:5) areorthogonal;thatis,e(cid:5)µˆ =−(0.5)2+(0.5)2=0. 4.2.2 USEFUL PROPERTIES OF ESTIMATES AND OTHER RELATED VECTORS Recallourmodel y=Xβ+(cid:2) where X is a fixed (nonrandom) matrix with full rank, and the random error (cid:2) follows a distribution with mean E((cid:2))=0 and covariance matrix V((cid:2))=σ2I. Usually, we also assume a normal distribution. The model implies that E(y)= Xβ and V(y)=σ2I. The LSE of the parameter vector β is βˆ=(X(cid:5)X)−1X(cid:5)y. Thevectoroffittedvaluesisµˆ =Xβˆ=Hyandtheresidualvectorise=y−µˆ = (I −H)y.Wenowstudypropertiesofthesevectorsandotherrelatedquantities, alwaysassumingthatthemodelistrue. i. Estimateβˆ: E(βˆ)= E(X(cid:5)X)−1X(cid:5)y =(X(cid:5)X)−1X(cid:5)E(y)=(X(cid:5)X)−1X(cid:5)Xβ=β (4.13) showingthatβˆ isanunbiasedestimatorofβ. V(βˆ)= V[(X(cid:5)X)−1X(cid:5)y] =(X(cid:5)X)−1X(cid:5)V(y)X(X(cid:5)X)−1 =(X(cid:5)X)−1X(cid:5)(σ2I)(X(cid:5)X)−1 =(X(cid:5)X)−1X(cid:5)X(X(cid:5)X)−1σ2=(X(cid:5)X)−1σ2 (4.14) ThematrixinEq.(4.14)containsthevariancesoftheestimatesinthe diagonalandthecovariancesintheoff-diagonalelements.Letv denote ij theelementsofthematrix(X(cid:5)X)−1.ThenV(βˆ )=σ2v , i ii v Cov(βˆ ,βˆ )=σ2v ,andCorr(βˆ ,βˆ )= ij . i j ij i j (v v )1/2 ii jj ii. Linearcombinationofestimates,a(cid:5)βˆ: Thelinearcombinationa(cid:5)β,whereaisavectorofconstantsofappropriate dimension,canbeestimatedbya(cid:5)βˆ.Wefind E(a(cid:5)βˆ)=a(cid:5)E(βˆ)=a(cid:5)β and V(a(cid:5)βˆ)=a(cid:5)V(βˆ)a=a(cid:5)(X(cid:5)X)−1aσ2 (4.15)

