Table Of Content

Smoothing Spline ANOVA for Multivariate Bernoulli Observations, With Application to Ophthalmology Data Fangyu Gao, Grace Wahba, Ronald Klein, and Barbara Klein We combine a smoothing spline analysis of variance (SS-ANOVA) model and a log-linear model to build a partly (cid:143)exible model for multivariate Bernoullidata. The jointdistributionconditioningon the predictorvariables is estimated. The log odds ratio is used to measure the association between outcome variables. A numerical scheme based on the block one-step successive over relaxation SOR–Newton-Ralphsonalgorithmisproposedtoobtainanapproximatesolutionforthevariationalproblem.Weextendthegeneralized approximatecrossvalidation(GACV)andtherandomizedGACVforchoosingsmoothingparameterstothecaseofmultivariateBernoulli responses.Therandomizedversionisfastandstabletocomputeandisusedtoadaptivelyselectsmoothingparametersineachblockone- stepSORiteration.ApproximateBayesiancon(cid:142)denceintervalsareobtainedforthe(cid:143)exibleestimatesoftheconditionallogitfunctions. Simulationstudiesare conductedto check the performanceoftheproposedmethod,usingthe comparativeKullback–Leiblerdistance asayardstick.Finally,themodelisappliedtotwo-eyeobservationaldatafromtheBeaverDamEyeStudy,toexaminetheassociation ofpigmentaryabnormalitiesandvariouscovariates. KEYWORDS: Log-linear models; Multivariate responses; Odds ratio; Penalized likelihood;Repeated measurements; Representers; Reproducing kernel Hilbert space; Risk factor estimation; Semiparametric regression; Smoothingspline analysis of variance. 1. INTRODUCTION and a reasonable assessment of accuracy after the model has been (cid:142)tted. This property is especially important for medi- CorrelatedBernoullioutcomesmaycomefrommanyappli- cal researchers, as the investigators are usually interested in cations. One motivationof this study is to develop a (cid:143)exible understandingthe cause of certainoutcomes. model for analyzingtypicaldata from ophthalmologicalstud- It is not enough to simply estimate the marginal distri- ies.Usually,pairedobservationsare availableforbotheyesof bution separately for the individual outcome variables. The thesameperson.Bothperson-speci(cid:142)candeye-speci(cid:142)c covari- dependence structure can be useful for the ef(cid:142)cient estima- ates may be available as predictor variables. The outcomes tion of the mean values, or it can be of direct scienti(cid:142)c for the same person are expected to be correlated even after interest. This is a very active research topic, with numer- adjustment for the available predictor variables. This associ- ous schemes proposed to study it. For example, Cox (1972) ation re(cid:143)ects the consequence of unmeasured or unmeasur- expressed the likelihood function in terms of the multivari- able genetic,behavioralor other risk factors. Other examples ate exponential family distribution, and Qu, Williams, Beck, involving correlated outcomes include two-period cross-over and Goormastic (1987) considered conditional logistic mod- designs (Jones and Kenward 1989), twin studies (Cessie and els. Lipsitz, Laird, and Harrington (1991), Williamson, Kim, Houwelingen 1994) and typical longitudinal studies (Diggle, and Lipsitz (1995), and Heagerty and Zeger (1996) consid- Liang, and Zeger 1994). Sometimes it is also of interest to ered marginal models and used the (global) odds ratio as a model several closely related endpoints simultaneously. For measure of association.Liang et al. (1992) discussed the dif- example, Liang, Zeger, and Qaqish (1992) considered two ferencebetweenlog-linearandmarginalmodels.Molenberghs endpoints from the Indonesian Children’s Study, respiratory andRitter(1996)proposeda likelihood-basedmarginalmodel and diarrheal infections,in the same model. We are interested in (cid:142)nding the relation between the out- and established the connection with the second-order gener- come variables and the predictor variables, including condi- alized estimating equations (GEEs). McCullagh and Nelder tionalcorrelationsbetweenthe outcomevariables.Due to the (1989) and Golenk and McCullagh (1995) proposed a multi- complexityof biologicalprocesses, linearparametricassump- variatemarginallogisticregressionmodel.Otherrelatedwork tions on some scale, or even quadratic or cubic assump- hasbeendonebyZhao,Prentice,andSelf(1992),Fitzmaurice tions might not be adequate. When such an assumption is and Laird (1993),and Carey, Zeger, and Diggle(1993).Katz, far away from the truth, the results obtained under it may Zeger, and Liang (1994) speci(cid:142)cally discussed approaches to even be misleading. Hence we are interested in building a accountfor the associationbetween felloweyes. (cid:143)exible statistical model. A nonparametric model of the type Researchers have already realized the merit of a nonpara- consideredinthisarticlecanalsoserve as an automateddiag- metric approach to model correlated data. We note that addi- nostic toolfor parametric (cid:142)tting.The model shouldalso have tive smoothingsplines with (cid:142)xed smoothing parameters have readily interpretableresults for multivariatefunctionestimate been used by Wild and Yee (1996) and Yee and Wild (1996). These authors gave a nonparametric extension to both GEE and likelihood approaches. Heagerty and Zeger (1998) used Fangyu Gao is Senior Economist, Freddie Mac, McLean, VA 22102 the log odds ratio as a measurement of dependence and (E-mail: fangyu_gao@freddiemac.com). Grace Wahba is Bascom Profes- sor, Department of Statistics, University of Wisconsin, Madison, WI 53706(E-mail:wahba@stat.wisc.edu). RonaldKlein,MD, is Professor,and Barbara Klein, MD is Professor, Department of Ophthalmology, Univer- ©2001American StatisticalAssociation sity of Wisconsin,Madison,WI 53705(E-mail: kleinr@epi.ophth.wisc.edu; JournaloftheAmerican StatisticalAssociation kleinb@epi.ophth.wisc.edu). March2001,Vol.96,No.453,TheoryandMethods 127 128 JournaloftheAmericanStatisticalAssociation,March2001 smoothing splines with (cid:142)xed degrees of freedom to estimate and log odds ratios along with the smoothingparameter esti- it. Theirmodel was (cid:142)tted using GEEs. Lin and Zhang (1999) mates. Simulation studies are presented to demonstrate the proposeda generalizedadditivemixedeffects modelandused ef(cid:142)cacy of the methods,and (cid:142)nally,the results are appliedto smoothing splines to estimate the additive (cid:142)xed effects term. two-eye observational data from the Beaver Dam Eye study Berhane and Tibshirani (1998) also provided a generalized to examine the association of pigmentary abnormalities and additivemodel for the GEE method. variouscovariates. In thisarticlewe explorehowto combinesmoothingspline 2. LOG-LINEAR MODEL FOR MULTIVARIATE analysis of variance (SS-ANOVA) and log-linear models to BERNOULLI OBSERVATIONS build a partly (cid:143)exiblemodel for multivariateBernoulliobser- vations. We also propose a method for choosing the smooth- We (cid:142)rst present the log-linear model for multivariate ing parameters adaptively in this multivariate outcome case. Bernoullidata.Assuming that there are J different endpoints, Classical log-linear models have been widely used to esti- and K repeated measurements for the jth endpoint, let Y j jk matejointconditionalprobabilities(see Bishop,Fienberg,and denote the kth measurement of the jth endpoint. For exam- Holland 1975). SS-ANOVA provides a general framework ple, in ophthalmologicalstudies, we have two repeated mea- for multivariatenonparametricfunctionestimationthatallows surements for each disease: left eye and right eye. In a typ- both main effects and interactionterms. The popularadditive ical longitudinalstudy, we have repeated measurements over spline models discussed by Hastie and Tibshirani (1990) and time. Y=4Y 1j=11:::1J1k=11:::1K 5 is a multivariate jk j implemented in S (Chambers and Hastie 1992) are special Bernoulli outcome variable. Let X =4X 1X 1:::1X 5 jk jk1 jk2 jkD cases of SS-ANOVA models restrictedto main effects. Hastie be a vector of predictor variables ranging over the subset ¸ and Tibshirani(1990) also commented on interactionspaces. of ²D, where X denotes the dth predictor variable for the jkd SS-ANOVA models have been studied extensively for kth measurement of the jth endpoint. Some predictor vari- Gaussian data. Recently, Y. Lin (2000) obtained some gen- ables may take different values for different measurements, eral convergenceresults for the tensor productspaceANOVA whereas others may be the same for all Yjk’s. For example, model and showed that the SS-ANOVA model achieves in ophthalmologystudies both person-speci(cid:142)c predictors and the optimal convergence rate. Wahba, Wang, Gu, Klein, eye-speci(cid:142)c predictors may be present. The person-speci(cid:142)c and Klein (1995) gave a general setting for applying SS- predictors are the same for each person. For the eye-speci(cid:142)c ANOVA to data from exponential families. They success- predictors, the set of predictor variables is the same, but fully applied their method to analyze demographic medical the variables may take different values for the left and right eyes. We can treat observations from both eyes as corre- data with univariate Bernoulli outcomes. X. Lin (1998) pro- lated repeated measurements in ourmodel.Let X=4X 1j= posedusing SS-ANOVA to analyzedata with polychotomous jk 11:::1J1k=11:::1K 5. Then 4X1Y5 is a pair of random responses. Wang (1998a) developed a mixed effects smooth- j vectors. For a response vector y =4y 1j =11:::1J1k = ing splinemodel for correlatedGaussian data. Brumback and jk 11:::1K 5, its joint probability distribution conditioning on Rice (1998) proposed smoothingsplinemodels for correlated j the predictorvariablesX can be written as curves (see also Wang 1998b;Wang and Brown 1996).Inter- estingconnectionsbetweenSS-ANOVA modelsandgraphical P4Y=yX5 models,as discussedbyWhittaker(1990),Jordan (1998),and — others also may be observed. J Kj J =exp f y + (cid:129) y y It is of particular interest to us to explore the nonlinear- jk jk jk11jk2 jk1 jk2 ity of the conditional logit functions. We use the log odds Xj=1kX=1 Xj=1kX1<k2 ratio to model the association among multivariate Bernoulli + (cid:129) y y + j1k11j2k2 j1k1 j2k2 outcomes. We restrict the log odds ratios to simple paramet- j1<j2k11k2 X X ric forms and estimate them using maximum likelihood esti- + (cid:129) y y :::y b4f1(cid:129)5 1 (1) mation. However, the methods that we propose here can be 111121:::1JKJ 11 12 JKJ ƒ easily generalized to estimate log odds ratios nonparametri- where cally, and to other cases when the log-likelihoodcan be fully speci(cid:142)ed. An extensionof the generalizedapproximatecross- validation(GACV) proposed by Xiang and Wahba (1996) to b4f1(cid:129)5=log 1+ efjk+ e4fj1k1+fj2k2+(cid:129)j1k11j2k25 multivariate responses is derived for choosing the smoothing j1k j11k1j21k2 X X X parameters.It is derivedstartingwithanapproximateleaving- out-one-subject argument, in contrast to the leaving-out-one- + + e4 all ff+ all (cid:129)(cid:129)5 0 (2) P P observationargumentusedbyXiangandWahba(1996)inthe original derivation of the GACV. Then the randomized trace Let M = J K be the length of the vector Y. There j=1 j techniquefor computingthe GACV obtainedby Wahba,Lin, are a total of 2M 1 parameters: 4f1(cid:129)5 = 4f 1f 1:::1 Gao, Xiang, Klein, and Klein (1999)and Lin, Wahba, Xiang, f 1(cid:129) P1:::1(cid:129) ƒ 5, which may depe11nd12on X. JKJ 11112 111121:::1JKJ Gao,Klein,andKlein(2000)isgeneralizedtotheextensionof The parameter space is unconstrained. They have straight- theGACV just derived.An ef(cid:142)cientnumericalapproximation forward interpretations in terms of conditional probabilities; schemeanditerativealgorithminvolvingatailoredblockone- for example, step successive over relaxation SOR–Newton-Raphson algorithm is proposed to compute the conditional logit functions fjk=logit4P4Yjk=1—Y4ƒjk5=01X55 (3) Gaoetal.: SmoothingSplineANOVAforBernoulliData 129 is the conditionallogitfunction; modelisamemberofthequadraticexponentialmodelofZhao and Prentice (1990). (cid:129)j1k11j2k2 =logOR4Yj1k11Yj2k2—Y4ƒj1k11ƒj2k25=01X5 (4) 3. PENALIZED MULTIVARIATE LOGISTIC is the conditionallog odds ratio, which is a meaningful way REGRESSION USING SMOOTHING to measure pairwise association;and SPLINE ANALYSIS OF VARIANCE (cid:129)j1k11j2k21j3k3 =logOR4Yj1k11Yj2k2—Yj3k3 =11 To simplify notation, we consider a parsimonious model Y4ƒj1k11ƒj2k21ƒj3k35=01X5 here, by setting all higher-order associationsthan pairwise to be 0 in (7). Thus the negativelog-likelihoodfunction simpli- logOR4Y 1Y Y =01 ƒ j1k1 j2k2— j3k3 (cid:142)es to Y4ƒj1k11ƒj2k21ƒj3k35=01X5 (5) n n J Kj measuresthree-wayassociation.HereY4ƒ 5 denotesthesubset ¬4y1f1(cid:129)5=ƒ l f4xi51(cid:129)4xi5 =ƒ fj4xijk5yijk of vector Y, exceptY , and i=1 i=1 j=1k=1 X X XX J p + (cid:129) 4x 1x 5y y + logit4p5=log1ƒp1 j=1k1<k2 jj ijk1 ijk2 ijk1 ijk2 j1<j2k11k2 X X X X P4v=11w=15P4v=01w=05 OR4v1w5= 0 (6) (cid:129) 4x 1x 5y y b4f1(cid:129) 51 1 (8) P4v=11w=05P4v=01w=15 j1j2 ij1k1 ij2k2 ij1k1 ij2k2ƒ i i Now assume that we have n independent observa- where tions 4x1y51i =11:::1n, where y =4y 1y 1:::1y 5 i i i i11 i12 iJKJ and x = 4x 1x 10001x 5. Here y and x = 4x 1xi 1:::i111x i125 are thiJeKJoutcome variiajkble and prijekdic- b4fi1(cid:129)i5=log 1+ exp4fj4xijk55 ijk1 ijk2 ijkD j1k tor vector for the kth measurement of the jth endpoint of X the ith subject. From now on we use fi and (cid:129)i to denote + exp4fj14xij1k15+ fj24xij2k25 the parameters for the ith subject, while y =4y 1:::1y 5, j11k1j21k2 1 n X X f=4f 1:::1f 5,and(cid:129)=4(cid:129) 1:::1(cid:129) 5.Wecanwritethelog- 1 n 1 n + (cid:129) 4x 1x 55+ + exp f 4x 5 likelihoodfunctionbased on the observed data j1j2 ij1k1 ij2k2 j ijk j1k X ¬4y1f1(cid:129)5 + (cid:129) 4x 1x 5 0 (9) j1j2 ij1k1 ij2k2 n J Kj J j11k1j21k2 = f y + (cid:129) y y X X ijk ijk ijk11ijk2 ijk1 ijk2 i=1 j=1k=1 j=1k1<k2 We are interested in relaxing the parametric assumptions X XX X X to builda (cid:143)exible log-linearmodel.We are particularlyinter- + (cid:129) y y + ij1k11ij2k2 ij1k1 ij2k2 ested in exploring the nonlinearity of the conditional logit j1<j2k11k2 X X functions f. We also could model the (cid:129) nonparametrically, + (cid:129) y y y b4f1(cid:129) 5 0 (7) but becausea largernumberofobservationswould be needed i111i121:::1iJKJ i11 i12 iJKJ ƒ i i toestimateandtunemanymultivariatesmoothfunctionsnon- parametrically, we assume in this article that the (cid:129) are of We call (7) the log-linear model for multivariate logistic simple parametric form possibly depending on some set of regression. Here f is the conditional logit function for the ijk parameters ‚, and leave more general(cid:129) for future study.We kthmeasurementofthejthendpointoftheithsubject.Scien- ti(cid:142)cally,exceptforthepossibilitythatthefijk maytakediffer- use the penalizedlikelihoodmethodto model the fj nonpara- entpredictorvaluesfrom measurementto measurement,there metrically. To relax the parametric assumptions, the penal- is little reason to believe they will take different functional ized likelihood method (O’Sullivan 1983) assumes that the formsforthesameendpoint.Hencewe canassumethatf = function to be estimated is smooth in some sense. This is ijk f 4x 5. The same reasoning applies to the associationterms; done by assuming that the function to be estimated is in j ijk forexample,wecanassumethat(cid:129) =(cid:129) 4x 1x 5. a given reproducing kernel Hilbert space of “smooth” func- In practice, the number of paraimj1ke1t1eij2rks2to bje1j2estiijm1ka1tedij2ck2an tions, where “roughness” is measured by the size of some be reduced in many ways. For example, in many situations quadraticfunctional,typicallyasquarenormorseminorm,and scienti(cid:142)c interestis focused primarilyon the conditionallogit a roughnesspenaltyis imposedinobtainingtheestimate.The function f and log odds ratio (cid:129) , which measures reproducingkernel Hilbert space theory (see Aronszajn 1950; ijk ij1k11ij2k2 pairwise association. The existence of three-way associations Kimeldorf and Wahba 1971) can then used to character- (cid:129) and higher-order associations are usually dif- ize the solutions of very general penalized likelihood prob- (cid:142)ciju1kl1t1ijt2ok21vij3ekr3ify in practical situations and may attract less lems. We assume then that f ¨j, where ¨j is a given j 2 scienti(cid:142)c interest. Hence it is possible to set all higher-order reproducing kernel Hilbert space of functions on ¸. associations to be 0 and to (cid:142)t only a parsimonious model Hence, f = 4f 1f 1:::1f 5 ¨1 ¨J. The penal- 1 2 J 2 instead of the saturated one described in (7). The reduced ized multivariate logistic regression estimate of f and (cid:129) = 130 JournaloftheAmericanStatisticalAssociation,March2001 4(cid:129) 1(cid:129) 1:::1(cid:129) 5 is the minimizer of the variational prob- whereŒ=4 ¥ 5g,g =44I ¥ 5 ¥ 5g,g =44I lem11 12 JJ ¥ 54I ¥ 5 d d ¥d 5g,aƒndsdofodrt1h6=.dThd1eaverda1gd2ingopeƒr- n n atdo1rs aƒnddn2oQrmds36=od1n1d2¨dj3can be chQosen so that the compo- Q ¬ 4y1f1(cid:129)5= l4f4x51(cid:129)4x55+ ª 4f 1:::1f 51 (10) nentsofthisdecompositionareprojectionsofg ontoorthogo- å ƒ i i 2 å 1 J i=1 nal subspacesof ¨j (see Gu and Wahba 1993;Wahba 1990; X where the (cid:142)rst part is the negative log-likelihood,the second Wahba et al. 1995). In practice, the ANOVA decomposition part is the roughness penalty,and å=4å 1:::1å 5 are the in(14)willbetruncatedatsomepoint.Assumingthatwehave 1 J smoothingparameters that controlthe smoothnessof the esti- alreadydecidedwhichsubspaces(equivalently,componentsof mated fj’s. We assume an additiveform of the penalty func- g) will be includedin our model ›4 ¨j5, and suppressing tional for simplicityand easy interpretation, the superscriptj on the subspacesfor the rest of this section, we can regroup and write the model space as J ª 4f 1:::1f 5= ªj 4f 50 (11) å 1 J åj j j=1 X › =¨ 4¨ ¨ 5=¨ ¨ 0 (15) Weconsidertheorthogonaldecomposition¨j=¨j ¨j. 0 111 11q 0 1 0 1 Here ¨j is (cid:142)nite dimensional (the “parametric” part, usually 0 polynomials), and ¨j (the “smooth” part) is the orthocom- plement of ¨j in ¨j1. The penaltyfunctionalwill be related Usually we will let ¨0 be a (cid:142)nite-dimensional space con- 0 onlytothe smoothpartof thefunctional,ªj 4f 5= Pjf 2 , taining functions that will not be penalized. The norms on where Pj is the orthogonal projection opeårjatojr in˜¨1j joñåtjo the composite ¨11l11 l q are the tensor product norms 1 ¨j. The penalizedlikelihoodhas the expression induced by the norms on the component subspaces, and 1 g 2 = P g 2+ q ‹ Pg 2, where P is the orthogonal ˜ ˜å ˜ 0 ˜ l=1 l˜ l ˜ l n n J projectorin› onto¨ .Nowwecanusereproducingkernel ¬ 4y1f1(cid:129)5= l4f4x51(cid:129)4x55+ Pjf 2 0 (12) P 11l å ƒ i i 2 ˜ 1 j˜åj Hilbert space methods to explicitly impose roughness penal- i=1 j=1 X X ties.Forthepenaltyfunctionalin(11),itcanbefurtherrewrit- The followingtheoremshows the existenceand uniqueness ten as of the solution to the variational problem (10). Letting ¨ = 0 ¨1 ¨J denote the null space of ¨1 ¨J with 0 0 respect to the penaltyfunctionª , we have: q å ªj 4f 5= ‹ Pjf 20 (16) Theorem 1. If the minimizer of (12) exists in ¨0, then it åj j l=1 jl˜ l j˜ uniquelyexists in ¨1 ¨J0 X Proof. See AppendixA. It is well known (see, e.g., Kimeldorf and Wahba 1971; If ” 1(cid:141)=11:::1M span ¨ , then this correspondsto the Wahba 1990) that the minimizer of the variational prob- (cid:141) 0 identi(cid:142)ability of the ordinary parametric model, which is a lem(12)isinaparticular(cid:142)nite-dimensionalsubspacespanned linearcombinationofthe” .Inthelaterexamples,the” are by a set of basisfunctionsobtainedfrom thereproducingker- (cid:141) (cid:141) low-degree polynomials. nelassociatedwiththemodelspace.It is knownthatthemin- imizers f of (12) have the form 3.1 Smoothing SplineAnalysis of j Variance Models for f j Here we brie(cid:143)y review some of the well-known prop- f 4 5=”j45Tdj+ (cid:142)j4 5Tcj1 (17) j erties of SS-ANOVA models. For any j (1 j J), let g = f . Here g is assumed to be a function of D vari- j ables, g =g4x11:::1xD5 with xd 2¸4d5, d =11:::D; thus where cj and dj are vectors of coef(cid:142)cientsto be found. Here x=4x11:::1xD52¸ =¸415† †¸4D5. Given probability 8”j9pj is a set of p basis functionsspanningthe nullspace measuresdŒ on¸4d5,de(cid:142)netheaveragingoperators¥ on¸ (cid:141) (cid:141)=1 j d d ¨j. ”j45T =4”j451:::1”j 455. (cid:142)j45T =4(cid:142) 451 1 i as 0 1 pj ijk n11 k K 5, where (cid:142) 45 is the representer of the evalua- j ijk 4¥d5g4x5= g4x11x210001xD5 dŒd4xd50 (13) tionfunctionalat xijk in ¨1j. It is not necessary to befamiliar ¸4d5 withtheconceptoftherepresenterofanevaluationfunctional Z These averaging operators de(cid:142)ne a unique (functional in a reproducingkernel Hilbert spaceto follow the rest of the ANOVA) decompositionof g as article,if one is willingto acceptthe result that ourestimates will be of the form (17) and that the penalty functionals are D quadraticformsinthecj asgivenfollowing(19)below.(Inter- g4x11:::1xD5=Œ+ gd4xd5+ gd1d24xd11xd25 ested readers can (cid:142)nd de(cid:142)nitions in Wahba 1990, pp. 1–2.) d=1 d1<d2 Descriptions of the particular ”j and (cid:142) used in this article X X (cid:141) ijk + + g 4x 1:::1x 51 (14) are givenin Section 5. 11:::1D 1 D Gaoetal.: SmoothingSplineANOVAforBernoulliData 131 The smoothing parameters å =4‹ 1:::1‹ 5 are incor- that the initial value for V be at least 25. Combined with j j1 jq porated into (cid:142)j4 5T. Details have been given by Gu and the iterative method for choosing smoothing parameters pro- Wahb (1993), and Wahba (1990, chap. 10). and Wahba et al. posed in the next section, the foregoing algorithm usually (1995). The penalty functionals are known quadratic forms converges very rapidly. From our experience, for medical in the cj. We give examples later. Given the 8‹ 9 and the data, 50–100 basis functions usually yield a very good jl result (17), the numerical problem is to obtain the minimizer approximation. of (10). Next, we discuss in detail how to use the block one-step SOR algorithmtoobtainan approximatesolutionfor(cid:142)xedV. 3.2 The Fitting Algorithm Numerically, we need to solve a large nonlinear system. To For computationalreasons, onlyan approximateminimizer speed up the computation,we break down the nonlinearsys- of (12) will be obtained, using only a subset of the basis tem to several “blocks.” When updating one “block” in the functions 8(cid:142)ijk4511 i n11 k Kj9. Usually, when the nonlinearsystem,we(cid:142)xallother“blocks”atthemostrecently design points are close, their representers are also very close. updated values. The natural breakdown in this application is Hence when the dataset is large, it is very likely that many to update the f ’s and ‚ iteratively.To update the ‚, we use of the basis functions are nearly linearly dependent. On the j the Newton-Raphsonalgorithm and iterate until convergence. other hand, if by some “prior” knowledge the structure of However, theupdatingstep forf is much more computation- the true f is thought to be not very complicated, then it j j ally intensive. The block nonlinear SOR algorithm requires may be well approximated by a small number of basis func- thattheNewton-Ralphsonalgorithmberununtilconvergence. tions. As a result, if we select a subset of the design points However, we use only a one-step updating formula, sacri(cid:142)c- having maximum separation, then their corresponding repre- ing the overall convergence rate a bit to reduce the compu- senters are expected to have less correlation. We choose a tational complexity in each iteration. Let S be the nK p (cid:142)xed V and usetheSAS procedureFASTCLUS to clusterthe j j j matrix with entries ”j4x 5. Returningto our(cid:142)xed V and the nKj values of the design points 8xijk11 i n11 k Kj9 (cid:141) ijk into V clusters. Within the vth cluster, v =11:::1V, we selected design points xj1v, v=11:::1V and the associated randomly select one design point and call it xj1v. The rep- representers (cid:142)j1v4 5, we de(cid:142)ne resenter of evaluation in ¨j at x , called (cid:142) , is used to j1v j1v form theapproximatingsubspace.An approximatesolutionis (cid:142) 4x 5 (cid:142) 4x 5 (cid:142) 4x 5 of the form f 45=”j45Tdj+ (cid:142)j 45Tcj, where ”j45T is as j11 1j1 j12 1j1 j1V 1j1 j V V before and (cid:142)j45T =4(cid:142) 451v=11:::1V5, and is computed 0(cid:142) 4x 5 (cid:142) 4x 5 (cid:142) 4x 51 V j1v j11 1j2 j12 1j2 j1V 1j2 Q = froitrhm(1.2()sebeyOartebgloacakndonRe-hsetienpboSlOdtR1–9N7e0w).toWn-hReanlpthhesonnuamlgboe-r j1V BB 000 000 000 000 CC B C ovferbgaessistofutnhcetieoxnascVt soinlucrteioanse.sW, tehehaavpeprfooxuinmdattehastofluotriomnocdoenr-- BBB(cid:142)j114xnjKj5 (cid:142)j124xnjKj5 (cid:142)j1V4xnjKj5CCCnKj V @ A atelylargeV, itisnotimportantwhichpointineachclusteris chosen. and Recall that the (cid:129) could depend on a set of parameters ‚. The iterative algorithm for the approximate solution is given (cid:142) 4x 5 (cid:142) 4x 5 (cid:142) 4x 5 j11 j11 j12 j11 j1V j11 in Table 1. 0(cid:142) 4x 5 (cid:142) 4x 5 (cid:142) 4x 51 j11 j12 j12 j12 j1V j12 Q = (18) Table1. IterativeAlgorithmforApproximateSpline j1V BB 000 000 000 000 CC B C Vdo„initialvalue BBB(cid:142)j114xj1V5 (cid:142)j124xj1V5 (cid:142)j1V4xj1V5CCCV V0 ClusterthedatapointsintoV groups @ A Randomlyselectonedatapointfromeachgroup Let cj =4c 1c 1:::1c 5T. Our goal is to compute the Generatethecorrespondingbasisfunctions V j11 j12 j1V approximatesolutionwith the representation f initialvalues,j =1121:::1J j „ ‚ initialvalues „ do f 45=”j4 5Tdj+ (cid:142)j45Tcj1 (19) j V V doj =1toJ f updatedvaluesintheapproximatingsubspace j „ which involvestheuseof S , Q and Q . It is easytover- end j j1V j1V ify that the penaltyfor f has the quadratic form P f 2 = ‚„Newton-Ralphsonupdate for‚ j ˜ 1 j˜åj until(convergence) cjTQ cj. Therefore, to update f , provided all other esti- V j1V V j V „2 V mated valuesare (cid:142)xed at the solutionsfrom the previousiter- until ˜fne˜wfoƒldfõld˜<prec1 and ˜‚n˜e‚woƒld‚õld˜ <prec2 ation,andrecallingthatfijk=fj4xijk5, thevariationalproblem is to minimize setHteore10pre7c1anadnd10pre8ci2natrheepreexsapmecpil(cid:142)eesdtothfroesllhoowld.sW,wehsiucghgweset Ij1V =ƒ n Kj fijkyijkƒb4fi1(cid:129)i5 + n2cjVTQj1VcjV0 (20) ƒ ƒ i=1 k=1 X X 132 JournaloftheAmericanStatisticalAssociation,March2001 Denote b =b4f1(cid:129) 5. To update f , we need the following, i i i j where l is de(cid:142)ned in Lemma C.1 j ¡b Œijk = ¡fi =EYijk= efijk+ efijk+fijk3+(cid:129)ijk1ijk3+ + e j31k3fij3k3+ j31k3 j41k4(cid:129)ij3k31ij4k4 ijk kX36=k P P P 1 ƒ 1+ efij3k3+ + e j31k3fij3k3+ j31k3 j41k4(cid:129)ij3k31ij4k4 1 j31k3 P P P ¡2b X w = i =varY =Œ 41 Œ 51 ijk1ijk ¡f2 ijk ijk ƒ ijk ijk ¡2b ¡b w = i =cov4Y 1Y 5=E4Y Y 5 EY EY = i Œ Œ ijk11ijk2 ¡f ¡f ijk1 ijk2 ijk1 ijk2 ƒ ijk1 ijk2 ¡(cid:129) ƒ ijk1 ijk2 ijk1 ijk2 ijk11ijk2 =4efijk1+fijk2+(cid:129)ijk11ijk2 + + e j31k3fij3k3+ j31k3 j41k4(cid:129)ij3k31ij4k45 P P P 1 ƒ 1+ efij3k3+ + e j31k3fij3k3+ j31k3 j41k4(cid:129)ij3k31ij4k4 ƒŒijk1Œijk21 j31k3 P P P X ¡l 4y 1f 5 u = ƒ j ij ij = y + Œ 1 ijk ¡f ƒ ijk ijk ijk u =4u 1u 1:::1u 1u 1:::1u 5T1 j 1j1 1j2 1jKj 2j1 njKj w w w ij11ij1 ij11ij2 ij11ijKj 0 w w w 1 ij21ij1 ij21ij2 ij21ijKj W = 1 ij BB 000 000 000 000 CC B C B C Bw w w C BB@ ijKj1ij1 ijKj1ij2 ijKj1ijKjCCAKj Kj and W =diag4W 1W 1:::1W 50 j 1j 2j nj Therefore, the one-step updatingformula is to solve We use this result later to construct approximate Bayesian con(cid:142)denceintervalsfor the f . j QT W Q + nQ QT W S cj cj The preceding discussion assumes no special structure in j1V j j1V j1V j1V j j Vƒ V ƒ ƒ ƒ the design points. The algorithm is speci(cid:142)cally designed to 0 STW Q STW S 10dj dj 1 j j j1V j j j ƒ handle the unstructured case. However, when special struc- ƒ ƒ ƒ @ A@ A ture is available, the foregoing algorithm can be simpli(cid:142)ed. QT u nQ cj = ƒ j1V jƒƒ j1V Vƒ 1 (21) One common case is the presence of person-speci(cid:142)c covari- 0 ƒSTjuj 1 ates only. Hence xijk =xij for all k =11:::1Kj. Similarly, ƒ f =f 4x 5 =f 4x 5 =f . To update f , the part of the @ A ijk j ijk j ij ij j where thesubscriptminusindicatesthequantitiesevaluatedat penalizedlikelihoodthat needs to be minimized has the sim- the latestupdate. pli(cid:142)ed form With some abuse of notation, we use u to denote ij n Kj n 4uij11:::1uijKj5T, fij to denote 4fij11:::1fijKj5T, an so on. Ij1V =ƒ yijk fijƒbi + 2cjTQj1Vcj0 (23) HtoesreeeyQitjh=atftihjƒeƒsoWluƒitjiƒ1ounijƒofa(r2e1c)agllievdesthtehepsmeuindiom-dizaetra.oIftiseasy Xi=1 kX=1 Now rede(cid:142)ne 1 n 4y f 5TW 4y f 5+ cjTQ cj0 (22) n Qijƒ ij ijƒ Qijƒ ij j1V (cid:142)j114x1j5 (cid:142)j124x1j5 (cid:142)j1V4x1j5 i=1 X 0(cid:142) 4x 5 (cid:142) 4x 5 (cid:142) 4x 51 j11 2j j12 2j j1V 2j The block one-step SOR–Newton-Ralphson procedure iter- Q = 0 (24) j1V B 00 00 00 00 C atively reformulates the problem to estimate f from the B 0 0 0 0 C j B C pseudo-databyweightedpenalizedleastsquares.Thepseudo- B C dataareequivalenttotheadjusteddependentvectorofYeeand BB(cid:142)j114xnj5 (cid:142)j124xnj5 (cid:142)j1V4xnj5CCn V @ A Wild(1996).Thepseudo-datacanbeshown to havetheusual (mean andcovariance)datastructureimplicitin(22)ifthefj Denote yij = Kk=j1yijk, Œij =E4 Kk=j1Yijk5, Wij =var4 Kk=j1 are not far away from f . A theorem is given in AppendixBƒ. Y 5, u = Kj y + Œ , and u =4u 1u 1:::1u 5T. j ijk ij ƒPk=1 ijk ij P j 1j 2j Pnj P Gaoetal.: SmoothingSplineANOVAforBernoulliData 133 Except for the aforementioned changes, all of the previous for choosingå , CV4å 5, is de(cid:142)ned as j j formulas and discussionsremain true. Finally, by the SOR–Newton theorem of Ortega and 1 n Kj Rheinboldt(1970)andcorollary3.1ofX.Lin(1998),thelocal CV4å 5= y f6ƒi7+ b4f 5 j n ƒ ijk ijk ij convergent property holds if we use block one-step SOR– i=1 k=1 X X Newton-Raphson method to (cid:142)nd the minimizer of a twice- 1 n Kj differentiableconvex function.Thus there exists an open ball = y f + b4f 5 n ƒ ijk ijk ij of the minimizer, and when the starting point is in this open i=1 k=1 X X ball, the algorithm will converge to the minimizer for (cid:142)xed Kj smoothingparameters. + yijk4fijkƒfi6jƒki75 0 (25) In response to a referee’s query, we note that the back(cid:142)t- k=1 X ting algorithmpopularizedby Hastie and Tibshirani(1990)is a form of block SOR algorithm.A block SOR algorithm was Because yijk and fi6jƒki7 are independent,and because for large also used by Wahba et al. (1995), who gave a detailed dis- sample sizes f6ƒi7 is expected to be close to f , CV is ijk ijk cussion of the relationbetween the block SOR they used and expected to provide an approximately unbiased estimate for the back(cid:142)tting algorithm of Hastie and Tibshirani. However, theCKLdistance,given(cid:129) andf4ƒj5 (cid:142)xed.First,letusassume the way in which we have assigned blocks in this article is that theexactsolutionof the variationalproblemcan becom- different than either of these forms of block SOR. puted.Lety =4y 11 i n11 k K 5T.In eachupdating j ijk j step for f , the minimizer of j 4. ADAPTIVE CHOICE OF THE SMOOTHING PARAMETERS n Kj n I 4f 1y 5= y f + b4f 5 + ªj 4f 5 (26) åj j j ƒ ƒ ijk ijk ij 2 åj j So far, all smoothing parameters are considered (cid:142)xed. i=1 k=1 X X Now we consideran automateddata-drivenmethodto choose is obtained. By using several (cid:142)rst order Taylor expansions, smoothingparameters.Theriskfunctionor“target”tobemin- we derive an approximate cross validation(ACV) score. The imizedisthecomparativeKullback–Leibler(CKL)distanceof GACV is a generalizationof the ACV score. theestimatefrom the“truth.” Becausethetruthis notobserv- First, we de(cid:142)ne the generalized average of submatrices able,exceptinsimulationstudies,theCKL alsoisnotobserv- as a generalization of the trace of a matrix. For any matrix able.Thusa computableproxyfor theCKL whoseminimizer is a good estimate of the minimizer of the CKL is desired. A with submatrices Aii11 i n on the diagonal, Aii = The GACV was proposed by Xiang and Wahba (1996) as ai1k1k2 K K11 k11k2 K, if K =1, the generalized aver- a method for choosing a smoothing parameter for penalized ageofAii issimplythetraceof Adividedbyn,SA=tr4A5=n. likelihood estimates when the data are from an exponential When K 2, let family with no nuisance parameters. Those authors showed via Monte Carlo methods with Bernoulli data that its mini- 1 n K „= a =tr4A51 mizer providesa good estimate of the minimizer of the CKL nK i1kk i=1k=1 (27) inrelativelysmallsamples.ThederivationoftheGACV score X1X n ƒ= a 0 beginswithaleave-out-oneargument.Directcalculationofthe n K4K 15 i1k1k2 GACV score becomes unstable in the large-sample case due ƒ Xi=1kX16=k2 to the existence of the inverse of a generally ill-conditioned The generalizedaverage of A ’s is de(cid:142)ned as matrix in the formula (see Xiang and Wahba 1996). Later, ii Lin et al. (2000) and Wahba et al. (1999) demonstrated that the GACV score can be computedand minimizedovermulti- „ ƒ ƒ ple smoothingparameters with the use of a randomized trace 0ƒ „ ƒ1 ttehcishnairqtiucele,(wraenGexAtCenVd),thwehiGchACprVo,viadnedsathsetnabtlheecraalncuGlAatCioVn,.Itno SA=4„ƒƒ5IK K+ ƒ eeT =BB000 000 000 000CC 1 (28) B C thecaseof correlatedBernoulliobservations.Insteadof using B C Bƒ ƒ „C a leave-out-one-observation to start the derivation, we leave B CK K @ A out one person. Following the derivation, in the next section we demonstrateviaa small simulationstudythe propertiesof where e=41111 115T is the unitvector.When K 3, this the ranGACV for providing an estimate of the minimizer of formissuitablewheneveritisreasonabletoassumethatasso- the CKL in the correlatedBernoullicase. ciations between different pairs of repeated observations are Tochoosethesmoothingparametersforf ,weletallofthe similar. Other generalization forms are possible. The proper- j otherconditionallogitfunctionsf 4`=j5 and the(cid:129) be(cid:142)xed ties of the K 3 case are a matter for future research. ` 6 at the estimated values from the previous iteration. Let fi6jƒki7 LetHj denotetheinverseHessianof(26)withrespecttofj. denotethe estimatedconditionallogitfunctionf evaluatedat Hj is an nK nK matrix with Hj1 i=11:::1n as K K j j j ii j j x , with the ith subjectleft out for the estimationprocedure. submatrices on the diagonal.Let Gj =I W Hj and let Hj ijk ii ƒ ij ii S The ordinary leave-out-one-subject cross validation function andGj denotethegeneralizedaverageofHj andGj.Wenow S ii ii 134 JournaloftheAmericanStatisticalAssociation,March2001 de(cid:142)ne the GACV as …T… …TW 4fYj+…N11 fYj5. We canthuscalculatetherandom- 1 n Kj iNzeNdƒesNtimatjeoåfj Gj.ƒThåisj approach avoidsexplicitcalculation GACV4å 5= y f + b4f 5 S j n ƒ ijk ijk ij oftheinverseHessianHj,whichiscomputationallyexpensive i=1 k=1 X1 n X and tends to be unstable if Hj is ill conditioned.A random- + y y ized estimate can always be obtained provided that a cheap n ij1 ijKj i=1 andstable“blackbox”existsforcalculatingthe(approximate) X y Œ one-step solution for perturbed data. The resulting ranGACV HSj4GSj5ƒ10 ij1ƒ000 ij1 10 (29) functionis y Œ 1 n Kj B ijKjƒ ijKjC ranGACV4å 5= y f + b4f 5 @ A j n ƒ ijk ijk ij DetailsofthederivationoftheGACVaregiveninAppendixC. i=1 j=1 X X DirectcomputationoftheGACVforlargesamplesizesisslow 1 n + 4y 1:::1y 5 andtendstobeunstable(seeXiangandWahba1996,Sec.3.1). n ij1 ijKj Thisexplicitcalculationcanbeavoidedbyusingatechniquein i=1 X thespiritoftherandomizedtracemethod,providedthatasolu- y Œ tion(eitherexactorapproximate)ofthevariationalproblemcan HS¹j GS¹j ƒ10 ij1ƒ000 ij1 11 (32) beobtainedatalowercost.Weproposeaone-steprandomized y Œ 0 estimateofGACVthatisfastandcheaptocalculate. B ijKjƒ ijKj C @ A Given a square matrix A with A 41 i n5 being the K K submatricesonthediagonal,wiie discusshowto obtain where SH¹j and GS¹j denotethe randomizedestimates.To reduce the variance in the term after the “+” in (32), we may draw a randomized estimateof A. First, n iid vectors of K iid ran- S R independentrandom vectors …4151:::1…4R5, and replace the dom variables distributed as N(0,1) are generated. Let … = i term after the “+” in (32) by 4… 1:::1… 5T be the ith such vector and …=4…T1:::1…T5T. i1 K 1 n Then „ = tr4A5=4nK5 can be estimated by 4…TA…5=4nK5. y Œ Otr4nAt5h5e=4ontKhe4rKhan1d5,5.ifTKo e>sti1m,attheen ƒ =4 ai k11k,2laeit1k1…k2 ƒ= n1R R n 4yij11:::1yijKj5HS¹j4r5 GS¹j4r5 ƒ10 ij1ƒ000 ij1 1 41=pK5 Kk=1…ƒikand…N =4…N11:::1…NP11i…PN21k1:1:Pk2:1Pi…N1kn15kT2.HereN…iN is Xr=1Xi=1 B@yijKjƒŒijKj0CA(33) a column vector with K replicates of … for each 1 i n. Note thatPE…TA…= a . ThNui s we can estimateƒ to obtainan R-replicatedranGACV function.The GACV and N N i k11k2 i1k1k2 by4…TA… …TA…5=4nK4K 155andobtainarandomizedesti- ranGACV functions are derived by assuming that the mini- N Nƒ P P ƒ mateof A. mizer of (20) is calculatedat each block nonlinearSOR iter- S In practice,therandomizedestimateofGACV iscalculated ation. To speed up the algorithm, however, only a one-step by solvingthe nonlinearsystem on the perturbed data Y + … update will be calculated.We remark that all favorable prop- j and Y + …. Let fYj denote the solution of (26) using the erties of GACV and ranGACV are preserved for the block originajl daNta andåljet fYj+… denote the solution using the one-stepSORalgorithmandapproximatesplineestimate.It is åj veryeasytocarryoutthecomputation,asnoadditionalmatrix perturbeddata.If we takefYj astheinitialvaluetoaNewton- decompositionis required.We iterativelyminimizeranGACV åj Ralphsoncalculationof fYj+… and run the iteration only once ineachstep oftheblockone-stepSOR iteration.Thisis done by using all matrix decoåmj positions that have already been repeatedlyuntilsomeprespeci(cid:142)edconvergencecriteriaismet, performed for calculatingfYj in the last step, then we obtain or the number of iterationsexceedsa prespeci(cid:142)ed limit. the one-step solution fYj+…1å1.j Because 4¡I =¡f 54fYj1Y 5=0 We remark that it is a straightforward generalization to åj åj j åj j extend the the algorithm and the ranGACV to observations and 4¡2I =¡fT¡f 54fYj1Y 5=4¡2I =¡fT¡f 54fYj1Y + …5, we that come from other exponential families with no nuisance åj j j åj j åj j j åj j observethesimplerelation parameter. In fact, the original derivation of the GACV did ¡2I 1¡I not use the fact that the data were Bernoulli. However, the fYj+…11=fYj åj fYj1Y +… ƒ åj fYj1Y +… GACV hasnotyetbeen testedondatafromotherexponential åj åjƒ ¡fT¡f åj j ¡f åj j j j j families. ¡2I 1 ¡I =fYj åj fYj1Y ƒ …+ åj fYj1Y 5. MONTE CARLO SIMULATIONS åjƒ ¡fT¡f åj j ƒ ¡f åj j j j j In this section we demonstrate results from some Monte =fYj+ Hj…0 (30) Carlosimulationstoevaluatetheperformanceoftheproposed åj method. The comparative Kullback–Leibler distance (CKL) Hence we have is used to measure the performance of the estimated values. In our experience,the CKL distance provides a measure that fYj+…11 fYj =Hj…0 (31) åj ƒ åj agreesverywellwithhumanintuitionwheneyeballingthe(cid:142)t- ted surfaces. Thus …T4fåYjj+…11 ƒfåYjj5 = …THj… and …NT4fåYjj+…N11 ƒfåYjj5 = In all models presented here and in next section, we used …SNTimHijla…Nr,lya,n…dTGwje…=can…To…bta…inTWa r4afnYdj+o…m11izefdYje5s,tiamndat…eToGfj…HS=j. rbelporcokdsufcoirngthkeertnenelssoraspsroocdiautcetdswpaicthecruepbriocdsupcliinnegskaesrnbeulisldtihnagt ƒ j åj ƒ åj N N Gaoetal.: SmoothingSplineANOVAforBernoulliData 135 belowwerealsogivenbyLinetal.(2000).In theexperiments and data analysisthat follow, the ranGACV is used to choose the smoothingparameters. 5.1 Repeated Measurementsfor the Same Endpoint The (cid:142)rst example concerns the single smoothing parameter situation. We try to mimic the characteristic of possible ophthalmologydata. Each subject has one endpoint of interest and paired observations.There is one observation-speci(cid:142)c covariate, X 14k=1125. The X ’s are assumed to be uni- ik i1 formly distributed on the interval 400510955. X =X + …, i2 i1 i and the …’s are uniformly distributed on 4 00510055. The i ƒ tedFiglinueres1re.prHeissetongtrtahme otrufe(cid:129)O fvoarluTehroefe(cid:129)D=iffe.8re.nTthSeammpelaenSsizoefse.sTthimeadteodt- tlrougeit4cPo4nYditi=on1alYl4oƒgki5t=fu0n1cxtio5n5 i=s 2a6sesxupm4ed30to4xbe f0245x5ik255=+ ik — i ik ƒ ikƒ valuesare.7926,.7937,and.7686. sin4(cid:143)x257 2, and the conditional log odds ratio (cid:129) = ik ƒ logOR4Y 1Y x5=08. Three different sample sizes are used i1 i2— i generatethe(cid:142)’s.(ThisconstructionwasgiveninWahba1990, in this simulation: n=12512501500. For each sample size, sec. 10.2, for general SS-ANOVA models.) We brie(cid:143)y note 100independentdatasetsarerandomlygeneratedaccordingto some details.Let u 60117; thenullspace of the cubicspline 2 the true jointdistribution. penaltyfunctional( 4f"52) is spannedbythelinearfunctions Figure 1 shows histogram plots of the estimated (cid:129) for the ” 4u5=1and” 4u5=u 1=2.De(cid:142)nethereproducingkernel O 1 2 R ƒ threedifferentsamplesizes.Thedottedlinesrepresentthetrue r 4u1v5 =k 4u5k 4v5 k 46u v75, where u1v 60117 and 1 2 2 ƒ 4 ƒ 2 value of 08. The (cid:142)tted values appear to converge to the truth k 4u5=B 4u5=`, with B 45 the `th Bernoullipolynomial.In ` ` W ` whilethesamplesizeincreases.Theestimatorof(cid:129) appearsto thecasewhere f is afunctionon60117,the”’sarethelinear j be approximatelyunbiased and normally distributed from the functionsabove and (cid:142) 4u5=r 4x 1u5. When f is a func- j1(cid:141) 1 j1(cid:141) j tion on the unit square or higher dimensional unit cube, the histogram. ”’s are builtup from tensorsums and productsof” and ” , Figure 2 plotsthe true conditionalprobabilityfunctionand and reproducingkernelstogeneratethe (cid:142) are built1up from2 the estimated curves for each sample size, P4Y =1Y4ƒk5= j1(cid:141) ik — i tensor sums and productsof r and r 4u1v5=” 4u5” 4v5, as 01x 5=ef4xik5=41+ ef4xik55. For each sample size, the 100 (cid:142)t- 1 0 2 2 ik givenbyWahba(1990).Particulardetailsforthemodelin(37) tedvaluesare rankedaccordingtotheCKLdistancesbetween Figure2. True(DashedLine)andEstimated(Solid Line)ConditionalProbabilityFunctions. 136 JournaloftheAmericanStatisticalAssociation,March2001 P4Y =1 or Y =1x5=42efi + e2fi+(cid:129)5=41+ 2efi + e2fi+(cid:129)5 i1 i2 — i 8 from the observeddata. 0. For this experiment, we assume that p4x5=P4Y =1 or i i1 p(x)0.4 YFio2u=r d1i—fxfei5re=nt08vasliune42s0a7rxei25u+sed01f.oFrig(cid:129)u:r0e,3.4p,l.o8t,satnhde1tr.u2e; (cid:129)p4=x50. corresponds to the case where Y and Y are independent. i1 i2 However we pretend that this fact is unknown, and so esti- 0 0. mate(cid:129) bytheproposedalgorithm.Straightforwardcalculation 0.0 0.2 0.4 0.6 0.8 1.0 yields the following formula to compute f for given (cid:129) and x i P4Y =1 or Y =1x5: i1 i2 — i Figure3. Truep x =P Y =1orY =1x Usedfor theSimula- i i1 i2 — i tionStudy. 4p4x5 15+ 41 p4x552+ e(cid:129)p4x541 p4x55 f =log i ƒ ƒ i i ƒ i 0 i e(cid:129)41 p4x55 p ƒ i (34) the estimated joint distributionsand the truth. The 5th, 25th, 50th, 75th and 95th best (cid:142)ts are plotted for each sample The experiment is conducted as follows. First, for the size. The true conditional logit function is a bimodal func- “univariate” (cid:142)t, the only information needed is Y, which is Li tion. The trend is clear that when the sample size increases, de(cid:142)ned to be 0 when Y =Y =0 and 1 otherwise. Here i1 i2 the estimated curves become increasingly accurate. However, P4Y =1x5=p4x5. We generate 100 datasets according to Li — i i for a parametric model, there might be no prior knowledge the true distribution and (cid:142)t the data using the “univariate” about the bimodal nature of the truth.Hence a linear or even penalized logistic regression based only on the derived data quadratic form will miss the true curve no matter how large Y’s. For the bivariate (cid:142)t, we (cid:142)rst calculate the true joint dis- Li the sample size. tribution of 4Y 1Y 5 according to the previous formula. For i1 i2 Inthenextexperiment,wecomparetheproposednew“mul- each value of (cid:129), 100 datasets are randomly generated and tivariate” method to the “univariate” (cid:142)t. In ophthalmology the joint distribution is estimated by the proposed multivari- studies, one question of interest is to estimate the probabil- atemethod.Afterward,theprobabilityP4Y =1 or Y =1x5 i1 i2 — i ity of at least one eye developing a certain disease given canbederivedfrom theestimatedjointdistribution.Forevery the values of the predictor variables for a person. Assume run, the CKL distancebetween the estimatedp4x5 and p4x5 O i i that there is no eye-speci(cid:142)c covariate.The X’s are uniformly is calculated. i distributed on [0,1], and for each subject, there are paired The foregoing procedure is performed for three different observations 4Y 1Y 5. We want to estimate the probability sample sizes: n=10012001400. Figure 4 shows histograms i1 i2 Figure4. HistogramsofEstimated(cid:129)’s for n=100,n=200, andn=400. Thedottedlines representthe truevaluesof(cid:129). From left toright, O the meansofestimatedvaluesare .0214,.3743,.7596,and 1.1624for n=400; .1028,.3161,.7058,and1.1025for n=200;and .0619, .3719,.7746,and1.1775forn=100.

Description:

Smoothing Spline ANOVA for Multivariate Bernoulli nostic tool for parametric fi tting In practice, the number of parameters to be estimated can.

Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... PDF

34 Pages·2005·2.24 MB·English

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... PDF Free - Full Version

by Unknow| 2005| 34 pages| 2.24| English

Download Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ...

Smoothing Spline ANOVA for Multivariate Bernoulli nostic tool for parametric fi tting In practice, the number of parameters to be estimated can.

Detailed Information

Author:	Unknown
Publication Year:	2005
Pages:	34
Language:	English
File Size:	2.24
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... PDF?

Yes, on https://PDFdrive.to you can download Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... on my mobile device?

After downloading Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ...?

Yes, this is the complete PDF version of Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Smoothing Spline ANOVA for Multivariate Bernoulli Observations With Application to ... PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.