Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis FranciscoJ.Rubioa andMarcG.Gentonb ∗ aUniversityofWarwick,DepartmentofStatistics,Coventry,CV47AL,UK. bCEMSEDivision,KingAbdullahUniversityofScienceandTechnology,Thuwal23955-6900,SaudiArabia. E-mail:[email protected] ∗ 6 1 0 Abstract 2 WestudyBayesianlinearregressionmodelswithskew-symmetricscalemixturesofnormalerror n distributions.Thesekindsofmodelscanbeusedtocapturedeparturesfromtheusualassumptionof a J normalityoftheerrorsintermsofheavytailsandasymmetry.Weproposeageneralnon-informative 0 prior structurefor these regressionmodels and show that the correspondingposteriordistribution 1 is proper under mild conditions. We extend these propriety results to cases where the response variablesare censored. The latter scenariois ofinterest in the contextof acceleratedfailure time ] models, whichare relevantin survivalanalysis. We present a simulationstudy that demonstrates P goodfrequentistpropertiesoftheposteriorcredibleintervalsassociatedtotheproposedpriors.This A studyalsoshedssomelightonthetrade-offbetweenincreasedmodelflexibilityandtheriskofover- t. fitting. Weillustratetheperformanceoftheproposedmodelswithrealdata. Althoughwefocuson a modelswithunivariateresponsevariables,wealsopresentsomeextensionstothemultivariatecase t s intheSupportingWebMaterial. [ KeyWords: accelerated failuretimemodel; flexible errors; modelselection; multivariate; noninforma- 1 v tiveprior;skewness. 4 2 2 1 Introduction 2 0 Theclassical assumptionofnormalityoftheresidual errors inlinearregressionmodels(LRMs)canbe . 1 restrictive in practice. In many cases, the use of more flexible parametric distributional assumptions 0 is necessary to capture departures from normality. These departures often arise when the sample con- 6 1 tains outliers or when the residual errors are asymmetric. Several approaches have been proposed for : parametrically modelling these departures from normality such as the use of scale mixtures of normal v i distributions[18],andskew-ellipticaldistributions[3,34,5],amongothers(seealso[4]and[22]). Inthe X absenceofstrongpriorknowledgeaboutthemodelparameters,awayforconductingBayesianinference r a consists ofusingnoninformative priors(oftenreferredtoasobjectiveBayesianinference). Inageneral sense, these kinds of priors are functions of the parameters that produce posterior distributions with good frequentist properties. The use of objective Bayesian inference in LRMs with symmetric errors hasbeenwidelystudied. Inparticular, [18]studiedtheproprietyoftheposteriordistribution associated with Bayesian LRMs with an improper prior structure and residual errors distributed according to the familyofscalemixtures ofnormals(SMN).Theuseofmoregeneral priorstructures hasbeenrecently studiedin[13]and[24]forparticularmembersoftheSMNfamily. Inarelatedvein,[36]studiedtheuse ofJeffreys-type priorsinthecontextofaccelerated failuretime(AFT)modelswithSMNerrors(which areLRMsforthelogarithmofthesurvivaltimes). However,therearefewerstudiesrelatedtotheuseof noninformative priorswithflexibleerrordistributions thatallowforcapturingskewness. [19]proposed aclass of multivariate two-piece distributions that allowfor capturing skewness. Theyemployed these distributions for modelling the errors inmultivariate LRMsandalsoproposed animproper prior struc- ture, which is similar to that in [16]. Another recent reference is [30], who studied AFT models with 1 errors distributed according to the generalised extreme value distribution. They proposed an improper priorforthisLRM,buttheirconditions forthepropriety oftheposterior involve truncating thesupport of the prior distribution on the shape parameter, which in turns reduces the flexibility of the model, as wellasaconditiononthepriorforthescaleparameter. Here, we study the use of the class of skew-symmetric scale mixtures of normal (SSSMN) dis- tributions for modelling residual errors in LRMs. This class contains, for instance, the skew-normal distribution [2], the skew-t distribution [3], and the skew-logistic distribution [37], as well as the cor- responding symmetric models as particular cases. We propose a general improper prior structure that preserves thepropriety oftheposterior distribution inthesensethattheposterior existsunder thesame conditions asinthemodel withSMNerrors. Thisresult allowsustoappeal toexistent results inorder to show the propriety of the posterior associated to LRMs with SSSMNerrors and the proposed prior structure. WediscusstheuseoftheproposedBayesianmodelsinsurvivalanalysisaswellastheimpact ofusingflexibledistributions inthepredictionoftheremaininglifeofpatientsthatsurvivedbeyondthe endofthestudy. Thepaperisorganisedasfollows. InSection2wedescribe theLRMsofinterestandintroduce the proposednoninformativepriorstructure. Weprovideeasytochecksufficientconditionsforthepropriety ofthe corresponding posterior distribution. These results represent anextension toprevious results for LRMs with symmetric errors [16, 18, 13, 24]. In Section 3, we extend our results to cases where the samplecontainscensoredresponsevariablesforapplicationoftheproposedmodelstosurvivalanalysis. In Section 4, we present a simulation study where we illustrate the good frequentist properties of the 95% credible intervals obtained withthe proposed priors anddiscuss some issues associated tocertain skew-symmetric models. In Section 5 we present two applications with publicly available data sets in thecontextofbiometricmeasuresandsurvivalanalysis. AllproofsarepresentedintheSupportingWeb Material. 2 Linear regression with skew-symmetric errors 2.1 Bayesianmodel RecallfirstthatarealrandomvariableZ issaidtobedistributed accordingtoaskew-symmetricdistri- butionifitsprobabilitydensityfunction(PDF)canbewrittenas[39] s(z)=2f(z)ϕ(z), z R, (1) ∈ wheref isasymmetricPDFandϕ:R [0,1]isafunctionthatsatisfiesϕ(z)=1 ϕ( z). Wefocus → − − onthefamilyofparametricskew-symmetricdistributions ofthetype 2 z ξ z ξ s(z ξ,ω,λ,δ) = f − δ G λ − , (2) | ω ω (cid:12) ! ω (cid:12) (cid:18) (cid:19) (cid:12) where G is a cumulative distribution function (CDF)(cid:12)with continuous symmetric PDF g with support (cid:12) on R, ξ R is a location parameter, ω > 0 is a scale parameter, λ R, and δ ∆ R is a shape ∈ ∈ ∈ ⊂ parameter. Wewillrefertof asthe“baselinedensity”. Notethatthisstructureallowsforthecasewhere f = g. We do not consider the case where g contains an unknown shape parameter and g = f since 6 this would produce a model with 5 parameters that may play redundant roles. This structure covers many cases of practical interest such as the skew-normal distribution, the skew-t distribution [3] with δ degrees offreedom, theskew-logistic distribution [37],andtheskew-slash distribution [38]. Density (2)isasymmetricforλ = 0,ands = f forλ = 0. Ifarandomvariable Z hasdistribution (2),wewill 6 denoteitasZ SS(ξ,ω,λ,δ;f,g). ∼ Considernowthelinearregressionmodel: y =x β+ε , j =1,...,n, (3) j ⊤j j 2 i.i.d. where yj R, β is a p-dimensional vector of regression parameters, εj SS(0,ω,λ,δ;f,g), and ∈ ∼ X = (x ,...,x ) is aknown n pdesign matrix of full column rank. Insome cases, wemight be 1 n ⊤ × interestedoncentringtheregressionmodelonthemeanorsomequantile(suchasthemedian),inwhich case wehave toproperly centre theerror distribution. Thiscentring strategy istypically done after the estimation of the parameters has beenperformed. This point has beendiscussed by[34]inthecontext ofregressionmodelswithskew-elliptical distributions, aswellasin[5]. Weadoptthegeneralpriorstructure: p(λ,δ) π(β,ω,λ,δ) = , (4) ωa where p(λ,δ) is a proper prior. This prior structure covers the structure of several priors obtained by formalrulessuchastheindependenceJeffreysprior,theJeffreys-ruleprior,andthereferenceprior[32]. Giventhatthispriorisimproper,weneedtoinvestigateconditionsfortheexistenceofthecorresponding posteriordistribution. In order to provide a general propriety result, we restrict our study to the case when the baseline densityf in(2)belongstothefamilyofSMNdistributions. RecallthatasymmetricPDFf issaidtobe aSMNifitcanbewrittenas: f(z δ)= τ1/2φ(τ1/2z)dH(τ δ), z R, | | ∈ ZR+ where H isamixingdistribution withpositive support andφrepresents thestandard normal PDF.The family of scale mixtures of normals contains important models such as the normal distribution, the logistic distribution, the Student-t distribution, the Laplace distribution, the symmetric generalised hy- perbolic distribution, and the slash distribution. Under this setting, we have the following results (see theSupplementaryWebMaterialforaformalproof). (i) Considerthemodel(3)–(4). Assumethatf in(2)isascalemixtureofnormalsandthatrank(X)= p. Ifa=1,asufficientconditionfortheproprietyoftheposteriorof(β,ω,δ,λ)isn>p. (ii) Ifa>1,asufficientconditionfortheproprietyoftheposteriorisn>p+1 aand − a 1 τ− −2 p˜(δ)dH(τ δ)dδ < , (5) | ∞ Z wherep˜(δ)representsthemarginalprioronδ. Theseresults indicate thattheintroduction ofskewness inthedistribution oftheresidual errors, by meansoftheskew-symmetricconstruction(2),doesnotaffecttheexistenceoftheposteriordistribution, aslongasweuseaproperpriorontheskewnessparameterλ. Fora=1theproprietyoftheposterioris guaranteedbyhavingmoreobservationsthancovariatesfortheclassofLRMswithSSSMNerrors. The resultspresentedherearesatisfiedwithprobabilityone(foralmostallsetsofobservations). Wereferthe readerto[16,17],[18],and[36]foradiscussiononzero-probability eventsthatinduceimproperposte- riors. Thisincludes,forinstance,sampleswithacertainnumberofresponsesthatcanberepresentedas anexact combination oftheir covariates which, inprinciple, haveprobability zeroofoccurrence under a continuous model but that may appear in practice due to rounding of the measurements. For a > 1 the conditions for the existence of the posterior distribution become more restrictive and condition (5) has to be checked case by case. For the cases when the residual error distribution is a skew-normal oraskew-logistic (normal orlogistic baseline f),the condition (5)isautomatically satisfied[36], and, therefore,theconditionn>p+1 aissufficientfortheproprietyoftheposterior. Similarly,forcases − whenthebaselinePDFf isageneralisedhyperbolicdistribution(witheitherfixedtailparameterδ >0 orwithacompactlysupportedmarginalpriorp˜(δ))oraLaplacedistribution,theconditionn>p+1 a − isalsosufficientfortheexistenceoftheposterior[13,24]. 3 2.1.1 Theroleofλinskew-symmetricmodels An aspect that has been little discussed in the literature is the overall influence of the parameter λ on the shape of a skew-symmetric PDF (1). We already mentioned that the PDF (1) is left/right skewed forλ ≶ 0. However,the parameter λcontrols other features aswellsuchasspreadandlocation ofthe mode. Moreover,forsomecombinationsoff andG,theparameterλhaslittleinfluenceoftheshapeof the PDFfor a certain range of values around λ = 0. For instance, inthe skew-normal PDF,the shape parameter λ has little influence on the asymmetry when λ 1. Inthis region, λ mainly controls the | | ≤ location andspread of thePDF.Figure 1shows that askew-normal PDFwithparameter λ = 1canbe reasonably well approximated with a normal PDF.This phenomenon is also observed in other models where f andGhave lighter tails thannormal, whilemodels withheavier tails seemtobe exemptfrom this problem. Intuitively, this suggests that it is hard to estimate the parameters (ξ,ω,λ) (both from a Classical andaBayesianperspective) insomelight-tailed skew-symmetric models whenthetruevalue lies in a certain interval centred at λ = 0 and the sample size is small or moderate. Consequently, if one uses a skew-symmetric distribution, with lighter or equal tails than normal, to model the errors in a LRM, it is expected to observe high correlation between the estimator of the intercept regression parameter,thescaleparameter,andtheskewnessparameter(SeeSection4). Giventhattheseissuesonly appearwhentheerrorsarenearlysymmetric,asimplesolutionconsists oftestingforλ = 0inorderto identify thebest andmostparsimonious model for the data. Moreover, inthe context of our paper this issue haslittlerelevance since weareonlyconsidering thecases whenf isaSMN,leaving thenormal astheonlyproblematiccase. 0.5 0.4 0.3 0.2 0.1 0.0 -3 -2 -1 0 1 2 3 4 Figure 1: Normal PDF with parameters (0.505,0.817) (dashed line) vs. skew-normal PDF with parameters (0,1,1)(continuousline). 2.2 Discussiononthechoiceofthepriorsfortheshapeparameters Thechoiceofthepriorsfortheshapeparameters(λ,δ)deserves furtherdiscussion. Wepresent asum- maryof some informative an non-informative priors used for the shape parameters inskew-symmetric models. 2.2.1 Priorsforλ (i) Jeffreys priors. For the skew-normal regression model, we can employ the reference prior of the skewness parameter λ. This prior was obtained by [26], who showed that this is proper and well defined. [8] showed that this prior can be reasonably well approximated using a Student-t distribution with 1/2 degrees of freedom and scale parameter σ = π/2. In [9], a similar result 0 wasalso established for the skew-t model withdegrees of freedom δ > 2. [32]characterised the marginalreferenceprioroftheparameterλinalargeclassofskew-symmetricmodelsandshowed thatthisissymmetric,withtailsoforderO(λ 3/2),andproper. Thesekindsofpriorshavebeen − | | shown to induce posteriors with good frequentist properties [26, 9, 32]. However, the reference 4 priors ofλare welldefinedat λ = 0only ifthe second moment off is finite. Asshownin[32], theJeffreys/reference priorofλcanbecalculatedasfollows g(λx)2 p(λ) ∞x2f(x) dx. (6) ∝sZ0 G(λx)[1−G(λx)] (ii) Matching priors. [10] calculated the matching prior for the skewness parameter λ in the skew- normalmodel. Theyshowedthatthispriorisproper,data-independent,bimodal,withtailsoforder O(λ 3/2),andthatthisprior induces aposterior withgoodfrequentist properties. However,the − | | bimodality of this prior might complicate sampling from the posterior for small and moderate samples. Tothe best ofourknowledge, matching priors have not beencalculated forother skew- symmetricmodels. (iii) Heuristic priors. An alternative non-informative prior can be constructed by using the popu- lar alternative parameterisation of the skew-normal distribution in terms of the parameter ρ = λ/√1+λ2 ( 1,1). Ifweassignauniformpriorontheparameterρ,thisinducesthefollowing ∈ − properprioronλ: 1 p(λ) . (7) ∝ (1+λ2)3/2 Thispriorhasbeenusedrecentlyin[11]. (iv) Informative priors. In cases where there is reliable prior information, one can appeal to any proper prior thatcancapture thefeatures ofour prior beliefs. Inparticular, [11]proposed theuse of the skew-normal distribution as a prior distribution, with hyperparameters (ξ ,ω ,λ ), for λ. 0 0 0 This prior allows the user to control the mass allocated on values of λ ≶ 0, which are related to negativeandpositiveskewness. 2.2.2 Priorsforδ Regarding the tail parameter δ, [33] proposed a general method for constructing weakly informative priors forkurtosis parameters. Theidea consists of assigning auniform prior toabounded measure of kurtosis appliedtothesymmetricbaseline densityf( δ). Inorderforthismethodtobeapplicable, the ·| chosen measure of kurtosis must be a one-to-one function of the parameter δ. This strategy induces a proper prior on the parameter δ which can be interpreted as a weakly informative prior, in the sense that itassigns aflatprior onafunction that represents the influence of the parameter δ onthe shape of the density. In the case of the degrees of freedom of the Student-t distribution, [33] showed that this strategy produces a prior withabehaviour similar tothat ofthe approximation tothe Jeffreys prior for thisparameter[21]proposedin[23]. Inaddition,thesepriorsonδcanbecoupledwiththeJeffreyspriorsinordertoproduceajointprior on(λ,δ)byusingthedecompositionp(λ,δ) =p(λδ)p(δ),where | g(λx)2 p(λδ) ∞x2f(xδ) dx. | ∝sZ0 | G(λx)[1−G(λx)] Thisprior alsohas tails oforder O(λ 3/2), foreachvalue of δ. Iftheprior p(λδ) isnottoovariable − | | | for different values of δ, we may opt for an independent structure p(λ,δ) = p(λ)p(δ) for practical purposes. Forinstance,intheskew-tdistribution,thepriorp(λδ)canbereasonablywellapproximated | witha Student-t distribution with1/2degrees of freedom and scale parameter σ . Areasonable value 0 ofσ depends onthe degrees offreedom δ. Figure 2shows the values of σ that produce areasonable 0 0 approximation (obtained by matching the mode) for δ = 2.1,2.2,...,50. We observe that σ varies 0 between 0.5 and π/2. In this case, for the sake of simplicity, we adopt the prior p(λ,δ) = p(λ)p(δ), 5 where p(λ) is a Student-t distribution with 1/2 degrees of freedom and scale parameter σ = π/2, 0 which corresponds to the reference prior of λ. In Section 4 we show, through a simulation study, that thispriorstructureinducesaposteriorwithgoodfrequentistproperties. 4 1. 2 1. 0 01. s 8 0. 6 0. 4 0. 10 20 30 40 50 d Figure 2: Valueofthescaleparameterω usedintheStudent-tapproximationtotheJeffreyspriorofλinthe 0 skew-tmodelwithδdegreesoffreedom. 3 Accelerated failure time models A closely related family of LRMs that are of great interest in survival analysis are AFT models. The basicideabehindthiskindofLRMconsists ofmodellingasetofsurvivaltimesT = (T ,...,T ) in 1 n ⊤ termsofasetofcovariatesβthroughthemodelequation: logT =x β+ε , j =1,...,n. (8) j ⊤j j Inthiscontext,thedistributionoftheerrorsε istypicallyassumedtobenormalorthatexp(ε )follows j j aWeibull distribution. Giventhatthe normality assumption canberestrictive inpractice, several alter- native models have been proposed. For instance, from a classical inferential framework, [7] proposed theuseoftheBirnbaum-Saundersdistribution, whichcorresponds tomodellingthesurvivaltimesT as j alog-Birnbaum-Saundersdistribution;and[31]employedtheclassoflog-two-piecedistributions. From aBayesianperspective, [36]employedtheclassoflogscalemixture ofnormalsformodellingsurvival times;meanwhile[30]employedtheloggeneralisedextremevaluedistribution. Weconsidertheclassoflogskew-symmetric(LSS)distributions [28]: 2 logt ξ logt ξ s(tξ,ω,λ,δ) = f − δ G λ − , t>0, (9) l | ωt ω ω (cid:18) (cid:12) (cid:19) (cid:18) (cid:19) (cid:12) for modelling the survival times Tj in (8), where (cid:12)f and g are as described in Section 2. It includes, for example, the log-skew-normal and log-skew-t distributions [6, 25]. This class of distributions can also be seen as an extension of the class of log-symmetric distributions (which is obtained with λ = 0). If a positive random variable T has distribution (9) we denote it as T LSS(ξ,ω,λ,δ;f,g). ∼ More specifically, we assume that T x ,β,ω,λ,δ LSS(x β,ω,λ,δ;f,g), that f in (9) is a scale j| j ∼ ⊤j mixture ofnormals, andthat rank(X) = p. Ifweadopt the prior structure (4)for this model, then the corresponding posterior is proper under the conditions in Section 2. However, in the context of AFT models, the presence of (right–, left–, and interval–) censored responses is quite common. Suppose that T x ,β,ω,λ,δ LSS(x β,ω,λ,δ;f,g), that f in (9) is a scale mixture of normals, and that j| j ∼ ⊤j rank(X) = p. Consider the prior structure (4) for this model. Suppose also that n nobservations c ≤ arecensoredandn =n n areobserved. Then, o c − 6 (i) Ifa=1,asufficientconditionfortheproprietyofthecorresponding posteriorisn >p. o (ii) Ifa > 1, asufficient condition forthe propriety of theposterior isn > p+1 atogether with o − (5). IntheSupplementaryWebMaterialweprovideastudyoftheextremecaseinwhichalloftheobserva- tions are censored (Corollary 2). This scenario requires a more careful analysis given that, intuitively, samplescontainingonlycensoredobservationscontainlittleinformationabouttheparameters,thus,one hastobecarefulaboutusingnon-informative priorsinsuchextremescenarios. 4 Simulation study In this section we present a simulation study that illustrates the performance of the proposed models. Westudytheregressionmodel: y =β +β x +β x +ε , j =1,...,n, j 1 2 1j 3 2j j wherewesimulatethevariables x andx fromastandardnormaldistribution andconsider different 1j 2j combinations ofthedistributionoftheresidualerrorsandthesamplesizen. In the first scenario, we simulate the residual errors from a skew-normal distribution with scale parameter ω = 0.5 and skewness parameter λ = 0,1,2,3,4,5, (β ,β ,β ) = (1,2,3) and n = 1 2 3 100,250,500. Negative values of λ produce similar results and are, therefore, omitted. We adopt the product prior structure (4) with a = 1 and p(λ) given by the reference prior of this parameter [26]. We use the Student-t approximation from [8] to facilitate its implementation. For each of these sce- narios, we obtain N = 1,000 samples of size 2,000 from the posterior distribution using the R [29] t-walkMarkovChainMonteCarlo(MCMC)sampler[14]afteraburn-inperiodof5,000iterationsand thinned to every 25th iteration (55,000 iterations in total). Then, we calculate the proportion of 95% credible intervals that include the true value of theparameters, the 5%, 50%, and95%quantiles of the maximum likelihood estimators (MLEs), the maximum a posteriori (MAP) estimators, and posterior median estimators, as well as the median Bayes factor associated to the hypothesis H : λ = 0. The 0 Bayes factor is approximated using the Savage-Dickey density ratio [15]. In the second scenario, we simulatetheresidualerrorfromaskew-logisticdistributionwithscaleparameterω =0.5andskewness parameterλ=0,1,2,3,4,5. Weadopttheproductpriorstructure(4)witha=1andp(λ)givenbythe referencepriorofthisparameter[32]. WeusetheStudent-tapproximation(1/2degreesoffreedomand scale4/3)from[32]tofacilitateitsimplementation. Inthethirdscenario,wesimulatetheresidualerrors from askew-t distribution withdegrees offreedom δ = 3andscale parameter ω = 0.5. Weadopt the prior onthe skewness parameter described inthe firstsimulation scenario. Forthe degrees of freedom δ,weusetheapproximationtotheJeffreyspriorproposedin[23]: 2dδ π(δ) = . (10) (δ+d)3 Wefixthehyperparameter d=2,whichinducesapriorwithmodeatδ =1. Results are reported inTables 1–3 below and Section 3, Tables 1S–6S, of the Supplementary Web Material. For the first scenario (skew-normal errors, Tables 1–3), we can observe that the coverage of the credible intervals of the parameter ω is greatly affected when the true value of λ is zero or one. Moreover, the Bayes factors associated to H : λ = 0 favours the normal model in both cases, for 0 all sample sizes, when λ = 0,1, which emphasises the difficulty in identifying models with λ 1. | | ≤ ThesescenarioscorrespondtothecasesdescribedinSection2.1.1wheretheskew-normaldistributionis nearlyorexactlysymmetric, whichinturnscomplicates theidentification oftheparameters (β ,ω,λ), 1 andinducesastrongcorrelationbetweentheseparameters. Figures3–4showthescatterplotsassociated toa posterior sample of (β ,β ,β ,ω,λ) for n = 250 and λ = 1,5, respectively. These figures show 1 2 3 thestrongcorrelation betweentheparameters(β ,ω,λ)forλ = 1,whilethecorrelation betweenthese 1 7 parameters seems to be lower for λ = 5. We can also observe from these tables that the coverage improves asλ 2andthat the coverage proportions converge tothenominal value as thesample size ≥ increases. Thisstudyalsoindicatesthatweneedatleast100observationsinordertogetagoodcoverage. Forthesecondandthirdscenarios (Tables1S–6S),weobserveagoodcoverageforallthesamplesizes aswellasagoodbehaviouroftheBayesianestimatorscomparedtothatofthecorresponding MLEs. IntheSupplementaryWebMaterial,Tables7S–9S,wepresentanadditionalsimulationstudyusing the configuration in the first scenario withthe prior (7) for the parameter λ. This study shows that the prior (7), that has been considered as a noninformative prior in the literature, induces a posterior with poorfrequentist properties. 1.90 2.00 0.3 0.4 0.5 0.6 5.4 b 5.2 1 5.0 4.8 2.00 b 2 1.90 3.05 b 3 3.00 2.95 0.6 0.5 w 0.4 0.3 2.5 l 1.5 0.5 −0.5 4.8 5.0 5.2 5.4 2.95 3.00 3.05 −0.5 0.5 1.5 2.5 Figure3: Scatterplotforasingleposteriorsimulationof(β ,β ,β ,ω,λ)withn=250andλ=1. 1 2 3 8 1.90 2.00 2.10 0.4 0.5 0.6 0.7 5.10 b 1 5.00 4.90 2.10 2.00 b 2 1.90 3.05 b 3 3.00 2.95 2.90 0.7 0.6 w 0.5 0.4 12 10 l 8 6 4 2 4.90 5.00 5.10 2.90 2.95 3.00 3.05 2 4 6 8 10 Figure4: Scatterplotforasingleposteriorsimulationof(β ,β ,β ,ω,λ)withn=250andλ=5. 1 2 3 5 Applications Inthis section, weillustrate the performance of the proposed Bayesian models withtworeal data sets. Inthefirstexample,werevisitthepopular“AustralianAthletes”dataset. Wecomparetheperformance of 4 types of distributional assumptions on the residual errors in a LRM studied in [1]. In the second example we present an application in the context of survival analysis of cancer patients. Simulations fromthecorresponding posteriordistributions areobtainedusingthet–walk[14]. Modelcomparisonis conductedviaBayesfactorswhicharecalculatedusinganimportancesampling(andcorroboratedusing Laplace’s method), the Bayesian information criterion (BIC), and log-predictive marginal likelihood (LPML).LPMLisameasure that ranks themodels ofinterest intermsoftheir predictive performance (see [35] for anoverview that includes the case with interval censored observations). Larger values of LPML indicate a better predictive performance. The use of the Bayes factors with improper priors is justifiedinthiscontextsinceweusethesamepriorstructure(withimproperpriorsonlyonthecommon parameters) forthedifferent modelsinordertoavoidthepresence ofarbitraryconstants (see[36]fora discussiononthispoint). Rcodesareavailableuponrequest. 5.1 Australianathletesdata Ourfirstapplicationconcernsthestudyoftheregressionmodel[1]: Lbm =β Ht +β Wt +ε , j =1,...,102, (11) j 1 j 2 j j where Lbm , Ht , and Wt denote the lean body mass, the height, and the weight of 102 Australian j j j maleathletesfromtheAustralianInstituteofSport(AIS)dataset. Wecomparethemodelsobtainedwith four distributional assumptions onthe residual errors ε : askew-normal distribution SN(0,ω,λ) [2],a j skew Student-t distribution SSt(0,ω,λ,δ) with unknown degrees of freedom, and the corresponding symmetric submodels (normal and Student-t). We adopt the prior structure (4) with a = 1. For the 9 Method MLE MAP Median Coverage BF 5% 50% 95% 5% 50% 95% 5% 50% 95% λ=0 β1 4.456 4.917 5.521 4.463 4.996 5.519 4.559 4.999 5.380 0.987 – β2 1.913 2.002 2.089 1.911 2.001 2.091 1.916 2.003 2.085 0.956 – β3 2.908 3.001 3.087 2.905 2.999 3.086 2.906 3.000 3.087 0.942 – ω 0.492 0.613 0.761 0.475 0.542 0.714 0.504 0.581 0.699 0.849 – λ -1.993 0.285 2.224 -1.452 -0.003 1.613 -1.196 0.009 1.520 0.989 2.141 λ=1 β1 4.788 5.009 5.317 4.796 5.057 5.649 4.856 5.193 5.507 0.947 – β2 1.931 2.001 2.069 1.927 2.001 2.069 1.928 2.002 2.067 0.950 – β3 2.930 3.001 3.066 2.929 3.001 3.071 2.931 3.002 3.066 0.963 – ω 0.373 0.485 0.635 0.388 0.450 0.613 0.417 0.483 0.603 0.994 – λ 0.007 0.992 2.550 -0.988 0.190 2.063 -0.713 0.248 1.974 0.964 2.029 λ=2 β1 4.886 5.003 5.252 4.885 5.009 5.435 4.900 5.059 5.359 0.911 – β2 1.941 2.001 2.058 1.938 2.003 2.060 1.942 2.001 2.058 0.955 – β3 2.944 3.000 3.060 2.941 3.001 3.061 2.943 3.001 3.059 0.952 – ω 0.354 0.496 0.600 0.336 0.447 0.593 0.365 0.462 0.589 0.947 – λ 0.451 2.083 4.942 -0.315 1.585 3.389 0.035 1.441 3.973 0.901 1.093 λ=3 β1 4.909 5.002 5.115 4.904 5.003 5.147 4.915 5.020 5.257 0.912 – β2 1.946 1.998 2.047 1.944 1.998 2.050 1.947 1.999 2.047 0.955 – β3 2.949 3.000 3.054 2.948 3.000 3.054 2.950 3.001 3.053 0.950 – ω 0.395 0.491 0.590 0.319 0.484 0.585 0.359 0.482 0.587 0.929 – λ 1.488 3.225 10.269 -0.154 2.463 5.112 0.577 2.742 6.939 0.904 0.223 λ=4 β1 4.928 5.002 5.084 4.922 5.000 5.091 4.929 5.008 5.137 0.946 – β2 1.954 2.000 2.046 1.953 2.001 2.047 1.954 2.001 2.045 0.964 – β3 2.954 3.002 3.047 2.953 3.003 3.048 2.955 3.002 3.048 0.964 – ω 0.407 0.495 0.576 0.369 0.488 0.574 0.380 0.490 0.574 0.950 – λ 2.298 4.505 108.781 -1.076 3.227 8.417 1.502 3.807 12.553 0.941 5×10−2 λ=5 β1 4.933 4.998 5.068 4.930 4.997 5.074 4.938 5.000 5.096 0.954 – β2 1.954 1.998 2.043 1.953 1.999 2.043 1.956 1.999 2.044 0.955 – β3 2.960 3.003 3.046 2.959 3.003 3.047 2.962 3.003 3.045 0.969 – ω 0.418 0.499 0.575 0.397 0.494 0.574 0.402 0.498 0.578 0.958 – λ 2.997 5.883 138.560 -17.640 3.930 12.810 2.192 4.929 21.107 0.943 2×10−2 Table1: Pointestimators,coverageproportionsandBayesfactors: Linearregressionmodelwithresidualerrors simulatedfromaskew-normaldistribution,n=100. 10