ebook img

Bayesian Hypothesis Assessment in Two-arm Trials Using Relative Belief Ratios PDF

0.16 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian Hypothesis Assessment in Two-arm Trials Using Relative Belief Ratios

Bayesian Hypothesis Assessment in Two-arm Trials Using Relative Belief Ratios 4 1 0 Saman Muthukumarana and Michael Evans∗ 2 n a J 7 Abstract 1 ] P This paper develops a Bayesian approach for assessing equivalence and non- A inferiorityhypothesesintwo-armtrialsusingrelativebeliefratios. Arelative . t belief ratio is a measure of statistical evidence and can indicate evidence a t either for or against a hypothesis. In addition to the relative belief ratio, s [ we also compute a measure of the strength of this evidence as a calibration 1 of the relative belief ratio. Furthermore, we make use of the relative belief v ratio as a measure of evidence, to assess whether a given prior induces bias 5 either for or against a hypothesis. Prior elicitation, model checking and 1 2 checking for prior-data conflict procedures are developed to ensure that the 4 choices of model and prior made are relevant to the specific application. . 1 We highlight the applicability of the approach and illustrate the proposed 0 method by applying it to a data set obtained from a two-arm clinical trial. 4 1 : Key words and phrases: equivalence, noninferiority, relative belief ratios, v i statistical evidence, bias induced from a prior, model checking and checking X for prior-data conflict. r a 1 Introduction Recently, hypothesis testing has been an active research topic in various types of two-arm clinical trials. As an example, a clinician may want to ∗Saman Muthukumarana is Assistant Professor, Department of Statistics, University of Manitoba, Winnipeg, Manitoba R3T 2N2, Canada. Michael Evans is Professor, De- partment of Statistics, University of Toronto, Toronto, Ontario, M5S 3G3, Canada. The authors were partially supported by research grants from the NaturalSciences and Engi- neering Research Council of Canada. 1 demonstrate whether a new treatment is not worse than that of a reference treatment (also known as active control or standard treatment) by more than a specified margin [1]. This helps in assessing whether a less toxic, easier to administer, or less expensive treatment, is medically noninferior to a reference treatment. This kind of clinical trial, where the intention is to demonstrate that the new treatment is not inferior to the standard treat- ment by morethan a smallpredefinedmargin δ, is known as a noninferiority trial. Here δ > 0 is known as the prespecified clinically irrelevant or non- inferiority margin. Two-arm noninferiority trials of a new treatment and a well established reference treatment are an attractive option as in certain settings they avoid exposing patients to placebo situations. There has been a series of articles on this topic; see for example, special issues of Statistics in Medicine (Volume 47, Issue 1, 2005) and Journal of Biopharmaceutical Statistics (Volume 14, Number 2, 2004). Sometimes in clinical trials, the goal is not to show that the new treat- ment is better, but rather equivalent to the reference treatment. This kind of clinical trial is known as an equivalence trial [2]. In an equivalence trial, the aim is to show that new treatment and the reference treatment have equal efficacy. For practical purposes, one can select a margin δ such that two treatments can be considered not to differ when their true difference lies in the interval of clinical equivalence (−δ,δ). Note that this is different from testing a difference between two treatments which is a two-sided test known as superiority test in clinical literature. In this case δ > 0 is called theclinically equivalencemargin. Notethatineithercase, δ mustbedefined a priori. There is a considerable literature on the related problems of hypothesis tests in clinical trials. As a simple frequentist method, one can use the standard t test for testing these hypotheses. At a higher level a generalized p-value approach may be applied using a generalized test function [3]. One canalsoperformtestsusingtheratiooftheaverages insteadofthedifference between the averages [4, 5]. Bayesian non-inferiority tests for proportions in two-arm trials with a binary primary endpoint [6] and normal means [7] are also considered in recent literature. This paper considers a novel Bayesian approach for assessing a hypoth- esis in two-arm trials using relative belief ratios. A relative belief ratio for a hypothesized value of a parameter of interest is interpreted as the evidence that the hypothesized value is correct. We may obtain evidence either for or against the hypothesized value. Associated with a relative belief ratio is a measure of the strength of this evidence and this may be weak, strong or inconclusive. Generalinferences basedonrelative belief ratios aredeveloped 2 in [8, 9, 10]. We discuss relative belief ratios and associated theory in Section 2 and illustrate this by application to the problem of interest. In Section 3 we consider the elicitation of the prior and how we can measure the suitability of the prior both with respect to its agreement with what the data say and with respect to measuring the bias a prior puts into the analysis. In Section 4, the approach is applied to a data set obtained from a two-arm clinical trial. We conclude with a short discussion in Section 5. 2 Inferences Based on Relative Belief Ratios Suppose we have a statistical model, as given by a collection of densities {f : θ ∈ Θ}, and a prior π on Θ. After observing data x, the posterior θ distribution of θ is given by the density π(θ)f (x) θ π(θ|x) = m(x) where m(x) = π(θ)f (x)dθ. For an arbitrary parameter of interest ψ = Θ θ Ψ(θ) we denoteRthe prior and posterior densities of ψ by π and π (·|x), Ψ Ψ respectively. The relative belief ratio for a hypothesized value ψ of ψ is 0 defined by π (ψ |x) Ψ 0 RB (ψ )= , (1) Ψ 0 π (ψ ) Ψ 0 the ratio of the posterior to the prior at ψ . As such, RB (ψ ) is measur- 0 Ψ 0 ing how beliefs have changed that ψ is the true value from a priori to a 0 posteriori. Considering the case when the prior for ψ is discrete, we have that RB (ψ ) > 1 means that the data have lead to an increase in the Ψ 0 probability that ψ is correct, and so we have evidence in favor of ψ , while 0 0 RB (ψ ) < 1 means that the data have lead to a decrease in the proba- Ψ 0 bility that ψ is correct, and so we have evidence against ψ . As discussed 0 0 in [10], this interpretation is also appropriate in the continuous case via a consideration of limits. Clearlyrelative beliefratios aresimilartoBayes factors, astheyareboth measuringchangeinbelief,buttheBayes factor doesthisbycomparingpos- terior to prioroddswhiletherelative belief ratiocompares posteriorto prior probabilities and so is somewhat simpler. In fact, in certain circumstances the relative belief ratio and Bayes factor can be considered as equivalent but this is not always true. The full relationship between these quantities is discussed in [10]. 3 One problem that both the relative belief ratio and the Bayes factor share as measures of evidence, is that it is not clear how they should be calibrated. Certainly the bigger RB (ψ ) is than 1, the more evidence we Ψ 0 have in favor of ψ while the smaller RB (ψ ) is than 1, the more evidence 0 Ψ 0 we have against ψ . But what exactly does a value of RB (ψ )= 20 mean? 0 Ψ 0 It would appear to be strong evidence in favor of ψ because beliefs have 0 increased by a factor of 20 after seeing the data. But what if other values of ψ had even larger increases? While calibrations of Bayes factors have been suggested [11, 12, 13] the proposed scales seem arbitrary and it is not at all clear that there is a universal scale on which Bayes factors or relative belief ratios can be calibrated. A more useful calibration of (1) is given by Π (RB (ψ) ≤ RB (ψ )|x) (2) Ψ Ψ Ψ 0 which is the posterior probability that the true value of ψ has a relative belief rationogreater thanthatofthehypothesizedvalueψ .Ifweinterpret 0 RB (ψ ) as the measure of the evidence that ψ is the true value, we see Ψ 0 0 that (2) is the posterior probability that the true value has evidence no greater than that for ψ . 0 While (2) may look like a p-value, we see that it has a very different interpretation. For when RB (ψ ) < 1, so we have evidence against ψ , Ψ 0 0 thenasmallvaluefor(2)indicatesalargeposteriorprobabilitythatthetrue value has a relative belief ratio greater than RB (ψ )and so wehave strong Ψ 0 evidence against ψ . If RB (ψ ) > 1, so we have evidence in favor of ψ , 0 Ψ 0 0 thenalargevaluefor(2)indicates asmallposteriorprobabilitythatthetrue value has a relative belief ratio greater than RB (ψ )and so wehave strong Ψ 0 evidenceinfavorofψ .Noticethatintheset{ψ :RB (ψ) ≤ RB (ψ )},the 0 Ψ Ψ 0 ‘best’ estimate of the true value is given by ψ simply because the evidence 0 for this value is the largest in this set. Various results have been established in [10] supporting both (1), as the measure of the evidence and (2), as the strength of that evidence. Asameasureofthestrengthoftheevidence,(2)seemstoworkbestwhen theposteriorprobabilitiesforallthepossiblevaluesofψ areallsmalloreven 0 as in the continuous case. When some of these values have large posterior probabilities we can augment (2) as follows. If the prior π corresponds to Ψ a discrete distribution with π (ψ )> 0, we have that Ψ 0 π (ψ |x) ≤ Π (RB (ψ) ≤ RB (ψ )|x) ≤ RB (ψ ). (3) Ψ 0 Ψ Ψ Ψ 0 Ψ 0 The right-hand inequality holds generally, see [10], while the left-hand in- equality requires discreteness. Suppose π (ψ |x) and (2) are both small Ψ 0 4 and notice that this happens whenever RB (ψ ) is small. In this case we Ψ 0 clearly have strong evidence against ψ when RB (ψ ) < 1 and weak ev- 0 Ψ 0 idence for ψ when RB (ψ ) > 1. Also, when π (ψ |x) and (2) are both 0 Ψ 0 Ψ 0 big, then we have only weak evidence against ψ when RB (ψ ) < 1 and 0 Ψ 0 strong evidence for ψ when RB (ψ ) > 1. 0 Ψ 0 The other possibility is that the posterior probability π (ψ |x) is small Ψ 0 and (2) is big. If the prior probability π (ψ ) is big and RB (ψ ) < 1, Ψ 0 Ψ 0 then this suggests that we have indeed obtained strong evidence against ψ 0 because (2) is big only because there are many other values of ψ for which there is evidence against ψ at least as strong as the evidence against ψ . 0 If, however π (ψ ) is small and RB (ψ ) < 1, then we have weak evidence Ψ 0 Ψ 0 against ψ because π (ψ |x) is small due to π (ψ ) being small. When 0 Ψ 0 Ψ 0 π (ψ ) is big and RB (ψ ) > 1, then we must have π (ψ |x) is big as well, Ψ 0 Ψ 0 Ψ 0 so no ambiguity arises, while when π (ψ ) is small, then again π (ψ |x) is Ψ 0 Ψ 0 small due to π (ψ ) being small and so in both situations we have strong Ψ 0 evidence in favor of ψ via (2). So the only context where (2) might not 0 suffice as a measure of the strength of the evidence given by RB (ψ ), is Ψ 0 when ψ has a discrete prior distribution with π (ψ ) a non-neglible size, Ψ 0 π (ψ |x) small and (2) big. In general there is no harm in the discrete case Ψ 0 in quoting (2), π (ψ |x) and π (ψ ), as part of the analysis of the strength Ψ 0 Ψ 0 of the evidence given by RB (ψ ) and we recommend this. Ψ 0 There is another issue associated with using RB (ψ ) to assess the ev- Ψ 0 idence that ψ is the true value. One of the key concerns with Bayesian 0 inference methods is that thechoice of the priorcan bias the analysis in var- ious ways. For example, in many problems Bayes factors and relative belief ratios can be made arbitrarily large by choosing the prior to be increasingly diffuse. This phenomenon is known as the Jeffreys-Lindley paradox because a diffuse prior is supposed to represent less information. An approach to dealing with this paradox is discussed in [10]. Given that we accept that RB (ψ ) is the evidence that ψ is true, the solution is Ψ 0 0 to measure a priori whether or not the chosen prior induces bias either for or against ψ . To see how to do this we note first the Savage-Dickey result, 0 see [14] and [10], which says that m(x|ψ ) 0 RB (ψ ) = (4) Ψ 0 m(x) where m(x|ψ ) = π(θ|ψ )f (x)dθ 0 0 θ Z {θ:Ψ(θ)=ψ0} 5 is the prior-predictive density of the data x given that Ψ(θ)= ψ . Actually, 0 it is easy to see that, if T(x) is a minimal sufficient statistic for the full model, then m(x|ψ )/m(x) = m (T(x)|ψ )/m (T(x)) where m is the 0 T 0 T T prior predictive density of T and m (·|ψ ) is the prior predictive density of T 0 T given that Ψ(θ)= ψ . 0 From (4) we can measure the bias in the evidence against ψ by com- 0 puting m (t|ψ ) T 0 M < 1 ψ (5) T 0 (cid:18) m (t) (cid:12) (cid:19) T (cid:12) (cid:12) as this is the prior probability that we will ob(cid:12)tain evidence against ψ0 when ψ is true. Sowhen(5)is large we have bias against ψ .Tomeasurethebias 0 0 in favor of ψ we choose values ψ′ 6= ψ such that the difference between ψ 0 0 0 0 and ψ′ represents the smallest difference of practical importance. We then 0 compute m (t|ψ ) M T 0 > 1 ψ′ (6) T (cid:18) m (t) (cid:12) 0(cid:19) T (cid:12) (cid:12) as this is the prior probability that we will o(cid:12)btain evidence in favor of ψ0 when ψ is false. Again, when (6) is large we have bias in favor of ψ . Note 0 0 that both (5) and (6) decrease with sample size and so, in design situations, they can be used to set sample size and so control bias. When we are not able to control sample size, then (5) and (6) can be computed and used to qualify any conclusions we reach about whether ψ 0 is true or not. For example, if we have evidence against ψ and (5) is large, 0 this has to be taken with a ‘grain of salt’ as our choices have biased things this way. We draw a similar conclusion if we have evidence in favor of ψ 0 and (6) is large. Of course, these negative conclusions could also lead us to redo the analysis using a prior that does not induce such biases when this is possible, see Section 3. A variety of other inferences can be derived from interpreting RB (ψ) Ψ as the evidence that ψ is the true value. For example, the best estimate of ψ is clearly the value for which the evidence is greatest, namely, ψ (x) = argsupRB (ψ), LRSE Ψ and called the least relative surprise estimator in [8, 9, 10]. Associated with this is a γ-credible region C (x) = {ψ :RB (ψ) ≥ c (x)} (7) γ Ψ γ where c (x) = inf{k : Π (RB (ψ) ≥ k|x) ≤ γ}. γ Ψ Ψ 6 Notice that ψ (x) ∈ C (x) for every γ ∈ [0,1] and so, for selected γ, LRSE γ we can take the size of C (x) as a measure of the accuracy of the estimate γ ψ (x). Given the interpretation of RB (ψ) as the evidence for ψ, we LRSE Ψ are forced to use the sets C (x) for our credible regions. For if ψ is in such γ 1 a region and RB (ψ ) ≥ RB (ψ ), then we must put ψ into the region as Ψ 2 Ψ 1 2 well as we have at least as much evidence for ψ as for ψ . 2 1 In [8, 9, 10] various optimality properties are established for ψ (x) LRSE and the regions C (x) in the class of all Bayesian inferences. One notable γ property is that inferences based on the relative belief ratio are invariant under reparameterizations. This is not the case for Bayesian inferences based on losses, such as the posterior mean or mode and highest probability density regions. We now consider the application of relative belief inferences to two-arm trials. Example Two-arm Trials. Let x = (x ,...,x ) denote the sample from the experimental E E,1 E,nE treatment and x = (x ,...,x ) denote the sample from the reference R R,1 R,nR treatment. We assume that these responses are mutually independent with x ∼ N(µ ,σ2) and x ∼ N(µ ,σ2) where µ ,µ ∈ R1 and σ2 > 0 are E,i E R,i R E R all unknown. The information in the data is summarized by the minimal sufficient statistic T(x ,x )= (x¯ ,x¯ ,s2) where s2 = [(n −1)s2 +(n − E R E R E E R 1)s2]/(n +n −2) and the likelihood equals R E R σ−nE−nRexp{− n (x¯ −µ )2+n (x¯ −µ )2+(n +n −2)s2 /2σ2}. E E E R R R E R (cid:2) (cid:3) We will use the prior for (µ ,µ ,σ2) given by E R µ |σ2 ∼ N(µ ,τ2σ2), E 0 0 µ |σ2 ∼ N(µ ,τ2σ2), R 0 0 1/σ2 ∼ Gamma(α ,β ). (8) 0 0 Wewilldiscusselicitationofthehyperparametersµ ,τ2,α andβ inSection 0 0 0 0 3. The posterior distribution of (µ ,µ ,σ2) is then easily obtained and is E R given by n x¯ +µ /τ2 σ2 µ |x ,x ,σ2 ∼ N E E 0 0, , E E R (cid:18) n +1/τ2 n +1/τ2(cid:19) E 0 E 0 n x¯ +µ /τ2 σ2 µ |x ,x ,σ2 ∼N R R 0 0, , R E R (cid:18) n +1/τ2 n +1/τ2(cid:19) R 0 R 0 n +n +2α 2β +(n +n −2)s2 1/σ2|x ,x ∼ Gamma E R 0, 0 E R . (9) E R (cid:18) 2 2 (cid:19) 7 Note that it is simple to generate values from (8) and (9). Now suppose we want to assess the hypothesis that the true value of µ − µ satisfies |µ −µ |< δ. So δ represents a practically meaningful E R E R difference between the means. If the difference is less than this quantity, then we do not distinguish between µ and µ but otherwise we do. It E R makes sense then that, if µ and µ do differ, we would want to know E R how many units of δ these means differed by. So for the ψ parameter of interest in this problem we will consider ψ ∈ Z where ψ = i indicates that µ −µ ∈((2i−1)δ,(2i+1)δ]. Sothehypothesisofinterestcorrespondsto E R H :ψ = 0. 0 To calculate the relative belief ratios for values of ψ we need the prior and posterior distributions of this parameter. These quantities are obtained by discretizing the prior and posterior distributions of µ −µ . We have E R that the marginal prior distribution of µ −µ is given by E R β 0 (µ −µ )/τ ∼ t (10) E R 0rα 2α0 0 and the marginal posterior distribution of µ −µ is given by E R {(µ −µ )−(x¯ −x¯ )}/s 1/n +1/n |x ,x ∼ t (11) E R E R p E R E R ν p where ν = n +n +2α −4 and E R 0 2β +(n +n −2)s2 s2 = 0 E R . p n +n +2α −4 E R 0 When ν is large, the posterior distribution is approximately normal, while it has heavy tails when ν is small. So to assess H the evidence is given by 0 Π((−δ,δ]|x ,x ) Π((−δ,δ]|x¯ ,x¯ ,s2) E R E R RB (0) = = (12) Ψ Π((−δ,δ]) Π((−δ,δ]) and the strength of the evidence is given by Π (RB (ψ) ≤ RB (ψ )|x ,x ) Ψ Ψ Ψ 0 E R = Π(∪ ((2i−1)δ,(2i+1)δ]|x ,x ) RBΨ(i)≤RBΨ(0) E R = Π({((2i−1)δ,(2i+1)δ]|x ,x ). (13) E R X i:RBΨ(i)≤RBΨ(0) Both (12) and (13) are easily evaluated using the exact distribution theory given for the prior and posterior distribution of µ −µ . For example, we E R can use the t distribution function routine in the R software package. 8 Suppose that we obtain RB (0) < 1 and that (13) indicates that this Ψ is reasonably strong evidence against H . From the tabulation of RB (i), 0 Ψ that we computed as part of calculating (13), we easily obtain the optimal estimate of ψ, namely, ψ (x ,x )= argsupRB(i). LRSE E R If ψ (x ,x ) is greater than 0, then we have a clear indication that LRSE E R the experimental treatment is better than the reference treatment. The accuracy of the estimate is assessed by computing the 0.95-relative belief region C (x ,x ) = {i :RB(i) ≥ c (x ,x )} 0.95 E R 0.95 E R and seeing how large it is. We can convert this into a region for µ −µ E R via C∗ (x ,x ) = ((2i−1)δ,(2i+1)δ]. 0.95 E R [ i∈C0.95(xE,xR) To assess the bias in the prior we have to compute (5) and (6). From (10) and (11) we can evaluate Π((−δ,δ]|x ,x ) m (x¯ ,x¯ ,s2| −δ < µ −µ ≤ δ) E R T E R E R RB (0) = = Ψ Π((−δ,δ]) m (x¯ ,x¯ ,s2) T E R and note that RB (0) depends on the data only through (x¯ −x¯ ,s2). We Ψ E R thenneedonlysimulatefromtheconditionalpriorpredictiveof(x¯ −x¯ ,s2) E R given that ψ is the true value. Note that, given (µ −µ ,σ2) then E R x¯ −x¯ ∼ N(µ −µ ,(1/n +1/n )σ2), E R E R E R (n +n −2)s2/σ2 ∼ Chi-squared(n +n −2) (14) E R E R and these quantities are independent. We can compute (5) by the following simulation process: 1. set a counter C = 0, 2. generate σ2 using (8), 3. generate µ −µ froma N(0,2τ2σ2) distribution conditioned to E R 0 −δ < µ −µ ≤ δ, E R 4. generate (x¯ −x¯ ,s2) using (14), E R 5. compute RB (0) and add 1 to C if it is less than 1, Ψ 9 6. repeat 2-4 N times and record C/N as the estimate of (5). Essentially the same simulation can be carried out to evaluate (6) with step 2changing,asweconditionon(2i−1)δ < µ −µ ≤ (2i+1)δ forsayi = 1 E R or i= −1, and in step 4 we check if RB (0) is greater than 1. Ψ The only slightly difficult part in this simulation is step 3 and for that we can use an inversion algorithm. For denoting the cdf and inverse cdf of a N(0,1) distribution by Φ and Φ−1, respectively, we generate µ −µ in E R step3,whenconditioningon(2i−1)δ < µ −µ ≤(2i+1)δ,bygenerating E R u ∼ U(0,1) and putting µ −µ = Φ−1(Φ((2i−1)δ)+[Φ((2i+1)δ)−Φ((2i−1)δ)]u). (15) E R We can use routines in R for Φ and Φ−1 to evaluate (15). 3 Choosing and Checking the Ingredients In any statistical analysis a statistician chooses a model that supposedlyde- scribes the generation of the data and, in a Bayesian analysis, also chooses a prior. As the analysis is typically highly dependent on these subjective choices, it is important that they be checked against what is typically ob- jective, at least if it is collected correctly, namely, the data. 3.1 Checking the Model For the model this entails asking if the observed data is surprising for every distribution in the model. If this is the case, then we conclude that there is a problem with the model and need to somehow modify this. While there are often many model checking procedures available, for the problem under study we will use the Shapiro-Wilks test based on the residuals from the model. We note that this check is, as it should be, completely independent of the choice of prior as we do not want to confound our considerations of the adequacy of the model and the prior. 3.2 Eliciting the Prior Before discussing how we check the prior, we first consider the choice of the prior. For this we need only consider eliciting the prior for µ and σ2 in a N(µ,σ2) distribution. So we need to specify the hyperparameters µ ,τ2,α 0 0 0 andβ .Thisisbasedonknowledgeofthemeasurementprocessthatleadsto 0 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.