ebook img

Statistical aspects of the TNK-S2B trial of tenecteplase versus alteplase in acute ischemic stroke: an efficient, dose-adaptive, seamless phase II/III design. PDF

2011·0.33 MB·English
by  LevinBruce
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical aspects of the TNK-S2B trial of tenecteplase versus alteplase in acute ischemic stroke: an efficient, dose-adaptive, seamless phase II/III design.

DESIGN Clinical Trials 2011; 8: 398–407 Statistical aspects of the TNK-S2B trial of tenecteplase versus alteplase in acute ischemic stroke: an efficient, dose-adaptive, seamless phase II/III design Bruce Levina, John LP Thompsona, Bibhas Chakrabortya, Gilberto Levya, Robert MacArthurb and E Clarke Haleyc Background TNK-S2B, an innovative, randomized, seamless phase II/III trial of tenecteplaseversusrt-PAforacuteischemicstroke,terminatedforslowenrollment before regulatory approval ofuse of phaseII patients in phaseIII. Purpose (1) To review the trial design and comprehensive type I error rate simulations and (2) to discuss issues raised during regulatory review, to facilitate future approval ofsimilar designs. Methods Inphase II,anearly (24-h) outcomeand adaptive sequential procedure selected one of three tenecteplase doses for phase III comparison with rt-PA. Decision rules comparing this dose to rt-PA would cause stopping for futility at phase II end, or continuation to phase III. Phase III incorporated two co-primary hypotheses, allowing for a treatment effect at either end of the trichotomized Rankin scale. Assuming no early termination, four interim analyses and one final analysis of 1908 patients provided anexperiment-wise type Ierror rateof <0.05. Results Over1,000distribution scenarios,eachinvolving 40,000replications,the maximumtypeIerrorinphaseIIIwas0.038.Inflationfromthedoseselectionwas morethanoffsetbytheone-halfcontinuitycorrectionintheteststatistics.Inflation from repeated interim analyses was more than offset by the reduction from the clinical stopping rules for futility atthefirstinterim analysis. Limitations Design complexity and evolving regulatory requirements lengthened thereview process. Conclusions (1)Thedesignwasinnovativeandefficient.Perprotocol,typeIerror was well controlled for the co-primary phase III hypothesis tests, and experiment- wise. (2a) Time must be allowed for communications with regulatory reviewers fromfirstdesignstages.(2b)AdequatetypeIerrorcontrolmustbedemonstrated. (2c) Greater clarity is needed on (i) whether this includes demonstration of type I error control if the protocol is violated and (ii) whether simulations of type I error control are acceptable. (2d) Regulatory agency concerns that protocols for futility stoppingmaynotbefollowedmaybeallayedbysubmittinginterimanalysisresults to them as these analyses occur. Clinical Trials 2011; 8: 398–407. http:// ctj.sagepub.com aDepartment of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA, bResearch Pharmacy, Columbia University, New York, NY, USA, cDepartment of Neurology, University of Virginia, Charlottesville, VA, USA Author for correspondence: John LP Thompson, Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th St Room 610,New York, NY 10025, USA E-mail: [email protected] ! The Author(s), 2011. Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774511410582 TNK-S2B trial of tenecteplase versus alteplase 399 Introduction an early stage of phase II; (2) at the end of phase II, a preliminary comparison of the clinical efficacy The TNK-S2B trial [1] was an innovative, multi- and safety endpoints from patients treated with the center, double blind, randomized, seamless phase selected dose of tenecteplase to those from II/IIIstudyofintravenoustenecteplase(TNK)versus patients treated with rt-PA, to determine promise standard-dose intravenous alteplase (rt-PA at 0.9 or futility for continuing into phase III; and (3) mg/kg) for treatment of patients with acute ische- if the results were promising, continuation of mic stroke within 3 h of onset. A key motivating the study, with additional clinical sites, to com- factor for development of tenecteplase was to pare the selected dose of tenecteplase with produce a molecular variant of rt-PA that would rt-PA in phase III, treating the decisions at the reduce the risk of symptomatic intracranial hem- end of phase II as the first interim analysis of orrhage (ICH) while retaining clinical efficacy. The phase III. We provide details about each of these phaseIIcomponentemployedanadaptivesequen- features. tial dose selection procedure to choose a preferred dose of tenecteplase, using an early (24 h) assess- ment of major neurological improvement (MNI) Sequential selection procedure for the best balanced against occurrence of symptomatic ICH. dose of tenecteplase Once a preferred tenecteplase dose was estab- lished, it was moved forward in the phase III The selection procedure was a truncated Levin- component to compare with standard-dose rt-PA. Robbins sequential elimination procedure [3,4] Decision rules comparing the selected tenecte- based on a rapid-response, ordered trichotomous plase dose and rt-PA on safety and efficacy outcome determined within 24 h of randomiza- outcomes were devised to yield a clear recom- tion. The best category was MNI, defined as a (cid:2)8- mendation to either stop the trial for futility at point improvement from baseline on the NIH the end of phase II, or continue into phase III. Stroke Scale [5] or a score of 0 at 24 h after stroke The trial was prematurely terminated for slow onset. The worst category was symptomatic ICH enrollment after 112 patients had been random- within 24 h of stroke onset. The intermediate ized at 8 clinical centers between 2006 and 2008. category was neither MNI nor ICH (NEI). Patients At that point, the proposal to include patients were randomized in permuted blocks of size 4 enrolled in the phase II portion of the trial in the matched by clinical site (quadruplets) to rt-PA or phase III analysis had not received regulatory one of the three doses of tenecteplase. Although approval because of concerns regarding control of patients were also randomized concurrently to the type I error rate. rt-PA to preserve the blinding and to eliminate Although the trial results were insufficient to other possible temporal biases, only the tenecte- establish promise or futility [1], its novel adaptive plase arms were involved in the selection proce- design,withsubstantialprovisionforthecontrolof dure. Once a rapid response was obtained from type I error, is nevertheless of interest, given each of the three patients assigned to the tenecte- plase doses in a matched quadruplet, their data current discussion of draft FDA guidelines on were entered into an ongoing sequential cumulative adaptive designs [2]. This article has two goals. sum scoring system. This added the value 2 to the The first is to review the key design features of the cumulative sum for any dose arm for a rapid- trial, and the design and results of its comprehen- response outcome of MNI, the value 1 for a rapid- sivetypeIerrorratesimulationstudies.Thesecond response outcome of NEI, and the value 0 for a is to discuss issues raised during the regulatory rapid-response outcome of ICH. The first time the review of the design, which focused on the type I cumulativesumforanyarm(s)withthelargestsum error simulations, and their implications for gain- was(were)6pointsaheadofthecumulativesumfor ing approval for use of this type of adaptive design any arm(s) with the smallest sum, the arm(s) with in the future. the smallest sum was (were) to be eliminated from thestudy,meaningthatthosearmswerenolonger to be considered candidates for selection and The TNK-S2B Design patients were no longer to be randomized to those arms. If one dose arm was eliminated at this The TNK-S2B trial incorporated three main design time, patients were to be randomized to the features: (1) a sequential selection procedure for remaining arms (and to rt-PA), and the procedure choosing one tenecteplase dose from among three was to proceed with their corresponding cumula- candidate doses (0.10, 0.25, and 0.40 mg/kg), by tive sums continued at their current tallies, until a the sequential elimination of inferior dose arms, second and final arm was eliminated. Once the potentially leading to a dose selection decision at second arm had been eliminated, the procedure http://ctj.sagepub.com Clinical Trials 2011; 8: 398–407 400 B Levin et al. Table1 Operatingcharacteristicsoftheselectionprocedurebasedon100,000simulationsforeachschemeundertheleastfavorable configuration Scheme(%MNI–%ICHforthreedoses) 36%–6% 36%–6% 31%–6% 31%–6% 26%–6% 16%–6% 16%–2% 21%–6% 21%–2% 26%–6% 16%–6% 16%–2% 21%–6% 21%–2% 26%–6% P[cs]a 0.976 0.958 0.802 0.646 0.297 E[min(N(1),m)] 22.5 27.6 30.7 38.5 35.4 E[min(N,m)] 35.9 43.7 59.3 73.6 74.2 Median[N] 31 37 50 65 65 Mode[N] 21 24 29 40 35 E[T] 94.4 115.0 149.3 185.7 183.7 P[nowinner] 0.0026 0.0093 0.041 0.107 0.110 aWhenthedoseshaveequalprobabilityofMNIandsymptomaticICH(asintheschemepresentedinthelastcolumn),selectionofany of the three doses is ‘correct’ with respect to the probability of MNI net of symptomatic ICH. In this case, the first row gives the probabilityofselectingthefirstlisteddoses.Exactlythesamefigureappliestotheothertwodoses. was to terminate and the remaining arm was to be maximum recruitment, yet sufficiently large to selected as the preferred dose of tenecteplase. If preserve the probability of correct selection. initially two arms were eliminated simultaneously, The operating characteristics of the selection theprocedurewastoterminatethenandselectthe procedure were estimated by simulation based on remaining arm as the preferred dose of tenecte- 100,000 replications for each scheme, and are plase. If dose selection occurred after complete presented in Table 1. P[cs] refers to the probability observationoftherapidresponsesfromfewerthan ofcorrectselection,i.e.reachingafinalelimination 100tenecteplase-matchedsets,randomizationcon- by the stated criterion with a correct selection of tinuedtoeithertheselecteddoseoftenecteplaseor the best dose at or before the 150th matched set. rt-PAuntilatotalof100patientsoneachtreatment The value for P[cs] does not include any correct had been randomized, at which time the clinical selections that might also occur at truncation time assessment for promise or futility was to be con- byclinicaldecisioncriteriashouldafinalwinnernot ducted. If the dose selection occurred after com- be declared by 150 matched sets. E[min(N(1),m)] plete observation of the rapid responses from refers to the expected time of first elimination or between 100 and 150 tenecteplase-matched sets, 150 triplets, whichever occurs first. This gives the randomization was to stop and the clinical assess- averagenumberofmatchedtripletsuntiltheearlier mentforpromiseorfutilitywastotakeplacethen. ofthefirstdoseeliminationortheendofthephase Ifnodoseselectionhadyetoccurredaftercomplete II trial; multiplying by 3 gives the number of observation of the rapid responses from 150 patientsrandomizedtothispointintheprocedure. tenecteplase-matched sets, the procedure was to E[min(N,m)] refers to the expected time of final be truncated, i.e., randomization would stop, and elimination or150matchedsets,whicheveroccurs thedoseselectionwastobecompletedinconjunc- first.Thisgivestheaveragenumberofmatchedsets tion with the clinical assessment for promise or until a dose is selected or 150 matched sets have futility as described below. The selection criterion been randomized, whichever occurs first. One of a lead of 6 was chosen to achieve a probability of cannot multiply this number by 3 to get the total correctselectionof(cid:2)0.8underthedesignalternative number of patients in the selection phase because probabilities of 0.31, 0.21, and 0.21 for the rapid- ofthepossibilityofearlyeliminationofadose.We responseoutcomeofMNIforthethreetenecteplase therefore provide another operating characteristic, doses, assuming a common probability of ICH of E[T], the expected total number of patients ran- 0.06 for each dose. A phase I pilot study of domized in the selection phase. Also, provided are tenecteplase [6] had shown substantially larger the median and modal numbers of matched sets differences in the proportion of subjects with MNI (approximate to the nearest integer). Finally, between tenecteplase doses: 36% of subjects given P[no winner] refers to the probability that trunca- 0.1mg/kghadMNIcomparedwith16%ofsubjects tion time will arrive without achievement of given 0.4 mg/kg. The literature also showed that the stated criterion for a final dose selection. ICH rates of approximately 6% are typical with In symbols, P[no winner]¼P[N>150]. In all rt-PA administration [7]. The truncation point was cases, the probability of selecting an incorrect selected small enough to achieve a feasible dose is given by 1(cid:3)P[cs](cid:3)P[no winner]. Table 1 Clinical Trials 2011; 8: 398–407 http://ctj.sagepub.com TNK-S2B trial of tenecteplase versus alteplase 401 shows the operating characteristics of different Theserulesfordeclaringpromiseorfutilitywere schemes in which one dose is superior and the clinical decision rules and were not based on twoinferiordosesareequalinprobabilitiesofMNI statistical significance criteria (except where the and symptomatic ICH; this is often called a ‘least interim monitoring plan would suggest that the favorable configuration.’ The ‘best’ dose in Table 1 DSMB consider early stopping of the study). While appears in the first row of each scheme. See somewhat arbitrary, they reflected the clear wishes Appendix (supplementary material) for the operat- oftheclinicalinvestigator,andprovidedunambig- ing characteristics under five other schemes of uous grounds for the required clinical decision interest. making at the end of phase II. They can also be viewedasanotherstatisticalselectionprocedure:at this point in the trial, we would need to select Preliminary comparisons of the clinical efficacy between two alternative courses – ‘to go’ or‘not to and safety endpoints go’ on to phase III – with a procedure that would provide a high probability of correct selection in The preliminary assessment for promise or futility the event there was a truly superior choice. at the end of phase II was based on the modified It is possible that the selection procedure might Rankin scale [8] observed 3 months after random- observe 150 matched sets without arriving at a ization, trichotomized into three ordered catego- selection decision. In that case, the following ries.ThebestcategorywasRankinscore0or1(good clinical criteria would be used to select a dose outcome);theworstcategorywasRankinscore4,5, from the remaining two or three competing arms or 6 (poor outcome, including death); and the and determine promise or futility. intermediate category was Rankin score 2 or 3 Criterion 1: Select the dose that allows continu- (neither, i.e., neither poor nor good outcome). There ation into phase III based on the futility criteria were three sets of clinical decision rules for declar- specified above. If no dose would allow continua- ing promise or futility, depending on the relative tion of the study based on the futility criteria, safety profile of (the selected dose of) tenecteplase further study with any of the three doses would be compared to rt-PA. declared unpromising. Scenario 1: Tenecteplase showed a lower symptom- Criterion 2: If under Criterion 1 more than one aticICHratethanrt-PA,definedastenecteplasehaving dose would allow continuation of the trial, select at least 2 fewer symptomatic ICHs than rt-PA on the the tenecteplase dose with the lowest ICH rate. rapid-response outcome. In this situation, declare Criterion 3: If under Criterion 2 more than tenecteplase promising if the observed proportion one dose would allow continuation of the trial, ofpatientswithpooroutcomeislessthanorequal select the dose with the lowest proportion of poor to that of rt-PA. If the proportion with poor outcomes. outcome on tenecteplase is greater than on rt-PA, Criterion 4: If under Criterion 3 more than one declarefurtherstudyunpromising.Inaddition,and dose would allow continuation of the trial, select consistent with the interim monitoring plan for thedosewiththehighestproportionofgoodoutcomes. safety and efficacy, declare further study unprom- See Appendix for further discussion. ising if the proportion of good outcomes for tenecteplase is significantly less than for rt-PA at the nominal two-tailed 0.001 level. Continuation of the study in phase III Scenario2:TherateofsymptomaticICHwithin24h for tenecteplase was effectively the same as for rt-PA, Iftheselecteddoseoftenecteplaseshowedpromise, definedastenecteplasehavingthesamenumberofICHs a total of at most 1908 patients were to be asrt-PAordifferingatmostbyplusorminusoneICH. randomized in phase III to the selected dose of In this situation, declare tenecteplase promising if tenecteplase or rt-PA (954 per group, including the the proportion of patients on tenecteplase with patientsstudiedinthesetwoarmsduringphaseII). poorRankinoutcomeisatleast8percentagepoints The primary endpoint for phase III was the lowerthanonrt-PA(i.e.,anarithmeticdifferenceof trichotomized Rankin score. Two co-primary null 0.08).Ifnot,declarefurtherstudyunpromising.As hypotheses were to be tested in phase III: (1) The in Scenario 1, also declare further study unpromis- proportion of poor outcomes with tenecteplase at ing if the proportion of good outcomes for theselecteddosedidnotdifferfromtheproportion tenecteplase is significantly less than for rt-PA at ofpooroutcomeswithrt-PA.(2)Theproportionof the nominal two-tailed 0.001 level. good outcomes with tenecteplase at the selected Scenario 3: Tenecteplase showed a higher symptom- dose did not differ from the proportion of good aticICHratethanrt-PA,definedastenecteplasehaving outcomes with rt-PA. Each hypothesis was to be two or more ICHs than rt-PA. In this situation, tested using the Mantel–Haenszel score test, strat- declare further study unpromising. ified by site, with 1/2-continuity correction. http://ctj.sagepub.com Clinical Trials 2011; 8: 398–407 402 B Levin et al. Thetwohypothesesweretobetestedattheoverall Multiple testing two-tailed(cid:2)¼0.05levelusingtheHolmstep-down procedure [9], which controls the probability of No correction for multiple testing is needed making one or two type I errors at no more than because the phase III component of the trial does 0.05. The planned sample size of 1,908 (954 per not compare each competing dose of tenecteplase group) would provide 90% power to detect a (cid:2)8% with rt-PA; it performs only one hypothesis test of reduction in poor outcome without a reduction in the selected dose of tenecteplase versus rt-PA (for good outcome, or 89% power to detect an 8% eachofthetwoco-primaryendpoints).Theinferior increase in good outcome without an increase in tenecteplase doses are eliminated in phase II by a poor outcome. selection procedure, not a hypothesis test against We formulated the two co-primary hypotheses rt-PA; moreover, the selection procedure uses an outcome different from (although correlated with) to allow for a treatment effect at either end of that in phase III. Thus, although the hypothesis the Rankin scale. This also facilitated identifying tested in phase III is determined by the selection the ‘win/lose’ situations for tenecteplase, as procedure, there is only one hypothesis test for required by regulatory reviewers. For example, if each co-primary endpoint. Since no more than a tenecteplasewerenobetterthanaplacebo,itwould single type I error can be committed, the need for reduce the proportion of poor outcomes due to adjustment for multiple testing does not arise. ICH, although it would also have poor efficacy compared to rt-PA. This would not be a ‘win’ situation. ThephaseIIItrialwastoincorporatefourformal Selection bias interim analyses and one final analysis. The first interim analysis was to take place at the end of The selection procedure does slightly inflate the phase II after enrollment of between 200 and 300 type I error rate. However, we show in a fixed patientsinthetwophaseIIIarms(between100and sample size simulation study (see below, and the 150 patients per arm, and not counting the other Appendix) that without stopping for futility or armsusedintheselectionstage).Thesecond,third, efficacy in interim monitoring, this inflation is more than compensated for by the 1/2-continuity andfourthinterimanalysesweretotakeplaceafter correction’s reduction in the type I error rate. We follow-up was completed for a total of 500, 1,000, also show below, in a group sequential study, that and 1,500 patients, respectively. The terminal the clinical criteria for futility stopping even with analysis was to take place after a total of 1908 interim monitoring reduce the type I error rate patient observations were complete, assuming no further below nominal levels. Given this, no cor- early termination. rection for selection bias resulting from the selec- Figure 1 contains a graphical representation tion procedure is necessary. of the win-lose-type situations for tenecteplase in the interim analyses and the terminal analysis, using barycentric coordinates. Any point in the triangle represents a triplet consisting of the Design of the type I error rate probabilities of poor, neither, and good outcome; simulation studies the perpendicular distance of the point from any one of the three sides represents the probability Given the complex design of TNK-S2B, simulation of the outcome denoted by the vertex opposite studies were required to demonstrate that the that side. statistical analysis plan specified above does in At each interim analysis, formal tests of the two fact control the type I error rate for the phase III primary hypotheses were to be conducted, each at trialatorbelownominallevels.Weconductedtwo the two-tailed (cid:2)¼0.001 level. Assuming both null suchstudies.One,theGroupSequential(GS)study, hypotheses to be true, rejection of either at any is presented here. A second, the Fixed Sample Size interimanalysiswouldconstituteatleastonetypeI (FSS) study, is included as an Appendix. As argued error. At the terminal analysis, assuming no earlier above, there was only one hypothesis test contem- stopping,eachprimaryhypothesiswastobetested platedfromthebeginningofphaseIItotheendof at the nominal (cid:2)¼0.025 level with 1/2-continuity phase III, that of the selected tenecteplase dose correction. versusrt-PA,andhencethetypeIerrorratewerefer TypeIerrorcontrolinseamlessphaseII/IIItrials to here is indeed an unconditional, experiment- posesunresolvedissues,particularlywhen,ashere, wiseerrorrateandnotconditionedontheidentity phase II involves a selection procedure rather than of the dose selected. ahypothesistest.Twoareparticularlyimportantin The FSS study considers a simplified version of the current case. the TNK-S2B trial design in which dose selection is Clinical Trials 2011; 8: 398–407 http://ctj.sagepub.com TNK-S2B trial of tenecteplase versus alteplase 403 Neither Neither FIRST INTERIM ANALYSIS = possible rt-PA FIRST INTERIM ANALYSIS = possible rt-PA N=200 outcome proportions N=200 outcome proportions Scenario I (poor, neither, good) Scenario II (poor, neither, good) # ICHs for rt-Pa –TNK > 1 = (0.40, 0.21, 0.39) Diff in # ICHs equals = (0.40, 0.21, 0.39) either +1, 0, or –1 NW Zones NW Zones W = winner for TNK W = winner for TNK NW = not a winner NW = not a winner NS = not significant NS = not significant L = loser for TNK L = loser for TNK C C L W L W Poor Good Poor Good SECOND INTERIM ANALYSIS Neither = possible rt-PA THIRD INTERIM ANALYSIS Neither = possible rt-PA N=500 outcome proportions N=1000 outcome proportions (poor, neither, good) (poor, neither, good) = (0.40, 0.21, 0.39) = (0.40, 0.21, 0.39) NW Zones NW Zones W = winner for TNK W = winner for TNK NW = not a winner NW = not a winner NS = not significant NS = not significant L = loser for TNK L = loser for TNK C C L W L W Poor Good Poor Good Neither Neither FOURTH INTERIM ANALYSIS = possible rt-PA TERMINAL ANALYSIS = possible rt-PA N=1500 outcome proportions N=1908 outcome proportions (poor, neither, good) (poor, neither, good) = (0.40, 0.21, 0.39) = (0.40, 0.21, 0.39) NW Zones NW Zones W = winner for TNK W = winner for TNK NW = not a winner NW = not a winner NS = not significant NS = not significant L = loser for TNK L = loser for TNK C L W L W NS Poor NW Good Poor NW Good Figure 1 Graphical representation of the win-lose-type situations for tenecteplase in the interim analyses as well as the terminal analysis, using barycentric coordinates http://ctj.sagepub.com Clinical Trials 2011; 8: 398–407 404 B Levin et al. possible; there is no examination of clinical out- assignment under the null hypothesis. Therefore, come data for promise or futility; and the trial forfeasibilityofthepresenttypeIerrorratestudies, proceeds to a single final analysis with a total of weusedonlythepooledz-scoretestfortheequality 1,908patients,954intheselectedtenecteplasearm of two proportions with 1/2-continuity correction and 954 in the rt-PA arm, with no stopping for for each primary hypothesis. futilityorforstrongshowingofinterimefficacy.Its purpose is to highlight the simultaneous effects of the dose selection from stage 1 and the 1/2- Specification of distribution schemes continuity correction, which would otherwise be dominated by the larger effect of the examination The simulation of both rapid-response outcomes for promise or futility and the effect of interim and 3-month Rankin outcomes requires specifica- monitoring. See Appendix for details. tion of the joint distribution of two trichotomous random variables, i.e., a 3(cid:4)3 table of joint prob- abilities,foreachtreatmentarm.Wecallsuchaset The GS simulation study of four 3(cid:4)3 tables a distribution scheme; 1000 different distribution schemes were used in the The purpose of the GS study is to simulate the full simulationstudies.Generationofthesedistribution TNK-S2B trial, including the dose selection, exam- schemes is described in detail in the Appendix. In inationforpromiseorfutility,andphaseIIIinterim the simulation studies, our goal was to produce a monitoring features, to estimate their combined set of distributions that would cover a portion of impact on the type I error rates. All replications of the parameter space that was of direct clinical theGSstudyareusedintheestimationoftheerror interest to the TNK-S2B trial, and that the portion rates. Note that according to the design, if the coveredwassufficientlybroadastorepresenttypeI actualtrialstoppedattheendofphaseIIforfutility error rates accurately over the entire theoretical because a selected dose lacked promise, or if there parameter space. Let the three-category rapid wasafailuretoselectatenecteplasedosebecauseall response be denoted by X, taking values of 0 for competingdoseslackedpromiseattruncationtime, ICH,1forneitherMNInorICH,and2forMNI.Let there would be no phase III trial and hence no the trichotomized 3-month Rankin scale for the declaration of significance and no type I errors clinical outcome be denoted by Y, taking values of under the null hypothesis (unless a dose had a 0 for poor outcome, 1 for neither poor nor good significantly different probability of poor or good outcome,and2forgoodoutcome.Letthedosesof outcome compared to rt-PA at the 0.001 level). tenecteplase be labeled A, B, and C, corresponding Thus, in the GS study, if a particular simulation to 0.1, 0.25, and 0.40 mg/kg, respectively, and let under a given null hypothesis scheme stops for doseDrefertort-PA.LetTdenoteanyofthesefour futility because a selected dose lacks promise treatmentarms.Itismostconvenienttodetermine (without attaining statistical significance at the the joint distribution of (X, Y) by first specifying two-tailed 0.001 level for either endpoint), we P[X|T], and then specifying P[Y|X, T]. Once these count the simulation as contributing no type I three-component vectors of probabilities are deter- error.Ifthesimulationresultsinafailuretoselecta mined,randomrealizationsofthepair(X,Y)canbe tenecteplase dose because all competing doses lack produced by generating a trinomial response X promise at truncation time, we choose one dose at following P[X|T], and then by generating another randomtotestforstatisticalsignificanceatthetwo- trinomial response Y following P[Y|X, T]. In the tailed0.001levelforbothpoorandgoodoutcomes, simulation studies, 10 different marginal condi- after which the simulated trial stops. On the other tional distributions for X given T were selected at hand,ifatanyinterimanalysisthetwo-tailed0.001 randominamannerdescribedintheAppendix;see significancelevelisattainedforeitherpoororgood Figure 2(a) for visualizing these distributions. For clinical outcome, or at the terminal analysis the each one of these, 10 different marginal distribu- two-tailed 0.025 significance level is attained for tions for Y given T (identical for each T under the eitherpoororgoodclinicaloutcome,wecountthe null hypotheses) were generated randomly simulationascontributingatleastonetypeIerror. (Figure 2(b)); and for each of the 10(cid:4)10¼100 Each simulation study used 40,000 replications pairs of marginal distributions for X and Y, 10 under each of 1000 different null hypothesis different conditional distributions for Y given X schemes described below. Note that the Mantel– and T were generated randomly, subject to the Haenszel score test would have type I error rates marginalizationconstraintthattheweightedaverage almost identical to that of a simple comparison of of P[Y|X, T] using weights P[X|T] agree with the pooled proportions, because in this trial the site- given distribution P[Y|T], together with a clinical stratified randomization procedure rendered the monotonicity constraint that P[Y¼poor|X, T] is the stratification factor orthogonal to the treatment greatest when X¼ICH and least when X¼MNI, Clinical Trials 2011; 8: 398–407 http://ctj.sagepub.com TNK-S2B trial of tenecteplase versus alteplase 405 Figure 2 Graphical representation of the different distribution schemes in terms of barycentric coordinates: (a) 10 marginal conditional distributions of X|T, for T ¼ A, B, C, D, along with the region of clinical interest; (b) 10 marginal conditional distributionsofY|T(sameforallTunderthenullhypothesis),alongwiththeregionofclinicalinterest;(c)1,000distributionsofY|X, Tforeachof12(X,T)combinations,wheredifferentcolorsrepresentdifferentvaluesofT;(d)1,000distributionsofY|X,Tforeach of 12 (X, T) combinations, where different colors represent different values of X, and which is consistent with the clinical monotonicity constraint and similarly, P[Y¼good|X, T] is the least when lower portion of the triangle. Figure 2(d) shows a X¼ICH and the greatest when X¼MNI. Thus, visual effect of the clinical monotonicity con- 10(cid:4)10(cid:4)10¼1000 different distribution schemes straint. As stated above, we simulated 40,000 trial were employed for the simulations. Figure 2(c) replications for each distribution scheme. Each containsagraphicalrepresentationofthecomplete replication generated random samples of up to list of distributions, using barycentric coordinates, 150 pairs of outcomes (X, Y) for each member of a displaying a uniform distribution of T over the quadruplet or matched set for the selection stage, http://ctj.sagepub.com Clinical Trials 2011; 8: 398–407 406 B Levin et al. Table2 DescriptivestatisticsofthedistributionoftypeIerrorsintheGSstudyacross1,000schemes Mean Median Maximum Minimum Lowerquartile Upperquartile Range StdDev. Analysisvariable:typeIerrorforpooroutcome 0.0091 0.0097 0.0194 0.0012 0.0053 0.0123 0.0182 0.0045 Analysisvariable:typeIerrorforgoodoutcome 0.0088 0.0096 0.0192 0.0010 0.0046 0.0120 0.0182 0.0046 Analysisvariable:typeIerrorforeitherpoororgoodoutcome(overallerror) 0.0179 0.0194 0.0380 0.0024 0.0098 0.0243 0.0356 0.0091 and additional (X, Y) pairs to make up the total thesumoftheerrorratesinthetwotailsarewithin samplesizeof1908clinicaloutcomesforthephase nominal levels. It is entirely acceptable for a two- III trial. tailedtesttoemployanasymmetricalallocationof totaltypeIerrorinthetwotails,solongasthetotal type I error rate is within nominal levels [10]. Similar results were obtained in the FSS study, and Results are described in detail in the Appendix. IntheGSstudy,thetypeIerrorswereallwellbelow their nominal levels, by about one-third. Table 2 shows that the average two-tailed type I error rate Discussion (averagedacrossall1,000distributionschemes)was approximately 0.009 for both poor and good out- The TNK-S2B trial employed an innovative, ran- comes. Among the 2,000 hypothesis tests con- domized, seamless phase IIB/III design to test ducted among the 1,000 distribution schemes, the tenecteplaseversusstandard-dosert-PAinthetreat- maximumestimatedtypeIerrorwasapproximately mentofpatientswithacuteischemicstrokewithin 0.019. The third panel of Table 2 shows that the 3 h of onset. For the phase II part of the trial, we average of the overall type I error rate for the two chose an adaptive sequential dose selection proce- primary outcomes (i.e., the probability of at least durethatemployedarapidassessmentofMNIat24 one type I error) was approximately 0.018, with hbalancedagainstoccurrenceofsymptomaticICH maximum value approximately 0.038. The type I to choose among three different doses of error rate is evidently under good control. tenecteplase. TheseresultsshowthattheinflationinthetypeI Regulatoryreviewersquestionedthreeaspectsof error rates caused by the dose selection and the control of the type I error rate. The first repeated looks at the data in interim analyses is concerned whether or not multiple comparisons morethanoffsetbythereductioninthetypeIerror techniqueswererequiredgiventhatthetrialbegan rate caused by implementing the clinical stopping with three doses of tenecteplase, and whether or rules for futility at the first interim analysis. In the notanadjustmentforselectioneffectswasneeded. GS study, selection in the phase II component We have addressed those concerns above. somewhat elevates the probability of declaring the ReviewerswerefurtherconcernedthatthetypeI selected dose of tenecteplase significantly better error rate would not be controlled under schemes than rt-PA with respect to poor outcome, while it which would not occur under the protocol, e.g., if simultaneouslydecreasessomewhattheprobability the trial were continued into phase III absent of declaring the selected dose of tenecteplase clinical criteria for promising efficacy. We under- significantly worse than rt-PA. The average type I stand the concern, in that noncompliance with errorrateintheformertailwas0.0069whilethatin prespecified protocols is a major problem when it the latter tail was 0.0022. Analogous results were occurs, as it often has. However, this would have obtained for good outcome. The average type I been in violation of the clearly specified trial error rate for declaring the selected dose of protocol,whichstatedthefutilitystoppingrequire- tenecteplase significantly better than rt-PA with ments as rules rather than guidelines. It has been respect to good outcome was 0.0061 while the acceptedintheongoingdialogbetweenresearchers average type I error rate for declaring the selected and regulators that adaptive designs must be dose of tenecteplase significantly worse than rt-PA prespecified rather than ad hoc. This presupposes was0.0027.Notehowtheasymmetricalallocations and requires respect for the prespecified protocol, oftypeIerrorcounterbalanceeachother,suchthat under which the type I error rate is computed. Clinical Trials 2011; 8: 398–407 http://ctj.sagepub.com TNK-S2B trial of tenecteplase versus alteplase 407 Ultimately, one cannot assure type I error control Funding under arbitrary protocol violations. This applies with equal force to traditional as well as adaptive This research was supported by grants (R01- trial designs. NS37666 and R01-NS45170) from the National Finally, a reviewer was doubtful whether any Institute of Neurological Disorders and Stroke – amountofsimulationscoulddemonstrateadequate National Institutes of Health. Genentech, Inc. control of the type I error rate. However, simula- supplied study drug (both tenecteplase and alte- tions are recognized under the FDA draft guidance plase)forthisclinicaltrial,butnootherfinancialor document [2], and given the coverage of the other support. parameter space in ours, it seemed to us unreason- able to sustain such doubt. Clarification is needed whether control of the error rate must be demon- strated by theorems or can be demonstrated by References simulation. Inaddition, asensitive reviewerofthis article suggested that the regulatory agency’s con- 1. Haley EC, Thompson JLP, Grotta JC, et al. for the cern might be allayed by a commitment from the TenecteplaseinStrokeInvestigators.PhaseIIB/IIITrial of Tenecteplase in Acute Ischemic Stroke: Results of a trialinvestigatorstosubmitinterimanalysisresults Prematurely Terminated Randomized Clinical Trial. to it as these analyses occur. This proposal merits Stroke2010;41:707–711. consideration. 2. U.S. Department of Health and Human Services Food In summary, the TNK-S2B trial design, while and Drug Administration, CDER/CBER, February, 2010. complex,wasinnovativeandefficient.Itsstatistical Guidance for Industry: Adaptive Design Clinical Trials for Drugs and Biologics. Available at http://www.fda.gov/ analysis plan had great integrity. Under the proto- downloads/Drugs/GuidanceComplianceRegulatoryInform col, it would have controlled the unconditional ation/Guidances/UCM201790.pdf (accessed 15 type I error rate below the nominal 0.05 level for September2010). the two primary hypothesis tests in phase III, and 3. LevinB,RobbinsH.Selectingthehighestprobabilityin experiment-wise. The trial provides several lessons binomial or multinomial trials. Proc Nat Acad Sci USA 1981;78:4663–66. for adaptive designs. (a) Time must be allowed for 4. LeuCS,LevinB.Ontheprobabilityofcorrectselection iterativecommunicationswithregulatoryreviewers in the Levin-Robbins sequential elimination procedure. from the first stages of the design and planning StatSin1998;9:879–91. process,astherecentFDAdraftguidancedocument 5. Lyden P, Raman R, Liu L, et al. NIHSS training and certification using a new digital video disk is reliable. stresses [2]. (b) Type I error control must be clearly Stroke2005;36:2446–49. demonstrated. (c) Greater clarity is needed on 6. HaleyEC,LydenPD,JohnstonKC,HemmenTM.The whether this includes demonstration that type I TNK in Stroke Investigators. A pilot dose-escalation error control will be maintained if the protocol is safety study of tenecteplase in acute ischemic stroke. violated,andonwhetherornotsimulationsarean Stroke2005;36:607–12. 7. Graham G. Tissue plasminogen activator for acute acceptable form of demonstration. (d) Regulatory ischemic stroke in clinical practice: a meta-analysis of agency concerns that the protocol for futility safetydata.Stroke2003;34:2847–50. stopping may not be followed may be allayed by a 8. van Swieten JC, Koudstaal PJ, Visser MC, et al. commitment to submit all interim analysis results Interobserveragreementfortheassessmentofhandicap to the regulatory agency as these analyses occur. instrokepatients.Stroke1988;19:604–7. 9. Holm S. A simple sequentially rejective multiple test Future trials with similar potential to TNK-S2B will procedure.ScandJStat1979;6:65–70. have a greater probability of success if these issues 10. FleissJL,LevinB,PaikMC.StatisticalMethodsforRates can be successfully addressed. andProportions(3rdedn).Wiley,NewYork,2003. http://ctj.sagepub.com Clinical Trials 2011; 8: 398–407

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.