Who Should Be Afraid of the Jeffreys-Lindley Paradox? Aris Spanos*y ThearticlerevisitsthelargenðsamplesizeÞproblemasitrelatestotheJeffreys-Lindley paradoxtocomparethefrequentist,Bayesian,andlikelihoodistapproachestoinference andevidence.Itisarguedthatwhatisfallaciousistointerpretarejectionof H aspro- 0 vidingthesameevidenceforaparticularalternativeH ,irrespectiveofn;thisisanex- 1 ampleofthefallacyofrejection.Moreover,theBayesianandlikelihoodistapproaches areshowntobesusceptibletothefallacyofacceptance.Thekeydifferenceisthatinfre- quentisttestingtheseverityevaluationcircumventsbothfallaciesbutnosuchprincipled remedyexistsfortheotherapproaches. 1. Introduction. The large n problem was initially raised by Lindley ð1957ÞinthecontextofthesimpleNormalmodel, X ∼NIIDðv;j2Þ; k51;2;:::;n;:::; ð1Þ k where NIIDðv;j2ÞstandsforNormal,Independent,andIdenticallyDistrib- uted with mean v∈ð2`;`Þ and variance j2>0 ðassumed knownÞ, by pointing out ðaÞThelargenproblem.Frequentisttestingissusceptibletothefallacious resultthatthereisalwaysalargeenoughsamplesizenforwhichanysim- pleðpointÞnullhypothesis,sayH :v5v ,willberejectedbyafrequentist 0 0 a-significanceleveltest. ReceivedApril2012;revisedAugust2012. *Tocontacttheauthor,pleasewriteto:DepartmentofEconomics,VirginiaTech,Blacksburg, VA24061;e-mail:[email protected]. yThanksareduetoDeborahMayofornumerousdiscussionsonthesetopicsandtotwoanon- ymousrefereesformanyusefulcommentsandsuggestions. PhilosophyofScience,80(January2013)pp.73–93.0031-8248/2013/8001-0004$10.00 Copyright2013bythePhilosophyofScienceAssociation.Allrightsreserved. 73 This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions 74 ARISSPANOS Lindley went on to claim that this result is paradoxical because, when viewedfromtheBayesianperspective,onecanshow ðbÞtheJeffreys-Lindleyparadox.Forcertainchoicesof theprior,the posteriorprobabilityof H ;givenafrequentista-significancelevelrejec- 0 tion,willapproachoneasn→`. This result was later called the Jeffreys-Lindley paradox because the broaderissueofconflictingevidencebetweenthefrequentistandBayesian approacheswasfirstraisedbyJeffreysð1939/1961,359–60Þ.Claimsaandb contrastthebehaviorofafrequentisttestðp-valueÞandtheposteriorprob- abilityof H asn→`,whichbringsupapotentialforconflictbetweenthe 0 frequentistandBayesianaccountsofevidence: ðcÞBayesiancharge1.“TheJeffreys-Lindleyparadoxshowsthatforin- ferenceaboutv;p-valuesandBayesfactorsmayprovidecontradictoryev- idence and hence can lead to opposite decisions” ðGhosh, Delampady, and Samanta 2006, 177Þ. ThispotentialconflictisgivenamoredistinctBayesianslantby ðdÞBayesiancharge2.AhypothesisthatiswellsupportedbytheBayes factorcanbeðmisleadinglyÞrejectedbyafrequentisttestwhennislarge ðseeBergerandSellke1987,112–13;Howson2002,45–49Þ. Theproblemofconflictingevidencepertainstothebroaderphilosophical issueofgroundingstatisticalpracticeonsoundprinciplesofinferenceand evidence.Whathasnotbeenadequatelyexplainedinthisliteratureiswhy, giventherejectionofH byafrequentisttest,itsposteriorprobabilitygoing 0 tooneasn→`ðirrespectiveofthetruthorfalsityof H Þisconducivetoa 0 moresoundaccountofevidence. Theprimaryobjectiveofthisarticleistoconsiderthisissuebycompar- ing the frequentist, Bayesian, and likelihoodist accounts of inference and evidence.Thediscussioncanbeseenaspartofawiderendeavortousethe errorstatisticalperspectiveðMayo1996Þinanattempttohaveacloserlook atseveralBayesianallegationsthathaveunderminedthecredibilityoffre- quentiststatisticsinphilosophicalcirclesoverthepasthalfcentury. Thebriefcommentsinsection2provideapreludetothediscussionthat follows by clarifying certain key issues at the outset. Section 3 introduces the large n problem in frequentist testing using a numerical example dis- cussed by Stone ð1997Þ. This example is based on a very large sample ðn5527,135Þthatisusedtobringoutthefallaciousclaimsassociatedwith the Jeffreys-Lindley paradox without the technical difficulties of invoking This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions AFRAIDOFTHEJEFFREYS-LINDLEYPARADOX? 75 limitingarguments asn→`. In sections 4 and 5, the Bayesian and likeli- hoodistapproachesareappliedtothisexamplewithaviewtodemonstrate thatbothapproachesarefarfromimmunetofallaciousresults,contraryto thecurrentview amongproponentsoftheBayesian andlikelihoodistper- spectivesðseeBerger1985;Royall1997;Robert2007;interaliaÞ.Section6 illustrates how the postdata severityevaluationofthe p-valueandaccept/ rejectresultsaddressesnotonlythelargenproblembutalsothebroaderfal- laciesofacceptance/rejectionandcallsintoquestionthechargesstemming fromtheJeffreys-Lindleyparadox.Theseverityperspectiveisthenusedin section7toshedlightonwhytheBayesianandlikelihoodistaccountsofev- idencegiverisetohighlyfallaciousresults. 2. ClarifyingWhatIsFallaciousorParadoxical. Beforediscussingthe variousclaimsthatrelatetotheJeffreys-Lindleyparadox,itisimportantto bring out certain key issues that have not been adequately illuminated by theliterature. First,infrequentisttesting,whichincludesbothFisher’ssignificanceand theNeyman-Pearsontesting,thelargenproblemarisesnaturallybecausethe powerofany“good”ðconsistentÞtestincreaseswith n.Ana-significance levelNeyman-Pearsontestissaidtobeconsistentwhenitspowertodetect any discrepancy g≠0 from H approaches one as n→`: In this sense, 0 there is nothing fallacious or paradoxical about a small p-value or a rejec- tion of the null, for a given significance level a;when n is large enough, since a highly sensitive test is likely to pick up on tiny ðin a substantive senseÞ discrepancies from H . 0 Second,Bayesiancharge2ðdÞoverlooksthefactthatthecornerstoneof Neyman-Pearsontestingisthetrade-off betweenthetypeIðrejectH when 0 trueÞand type II ðaccept H when falseÞerror probabilities. In this sense, 0 these charges ignore the decrease in the type II error probability as n in- creases,since,foragivendiscrepancyg,thepowerisoneminusthetypeII errorprobability.Indeed,variousattemptshavebeenmadetoalleviatethe largenproblem,includingdecreasingaasnincreasesinanattempttocoun- terbalancetheincreaseinpowerassociatedwithnðseeLehmann1986Þ.The difficulty,however,isthatonlycruderulesofthumbforadjustingacanbe devised because the power of a test depends on other factors besides n. Third,thelargenproblemðaÞconstitutesanexampleofabroaderprob- lemknownasthefallacyofrejection:ðmisÞinterpretingrejectH ðevidence 0 againstH ÞasevidenceforaparticularH ;thiscaneasilyarisewhenatest 0 1 hasveryhighpower.Duetothetrade-off betweentypeIandIIerrorproba- bilities,anyattempttoamelioratetheproblembyselectingasmallersignif- icance level when n is large might render the result susceptible to the re- versefallacyknownasthefallacyofacceptance:ðmisÞinterpretingacceptH 0 ðnoevidenceagainstH ÞasevidenceforH ;thiscaneasilyarisewhenatest 0 0 This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions 76 ARISSPANOS hasverylowpowerðe.g.,nisverysmallÞ.Asarguedbelow,thelargenprob- lem ðaÞ, when a rejection of H isinterpretedasevidenceforH ,andthe 0 1 Jeffreys-LindleyparadoxðbÞconstituteexamplesof thefallaciesof rejec- tionandacceptance,respectively. The main argument of this article can be stated succinctly as follows: ðiÞAlthoughthereisnothingfallaciousaboutasmallp-value,orarejection of H ;whennislargeðitisafeatureofagoodfrequentisttestÞ,ðiiÞthereisa 0 problemwhensuchresults aredetachedfromthe testitself andaretreated asprovidingthesameevidenceforaparticularalternativeH regardlessof 1 thegenericcapacityðthepowerÞofthetestinquestion.Thelargenproblem isdirectlyrelatedtopointsðiÞandðiiÞbecausethepowerdependscruciallyon n:Thatinturnrendersarejectionof H withalargenðhighpowerÞverydif- 0 ferentinevidentialtermsfromarejectionof H withasmallnðlowpowerÞ. 0 Thatis, therealproblemdoes notlie withthe p-valueortheaccept/reject rules as suchbutwithhow such resultsare fashioned into evidence foror against a particular hypothesis or an inferential claim relating to H ðH Þ. 0 1 ðiiiÞItisarguedthatthelargenproblemcanbecircumventedbyusingthe postdataseverityassessmenttoprovideasoundevidentialaccountforfre- quentistinference.Whetherdatax provideevidencefororagainstapartic- 0 ular hypothesisðH orH Þdepends cruciallyonthecapacityofthetestin 0 1 questiontodetectdiscrepanciesfromthenull.Thisstemsfromtheintuition thatasmallp-valueorarejectionof H basedonatestwithlowpowerðe.g., 0 a small nÞ for detecting a particular discrepancy g provides stronger evi- denceforthepresenceofgthanusingatestwithmuchhigherpowerðe.g., alargenÞ.AsfirstpointedoutbyMayoð1996Þ,thisintuitioniscompletelyat oddswith the Bayesian and likelihoodist intuitions articulated by Berger andWolpertð1988Þ,HowsonandUrbachð2006Þ,andSoberð2008Þ.Indeed, Mayowentontoproposeafrequentistevidentialaccountbasedonharness- ingthisperceptiveintuitionintheformofapostdataseverityevaluationof theaccept/rejectresults.Thisisbasedoncustomtailoringthegenericcapac- ityofthetesttoestablishthediscrepancygwarrantedbydatax .Thiseviden- 0 tialaccountcanbeusedtocircumventtheabovefallacies,aswellasother ðmisplacedÞ charges against frequentist testing ðsee Spanos 2011Þ. ðivÞ In contrasttofrequentisttesting,nosuchreasonedremedyexistsfortheBayes- ian andlikelihoodist approaches,whoseevidential accounts are shown to beequallyvulnerabletothesefallacies.Astrongcasecanbemade,orsoit is argued,thatthefallaciousnatureofthe Jeffreys-Lindleyparadoxstems primarilyfromthefactthattheBayesianandlikelihoodistapproachesdis- misstherelevanceofthegenericcapacityoftheparticulartestintheirevi- dentialaccounts.Indeed,theseaccountscastasidetherelevanceof thesam- ple space beyond x and “controlling” the relevant error probabilities for 0 beingthekeyweaknessesofthefrequentistapproach. This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions AFRAIDOFTHEJEFFREYS-LINDLEYPARADOX? 77 3. The Large n Problem in Frequentist Testing. The approach to fre- quentiststatisticsfollowedinthisarticleisknownaserrorstatisticsðMayo 1996Þ.It can be viewed as a refinement/extension of the Fisher-Neyman- Pearson approach that offers a unifying inductive reasoning for frequentist inference. It refines the Fisher-Neyman-Pearson approach by bringing out the importanceofspecifyingexplicitlyaswellasvalidatingvis-à-visdatax 0 theinductivepremisesofinference.ItextendstheFisher-Neyman-Pearson approachby supplementing it with a postdata severity assessment with a viewtoaddressanumberoffoundationalproblemsrelatingtoasoundev- idential account ðsee Mayo and Spanos 2004, 2006, 2011Þ. Considerthe followingnumerical example discussedbyStoneð1997Þ: Aparticle-physicscomplexplanstorecordtheoutcomesofalargenumber ofindependentparticlecollisionsofaparticulartype,wheretheoutcomes areeithertypeAortypeB....Theresultsaretobeusedtotestatheoreti- calpredictionthattheproportionoftypeAoutcomes,h,isprecisely1/5, againstthevaguealternativethathcouldtakeanyothervalue.Theresults arrive:106,298typeAcollisionsoutof527,135.ð263Þ How can one test this substantive hypothesis of interest? Thefirststepistoembedtheabovematerialexperimentintoastatistical model and frame the substantive hypothesis in terms of statistical param- eters. With that in mind, let us assume that each of these n trials can be viewed as a realization of a sample ðX ; X ;:::; X Þ of IID random vari- 1 2 n ables defined by ( 1 if typeAcollisionoccurs; X 5 0 if typeBcollisionoccurs; whichtransformstheobservedsequenceofparticlesintodatax :5ð1,0,0, 0 1, 1, . . . , 0Þ. These conditions render the simple Bernoulli ðBerÞmodel, X ∼BerIID½v;vð12vÞ(cid:2); v∈½0;1(cid:2); k51;2;:::;n;:::; ð2Þ k appropriateasastatisticalmodelinthecontextofwhichthematerialexper- iment canbe embedded.Note that inpracticeone should validate the IID assumptionstosecurethereliabilityofinference.Thesubstantivehypothesis ofinterest,h5.2,canbeframedintermsoftheNeyman-Pearsonstatistical hypotheses: H∶v5v versus H∶v≠v ; for v 5:2; ð3Þ 0 0 1 0 0 This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions 78 ARISSPANOS specifiedsolelyintermsoftheunknownparameterof thestatisticalmodel in ð2Þ.Aproperframingof theNeyman-Pearsonhypothesesrequiresapar- titioningoftheparameterspace,irrespectiveofwhetheroneissubstantively interestedinoneormorespecificvaluesof v.Fromthestatisticalperspec- tive,allvaluesof vin½0;1(cid:2)arerelevantfordefiningtheoptimalityof thetest ðseeSpanos2010,569Þ. ThetestT :5fdðXÞ;C ðaÞg,definedby a 1 pffiffiffi nðX 2v Þ dðXÞ5 pffiffiffiffiffiffiffinffiffiffiffiffiffiffiffiffiffi0ffiffiffi ∼H0 Binð0;1;nÞ; v0ð12v0Þ ð4Þ C ðaÞ5fx∶jdðxÞj ≥ c g; 1 ða=2Þ whereX 5ð1=nÞon X and C ðaÞ; c denote the rejection region and n i51 i 1 ða=2Þ value,respectively,isauniformlymostpowerfulunbiasedNeyman-Pearson testðseeLehmann1986Þ.Usingð4ÞonecandefinethetypeIerrorprobabil- ityðsignificancelevelÞ: PðjdðXÞj>c ;H Þ5a: ð5Þ ða=2Þ 0 Thesamplingdistributioninð4ÞisbasedonthefactthatforaBernoulliIID sampleX:5ðX,X,...,XÞ,therandomvariable Y 5nX 5on X;where 1 2 n n i51 i Y denotes the number of 1’s in n trials, is binomially ðBinÞ distributed: (cid:3) (cid:4) fðy;v;nÞ5 n vyð12vÞn2y; y50;1;2;:::;n: y In light of the large sample size ðn 5 527,135Þ, it is often judicious to chooseasmalltypeIerrorðLehmann1986Þ,saya5.003,whichyieldsa rejectionvalueofc 52:968;notethattheNormalapproximationtothe ða=2Þ binomialdistributionisquiteaccurateinthiscase. ThepoweroftestT atv 5v 1g definedby a 1 0 1 Pðv Þ5PðjdðXÞj>c ; v 5v 1g Þ; 1 ða=2Þ 1 0 1 forv ∈V ; 1 1 as well as the type II error probability bðv Þ512Pðv Þ; for v ∈V , are 1 1 1 1 evaluatedusingthesamplingdistributionofdðXÞunderH ,whichtakesthe 1 form: dðXÞv5∼v1 Binðdðpv1ffiÞffiffi;Vðv1Þ;nÞ; forv1∈V1:5V2f:2g; nðv 2v Þ v ð12v Þ ð6Þ wheredðv Þ5 pffiffiffiffiffiffi1ffiffiffiffiffiffiffiffiffiffi0ffiffiffiffi and Vðv Þ5 1 1 : 1 v0ð12v0Þ 1 v0ð12v0Þ This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions AFRAIDOFTHEJEFFREYS-LINDLEYPARADOX? 79 This indicates that test Ta is consistent becausepitffisffiffi power increases with dðv Þ—a monotonically increasing function of n—approaching one as 1 n→`ðseeLehmann1986Þ.Hence,otherthingsbeingequal,increasingn increases the power of this test, confirming that there is nothing paradox- ical aboutalarger nrenderingaðconsistentÞtestmoresensitive. ApplyingtheNeyman-PearsontestT totheabovedata, a 106;298 x 5 50:20165233; n 527;135 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð7Þ 527;135½ð106;298=527;135Þ2:2(cid:2) dðx Þ5 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 52:999; 0 :2ð12:2Þ leads to a rejection of H .The p-value, defined as the probability of ob- 0 serving an outcome x∈f0;1gn thataccordslesswellwithH thanx does 0 0 when H istrue,confirmstherejection of H : 0 0 PðjdðXÞj>jdðx Þj;H Þ5pðx Þ5:0027: ð8Þ 0 0 0 Thisdefinitionispreferredtothetraditionalone,“theprobabilityof observ- ingaresultmoreextremethanx underH ,”becauseitblendsinbetterwith 0 0 thepostdataseverityevaluationdiscussedinsection6.Theresultpðx Þ5 0 :0027suggeststhatdatax indicate“some”discrepancybetweenv andthe 0 0 “true” v ðthat gave rise to x Þ,butitprovidesnoinformationaboutitsmag- 0 nitude. As mentioned above, what is problematic is the move from the accept/ rejectresults,andthep-value,toclaimingthatdatax provideevidencefora 0 particular hypothesis, becausesuch a move is highlyvulnerable to the fal- lacies of acceptance and rejection. However, in the context offrequentist testing,thisvulnerabilitycanbecircumventedusingapostdataseverityeval- uation. How does the Jeffreys-Lindley paradox arise in this context? Using a spikedpriordistributionðLindley1957Þoftheform pðv5v Þ5p and pðv≠v Þ512p ; ð9Þ 0 0 0 0 theformalclaimassociatedwiththisparadoxis (cid:3) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(cid:4) p H jX 5v 1c v ð12v Þ=n →1 as n→ `; 0 n 0 ða=2Þ 0 0 thatis,theposteriorprobabilityof H ;conditionalondðx Þ5c ,goesto 0 0 ða=2Þ one as n approaches infinity. What is not so obvious is why this result is consideredanindicatorofthesoundnessoftheBayesianaccountofevi- dence. This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions 80 ARISSPANOS In relationto the natureofthe nullhypothesisðsimpleorcompositeÞ, it is important to elucidate certain issues raised in this literature. First, in the case of two composite hypotheses H∶v≤v versus H∶v>v ; where v 5:2; ð10Þ 0 0 1 0 0 the test T*:5fdðXÞ;C*ðaÞg;whereC*ðaÞ5fx:dðxÞ≥c g;isbothconsis- a 1 1 a tent and uniformly most powerful ðsee Lehmann 1986Þ. In this sense, the largenproblemhasnothingtodowiththerestrictiontoapointnullhypoth- esis;it is simply a feature ofa consistent Neyman-Pearson test.Thenature ofthenull ðsimpleorcompositeÞ only matters for thebehaviorof the like- lihoodist ratio, the Bayes factor, and the posterior probability of H . Sec- 0 ond,thefactthatthediscrepancybetweenthep-valueandtheBayesfactor is smaller for a composite null is not particularly interesting because both measuresaremisleadingindifferentways. Third, the comparison of the posterior pðvjx Þ;asvvariesover½0;1(cid:2) 0 ðwhichrepresentone’srevisedbeliefsaboutvinlightofx Þ,witherrorprob- 0 abilitiesðwhichmeasurehowoftenafrequentist procedureerrsasxvaries overf0;1gnÞ,isdubious.Indeed,asarguedbyCasellaandBergerð1987,110Þ, the comparison is made possible by using a blatant misinterpretation of the p-value:“Thephrase‘theprobabilitythatH istrue’hasnomeaningwithin 0 frequency theory, but it has been argued that practitioners sometimes at- tachsuchameaningtothe pvalue.Sincethepvalue,in thecasesconsid- ered,isanupperboundontheinfimum ofPrðH jxÞit lies within or at the 0 boundary of a range of Bayesian measures of evidence demonstrating the extenttowhichtheBayesianterminologycanbeattached.” 4. TheBayesianApproach. ConsiderapplyingtheBayesfactorprocedure tothehypothesesð3Þusingauniformprior: v∼Uð0;1Þ; thatis;pðvÞ51 for allv∈½0;1(cid:2): ð11Þ ThisgivesrisetotheBayesfactor: BFðx ;v Þ5 ELðv0;x0Þ 0 0 1 Lðv;x Þdv 0 0(cid:5) (cid:6) 527;135 ð:2Þ106;298ð12:2Þ527;1352106;298 5 E (cid:5)1(cid:5)06;298 (cid:6) (cid:6) ð12Þ 1 527;135 v106;298ð12vÞ527;1352106;298 dv 106;298 0 :000015394 5 58:115: :000001897 This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions AFRAIDOFTHEJEFFREYS-LINDLEYPARADOX? 81 ItisinterestingtonotethatthesameBayesfactorð12Þarisesinthecaseof thespikedpriorð9Þwith p 5:5;wherev5v isgivenpriorprobabilityof 0 0 :5andtheotherhalfisdistributedequallyamongtheremainingvaluesof v. This is because for p 5:5,theratioðp =ð12p ÞÞ51andwillcancelout 0 0 0 fromBFðx ;vÞ. 0 ThenextstepinBayesianinferenceistouseð12Þasthebasisforfashion- inganevidentialaccount.ABayesfactorresultBFðx ;v Þ>k;for k≥3:2; 0 0 indicates that data x favorthenullwiththe“strengthofevidence”increas- 0 ing with k. In particular, for 3:2 ≤ k <10,theevidenceissubstantial,for 10 ≤ k<100theevidenceisstrong,andfork≥100isdecisiveðseeRobert 2007Þ. Comparingtheresultinð12Þwiththep-valueinð8Þ,Stoneð1997,263Þ pointed out: “The theoretician is pleased when the ½likelihood–Bayes- minded(cid:2) statistician reports a Bayesfactor of 8 to1 in favour of his brain- child,butthepleasureisalloyedwhenheuseshisown P-valuecookbook to reveal the 3.00 standard deviation excess of type A outcomes.” Acloserscrutinyof theevidentialinterpretationof theresultinð12Þsug- geststhatitisnotasclear-cutasitappears.Thisisbecause,onthebasisof thesamedatax ,theBayesfactorBFðx ;v Þ“favors”notonlyv 5:2but 0 0 0 0 eachindividualvaluev insideacertainintervalaroundv 5:2: 1 0 V :5½:199648; :203662(cid:2)⊂V :5V2f:2g; ð13Þ BF 1 wherethesquarebracket indicatesinclusionoftheendpoint,in thesense that,foreachv ∈V ;BFðx ;v Þ>1,thatis, 1 BF 0 1 E 1 Lðv ;x Þ> Lðv;x Þdv; for allv ∈V : ð14Þ 1 0 0 1 BF 0 Worse,certainvaluesvzinV arefavoredbyBFðx ;vzÞmorestronglythan BF 0 v 5:2: 0 vz∈V :5ð:2;:20331(cid:2)⊂V ; ð15Þ LR BF where the left parenthesis indicates exclusion of the end point. It is im- portant toemphasizethatthesubsetsV ⊂V ⊂Vexistforeverydatax ; LR BF 0 andonecan locate them by trial and error. However, there is a much more efficient way to do that. As shown in section 5, V canbedefinedasa subset of V around the maximum likelihood estimaLtRe ðMLEÞbv ðx Þ5 MLE 0 :20165233:Thisisnotcoincidentalbecause,asMayoð1996,200Þpointed out,v♦5 ^v ðx Þisalwaysthemaximallylikelyalternative,irrespectiveof MLE 0 This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions 82 ARISSPANOS thenullorothersubstantivevaluesofinterest.Inthisexample,theBayesfactor forH∶v5v♦ versusH∶v ≠ v♦ yields 0 1 (cid:5) (cid:6) 527;135 ð:20165233Þ106;298ð12:20165233Þ527;1352106;298 106;298 BFðx ;v♦Þ5 E (cid:5)(cid:5) (cid:6) (cid:6) 0 1 527;135 v106;298ð12vÞ527;1352106;298 dv 106;298 0 :0013694656 5 5721:911; ð16Þ :000001897 indicating not only decisive evidence for v5v♦ butalsothat v♦ isfavoredbyBFðx ;v♦Þmorethan89 0 721:911 ≃ timesstronger thanv 5:2! 8:115 0 This result is an instance of the fallacy of acceptance in the sense that the Bayes factor BFðx ;v Þ>8is misinterpreted as providing evidenceforH: 0 0 0 v 5:2 against any value of v in V :5V2f:2g; when in fact BFðx ;vzÞ 0 1 0 providesstrongerevidenceforcertainvaluesofvz inV ⊂V . LR 1 Whatisthesourceoftheseconflictingevidence?Stoneð1997,263Þcon- jecturedthefollowingexplanation:“Asubhypothesisstronglyrejectedbya significancetestmaybestronglysupportedinposteriorprobabilityifthe prior puts insufficient weight on the hypotheses of non-negligible likeli- hood.” Letusfleshoutthisconjecture.First,chooseV ⊂V asarangeofvalues BF 1 of v thatBFðx ;vyÞfavorsinthesensegiveninð14Þ.Tothisrangeof values 0 the prior attributes insufficient weight since E :203662 dv5:004: :199648 Second,onecanrelateV tobothanequal-tailBayesianð12aÞ5:9997 BF credibleintervalaswellasthefrequentistð12aÞconfidenceinterval: ½^v ðXÞ2ð3:6267ÞSDð^v ðXÞÞ; ^vðXÞ 1ð3:6267ÞSDð^v ðXÞÞ(cid:2); MLE MLE MLE MLE whereSDð^v ðXÞÞdenotesthestandarddeviationof ^v ðXÞ:Inlightof MLE MLE this,onecanconsiderV to represent a range of values of v with nonneg- BF This content downloaded from 128.173.127.127 on Fri, 24 Jan 2014 13:52:47 PM All use subject to JSTOR Terms and Conditions
Description: