Knowledge(cid:2) Probability(cid:2) and Adversaries Joseph Y(cid:2) Halpern Mark R(cid:2) Tuttle IBM Almaden Research Center Digital Equipment Corporation San Jose(cid:2) CA (cid:3)(cid:4)(cid:5)(cid:6)(cid:7) Cambridge Research Lab halpern(cid:8)ibm(cid:9)com Cambridge(cid:2) MA (cid:7)(cid:6)(cid:5)(cid:10)(cid:3) tuttle(cid:8)crl(cid:9)dec(cid:9)com July (cid:3)(cid:4) (cid:3)(cid:5)(cid:5)(cid:6) Abstract(cid:2) What should it mean for an agent to know or believe an assertion is true with probability (cid:0)(cid:3)(cid:3)(cid:11) Di(cid:12)erent papers (cid:13)FH(cid:14)(cid:14)(cid:2) FZ(cid:14)(cid:14)a(cid:2) HMT(cid:14)(cid:14)(cid:15) give di(cid:12)erent answers(cid:2) choosing to use quite di(cid:12)erent probability spaces when computing the probability that an agent assigns to anevent(cid:9) Weshowthateachchoice can be understoodin termsofabetting game(cid:9) This betting game itself can be understood in terms of three types of adversaries in(cid:16)uencing three di(cid:12)erent aspects of the game(cid:9) The (cid:17)rst selects the outcome ofall nondeterministic choices in the system(cid:18) thesecond representstheknowledge oftheagent(cid:19)sopponent in thebetting game(cid:20)thisis thekey place the papersmentioned abovedi(cid:12)er(cid:21)(cid:18) the third is needed in asynchronoussystemstochoose the time the bet is placed(cid:9) We illustrate the need for considering all three types of adversaries with a number of examples(cid:9) Given a class of adversaries(cid:2) we show how to assign probability spaces to agents in a way most appropriate for that class(cid:2) where (cid:22)most appropriate(cid:23) is made precise in terms of this betting game(cid:9) We conclude by showing how di(cid:12)erent assignments of probability spaces (cid:20)corresponding to di(cid:12)erent opponents(cid:21) yield di(cid:12)erent levels of guarantees in probabilistic coordinated attack(cid:9) This paper appeared in Journal of the ACM(cid:2) (cid:24)(cid:7)(cid:20)(cid:24)(cid:21)(cid:25)(cid:3)(cid:5)(cid:26)(cid:27)(cid:3)(cid:28)(cid:6)(cid:2)September (cid:5)(cid:3)(cid:3)(cid:10)(cid:9) (cid:0) Introduction In nearly every (cid:17)eld of research concerned with systems of interacting agents(cid:29)be it distributed computing(cid:2) arti(cid:17)cial intelligence(cid:2) or economics(cid:29)people have found it useful to think about these systems in terms of knowledge(cid:9) In game theory(cid:2) for example(cid:2) a player(cid:19)s strategy typically takes into account the knowledge the player acquires about the other players(cid:19) strategies(cid:9) In all of these (cid:17)elds(cid:2) an important subclass of interactions involve probability(cid:9) For example(cid:2) a player in a game might toss a coin in order to determine its next move(cid:9) In such contexts(cid:2) it is natural to (cid:17)nd oneself reasoning(cid:29)at least informally(cid:29)about knowledge and probability and their interaction(cid:9) This sort of reasoning is quite common in computer science(cid:2) such as when reasoning about probabilistic primality(cid:30)testing algorithms(cid:9) Such an algorithm might guarantee that if the input n is a composite number(cid:2) then with high probability the algorithm will (cid:17)nd a (cid:22)witness(cid:23) that can be used to verify that n is composite(cid:9) Loosely speaking(cid:2) we reason(cid:2) if an agent runs this algorithm on input n and the algorithm fails to (cid:17)nd such a witness(cid:2) then the agent knows that n is almost certainly prime(cid:2) since the agent is guaranteed that the algorithm would almost certainly have found a witness had n been composite(cid:9) Anumberofrecentpapershavetriedtoformalizethissortofreasoningaboutknowledgeand probability(cid:9) Faginand Halpern (cid:13)FH(cid:14)(cid:14)(cid:15)presentanabstractmodel forknowledge andprobability in which theyassigntoeachagent(cid:30)statepairaprobability space tobe used whencomputing the (cid:0) probability(cid:2) according tothatagentatthatstate(cid:2)thataformula (cid:2)is true(cid:9) In their framework(cid:2) the problem of modeling knowledge and probability reduces to choosing this assignment of probability spaces(cid:9) Although they show that more than one choice may be reasonable(cid:2) they do nottellushowtomakethischoice(cid:9) Oneparticular(cid:20)andquitenatural(cid:21)choice ismadein(cid:13)FZ(cid:14)(cid:14)a(cid:15) (cid:2) and some arguments are presented for its appropriateness(cid:18) another is made in (cid:13)HMT(cid:14)(cid:14)(cid:15) and used toanalyze interactive proofsystems(cid:9) Itis not initially clear(cid:2) however(cid:2)which choice is most appropriate(cid:9) In this paper(cid:2) we clarify the issues involved in choosing the right assignment of probability spaces(cid:9) Wearguethatnosingle assignmentis appropriatein all contexts(cid:25) therightwaytothink about these assignments is in terms of strategies for a betting game(cid:2) and di(cid:12)erent assignments can be viewed as most appropriate in the contexts of betting against di(cid:12)erent opponents in this game(cid:9) Thinking in terms of probability(cid:2) the truth of a statement such as (cid:22)event E will occur with probability (cid:3)(cid:23) depends on the assignment of probability spaces(cid:9) Thinking in terms of games(cid:2) if the opponent can in(cid:16)uence the occurrence ofan event E in any way(cid:2)then the truth of a statement such as (cid:22)event E will occur with probability (cid:3)(cid:23) depends on the extent to which the opponent can in(cid:16)uence E(cid:2) and the success of any strategy depending on the occurrence of E depends on the power of the opponent(cid:9) We establish a correspondence between assignments of probability spaces and powers of opponents(cid:2) and hence establish a correspondence between assignments and winning strategies in this betting game against these opponents(cid:9) We (cid:17)nd(cid:2) however(cid:2) there is more to the betting game than just the opponent you are betting against(cid:9) Roughly speaking(cid:2) the setting in which the game is played is also of great importance(cid:9) We identify three aspects of the gameand its environment that capture all thatis relevant(cid:2) and (cid:0) Related results about their model appear in (cid:2)FH(cid:3)(cid:4)(cid:5) FHM(cid:3)(cid:6)(cid:7)(cid:8) (cid:2) In (cid:2)FZ(cid:9)(cid:10)(cid:5) FZ(cid:9)(cid:9)b(cid:7)(cid:5) other de(cid:11)nitions of knowledge involving probability are proposed(cid:5) in order to analyze an interactive proof of quadratic residuosity(cid:5) but these de(cid:11)nitions are primarily concerned with accounting for the limited computational power of agents in the system(cid:8) (cid:5) model them in terms of three types of adversaries(cid:2) each playing a fundamentally di(cid:12)erent role(cid:9) We brie(cid:16)y describe these adversaries and their roles here(cid:2) and explore them in greater depth in the rest of the paper(cid:9) When we analyze probabilistic protocols(cid:2) we do so in terms of probability distributions on the runs or executions of the protocol(cid:9) When we say that a protocol is correct with probability (cid:0)(cid:3)(cid:3)(cid:2) we are trying to say that the protocol will do the right thing in (cid:0)(cid:3)(cid:3) percent of the runs(cid:9) In fact(cid:2) a statement like (cid:22)(cid:0)(cid:3)(cid:3) percent of the runs(cid:23) does not usually make sense(cid:2) since we usually have probability distributions on subsets of the runs(cid:2) but not on the entire set of runs(cid:9) For example(cid:2) consider any probabilistic primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:2) SS(cid:26)(cid:26)(cid:15)(cid:9) For each (cid:17)xed input (cid:20)the number to be tested(cid:21)(cid:2) the coins tossed during the algorithm induce a probability space on the set of runs of the algorithm with that input(cid:2) but we do not have a distribution on the set of all runs because we are not willing to assume a distribution on the inputs(cid:9) When we say that the algorithm works with probability (cid:9)(cid:3)(cid:3)(cid:2) we really mean that for every choice of input the algorithm is correct in (cid:9)(cid:3)(cid:3) of the runs with that input(cid:9) The choice of input is a nonprobabilistic choice(cid:2) and the coin tosses are probabilistic choices(cid:9) The role of the (cid:17)rst type of adversary in our framework is to distinguish between these two types of choices(cid:25) this adversary factors out all nonprobabilistic choices in the system(cid:2) so that for each adversary the remaining probabilistic choices induce a natural probability distribution on the setof runs with that adversary(cid:9) The probability on the runs can be viewed as giving us an a priori probability of an event(cid:2) before the protocol is run(cid:9) However(cid:2) the probability an agent places on runs will in general change over time(cid:2) as a function of information received by the agent in the course of the execution of the protocol(cid:9) New subtleties arise in analyzing this probability(cid:9) Consider a situation with three agents p(cid:0)(cid:2) p(cid:2)(cid:2) and p(cid:3)(cid:9) Agent p(cid:3) tosses a fair coin at time (cid:7) and observes the outcome at time (cid:5)(cid:2) but agents p(cid:0) and p(cid:2) never learn the outcome(cid:9) What is the probability according to p(cid:0) that the coin lands heads(cid:11) Clearly at time (cid:7)(cid:2) before the coin is tossed(cid:2) it should be (cid:5)(cid:4)(cid:6)(cid:9) What about at time (cid:5)(cid:11) There is one argument that says the answer should be (cid:5)(cid:4)(cid:6)(cid:9) After all(cid:2) agent p(cid:0) does not learn any more about the coin as a result of its having been tossed(cid:2) so why should its probability change(cid:11) Another argument says that after the coin has been tossed(cid:2) it does not make sense to say that the probability of heads is (cid:5)(cid:4)(cid:6)(cid:9) The coin has either landed heads or it hasn(cid:19)t(cid:2) so the probability of the coin landing heads is either (cid:7) or (cid:5) (cid:20)although agent p(cid:0) does not know which(cid:21)(cid:9) This point of view appears in a number of papers in the philosophical literature (cid:20)for example(cid:2) (cid:13)Fra(cid:14)(cid:7)(cid:2) Lew(cid:14)(cid:7)(cid:15)(cid:21)(cid:9) Interestingly(cid:2) the same issue arises in quantum mechanics(cid:2) in Schro(cid:31)dinger(cid:19)s famous cat(cid:30)in(cid:30)the(cid:30)box thought experiment (cid:20)see (cid:13)Pag(cid:14)(cid:6)(cid:15) for a discussion(cid:21)(cid:9) We claim that these twochoices of probability are best explained in terms of betting games (cid:20)assuming honest players(cid:21)(cid:9) At time (cid:7)(cid:2) agent p(cid:0) should certainly be willing to accept an o(cid:12)er from either p(cid:2) or p(cid:3) to bet (cid:5) for a payo(cid:12) of (cid:6) if the coin lands heads (cid:20)assuming p(cid:0) is risk (cid:3) neutral (cid:21)(cid:9) Half the time the coin will land heads and p(cid:0) will be (cid:5) ahead(cid:2) and half the time the coin will land tails and p(cid:0) will lose (cid:5)(cid:2) but on average p(cid:0) will come out even(cid:9) On the other hand(cid:2) p(cid:0) is clearly not willing to accept such an o(cid:12)erfromp(cid:3) attime (cid:5) (cid:20)since p(cid:3) can tell at time (cid:5) whether it is going to win the bet(cid:2) and since p(cid:3) is presumably willing to o(cid:12)er the bet only (cid:3) Informally(cid:5) an agent is said to be risk neutral if it is willing to accept all bets where its expected winnings are nonnegative(cid:8) (cid:6) when it will win(cid:21)(cid:2) although p(cid:0) is still willing to accept this bet from p(cid:2)(cid:9) The point here is that if you do not want to lose money in a betting game(cid:2) not only is your knowledge important(cid:2) but also the knowledge of the opponent o(cid:12)ering the bet(cid:9) Betting games are not played in isolation! Thus(cid:2) the role played by the second type of adversary in our framework is to model the knowledge of theopponent o(cid:12)ering abet toanagentata given point in the run(cid:9) We sometimes identify the opponent with an adversary of this type(cid:9) One obvious choice of opponent is to assumeyouareplaying againstsomeonewhoseknowledge is identical toyourown(cid:9) This is what decision theorists implicitly do when talking about an agent(cid:19)s posterior probabilities (cid:13)BG(cid:4)(cid:24)(cid:15)(cid:18) it is also how we can understand the choice of probability space made in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9) By way of contrast(cid:2) the choice in (cid:13)HMT(cid:14)(cid:14)(cid:15) corresponds to playing someone who has complete knowledge about the past and knows the outcome of the coin toss(cid:18) this corresponds to the viewpoint that says that when the coin has landed(cid:2) the probability of heads is either (cid:7) or (cid:5) (cid:20)although you may not know which(cid:21)(cid:9) A further complication arises when analyzing asynchronous systems(cid:9) In this case(cid:2) there is a precise sense in which an agent does not even know exactly when the event to which it would like to assign a probability is being tested (cid:20)for example(cid:2) when a bet is being placed(cid:21)(cid:9) Thus we need to consider a third type of adversary in asynchronous systems(cid:2) whose role is to choose this time(cid:9) To illustrate the need for this third type of adversary(cid:2) we give an example of an asynchronous system where there are a number of plausible answers to the question (cid:22)What is the probability the mostrecent coin toss landed heads(cid:11)(cid:23) It turns outthat the di(cid:12)erent answers correspond todi(cid:12)erent adversaries choosing the times toperform the testin di(cid:12)erent ways(cid:9) We remarkthatthecase ofasynchronoussystemsis also considered in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9) We canunderstand the assignment of (cid:22)con(cid:17)dence(cid:23) made there as corresponding to playing against a certain class of adversaries of this third type(cid:9) As the preceding examples suggest(cid:2) each of the earlier de(cid:17)nitions of probabilistic knowledge that appear in the literature can be understood as the (cid:22)best(cid:23) de(cid:17)nition for a particular choice of adversaries(cid:9) On the other hand(cid:2) we show that every choice of adversaries gives rise to a de(cid:17)nition of probabilistic knowledge that is (cid:22)best(cid:23) for that choice(cid:9) To make this precise(cid:2) we formalize our intuition that the probability an agent assigns to an event is related to the payo(cid:12) the agent is willing to accept in a betting game with an opponent(cid:9) We de(cid:17)ne a particular betting game(cid:2) and show that once we (cid:17)x the choice of adversaries(cid:2) there is a de(cid:17)nition of probabilistic knowledge that is best in terms of doing as well as possible against an opponent whose knowledge is modeled by the second adversary(cid:9) We show how this de(cid:17)nition corresponds to a strategy that enables the agent to break even (cid:20)at least(cid:21) in the game(cid:2) and how any other de(cid:17)nition with this property corresponds to an overly conservative strategy that assumes the opponent is more powerful than it really is(cid:9) These results form the technical core of our paper(cid:9) The rest of the paper is organized as follows(cid:9) In the next section(cid:2) Section (cid:6)(cid:2) we provide a formal model of a system of agents such as a distributed system(cid:9) In Section (cid:10) we consider the problem of putting a probability on the runs of a system(cid:18) this is where we need the (cid:17)rst type of adversary(cid:2) to factor out the nondeterministic choices(cid:9) In Section (cid:24) we start to consider the issue of how probability should change over time(cid:9) In Section (cid:4) we consider the choices that must be made in a general de(cid:17)nition of probabilistic knowledge(cid:9) In Section (cid:28) we consider particularchoicesofprobabilityassignmentsthatseemreasonableinsynchronoussystems(cid:9) Here we consider the second type of adversary(cid:2) representing the knowledge of the opponent in the (cid:10) betting game(cid:9) In Section (cid:26)(cid:2) we consider asynchronous systems(cid:2) where we also have to consider the third type of adversary(cid:9) In Section (cid:14) we apply our ideas to analyzing the coordinated attack problem(cid:2) showing how di(cid:12)erent notions of probability correspond to di(cid:12)erent levels of guaranteesin coordinated attack(cid:9) The paper ends with twoappendices(cid:9) In Appendix Awe give the proofs of the results claimed in the paper(cid:2) and in Appendix B we discuss some interesting secondary observations related to the rest of the paper(cid:9) (cid:2) Modeling systems As the examples in the introduction show(cid:2) the analysis of a probabilistic system typically depends on the choice of a probability distribution on the runs or executions of the system(cid:9) In this section(cid:2) we(cid:17)x a model of computation that de(cid:17)nes these runs(cid:2) and in later sections we will vary the probability distributions associated with these runs(cid:9) Our model is actually the model given in (cid:13)HF(cid:14)(cid:3)(cid:15)(cid:2) which is itself a simpli(cid:17)cation of the model given in (cid:13)HM(cid:3)(cid:7)(cid:15)(cid:9) Both models are heavily in(cid:16)uenced by models for distributed computation(cid:9) Consideranarbitrarysystemofninteractingagentsp(cid:0)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)pn(cid:9) Intuitively(cid:2) arunofasystem is a complete description of one of the possible interactions of the agents(cid:9) Such an interaction is uniquely determined by the sequence of global states through which the system passes as a result of the interaction(cid:9) A global state is modeled as a tuple consisting of each process(cid:19) local state(cid:2) together with the state of the environment where(cid:2) loosely speaking(cid:2) the environment is intended tocapture everything relevant tothe stateof the system thatcannot be deduced from the agents(cid:19)local states(cid:9) Formally(cid:2) aglobal stateis an(cid:20)n"(cid:5)(cid:21)(cid:30)tuple (cid:20)se(cid:5)s(cid:0)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)sn(cid:21) oflocal states(cid:2) where si is the local state of agent pi(cid:2) and se is the state of the environment(cid:9) A run of the system is mapping r from times to global states(cid:9) We assume for sake of convenience that times are natural numbers(cid:9) A system is a set R of runs(cid:2) intuitively the set of all possible interactions of the system agents(cid:9) We denote the global state at time k in run r by r(cid:20)k(cid:21)(cid:2) the local state of pi in r(cid:20)k(cid:21) by ri(cid:20)k(cid:21)(cid:2) and the state of the environment by re(cid:20)k(cid:21)(cid:9) We refer to the ordered pair (cid:20)r(cid:5)k(cid:21)consisting of a run r and a time k as a point(cid:9) Later in the paper(cid:2) we will assume that the entire history of a run r up to time k is encoded in the environment(cid:19)s state in r(cid:20)k(cid:21)(cid:9) This allows us to think of the runs of a system in terms of a computation tree(cid:25) nodes of the tree are global states (cid:20)and correspond to a set of points(cid:21)(cid:2) and the paths in the tree are the runs of the system(cid:9) (cid:0) (cid:0) We say that a run r extends a point (cid:20)r(cid:5)k(cid:21) if r and r pass through the same global states up (cid:0) (cid:0) (cid:0) (cid:0) to time k(cid:18) that is(cid:2) r(cid:20)k(cid:21)# r (cid:20)k(cid:21) for (cid:7) (cid:2) k (cid:2) k(cid:9) A factis considered to be true orfalse ofa point(cid:9) We identify a fact (cid:2)with the set of points (cid:4) at which (cid:2) is true(cid:2) and write (cid:20)r(cid:5)k(cid:21)j# (cid:2) i(cid:12) (cid:2) is true at (cid:20)r(cid:5)k(cid:21)(cid:9) In a system R(cid:2) a fact (cid:2) is said to be a fact about the run if(cid:2) given two points of the same run(cid:2) (cid:2) is either true at both points or false at both points(cid:9) Similarly(cid:2) a fact (cid:2) is said to be a fact about the global state if(cid:2) given two points with the same global state(cid:2) (cid:2) is true at both points or false at both points(cid:9) We now de(cid:17)ne what it means for an agent to know a fact (cid:2) at a point (cid:20)r(cid:5)k(cid:21) of a system R(cid:9) Intuitively(cid:2) ri(cid:20)k(cid:21) captures all of agent pi(cid:19)s information at (cid:20)r(cid:5)k(cid:21)(cid:9) We say pi considers a point (cid:0) (cid:0) (cid:0) (cid:0) (cid:20)r(cid:5)k(cid:21) possibleat (cid:20)r(cid:5)k(cid:21)(cid:2)and write (cid:20)r(cid:5)k(cid:21)(cid:3)i (cid:20)r(cid:5)k(cid:21)(cid:2) if pi has the same local state atboth points(cid:18) (cid:4) In Section (cid:12) wede(cid:11)ne a logical language for describing such facts(cid:8) Formally(cid:5) a factis the interpretation ofa formula in such a language(cid:8) See (cid:2)HM(cid:3)(cid:6)(cid:7)for a complete formal treatment of the syntax and semantics of such a language(cid:8) (cid:24) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) that is(cid:2) if ri(cid:20)k(cid:21)# ri(cid:20)k(cid:21)(cid:9) We use Ki(cid:20)r(cid:5)k(cid:21) to denote f(cid:20)r(cid:5)k(cid:21) j(cid:20)r(cid:5)k(cid:21)(cid:3)i (cid:20)r(cid:5)k(cid:21)g(cid:2) the set of points agent pi considers possible at (cid:20)r(cid:5)k(cid:21)(cid:9) Following (cid:13)HM(cid:3)(cid:7)(cid:15) (cid:20)and many other papers since then(cid:21)(cid:2) we say pi knows (cid:2) at (cid:20)r(cid:5)k(cid:21) if (cid:2) is true at all points pi considers possible at (cid:20)r(cid:5)k(cid:21)(cid:9) This means pi knows (cid:2) at (cid:20)r(cid:5)k(cid:21) if (cid:2) is guaranteed to hold given the information recorded in pi(cid:19)s local state at (cid:20)r(cid:5)k(cid:21)(cid:9) More formally(cid:2) we denote the fact that pi knows (cid:2) at (cid:20)r(cid:5)k(cid:21) by (cid:20)r(cid:5)k(cid:21) j# Ki(cid:2)(cid:2) and (cid:0) (cid:0) (cid:0) (cid:0) de(cid:17)ne (cid:20)r(cid:5)k(cid:21)j# Ki(cid:2) i(cid:12) (cid:20)r(cid:5)k(cid:21) j# (cid:2) for all (cid:20)r(cid:5)k(cid:21) (cid:4) Ki(cid:20)r(cid:5)k(cid:21)(cid:9) While this de(cid:17)nition of knowledge depends heavily on the system R (cid:20)it restricts the set of points an agent considers possible at a given point(cid:21)(cid:2) the system will always be clear from context and we omit explicit reference to R in our notation(cid:9) (cid:3) Probability on runs In order to discuss the probability of events in a distributed system(cid:2) we must specify a proba(cid:30) bility space(cid:9) In this section we showthat in order to place a reasonable probability distribution on the runs of a system(cid:2) it is necessary to postulate the existence of the (cid:17)rst type of adversary sketched in the introduction(cid:9) Considerthesimple systemconsisting ofasingle agentthattossesafair coinonce andhalts(cid:9) This system consists of two runs(cid:2) one in which the coin comes up heads and one in which the coin comes up tails(cid:9) The coin toss induces a very natural distribution on the two runs(cid:25) each is assigned probability (cid:5)(cid:4)(cid:6)(cid:9) Nowconsiderthesystem(cid:20)suggestedbyMosheVardi(cid:18) avariantappearsin(cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:21)consisting of two agents(cid:2) p(cid:0) and p(cid:2)(cid:2) where p(cid:0) has an input bit and two coins(cid:2) one fair coin landing heads with probability (cid:5)(cid:4)(cid:6) and one biased coin landing heads with probability (cid:6)(cid:4)(cid:10)(cid:9) If the input bit is (cid:7)(cid:2) p(cid:0) tosses the fair coin once and halts(cid:9) If the input bit is (cid:5)(cid:2) p(cid:0) tosses the biased coin and halts(cid:9) This system consists of four runs of the form hb(cid:5)ci(cid:2) where b is the value of the input bit and c is the outcome of the coin toss(cid:9) What is the appropriate probability distribution on the runs of this system(cid:11) For example(cid:2) what is the probability of heads(cid:11) Clearly the conditional probability of heads given that the input bit is (cid:7) should be (cid:5)(cid:4)(cid:6)(cid:2) while the conditional probability of heads given the input bit is (cid:5) should be (cid:6)(cid:4)(cid:10)(cid:9) But what is the unconditional probability of heads(cid:11) If we are given a distribution on the inputs(cid:2) then it is easy toanswer this question(cid:9) If we assume(cid:2) forexample(cid:2) that(cid:7) and (cid:5) areequally likely as input (cid:0) (cid:0) (cid:0) (cid:2) (cid:5) values(cid:2) then we can compute that the probability of heads is (cid:2) (cid:5) (cid:2) " (cid:2) (cid:5) (cid:3) # (cid:0)(cid:2)(cid:9) If we are not given a distribution on the inputs(cid:2) then the question has no obvious answer(cid:9) It is tempting to assume(cid:2) therefore(cid:2) that such a distribution exists(cid:9) Often(cid:2) however(cid:2) assuming a particular (cid:17)xed distribution on inputs leads to results about a system that are simply too weak to be of any use(cid:9) Knowing an algorithm produces the correct answer in (cid:9)(cid:3)(cid:3) of its runs when all inputs are equally likely is of no use when the algorithm is used in the context of a di(cid:12)erent distribution on the inputs(cid:9) To overcome this problem(cid:2) one might be willing to assume the existence of some (cid:17)xed but unknown distribution on the inputs(cid:9) Proving that an algorithm produces the correct answer in (cid:9)(cid:3)(cid:3) of the runs in the context of an unknown distribution(cid:2) however(cid:2) is no easier than proving that for each (cid:17)xed input the algorithm is correct in (cid:9)(cid:3)(cid:3) of the runs(cid:2) since it is always possible for the unknown distribution to place all the probability on the input for which the algorithm performs particularly poorly(cid:9) Here the advantage of viewing the system as a single probability (cid:4) space is lost(cid:2) since this is precisely the proof technique one would use when no distribution is assumed in the (cid:17)rst place(cid:9) Moreover(cid:2) assuming the existence of some unknown distribution on the inputs simply moves all problems arising from nondeterminism up one level(cid:9) Although we have a distribution on the space of input values(cid:2) we have no distribution on the space of probability distributions(cid:9) This discussion leads us to conclude that some choices in a distributed system must be viewed as inherently nondeterministic (cid:20)or(cid:2) perhaps better(cid:2) nonprobabilistic(cid:21)(cid:2) and that it is inappropriate(cid:2) both philosophically and pragmatically(cid:2) tomodel probabilistically whatis inher(cid:30) ently nondeterministic(cid:9) But then how can we reason probabilistically about a system involving bothnondeterministic andprobabilistic choices(cid:11) Oursolution(cid:29)which isessentially aformaliza(cid:30) tion of the standard approach taken in the literature(cid:29)is to factor out initial nondeterministic events(cid:2) and view the system as a collection of subsystems(cid:2) each with its natural probability distribution(cid:9) In the coin tossing example above(cid:2) we would consider two probability spaces(cid:2) one corresponding to the input bit being (cid:7) and the other corresponding to the input bit being (cid:5)(cid:9) (cid:6) The probability of heads is (cid:5)(cid:4)(cid:6) in the (cid:17)rst space and (cid:6)(cid:4)(cid:10) in the second(cid:9) We want to stress that although this example may seem arti(cid:17)cial(cid:2) analogous examples frequently arise in the literature(cid:9) In a probabilistic primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:2) SS(cid:26)(cid:26)(cid:15)(cid:2) for example(cid:2) we do not want to assume a probability distribution on the inputs(cid:9) We want to know that for each choice of input(cid:2) the algorithm gives the right answer with high probability(cid:9) Rabin(cid:19)s primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:15) is based on the existence of a polynomial(cid:30)time computable predicate Pn(cid:20)a(cid:21) with the following properties(cid:25) (cid:20)(cid:5)(cid:21) if n is composite(cid:2) then at least (cid:10)(cid:4)(cid:24)ofthe a(cid:4) f(cid:5)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)n(cid:6)(cid:5)gcause Pn(cid:20)a(cid:21)tobe true(cid:2) and (cid:20)(cid:6)(cid:21)if n is prime(cid:2) then nosuch a causes n to be true(cid:9) Rabin(cid:19)s algorithm generates a polynomial number of a(cid:19)s at random(cid:9) If Pn(cid:20)a(cid:21) is true for any of the a(cid:19)s generated(cid:2) then the algorithm outputs (cid:22)composite(cid:23)(cid:18) otherwise it outputs (cid:22)prime(cid:23)(cid:9) Property(cid:20)(cid:6)(cid:21)guaranteesthatifthealgorithmoutputs(cid:22)composite(cid:23)(cid:2)thennisde(cid:17)nitely composite(cid:9) If the algorithm outputs (cid:22)prime(cid:23)(cid:2) then there is a chance that n is not prime(cid:2) but property(cid:20)(cid:5)(cid:21)guaranteesthatthisisveryrarelythecase(cid:25) ifnis indeed composite(cid:2)thenwithhigh probability the algorithm outputs (cid:22)composite(cid:23)(cid:9) If the algorithm outputs (cid:22)prime(cid:23)(cid:2) therefore(cid:2) it might seem natural to say that n is prime with high probability(cid:18) but(cid:2) of course(cid:2) this is not (cid:5)Often(cid:5) even in the presence of nondeterminism(cid:5) we can impose a meaningful distribution on the runs of a system without factoring the system into subsystems(cid:5) but the resulting distribution still may not capture all of our intuition(cid:8) The problem in the preceding example is that probabilistic events (cid:13)the coin toss(cid:14) depend on nonprobabilistic events (cid:13)the input bit(cid:14)(cid:8) Suppose(cid:5) however(cid:5) the agent tosses a fair coin regardless of the input bit(cid:15)s value(cid:8) Now it is natural to assign probability (cid:4)(cid:0)(cid:16) to each of the events f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)h(cid:14)g and f(cid:13)(cid:4)(cid:2)t(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g that the coin lands head and tails(cid:5) respectively(cid:8) Consider(cid:5) however(cid:5)the situation (cid:13)discussed in (cid:2)FH(cid:9)(cid:9)(cid:5)HMT(cid:9)(cid:9)(cid:7)(cid:14) where an agent performs a given action a i(cid:17) the input bit is (cid:4) and the coin landed heads(cid:5) or the input bit is (cid:6) and the coin landed tails(cid:8) Itis natural to argue that theprobability the agent performs the action a is also (cid:4)(cid:0)(cid:16)(cid:18) if the input bit is (cid:4) then with probability (cid:4)(cid:0)(cid:16) the coin will land heads and a will be performed(cid:19) and ifthe input bit is (cid:6) then with probability (cid:4)(cid:0)(cid:16) the coin will land tails and a will be performed(cid:8) Unfortunately(cid:5) our (cid:20)natural(cid:21) distribution on the runs of the system does not support this line of reasoning(cid:5) since this distribution does not assign a probability to the set f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g corresponding to the performance of a(cid:8) In fact(cid:5) if we could assign a probability to this set(cid:5) then we would have to consider it a measurable set(cid:8) Using this information(cid:5) we could then prove that the sets f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:4)(cid:2)t(cid:14)g and f(cid:13)(cid:6)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g are also measurable sets(cid:8) This means that we would have to assign a probability to having the input bit set to (cid:6) or (cid:4)(cid:5) but the setting of the input bit was assumed to be nondeterministic(cid:22) Again(cid:5) however(cid:5) if we factor out this initial nondeterminism(cid:5) we can view the system as two subsystems with obvious associated probability distributions(cid:5) and within each subsystem the action a is performedwithprobability (cid:4)(cid:0)(cid:16)(cid:8) Thisisprecisely whatthereasoning underlying ourintuition isimplicitly doing(cid:8) (cid:28) r (cid:0)Q (cid:0) Q (cid:0)(cid:0) (cid:0) Q (cid:0)(cid:0) (cid:0) Q (cid:0) Q r(cid:0)(cid:2)(cid:0) QQsr (cid:3)(cid:5) (cid:3)(cid:5) (cid:0)(cid:0) (cid:3) (cid:5) (cid:0)(cid:0) (cid:0)(cid:2)(cid:0) (cid:3) (cid:5) (cid:0)(cid:2)(cid:0) (cid:0)(cid:0) (cid:3) (cid:5) (cid:3) (cid:5) (cid:3)(cid:4)r(cid:3) (cid:5)(cid:5)Rr r(cid:3)(cid:4)(cid:3) (cid:6)r (cid:5)(cid:5)Rr Figure (cid:5)(cid:25) A (cid:20)labeled(cid:21) computation tree(cid:9) quite right(cid:9) The input n is either prime or it is not(cid:18) it does not make sense to say that it is prime with high probability(cid:9) On the other hand(cid:2) it does make sense to say that the algorithm gives the correct answer with high probability(cid:9) The natural way to make this statement precise is to partition the runs of the algorithm into a collection of subsystems(cid:2) one for each possible input(cid:2) and provethatthe algorithmgivestherightanswerwith high probability in eachofthese subsystems(cid:2) where the probability on the runs in each subsystem is generated by the random choices for a(cid:9) While for a (cid:17)xed composite input n there may be a few runs where the algorithm incorrectly outputs (cid:22)prime(cid:23)(cid:2) in almost all runs it will give the correct output(cid:9) In many contexts of interest(cid:2) the choice of input is not the only source of nondeterminism in the system(cid:9) Later nondeterministic choices may also be made throughout a run(cid:9) In asyn(cid:30) chronous distributed systems(cid:2) for example(cid:2) it is common to view the choice of the next pro(cid:30) cessor to take a step or the next message to be delivered as a nondeterministic choice(cid:9) Similar arguments to those made above can be used to show that we need to factor out these nonde(cid:30) terministic choices in order to use the probabilistic choices (cid:20)coin tosses(cid:21) to place a well(cid:30)de(cid:17)ned probability on the set of runs(cid:9) A common technique for factoring out these nondeterministic choices is toassume the existence of a scheduler deterministically choosing (cid:20)asa function of the history of the system up to that point(cid:21) the next processor to take a step (cid:20)cf(cid:9) (cid:13)Rab(cid:14)(cid:6)(cid:2) Var(cid:14)(cid:4)(cid:15)(cid:21)(cid:9) It is standard practice to (cid:17)x some class of schedulers(cid:2) perhaps the class of (cid:22)fair(cid:23) schedulers or (cid:22)polynomial(cid:30)time(cid:23) schedulers(cid:2) and argue that for every scheduler in this class the system satis(cid:17)es some condition(cid:9) As we now show(cid:2) if we view all nondeterministic choices as under the control of some adversary taken from some class of adversaries(cid:2) then there is a straightforwardway to view the set of runs of a system as a collection of probability spaces(cid:2) one for each adversary(cid:9) By (cid:17)xing an adversarywe factor out the nondeterministic choices and are left with a purely probabilistic system(cid:2) with the obvious distribution on the runs determined by the probabilistic choices made during the runs(cid:9) This is essentially the approach taken in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9) Once we (cid:17)x an adversary A(cid:2) we can view the runs of the system with this adversary as a (cid:20)labeled(cid:21) computation tree TA (cid:20)see Figure (cid:5)(cid:21)(cid:9) As in Section (cid:6)(cid:2) nodes of the tree are global states and paths in the tree are runs(cid:9) Now(cid:2) however(cid:2) edges of the tree are labeled with positive real numbers such that for every node the values labeling the node(cid:19)s outgoing edges sum to (cid:5)(cid:9) Intuitively(cid:2) the value labeling an outgoing edge of node s represents the probability the system makesthe corresponding transition fromnode s(cid:9) Given a (cid:17)nite pathin the tree(cid:2) the probability of the set of runs extending this (cid:17)nite path is simply the product of the probabilities labeling (cid:26) the edges in this (cid:17)nite path(cid:9) It is natural to view this computation tree TA as a probability space(cid:2) a tuple (cid:20)RA(cid:5)XA(cid:5)(cid:6)A(cid:21) where RA is theset ofruns in TA(cid:2)XA consists ofsubsets ofRA thataremeasurable(cid:20)thatis(cid:2) the ones to which a probability can be assigned(cid:18) these are generated by starting with sets of runs with a common (cid:17)nite pre(cid:17)x and closing under countable union and complementation(cid:21)(cid:2) and a probability function (cid:6)A de(cid:17)ned onsetsin XA sothattheprobability oftheset ofall runs witha common pre(cid:17)x is the product of the probabilities labeling the edges of the pre(cid:17)x(cid:9) If we restrict attention to (cid:17)nite runs (cid:20)as is done in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:21)(cid:2) then it is easy to see that each individual run is measurable(cid:2) so that XA consists of all possible subsets of RA(cid:9) Moreover(cid:2) in the case of (cid:17)nite runs(cid:2) the probability of a run is just the product of the transition probabilities along the edges of the run(cid:9) It is occasionally useful to view this computation tree TA as consisting of two components(cid:25) the tree structure (cid:20)that is(cid:2) the unlabeled graph itself(cid:21)(cid:2) and the assignment of transition proba(cid:30) bilities to the edges of the tree(cid:9) Given an unlabeled tree TA(cid:2) we de(cid:17)ne a transition probability assignment for TA to be a mapping (cid:7) assigning transition probabilities to the edges of TA(cid:9) We will use the notation TA at times to refer to the unlabeled tree(cid:2) to the labeled tree(cid:2) and to the induced probability space(cid:18) which is meant should be clear from context(cid:9) Wede(cid:17)neaprobabilisticsystem toconsistofacollection oflabeled computationtrees(cid:20)which we view as separate probability spaces(cid:21)(cid:2) one for each adversary A in some set A(cid:9) We assume that the environment component in each global state in TA encodes the adversary A and the entire past history of the run(cid:9) This technical assumption ensures that di(cid:12)erent nodes in the same computation tree have di(cid:12)erent global states(cid:2) and that we cannot have the same global state in two di(cid:12)erent computation trees(cid:9) Given a point c(cid:2) we denote the computation tree containing c by T(cid:20)c(cid:21)(cid:9) Our technical assumption guarantees that T(cid:20)c(cid:21) is well(cid:30)de(cid:17)ned(cid:9) The choice of the appropriate set A of adversaries against which the system runs is typi(cid:30) cally made by the system designer when specifying correctness conditions for the system(cid:9) An adversary might be limited to choosing the initial input of the agents (cid:20)in which case the set of possible adversaries would correspond to the set of possible inputs(cid:21) as is the case in the context of primality(cid:30)testing algorithms in which an agent receives a single number (cid:20)the number to be tested(cid:21)asinput(cid:9) On theotherhand(cid:2)anadversarymayalsodetermine theorderin which agents are allowed to take steps(cid:2) the order in which messages arrive(cid:2) or the order in which processors fail(cid:9) One might also wish to restrict the computational power of the adversary to polynomial time(cid:9) What is the most appropriate set of powers to assign to the adversary(cid:11) It depends on the application(cid:9) (cid:4) Probability at a point In thepreceding section(cid:2) weshowedhowtousethenotion ofan adversarytoimpose ameaning(cid:30) ful probability distribution on the runs of a system(cid:9) In order to de(cid:17)ne an agent(cid:19)s probabilistic knowledge atagiven point(cid:2) however(cid:2)weseemtorequire aprobability distribution onthe points of the system(cid:2) and not the runs(cid:9) An agent(cid:19)s probability distribution on points must certainly be related to the distribution on runs if it is to be at all meaningful(cid:9) Nevertheless(cid:2) the two distributions maybequite di(cid:12)erent(cid:9) Forexample(cid:2) justasanagent(cid:19)sknowledgevaries overtime(cid:2) we would expect the probability an agent assigns to an event to vary over time(cid:9) In contrast(cid:2) (cid:14) the distribution over runs can be viewed as a static distribution(cid:9) Furthermore(cid:2) depending on which of the two distributions we use(cid:2) we can be led to quite di(cid:12)erent analyses of a protocol(cid:9) The purpose of this section is to make clear the distinction between distributions overruns and distributions over points(cid:9) To begin(cid:2) consider the Coordinated Attack problem (cid:13)Gra(cid:26)(cid:14)(cid:15)(cid:9) Two generals A and B must decide whether toattacka common enemy(cid:2) but werequire that anyattackbe a coordinated at(cid:30) tack(cid:18) thatis(cid:2) A attacksi(cid:12) B attacks(cid:9) Unfortunately(cid:2) they can communicate only by messengers who may be captured by the enemy(cid:9) It is known that it is impossible for the generals tocoordi(cid:30) nateanattackunder such conditions (cid:13)Gra(cid:26)(cid:14)(cid:2)HM(cid:3)(cid:7)(cid:15)(cid:9) Suppose werelax thiscondition(cid:2) however(cid:2) and require only that the generals coordinate their attackwith high probability (cid:13)FH(cid:14)(cid:14)(cid:2) FZ(cid:14)(cid:14)a(cid:15)(cid:9) Toeliminate allnondeterminism(cid:2) letusassumegeneralAtossesafaircointodetermine whether to attack(cid:2) and let us assume the probability a messenger is lost to the enemy is (cid:5)(cid:4)(cid:6)(cid:9) Let us assume further that the system is completely synchronous(cid:9) Our new correctness condition is that the condition (cid:22)A attacks i(cid:12) B attacks(cid:23) holds with probability (cid:0)(cid:3)(cid:3)(cid:9) Consider the following two(cid:30)step solution CA(cid:0) to the problem(cid:9) At round (cid:7)(cid:2) A tosses a coin and sends (cid:5)(cid:7) messengers to B i(cid:12) the coin landed heads(cid:9) At round (cid:5)(cid:2) B sends a messenger to tell A whether it has learned the outcome of the coin toss(cid:9) At round (cid:6)(cid:2) A attacks i(cid:12) the coin landed heads (cid:20)regardless of what it hears from B(cid:21) and B attacks i(cid:12) at round (cid:5) it learned that the coin landed heads(cid:9) It is not hard to see that if we put the natural probability space on the set of runs(cid:2) then with probability at least (cid:0)(cid:3)(cid:3) (cid:20)taken over the runs(cid:21) A attacks i(cid:12) B attacks(cid:25) if the coin lands tails then neither attacks(cid:2) and if the coin lands heads then with probability at least (cid:0)(cid:3)(cid:3) at least one of the ten messengers sent from A to B at round (cid:7) avoids capture and both generals attack(cid:9) This is very di(cid:12)erent(cid:2) however(cid:2) from saying that at all times both generals know that with probability atleast (cid:0)(cid:3)(cid:3)theattackwill be coordinated(cid:9) Tosee this(cid:2) consider thestatejustbefore attacking in which A has decided to attack but has received a message from B saying that B has not learned the outcome of the coin toss(cid:9) At this point(cid:2) A is certain the attack will not be coordinated(cid:9) Although we have not yet given a formal de(cid:17)nition of how to compute an agent(cid:19)s probability at agiven point(cid:2) it seems unreasonable for an agenttobelieve with high probability that an event will occur when information available to the agent guarantees it will not occur(cid:9) On the other hand(cid:2) consider the solution CA(cid:2) di(cid:12)ering from the preceding one only in that B does not try to send a messenger to A at round (cid:5) informing A about whether B has learned the outcome of the coin toss(cid:9) An easy argument shows that in this protocol(cid:2) at all times both generals have con(cid:17)dence (cid:20)in some sense of the word(cid:21) at least (cid:0)(cid:3)(cid:3) that the attack will be coordinated(cid:9) Consider B(cid:2) for example(cid:2) after having failed to receive a message from A(cid:9) B reasons that either A(cid:19)s coin landed tails and neither general will attack(cid:2) which would happen with probability (cid:5)(cid:4)(cid:6)(cid:2) or A(cid:19)s coin landed heads and all messengers were lost(cid:2) which (cid:0)(cid:0) would happen with probability (cid:5)(cid:4)(cid:6) (cid:18) and hence the conditional probability that the attack will be coordinated given that B received no messages from A is at least (cid:0)(cid:3)(cid:3)(cid:9) As the preceding discussion shows(cid:2) in a protocol which has a certain property P with high probability taken over the runs(cid:2) an agent maystill (cid:17)nd itself in a statewhere it knows perfectly well thatP does not(cid:20)andwill not(cid:21)hold(cid:9) While correctnessconditions P forproblems arising in computer science have typically been stated in terms of a probability distribution on the runs(cid:2) it might be of interest to consider protocols where an agent knows P with high probability at (cid:3)
Description: