THE BENEFITS OF DUALITY IN VERIFYING CONCURRENT PROGRAMS UNDER TSO∗ PAROSH AZIZ ABDULLAa, MOHAMED FAOUZI ATIGb, AHMED BOUAJJANIc, AND TUAN PHONG NGOd 7 a,b,d Uppsala University, Sweden 1 e-mail address: parosh,mohamed faouzi.atig,[email protected] 0 2 n c IRIF Universit´e Paris Diderot - Paris 7, France a e-mail address: [email protected] J 1 3 Abstract. We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures ] L for this model are based on complex encodings of store buffers as lossy channels. These F procedures assume that the number of processes is fixed. However, it is important in . general to prove correctness of a system/algorithm in a parametric way with an arbitrarily s c large number of processes. [ Inthispaper,weintroduceanalternative(yetequivalent)semanticstotheclassicalone for the TSO semantics that is more amenable for efficient algorithmic verification and for 2 extension to parametric verification. For that, we adopt a dual view where load buffers v are used instead of store buffers. The flow of information is now from the memory to 2 load buffers. We show that this new semantics allows (1) to simplify drastically the safety 8 6 analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared 8 to existing procedures, and (3) to extend easily the decision procedure to the parametric 0 case, which allows to obtain a new decidability result, and more importantly, a verification . algorithm that is more general and more efficient in practice than the one for bounded 1 instances. 0 7 1 : v 2012 ACM CCS: [Software and its engineering]: Software organization and properties—Software i functional properties—Formal methods—Software verification. X Key words and phrases: Total Store Order, Weak Memory Models, Reachability Problem, Parameterized r a Systems, Well-quasi Order Framework. ∗ A preliminary version of this paper appeared as [AABN16] at CONCUR’16. This work was supported in part by the Swedish Research Council and carried out within the Linnaeus centre of excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center. LOGICALMETHODS (cid:13)c P.A.Abdulla,M.F.Atig,A.Bouajjani,andT.P.Ngo INCOMPUTERSCIENCE DOI:10.2168/LMCS-??? CreativeCommons 1 2 P.A.ABDULLA,M.F.ATIG,A.BOUAJJANI,ANDT.P.NGO 1. Introduction Most modern processor architectures execute instructions in an out-of-order manner to gain efficiency. In the context of sequential programming, this out-of-order execution is transparent to the programmer since one can still work under the Sequential Consistency (SC) model [Lam79]. However, this is not true when we consider concurrent processes that share the memory. In fact, it turns out that concurrent algorithms such as mutual exclusion and producer-consumer protocols may not behave correctly any more. Therefore, program verification is a relevant (and difficult) task in order to prove correctness under the new semantics. The inadequacy of the interleaving semantics has led to the invention of new program semantics, so called Weak (or relaxed) Memory Models (WMM), by allowing permutationsbetweencertaintypesofmemoryoperations[AG96,DSB86,AH90]. TotalStore Ordering (TSO) is one of the the most common models, and it corresponds to the relaxation adopted by Sun’s SPARC multiprocessors [WG94] and formalizations of the x86-tso memory model [OSS09, SSO+10]. These models put an unbounded perfect (non-lossy) store buffer between each process and the main memory where a store buffer carries the pending store operationsoftheprocess. Whenaprocessperformsastoreoperation,itappendsittotheend of its buffer. These operations are propagated to the shared memory non-deterministically in a FIFO manner. When a process reads a variable, it searches its buffer for a pending store operation on that variable. If no such a store operation exists, it fetches the value of the variable from the main memory. Verifying programs running on the TSO memory model poses a difficult challenge since the unboundedness of the buffers implies that the state space of the system is infinite even in the case where the input program is finite-state. Decidability of safety properties has been obtained by constructing equivalent models that replace the perfect store buffer by lossy channels [ABBM10, ABBM12, AAC+12a]. However, these constructions are complicated and involve several ingredients that lead to inefficient verification procedures. For instance, they require each message inside a lossy channel to carry (instead of a single store operation) a full snapshot of the memory representing a local view of the memory contents by the process. Furthermore, the reductions involve non-deterministicguessingthelossychannelcontents. Theguessingisthenresolvedeitherby consistency checking [ABBM10] or by using explicit pointer variables (each corresponding to one process) inside the buffers [AAC+12a], causing a serious state space explosion problem. In this paper, we introduce a novel semantics which we call the Dual TSO semantics. Our aim is to provide an alternative (and equivalent) semantics that is more amenable for efficient algorithmic verification. The main idea is to have load buffers that contain pending load operations (more precisely, values that will potentially be taken by forthcoming load operations) rather than store buffers (that contain store operations). The flow of information will now be in the reverse direction, i.e., store operations are performed by the processes atomically on the main memory, while values of variables are propagated non-deterministically from the memory to the load buffers of the processes. When a process performs a load operation it can fetch the value of the variable from the head of its load buffer. We show that the dual semantics is equivalent to the original one in the sense that any given set of processes will reach the same set of local states under both semantics. The dual semantics allows us to understand the TSO model in a totally different way compared to the classical semantics. Furthermore, the dual semantics offers several important advantages fromthepointofviewofformalreasoningandprogramverification. First, thedualsemantics allows transforming the load buffers to lossy channels without adding the costly overhead THE BENEFITS OF DUALITY IN VERIFYING CONCURRENT PROGRAMS UNDER TSO 3 that was necessary in the case of store buffers. This means that we can apply the theory of well-structured systems [Abd10, ACJT96, FS01] in a straightforward manner leading to a much simpler proof of decidability of safety properties. Second, the absence of extra overhead means that we obtain more efficient algorithms and better scalability (as shown by our experimental results). Finally, the dual semantics allows extending the framework to perform parameterized verification which is an important paradigm in concurrent program verification. Here, we consider systems, e.g., mutual exclusion protocols, that consist of an arbitrary number of processes. The aim of parameterized verification is to prove correctness of the system regardless of the number of processes. It is not obvious how to perform parameterized verification under the classical semantics. For instance, extending the framework of [AAC+12a], would involve an unbounded number of pointer variables, thus leading to channel systems with unbounded message alphabets. In contrast, as we show in this paper, the simple nature of the dual semantics allows a straightforward extension of our verification algorithm to the case of parameterized verification. This is the first time a decidability result is established for the parametrized verification of programs running over WMM. Notice that this result is taking into account two sources of infinity: the number of processes and the size of the buffers. Based on our framework, we have implemented a tool and applied it to a large set of benchmarks. The experiments demonstrate the efficiency of the dual semantics compared to the classical one (by two order of magnitude in average), and the feasibility of parametrized verification in the former case. In fact, besides its theoretical generality, parametrized verification is practically crucial in this setting: as our experiments show, it is much more efficient than verification of bounded-size instances (starting from a number of components of 3 or 4), especially concerning memory consumption (which is the critical resource). Related Work. There have been a lot of works related to the analysis of programs running underWMM(e.g.,[LNP+12,KVY10,KVY11,DMVY13,AAC+12a,BM08,BSS11,BDM13, BAM07, YGLS04, AALN15, AAC+12b, AAJL16, DMVY17, TW16, LV16, LV15, Vaf15, HVQF16]). Some of these works propose precise analysis techniques for checking safety propertiesorstabilityoffinite-stateprogramsunderWMM(e.g.,[AAC+12a,BDM13,DM14, AAP15, AALN15]). Others propose context-bounded analyzing techniques (e.g., [ABP11, TLI+16, TLF+16]) or stateless model-checking techniques (e.g., [AAA+15, ZKW15, DL15, HH16]) for programs under TSO and PSO. Different other techniques based on monitoring and testing have also been developed during these last years (e.g., [BM08, BSS11, LNP+12]). There are also a number of efforts to design bounded model checking techniques for programs under WMM (e.g., [AKNT13, AKT13, YGLS04, BAM07]) which encode the verification problem in SAT/SMT. The closest works to ours are those presented in [AAC+12a, ABBM10, AAC+13, ABBM12] which provide precise and sound techniques for checking safety properties for finite-state programs running under TSO. However, as stated in the introduction, these techniques are complicated and can not be extended, in a straightforward manner, to the verification of parameterized systems (as it is the case of the developed techniques for the Dual TSO semantics). In Section 7, we experimentally compare our techniques with Memorax [AAC+12a, AAC+13]whichistheonlypreciseandsoundtoolforcheckingsafetypropertiesforconcurrent programs under TSO. 4 P.A.ABDULLA,M.F.ATIG,A.BOUAJJANI,ANDT.P.NGO 2. Preliminaries Let Σ be a finite alphabet. We use Σ∗ (resp. Σ+) to denote the set of all words (resp. non-empty words) over Σ. Let (cid:15) be the empty word. The length of a word w ∈ Σ∗ is denoted by |w| (and in particular |(cid:15)| = 0). For every i : 1 ≤ i ≤ |w|, let w(i) be the symbol at position i in w. For a ∈ Σ, we write a ∈ w if a appears in w, i.e., a = w(i) for some i : 1 ≤ i ≤ |w|. Given two words u and v over Σ, we use u (cid:22) v to denote that u is a (not necessarily contiguous) subword of v, i.e., if there is an injection h : {1,...,|u|} (cid:55)→ {1,...,|v|} such that: (1) h(i) < h(j) for all i < j and (2) for every i ∈ {1,...,|u|}, we have u(i) = v(h(i)). Given a subset Σ(cid:48) ⊆ Σ and a word w ∈ Σ∗, we use w| to denote the projection of w Σ(cid:48) over Σ(cid:48), i.e., the word obtained from w by erasing all the symbols that are not in Σ(cid:48). Let A and B be two sets and let f : A (cid:55)→ B be a total function from A to B. We use f[a ←(cid:45) b] to denote the function g such that g(a) = b and g(x) = f(x) for all x (cid:54)= a. (cid:16) (cid:17) a A transition system T is a tuple C,Init,Act,∪ −→ where C is a (potentially a∈Act infinite) set of configurations; Init ⊆ C is a set of initial configurations; Act is a set of actions; and for every a ∈ Act, −→a ⊆ C×C is a transition relation. We use c −→a c(cid:48) to denote that (c,c(cid:48)) ∈−→a . Let →−= ∪ −→a . a∈Act Arunπ ofT isoftheformc −a→1 c −a→2 ··· −a→n c wherec −a−i−+→1 c foralli : 0 ≤ i < n. 0 1 n i i+1 π Then, we write c −→ c . We use target(π) to denote the configuration c . The run π 0 n n is said to be a computation if c ∈ Init. Two runs π = c −a→1 c −a→2 ··· −a−m→ c and 0 1 0 1 m π = c −a−m−+→2 c −a−m−+→3 ··· −a→n c are compatible if c = c . Then, we write π •π 2 m+1 m+2 n m m+1 1 2 to denote the run π = c −a→1 c −a→2 ··· −a−m→ c −a−m−+→2 c −a−m−+→3 ··· −a→n c 0 1 m m+2 n For two configurations c and c(cid:48), we use c →−∗ c(cid:48) to denote that c −→π c(cid:48) for some run π. A ∗ configuration c is said to be reachable in T if c →− c for some c ∈ Init, and a set C of 0 0 configurations is said to be reachable in T if some c ∈ C is reachable in T. 3. Concurrent Systems In this section, we define the syntax we use for concurrent programs, a model for representing communicating concurrent processes. Communication between processes is performed through a shared memory that consists of a finite number of shared variables (over finite domains) to which all processes can read and write. Then we recall the classical TSO semantics including the transition system it induces and its reachability problem. Next, we introduce the Dual TSO semantics and its induced transition system. Finally, we state the equivalence between the two semantics; i.e., for a given concurrent program, we can reduce its reachability problem under the classical TSO to its reachability problem under Dual TSO and vice-versa. 3.1. Syntax. Let V be a finite data domain and X be a finite set of variables. We assume w.l.o.g. that V contains the value 0. Let Ω(X,V) be the smallest set of memory operations that contains with x ∈ X, and v,v(cid:48) ∈ V: (1) “no” operation nop, (2) read operation r(x,v), (3) write operation w(x,v), THE BENEFITS OF DUALITY IN VERIFYING CONCURRENT PROGRAMS UNDER TSO 5 (4) fence operation fence, and (5) atomic read-write operation arw(x,v,v(cid:48)). A concurrent system is a tuple P= (A ,A ,...,A ) where for every p : 1 ≤ p ≤ n, A 1 2 n p is a finite-state automaton describing the behavior of the process p. The automaton A is p defined as a triple (cid:0)Q ,qinit,∆ (cid:1) where Q is a finite set of local states, qinit ∈ Q is the p p p p p p initial local state, and ∆ ⊆ Q ×Ω(X,V)×Q is a finite set of transitions. We define p p p P := {1,...,n} to be the set of process IDs, Q := ∪ Q to be the set of all local states p∈P p and ∆ := ∪ ∆ to be the set of all transitions. p∈P p 3.2. Classical TSO Semantics. In the following, we recall the semantics of concurrent systems under the classical TSO model as formalized in [OSS09, SSO+10]. To do that, we define the set of configurations and the induced transition relation. Let P= (A ,A ,...,A ) 1 2 n be a concurrent system. TSO-configurations. A TSO-configuration c is a triple (q,b,mem) where: (1) q : P (cid:55)→ Q is the global state of P mapping each process p ∈ P to a local state in Q p (i.e., q(p) ∈ Q ). p (2) b : P (cid:55)→ (X ×V)∗ gives the content of the store buffer of each process. (3) mem : X (cid:55)→ V defines the value of each shared variable. Observe that the store buffer of each process contains a sequence of write operations, where each write operation is defined by a pair, namely a variable x and a value v that is assigned to x. The initial TSO-configuration c is defined by the tuple (q ,b ,mem ) where, init init init init for all p ∈ P and x ∈ X, we have that q (p) = qinit, b (p) = (cid:15) and mem (x) = 0. In init p init init other words, each process is in its initial local state, all the buffers are empty, and all the variables in the shared memory are initialized to 0. We use C to denote the set of TSO-configurations. TSO TSO-transition Relation. The transition relation →− between TSO-configurations is TSO given by a set of rules, described in Figure 1. Here, we informally explain these rules. A nop transition (q,nop,q(cid:48)) ∈ ∆ changes only the local state of the process p from q to q(cid:48). p A write transition (q,w(x,v),q(cid:48)) ∈ ∆ adds a new message (x,v) to the tail (i.e., the left p most) of the store buffer of the process p. A memory update transition update can be p performed at any time by removing the (oldest) message at the head (i.e., the right most) of the store buffer of the process p and updating the memory accordingly. For a read transition (q,r(x,v),q(cid:48)) ∈ ∆ , if the store buffer of the process p contains some write operations to p x, then the read value v must correspond to the value of the the most recent (i.e., the left most) such a write operation. Otherwise the value v of x is fetched from the memory. A fence transition (q,fence,q(cid:48)) ∈ ∆ may be performed by the process p only if its store buffer p is empty. Finally, an atomic read-write transition (q,arw(x,v,v(cid:48)),q(cid:48)) ∈ ∆ can be performed p by the process p only if its store buffer is empty. This transition checks then whether the value of x is v and then changes it to v(cid:48). Let ∆(cid:48) = (cid:8)update | p ∈ P(cid:9), i.e. ∆(cid:48) contains all memory update transitions. We use p c →− c(cid:48) to denote that c →−t c(cid:48) for some t ∈ ∆∪∆(cid:48). The transition system induced by TSO TSO P under the classical TSO semantics is then given by T = (C ,{c },∆∪∆(cid:48),→− ). TSO TSO init TSO 6 P.A.ABDULLA,M.F.ATIG,A.BOUAJJANI,ANDT.P.NGO t = (q,nop,q(cid:48)) q(p) = q Nop (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) TSO t = (q,w(x,v),q(cid:48)) q(p) = q Write (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b[p ←(cid:45) (x,v)·b(p)],mem) TSO t = update p Update t (q,b[p ←(cid:45) b(p)·(x,v)],mem) →− (q,b,mem[x ←(cid:45) v]) TSO t = (q,r(x,v),q(cid:48)) q(p) = q b(p)| = (x,v)·w {x}×V Read-Own-Write (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) TSO t = (q,r(x,v),q(cid:48)) q(p) = q b(p)| = (cid:15) mem(x) = v {x}×V Read from Memory (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) TSO t = (q,arw(x,v,v(cid:48)),q(cid:48)) q(p) = q b(p) = (cid:15) mem(x) = v ARW (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem[x ←(cid:45) v(cid:48)]) TSO t = (q,fence,q(cid:48)) q(p) = q b(p) = (cid:15) Fence (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) TSO Figure 1. The transition relation →− under TSO. Here process p ∈ P TSO (cid:8) (cid:9) and transition t ∈ ∆ ∪ update where update is a transition that updates p p p the memory using the oldest message in the buffer of the process p. The TSO Reachability Problem. A global state q is said to be reachable in target T if and only if there is a TSO-configuration c of the form (q ,b,mem), with TSO target b(p) = (cid:15) for all p ∈ P, such that c is reachable in T . TSO The reachability problem for the concurrent system P under the TSO semantics asks, for a given global state q , whether q is reachable in T . Observe that, in the target target TSO definition of the reachability problem, we require that the buffers of the configuration c must be empty instead of being arbitrary. This is only for the sake of simplicity and does not constitute a restriction. Indeed, we can easily show that the “arbitrary buffer” reachability problem is reducible to the “empty buffer” reachability problem. 3.3. Dual TSO Semantics. In this section, we define the Dual TSO semantics. The model has a perfect FIFO load buffer between the main memory and each process. This load buffer is used to store potential read operations that will be performed by the process. Each message in the load buffer of a process p is either a pair of the form (x,v) or a triple of the form (x,v,own) where x ∈ X and v ∈ V. A message of the form (x,v) corresponds to the fact that x has had the value v in the shared memory. While a message (x,v,own) corresponds to the fact that the process p has written the value v to x. A write operation w(x,v) of the process p immediately updates the shared memory and then appends a new message of the form (x,v,own) to the tail (i.e., the left most) of the load buffer of p. Read propagation is then performed by non-deterministically choosing THE BENEFITS OF DUALITY IN VERIFYING CONCURRENT PROGRAMS UNDER TSO 7 t = (q,nop,q(cid:48)) q(p) = q Nop (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) DTSO t = (q,w(x,v),q(cid:48)) q(p) = q Write (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b[p ←(cid:45) (x,v,own)·b(p)],mem[x ←(cid:45) v]) DTSO t = propagatex mem(x) = v p Propagate t (q,b,mem) →− (q,b[p ←(cid:45) (x,v)·b(p)],mem) DTSO t = delete |m| = 1 p Delete t (q,b[p ←(cid:45) b(p)·m],mem) →− (q,b,mem) DTSO t = (q,r(x,v),q(cid:48)) q(p) = q b(p)| = (x,v,own)·w {x}×V×{own} Read-Own-Write (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) DTSO t = (q,r(x,v),q(cid:48)) q(p) = q b(p)| = (cid:15) b(p) = w·(x,v) {x}×V×{own} Read from Buffer (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) DTSO t = (q,arw(x,v,v(cid:48)),q(cid:48)) q(p) = q b(p) = (cid:15) mem(x) = v ARW (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem[x ←(cid:45) v(cid:48)]) DTSO t = (q,fence,q(cid:48)) q(p) = q b(p) = (cid:15) Fence (q,b,mem) →−t (q[p ←(cid:45) q(cid:48)],b,mem) DTSO Figure 2. The induced transition relation →− under the Dual TSO DTSO semantics. Here process p ∈ P and transition t ∈ ∆ ∪ ∆(cid:48) where ∆(cid:48) = p p p (cid:8)propagatex,delete | x ∈ X(cid:9). p p a variable (let’s say x and its value is v in the shared memory) and appending the new message (x,v) to the tail of the load buffer of p. This propagation operation speculates on a read operation of p on x that will be performed later on. Moreover, delete operation of the process p can be performed at any time by removing the (oldest) message at the head (i.e., the right most) of the load buffer of p. A read operation r(x,v) of the process p can be executed if the message at the head of the load buffer of p is of the form (x,v) and there is no pending message of the form (x,v(cid:48),own). In the case that the load buffer contains some messages belonging to p (i.e., of the form (x,v(cid:48),own)), the read value must correspond to the value of the most recent (i.e., the left most) message belonging to p. Implicitly, this allows to simulate the Read-Own-Write transitions in the TSO semantics. A fence operation means that the load buffer of p must be empty before p can continue. Finally, an atomic read-write operation arw(x,v,v(cid:48)) means that the load buffer of p must be empty and the value of the variable x in the memory is v before p can continue. DTSO-configurations. A DTSO-configuration c is a triple (q,b,mem) where: (1) q : P (cid:55)→ Q is the global state of P. (2) b : P (cid:55)→ ((X ×V)∪(X ×V ×{own}))∗ is the content of the load buffer of each process. 8 P.A.ABDULLA,M.F.ATIG,A.BOUAJJANI,ANDT.P.NGO (3) mem : X (cid:55)→ V gives the value of each shared variable. The initial DTSO-configuration cD is defined by (q ,b ,mem ) where, for all init init init init p ∈ P and x ∈ X, we have that q (p) = qinit, b (p) = (cid:15) and mem (x) = 0. init p init init We use C to denote the set of DTSO-configurations. DTSO DTSO-transition Relation. Thetransitionrelation→− betweenDTSO-configurations DTSO is given by a set of rules, described in Figure 2. This relation is induced by members of ∆∪∆ where ∆ := (cid:8)propagatex,delete | p ∈ P, x ∈ X(cid:9). aux aux p p We informally explain the transition relation rules. The propagate transition propagatex p speculates on a read operation of p over x that will be executed later. This is done by appending a new message (x,v) to the tail (i.e., the left most) of the load buffer of p where v is the current value of x in the shared memory. The delete transition delete removes the p (oldest) message at the head (i.e., the right most) of the load buffer of the process p. A write transition (q,w(x,v),q(cid:48)) ∈ ∆ updates the memory and appends a new message (x,v,own) p to the tail of the load buffer. A read transition (q,r(x,v),q(cid:48)) ∈ ∆ checks first if the load p buffer of p contains a message of the form (x,v(cid:48),own). In that case, the read value v should correspond to the value of the most recent (i.e., the left most) message of that form. If there is no such message on the variable x in the load buffer of p, then the value v of x is fetched from the message at the head of the load buffer of the process p. We use c →− c(cid:48) to denote that c →−t c(cid:48) for some t ∈ ∆ ∪ ∆ . The tran- DTSO DTSO aux sition system induced by P under the Dual TSO semantics is then given by T = DTSO (cid:0) (cid:1) C ,{cD },∆∪∆ ,→− . DTSO init aux DTSO The DTSO Reachability Problem. The DTSO reachability problem for P under the Dual TSO semantics is defined in a similar manner to the case of TSO. A global state q target is said to be reachable in T if and only if there is a DTSO-configuration c of the form DTSO (q ,b,mem), with b(p) = (cid:15) for all p ∈ P, such that c is reachable in T . Then, the target DTSO reachability problem consists in checking whether q is reachable in T . target DTSO 3.4. Relation between TSO and DTSO Reachability Problems. The following the- orem states the equivalence of the reachability problems under the TSO and Dual TSO semantics. Theorem 3.1 (TSO-DTSO reachability equivalence). A global state q is reachable in target T iff q is reachable in T . TSO target DTSO Proof. The proof of the theorem is given in Section 6.1. 4. The DTSO Reachability Problem In this section, we show the decidability of the DTSO reachability problem by making use of the framework of Well-Structured Transition Systems (Wsts) [ACJT96, FS01]. First, we briefly recall the framework of Wsts. Then, we instantiate it to show the decidability of the DTSO reachability problem. THE BENEFITS OF DUALITY IN VERIFYING CONCURRENT PROGRAMS UNDER TSO 9 (cid:16) (cid:17) a 4.1. Well-Structured Transition Systems. LetT = C,Init,Act,∪ −→ beatran- a∈Act sition system. Let (cid:118) be a well-quasi ordering on C. Recall that a well-quasi ordering on C is a binary relation over C that is reflexive and transitive; and for every infinite sequence (c ) of elements in C there exist i,j ∈ N such that i < j and c (cid:118) c . i i≥0 i j A set U ⊆ C is called upward closed if for every c ∈ U and c(cid:48) ∈ C with c (cid:118) c(cid:48), we have c(cid:48) ∈ U. It is known that every upward closed set U can be characterised by a finite minor set M ⊆ U such that: (i) for every c ∈ U, there is c(cid:48) ∈ M such that c(cid:48) (cid:118) c; and (ii) if c,c(cid:48) ∈ M and c (cid:118) c(cid:48), then c = c(cid:48). We use min to denote the function which for a given upward closed set U returns its minor set. Let D ⊆ C. The upward closure of D is defined as D ↑:= {c(cid:48) ∈ C| ∃c ∈ D with c (cid:118) c(cid:48)}. We (cid:110) (cid:111) a also define the set of predecessors of D as Pre (D) := c| ∃c ∈ D,a ∈ Act,c −→ c . For a T 1 1 finite set of configurations M ⊆ C, we use minpre(M) to denote min(Pre (M ↑)∪M ↑). T The transition relation →− is said to be monotonic wrt. the order (cid:118) if, given c ,c ,c ∈ C 1 2 3 where c →− c and c (cid:118) c , we can compute a configuration c ∈ C and a run π such that 1 2 1 3 4 π c −→ c and c (cid:118) c . The pair (T,(cid:118)) is called a monotonic transition system if →− is 3 4 2 4 monotonic wrt. (cid:118). Given a finite set of configurations M ⊆ C, the coverability problem of M in the monotonic transition system (T,(cid:118)) asks whether the set M ↑ is reachable in T; i.e. there exist two configurations c and c such that c ∈ M, c (cid:118) c , and c is reachable in T. 1 2 1 1 2 2 For the decidability of this problem, the following three conditions are sufficient: (1) For every two configurations c and c , it is decidable whether c (cid:118) c . 1 2 1 2 (2) For every c ∈ C, we can check whether {c} ↑ ∩Init (cid:54)= ∅. (3) For every c ∈ C, the set minpre({c}) is finite and computable. The solution for the coverability problem as suggested in [ACJT96, FS01] is based on a backward analysis approach. It is shown that starting from a finite set M ⊆ C, the sequence 0 (M ) with M := minpre(M ), for i ≥ 0, reaches a fixpoint and it is computable. i i≥0 i+1 i 4.2. DTSO Transition System is a Wsts. In this section, we instantiate the framework of Wsts to show the following result: Theorem 4.1 (DecidabilityofDTSOreachabilityproblem). The DTSO reachability problem is decidable. Proof. The rest of this section is devoted to the proof of the above theorem. Let P = (A ,A ,...,A ) be a concurrent system (as defined in Section 3). Moreover, let T = 1 2 n DTSO (cid:0) (cid:1) C ,{cD },∆∪∆ ,→− be the transition system inducedbyPunder the Dual TSO DTSO init aux DTSO semantics (as defined in Section 3.3). In the following, we will show that the DTSO-transition system T is monotonicity DTSO wrt. an ordering (cid:118). Then, we will show the three sufficient conditions for the decidability of the coverability problem for (T ,(cid:118)) (as stated in Section 4.1). DTSO (1) We first define the ordering (cid:118) on the set of DTSO-configurations. (2) Then, we show that the transition system induced under the Dual TSO semantics is monotonic wrt. the order (cid:118) (see Lemma 4.2). (3) For the first sufficient condition, we show that (cid:118) is a well-quasi-ordering; and that for every two configurations c and c , it is decidable whether c (cid:118) c (see Lemma 4.3). 1 2 1 2 10 P.A.ABDULLA,M.F.ATIG,A.BOUAJJANI,ANDT.P.NGO (4) The second sufficient condition (i.e., checking whether the upward closed set {c} ↑, with c is a DTSO-configuration, contains an initial configuration) is trivial. This check boils down to verify whether c is an initial configuration. (5) For the third sufficient condition, we show that we can calculate the set of minimal DTSO-configurationsforthesetofpredecessorsofanyupwardclosedset(seeLemma4.4). (6) Finally, we will show also that the DTSO reachability problem for P can be reduced to the coverability problem in the monotonic transition system (T ,(cid:118)) (see Lemma DTSO 4.5). Observe that this reduction is needed since we require that the load buffers are empty when defining the DTSO reachability problem. This concludes the proof of Theorem 4.1. 4.2.1. Ordering (cid:118). In the following, we define an ordering (cid:118) on C . Let us first DTSO introduce some notations and definitions. Consider a word w ∈ ((X ×V)∪(X ×V ×{own}))∗ representing the content of a load buffer. We define an operation that divides w into a number of fragments according to the most-recent own-messages concerning each variable. We define [w] := (w ,(x ,v ,own),w ,...,w ,(x ,v ,own),w ) own 1 1 1 2 m m m m+1 where the following conditions are satisfied: (1) x (cid:54)= x if i (cid:54)= j. i j (2) If (x,v,own) ∈ w , then x = x for some j < i (i.e., the most recent own-message on x i j j occurs at position j). (3) w = w ·(x ,v ,own)·w ···w ·(x ,v ,own)·w (i.e., the fragments correspond 1 1 1 2 m m m m+1 to the given word w). Let w,w(cid:48) ∈ ((X ×V)∪(X ×V ×{own}))∗ be two words. Let us assume that: [w] = (w ,(x ,v ,own),w ,...,w ,(x ,v ,own),w ) own 1 1 1 2 r r r r+1 [w(cid:48)] = (w(cid:48),(x(cid:48),v(cid:48),own),w(cid:48),...,w(cid:48) ,(x(cid:48) ,v(cid:48) ,own),w(cid:48) ). own 1 1 1 2 m m m m+1 We write w (cid:118) w(cid:48) to denote that the following conditions are satisfied: (i) r = m, (ii) x(cid:48) = x i i and v(cid:48) = v for all i : 1 ≤ i ≤ m, and (iii) w (cid:22) w(cid:48) for all i : 1 ≤ i ≤ m+1. i i i i Consider two DTSO-configurations c = (q,b,mem) and c(cid:48) = (q(cid:48),b(cid:48),mem(cid:48)), we extend the ordering (cid:118) to configurations as follows: c (cid:118) c(cid:48) if and only if the following conditions are satisfied: (1) q = q(cid:48), (2) b(p) (cid:118) b(cid:48)(p) for all process p ∈ P, and (3) mem(cid:48) = mem. 4.2.2. Monotonicity. Letc = (q ,b ,mem ),c = (q ,b ,mem ),c = (q ,b ,mem ) ∈ 1 1 1 1 2 2 2 2 3 3 3 3 C such that c →−t c for some t ∈ ∆ ∪(cid:8)propagatex,delete | x ∈ X(cid:9) with p ∈ P, DTSO 1 DTSO 2 p p p and c (cid:118) c . We will show that it is possible to compute a configuration c ∈ C and a 1 3 4 DTSO π run π such that c −→ c and c (cid:118) c . 3 DTSO 4 2 4 To that aim, we first show that it is possible from c to reach to a configuration c(cid:48), by 3 3 performing a certain number of delete transitions, such that the process p will have the p same last message in its load buffer in the configurations c and c(cid:48) while c (cid:118) c(cid:48). Then, 1 3 1 3 from the configuration c(cid:48), the process p can perform the same transition t as c did in order 3 1