International Journal on Software Tools for Technology Transfer manuscript No. (will be inserted by the editor) lemtoSATorQSATprobleminstances.Accountingfor temporal properties is done via counterexample prohi- bition. Counterexamples are either obtained from pre- viously identified FSMs, or based on bounded model checking. The fourth method uses backtracking. The Exact Finite-State Machine proposed methods are evaluated on several case stud- Identification from Scenarios ies and on a larger number of randomly generated in- stances of increasing complexity. The results show that and Temporal Properties the Iterative SAT-based method is the leader among theproposedmethods.Themethodsarealsocompared Vladimir Ulyantsev1 ¨ withexistinginexactapproaches,i.e.theoneswhichdo Igor Buzhinsky1,2 ¨ not necessarily identify the minimum FSM, and these Anatoly Shalyto1 comparisons show encouraging results. 6 Keywords Finite-state machine identification ¨ linear 1 temporal logic ¨ model checking ¨ SAT ¨ QSAT 0 Thefinalpublication isavailableat Springer via 2 http://dx.doi.org/10.1007/s10009-016-0442-1 v 1 Introduction o N Abstract Finite-statemodels,suchasfinite-statema- Finite-state models, such as finite-state machines, or 9 chines (FSMs), aid software engineering in many ways. FSMs, and deterministic finite automata (DFA), are 1 They are often used in formal verification and also can commonly used for solving various problems arising in serve as visual software models. The latter application software engineering, such as software verification and ] E is associated with the problems of software synthesis reverse engineering. Recently, there has been growing S andautomaticderivationofsoftwaremodelsfromspec- interestinautomatedFSMconstructionbasedongiven s. ification. Smaller synthesized models are more general specifications, which are often represented as execu- c andareeasiertocomprehend,yettheproblemofmini- tion traces and logs [23,38,39,31]. Other types of data [ mumFSMidentificationhasreceivedlittleattentionin employed in model construction are temporal proper- 2 previous research. ties [38,9,31] and invariants [3]. This research direc- v This paper presents four exact methods to tackle tion is appealing, since inferred finite-state models can 5 the problem of minimum FSM identification from a set help comprehend software, reveal faults in it, facilitate 4 9 of test scenarios and a temporal specification repre- model-driven development, or even serve as software. 6 sented in linear temporal logic. The methods are im- Existing techniques, such as state merging [27,38] 0 plemented as an open-source tool. Three of them are and metaheuristic approaches [34,9], demonstrate ac- . 1 based on translations of the FSM identification prob- ceptable performance. However, they are almost not 0 concerned about the size of generated models: it is not 6 This work was financially supported by the Government always possible to obtain the FSM with the minimum 1 of Russian Federation, Grant 074-U01, and also partially : number of states, and even when it is, existing meth- v supported by the Russian Foundation for Basic Research ods do not provide the proof that the found automa- i (RFBR), research project No. 14-07-31337 mol a. We also X thank Maxim Buzdalov, Daniil Chivilikhin and anonymous ton is indeed the smallest possible one. Smaller FSMs r reviewersforusefulcomments. are preferred since they are easier to comprehend, to a maintain, and, according to the Occam’s principle, are VladimirUlyantsev moregeneral,whichisusefulinthecasesofincomplete [email protected] specifications. In particular, if FSMs are further used IgorBuzhinsky(correspondingauthor) for test case generation [11,6], the smaller number of igor [email protected] states leads to more concise test suites. Inordertoaddressthisproblemofconstructingthe AnatolyShalyto [email protected] smallest possible FSM, or the problem of exact FSM identification, this paper presents four exact methods 1Computer Technologies Laboratory, ITMO University, of FSM identification from test scenarios and temporal St.Petersburg,Russia 2Department of Electrical Engineering and Automation, properties represented in linear temporal logic (LTL) AaltoUniversity,Espoo,Finland [32]. The results of Gold [21] and Rosner [33] on com- putational complexity of other finite-state model iden- 2 V.Ulyantsev,I. Buzhinsky,A. Shalyto tification problems make us believe that the consid- augmented prefix tree acceptor (APTA), a tree-shaped eredproblemisNP-hard,althoughnoproofisprovided automaton, and iteratively merges its states until no in this paper. The proposed methods are hence based valid merge exists. This algorithm serves as the basis on heuristics. Three of them translate the problem to for the method of FSM identification from execution either the Boolean satisfiability problem (SAT) or its traces and LTL safety formulae proposed in [38]. The quantifiedversion(QSAT).TheSATproblemhasbeen authors perform a number of state merging executions previously used in related research: in [22] the authors (the practically efficient Blue Fringe approach is cho- learn DFA, and in [35] extended finite-state machines sen) with the increasing number of negative execution (EFSMs) are identified. Conversely, the translation to traces, which are obtained as contradictions between QSAT has not been applied in solving such problems. the current FSM and LTL properties. The validity of In this paper it is used for FSM construction in com- eachmergeisadditionallycheckedagainstthetemporal bination with bounded model checking [4], which is a properties.Thisreducesthesizeofthesearchspaceand formofmodelchecking[12],anapproachinformalsoft- thus makes the state merging procedure more efficient. ware verification. The remaining method is based on While in [38] LTL properties are either known in backtracking.Allthemethodsareincorporatedintoan advance (in the “passive” approach) or messaged to open-source tool written in Java. the FSM identification tool by its user (in the “ac- Another issue, which might be important in reac- tive” approach), in [28] and [3] they are mined from tive software model identification, is completeness – software traces or logs using predefined templates. In the property of having a transition for each event in [28], the mined temporal properties are employed to eachstateoftheidentifiedFSM.Forexample,complete guide state merging so that they are not violated. In FSMs are essential in sequential circuit synthesis [10], [3], the initially constructed compact model is itera- finite-stateprotocolsynthesis[1],andintheIEC61499 tivelyrefinedtofulfillthetemporalpropertiesandthen international industrial standard [37]. The majority of is additionally coerced to cancel the refinements which existing methods neglect this requirement, but is not are redundant due to an imperfect heuristic refinement the case for the proposed techniques. procedure. The approach [3] is improved in the work The proposed methods are evaluated on case stud- [31], which focuses on learning models whose transi- ies and randomly generated instances. They are fur- tions are annotated with numbers indicating resource ther compared with existing inexact approaches. First, (i.e. time or memory) consumption. Inferring models two of them are shown to outperform the approach richer than simple discrete transition systems has also from[9].Comparedtostatemerging[38],theproposed been attempted in [39], but the idea in this work is approaches need more time, but are applicable under different and does not employ temporal properties: in- fewerpremises(suchastheabsenceofactionsontransi- stead, finite-state machines are enriched with numeric tions). Finally, the comparison with symbolic bounded data classifiers learnt from traces with data values. LTLsynthesis[16,17]suggeststhattheproposedmeth- Another group of methods is based on metaheuris- ods generate notably smaller models. tics, such as genetic algorithms [30] and ant colony op- Therestofthepaperisorganizedasfollows.InSec- timization[13].Thegeneticalgorithmhasbeenapplied tion 2, we examine related research. In Section 3, we for EFSM construction in [34], but the work [8] shows review several key concepts from the fields of model that the evolutionary algorithm based on ant colony checking and bounded model checking. The considered optimization solves this problem faster. problem is formally stated in Section 4. Next, in Sec- One of the ways of finding an exact solution (i.e. tion 5, we describe the contribution of the paper: the the FSM with the minimum number of states conform- proposed FSM identification techniques. In Section 6, ing with the specification), apart from the naive brute- weevaluatethemoncasestudysystemsandrandomin- force solution enumeration, is the translation of the stances,andthencomparethemwithothertechniques. problem to another NP-hard problem such as SAT or Section 7 concludes the paper. the constraint satisfaction problem (CSP) and feed- ing the obtained set of constraints to an exact solver 2 Related work (a third-party tool based on heuristics). To the best of our knowledge, all existing translation-based meth- Many previously proposed finite-state model identifi- odscurrentlydonotsupporttemporalspecifications.A cation methods are heuristic. The EDSM state merg- translation-to-SAT DFA learning method, which em- ingalgorithm[27]forconstructingDFAfromanumber ploys labeled examples as input data, has been pro- ofwordslabeledwithacceptance/rejectioninformation posedin[22].Thismethodfindsapropercoloringofthe was among the first ones. State merging starts from an so-calledconsistencygraph,whichdeterminesunmerge- ExactFinite-StateMachineIdentificationfromScenariosandTemporalProperties 3 able pairs of APTA vertices. The paper [36] improves L(1) = { p, q } L(3) = { } thisapproachbyaddingbreadth-firstsearch(BFS)sym- metry breaking predicates to narrow the search space. L(5) = { p, r } Another work [35], which is based on [22], introduces a L(2) = { q } L(4) = { r } SAT-basedmethodofFSMsynthesisfromuser-prepared behavior examples, or test scenarios. Since one of the ... 1 2 4 5 3 4 5 methodsproposedinourpaperisbasedonthemethod p, q q r p, r r p, r from [35], we will examine it in more detail in Sec- tion 5.1.1. Fig. 1 An example of a Kripke structure (top) and an infi- nite path in it (bottom). The structure has 5 states, and its Finally, the problem of identifying an FSM from labeling function annotates them with three atomic propo- both scenarios and temporal properties represented in sitions p, q, and r. Two initial states 1 and 2 are marked the LTL language is solved in [9]. The solution called with incoming arrows from the left. Other arrows describe CSP+MuACOsm combinestheuseofaCSPsolverand thetransitionrelation. ametaheuristicsearchwithanantcolonyoptimization algorithm. More precisely, the CSP solver finds the ini- a Kripke structure with an infinite path is shown in tial solution based on scenarios only, and then it is ad- Fig. 1. justedmetaheuristicallytoaccountfortemporalformu- LTLformulaeconsistofBooleanoperators(^,_,(cid:32), lae. Thus, this approach is inexact. Ñ),temporaloperatorsandatomicpropositions.Iff is There has also been a large volume of research con- simply a Boolean formula, then it asserts that the first cerning the LTL synthesis problem, wherein a reactive state of the path is marked with some atomic proposi- system compliant with given LTL properties must be tions and is not marked with some other ones. If f is derived [5]. This problem in known to be 2EXPTIME- an LTL formula, then saying that f holds for a state complete [33] in terms of specification length. While within an infinite path means that it holds for the in- themajorityoftechniquesmentionedaboveaimtocon- finite suffix of the path starting from this state. The struct a finite-state model which explains the behavior following temporal operators can be used: of a software system, the LTL synthesis problem re- quires a software system to be constructed. In this case – The neXt operator: Xf indicates that formula f LTL properties are often easier to obtain than traces, holds for the next state of the path. sincethereisnosoftwarewhichcangeneratethem.Re- – TheGlobal operator:Gf indicatesthatf holdsfor cent works, which attempted to solve this problem in a the current state and all future states of the path. practicallyfeasibleway,includetheapproach tobound – TheFuture operator:Ff indicatesthatthereexists the parameters of the target system [17,20], paper de- a future state for which f holds, or it already holds scribing a tool which implements this idea [16], and an for the first state of the path. approach based on incremental model refinement [19]. – TheUntil operator:fUg indicatesthatf holdsfor a finite number of states, and then g holds for the next state. 3 Model checking and bounded model checking – The Release operator: fRg indicates that either g holds until both f and g are true in some state, or Concepts related to model checking [12] will be exten- g holds forever if f never becomes true. sively used in the rest of the paper. Model checking is a formal verification technique for finite-state mod- Iff isanLTLformula,thenM (Af meansthatf els which suggests describing the specification for the is satisfied for all infinite paths in M which start in I. software in the form of temporal properties. One of the Alternatively, M (Ef means that there exists a path ways to define temporal properties is linear temporal starting in I for which f is satisfied. logic (LTL): each property to check is expressed as a Bounded model checking [4],orBMC,isatechnique formula defined over the set of infinite paths in the to approximately verify LTL formulae by reducing the Kripke structure,aspecialmodelofsoftwareexecution. problem to a SAT instance. The idea is to search for A Kripke structure [12] M is a quadruple pS ,I,T,Lq a counterexample for the formula being verified among K where S is the set of states, I Ă S is the set of finite execution paths and infinite paths with a simple K K initial states, T Ă S ˆS is the transition relation, structure, which enter an infinite loop at some point. K K which must be left-total (that is, from each state there Each path, either finite or looping, is represented as is a transition to at least one state), and L:S Ñ2P a number of Boolean vectors s ,...,s , where k deter- K 0 k is the labeling function, where P is the set of atomic mines the strength of the verification procedure. This propositions, which characterize states. An example of integer can be iteratively increased until a counterex- 4 V.Ulyantsev,I. Buzhinsky,A. Shalyto ampleisfound,oratheoreticalboundary[4]isreached a) e / z 1 1 which proves that BMC of the Kripke structure with the current k is equivalent to its usual model checking, or the employed SAT solver fails to solve the current 1 SAT instance. Each s p0ĺj ĺkq determines the j-th j state of the path. e / z e / z 2 1 1 1 e / z 2 2 4 Problem statement 2 Inthiswork,afinite state machine (FSM)isasextuple b) pS,s ,E,Z,δ,λqwhereSisafinitesetofstates,s P init init S is the initial state, E is a finite set of input events, (1, e , z , 1) (2, e , z , 1) Z is a finite set of output actions, δ : S ˆE Ñ S is 1 1 1 1 the transition function, and λ : S ˆE Ñ Z˚, where Z˚ is the set of strings over Z, is the output function. If δ and λ are partial functions defined over the same (1, e , z , 2) (2, e , z , 1) subsetofSˆE,thentheFSMiscalledincomplete:some 2 1 2 2 of its transitions are missing. Otherwise, if δ and λ are totalfunctions,wecallsuchanFSMcomplete.AnFSM Fig. 2 An example of an FSM (a) and its Kripke struc- ture(b). executionisasequenceofcycles:oneachcycletheFSM receives an input event, generates an output sequence according to λ and changes its active state according – wasEventpeq,ePE:whetherthecorrespondingtran- to δ. sition of the FSM is triggered by event e; TheconsideredprobleminvolvesidentifyinganFSM – wasActionpzq, z P Z: whether the corresponding with a fixed number of states |S| which satisfies two transition of the FSM includes at least one action z types of specification: scenarios and LTL properties. If in its output sequence. such an FSM does not exist, this also must be eventu- The second type of atomic propositions is not suf- ally spotted. Note that to find an FSM with the small- ficient to express constraints which involve the posi- est number of states, one might try increasing |S| until tionsofactionsinoutputsequences.Nevertheless,since a solution is found. The first type of specification is these propositions were considered in one of the previ- a set of test scenarios. A test scenario is a sequence ous works [9] with which we compare our work, we will of pairs pe ,A q,...,pe ,A q, where each e P E and 1 1 n n i alsousethem.Besides,generalizingwasActiontohan- A P Z˚ p1 ĺ i ĺ nq. These pairs are called scenario i dle positions would not essentially influence the pro- elements. posed methods. Below are some examples of LTL for- The second type of specification is a set of LTL mulae with the defined atomic propositions, which are formulae. An FSM complies with an LTL formula, if satisfied for the FSM shown in Fig. 2: the formula holds for each possible execution of the – wasActionpz q^XpwasActionpz q_wasActionpz qq: FSM.Weassumethefollowingcorrespondencebetween 1 1 2 the first state of the path is marked with atomic FSMs and their Kripke structures: the Kripke struc- propositionwasActionpz q(andpossiblywithsome ture’s states S are FSM’s transitions (thus, S Ă 1 K K other atomic propositions), and the second state is SˆEˆZ˚ˆS), and a state of the Kripke structure is markedwitheitherwasActionpz qorwasActionpz q. initialifandonlyifitcorrespondstoanFSMtransition 1 2 IntermsofthecorrespondingFSM,thismeansthat from its initial state s : init z is emitted on the first cycle of FSM execution, 1 and either z or z is emitted on the second cycle. I “tps ,e,λps ,eq,δps ,eqq|ePEu. 1 2 init init init – GpwasEventpe q Ñ FpwasActionpz qqq: each event 1 1 e received by the FSM will cause action z in the Consequently,thepaircomposedoftwostatesps ,e,z, 1 1 1 future. s q and ps1,e1,z1,s1q belongs to the transition relation 2 1 2 T, if and only if s “ s1. An example of the described Itisalsopossibletooptionallyrequiretheidentified 2 1 transformationisshowninFig.2.Finally,todefinethe FSM to be complete. While describing the FSM iden- labeling function L, we consider the following set of tification techniques, we will mention the cases of both atomic propositions P: presence and absence of the completeness requirement. ExactFinite-StateMachineIdentificationfromScenariosandTemporalProperties 5 The final remark in this section concerns the corre- e / z e / z 1 1 1 1 spondenceofourdefinitionsoftheFSManditsidentifi- e / z cationproblemwiththeonesfrompreviousworks.The 1 1 e / z model of EFSMs considered in [35] and [9] additionally e / z 2 1 2 1 employs guard conditions on transitions. Such condi- tionsdependonBooleanvariables–anextratypeofin- e / z 2 1 e / z e / z 2 2 1 1 put data for an FSM. Nevertheless, any instance of the FSMlearningproblemwithbotheventsandguardcon- Fig. 3 An example of a scenario tree for four test ditions can be transformed to an instance with events scenarios: pe1,z1q,pe1,z1q,pe1,z1q; pe1,z1q,pe1,z1q,pe2,z1q; pe1,z1q,pe2,z1q;and pe2,z1q,pe2,z2q,pe1,z1q. only. Each event of the transformed instance is a pair of an event from the initial instance and a combination of variable values. Thus, this transformation would in- specification. We employ the model checker written by crease the number of events in 2|V| times, where |V| theauthorsof[34]andfurthermodifiedtomakeitout- is the number of variables. For large |V| such a trans- put minimum counterexamples to falsified formulae. If formation is expensive, but since in this work we deal the FSM’s Kripke structure does not comply with the with |V| ĺ 2, smarter handling of guard conditions is specification,weprohibitthecounterexamplesfoundby not considered. themodelcheckerusingadditionalBooleanconstraints Another FSM definition is the one from [38]. Its and thus enforce the SAT solver to find a different so- main difference with our one is the absence of actions, lution after it is restarted. but the problem stated in [38] assumes the optional Animportantoptimizationistousethecapabilities presence of negative scenarios – the ones with which of incremental solvers [15] instead of restarts. On each the identified FSM must not comply. iteration, only new constraints are fed to the running instance of the solver. This saves computation time, since the number of iterations can be large. 5 FSM identification methods Anapproachsimilartotheproposedone,butbased onstatemerginginsteadoftheSATproblem,wasintro- FourexactFSMidentificationmethodsarepresentedin duced in [38]. Another work [19] which applies related this paper. Our first method, the Iterative SAT-based ideas is devoted to LTL synthesis. approach,islargelybasedonaknownmethodofidenti- fyingFSMsfromtestscenariosonly[35]andtheideaof 5.1.1 Method of identifying FSMs from scenarios only iterative counterexample prohibition [38]. The second one, the QSAT-based method, uses the translation of We now shortly describe the method from [35]. In this the considered problem to QSAT and involves execut- method, test scenarios are merged into the scenario ingaQSATsolver.Instead,thethirdapproach,named tree. An example of such tree is shown in Fig. 3. De- the Exponential SAT-based one, executes a SAT solver notethesetoftreenodesasV .Twovariabletypesare sc on the expanded version of the quantified Boolean for- introduced in [35] (we slightly alter the notation from mula. Eventually, the fourth and the simplest Back- this work): trackingmethodisbasedneitheronSATnoronQSAT and performs a heuristic search with backtracking. We – x : whether node v P V of the scenario tree cor- v,i sc implementedthelastmethodtomakeitthebaselinein responds to state ip1 ĺ i ĺ |S|q of the FSM (v is itscomparisonwiththeothers.Theimplementationsof “colored” into color i); all the methods in Java are available online as a cross- – y : whether the transition from state i p1 ĺ i1,i2,e 1 platform software tool1 with a command-line interface. i ĺ |S|q triggered by event e P E leads to state 1 i p1ĺi ĺ|S|q, i.e. whether δpi ,eq“i . 2 2 1 2 Anumberofconstraintsenforcethepropercoloring 5.1 Iterative SAT-based solution of the scenario tree and the compliance of the FSM withthistree.Briefly,theseconstraintsensurethatthe The idea of the Iterative SAT-based solution is as fol- first state of the FSM is the initial one (i.e. x “ 1), lows. We iteratively execute the method of identifying 1,1 that exactly one color (FSM state) is assigned to each FSMs from test scenarios only, presented in [35], with node of the tree, that there is no pair of inconsistent several adjustments. After each iteration, we verify the nodes [23] with identical colors, that there is at most obtained FSM with model checking against the LTL one transition y in the FSM for each source state i1,i2,e 1 https://github.com/ulyantsev/EFSM-tools/ i1 and event e, and that y variables are consistent with 6 V.Ulyantsev,I. Buzhinsky,A. Shalyto the coloring of the tree. Denote the logical conjunction of all these constraints as S. e1 / z1 e1 / z2 e / z 1 1 x 5.1.2 Action constraints e / z e / z 2 2 2 2 In [35], output actions were not included into the SAT e / z 2 1 e / z e / z model, but were restored based on scenarios. In our 2 2 1 2 case, actions must be considered explicitly to facili- Fig. 4 An example of a negative scenario tree. Two tate counterexample prohibition (Section 5.1.3). First looping counterexamples are pe1,z1q,rpe1,z1q,pe1,z2qs we need to introduce an additional variable type for and pe2,z1q,pe2,z2q,rpe1,z2qs (cycles are denoted with output actions: square brackets), and the single finite counterexample is pe1,z1q,pe2,z2q,pe2,z2q. – z : whether the transition from state i triggered i,a,e by event e produces output action a, i.e. whether – Foreachnodeofthepositivescenariotreeandeach aPλpi,eq. event,therecannotbemorethanoneoutgoingedge. The constraint Z ensures the compliance of z vari- Otherwise, the tree would require the FSM to be ables with scenarios by stating that they do not con- nondeterministic.Thisrestrictiondoesnotapplyto tradict with each edge of the scenario tree. Let outpvq the negative tree: such a situation just means that be the set of outgoing edges from node v, then: more than one combination of actions is prohibited ¨ ˛ in a particular node for a particular event. ľ ľ|S| ľ – Generally, each counterexample to an LTL formula ˝ ‚ Z “ xv,i Ñ Mi,e,A , isaninfinitepath,andwithoutlossofgeneralitywe vPVsc i“1 # pe,A,v1qPoutpvq may assume that it is composed of a finite prefix ľ z , if aPA followed by a cycle [12]. Moreover, for some formu- i,a,e where Mi,e,A “ lae there are finite prefixes such that all possible (cid:32)z , if aRA. aPZ i,a,e infinite continuations of them are counterexamples, so we will regard such prefixes as counterexamples 5.1.3 Negative scenario tree themselves. A finite counterexample simply corre- spondstoapathfromtherootofthetreetotheend We introduce the concept of the negative scenario tree, node of the counterexample. To represent a looping which is used to represent counterexamples prohibited counterexample, afteraddingthe finiteprefix anda after each iteration of the method. To do this, we need singleoccurrenceofthecycle,abackedge isinserted onemoretypeofvariableswhichwillrepresentthecol- to link the end of the cycle with its beginning. ors of negative scenario tree nodes, the set of which we An example of a negative scenario tree is shown in denote as V : sc Fig.4.Itconsistsofthreecounterexamples:twolooping – x : whether node v P V of the negative scenario ones(backedgesareshownindashedlines)andafinite v,i sc treecorrespondstostateip1ĺiĺ|S|qoftheFSM. one (indicated with a cross inside its end node). Boolean constraints which specify the negative sce- Asinthecaseoftheordinary,positivescenariotree, nariotreearetotallydifferentfromtheonesoftheposi- thestructureoftheFSM,encodedinitsBooleanmodel, tivetree.First,propercoloringofthenegativescenario determines the mapping between tree nodes and FSM tree must be ensured. Its root (node 1) corresponds to states (this will be asserted below with Boolean con- the initial state of the FSM: straints).However,thereareseveraldifferencesbetween these types of trees: S1 “x1,1. – It is possible for negative scenario nodes to not cor- Then,negativenodecolorsarepropagatedalongthe respond to any of the FSM states. This is intuitive edges of the tree (excluding back edges) according to for terminal counterexample nodes, for which the the Boolean model of the FSM: ľ opposite situation would mean that the counterex- S “ px ^y ^M Ñx q. amplebelongstothesetofpossibleFSMbehaviors. 2 v,i1 i1,i2,e i1,e,A v1,i2 Some nodes of the tree, nevertheless, correspond to vPVsc FSM states: when a counterexample is added into pe,A,v1qPoutpvq the tree, some of its prefixes are still correct. 1ĺi1,i2ĺ|S| ExactFinite-StateMachineIdentificationfromScenariosandTemporalProperties 7 Similarly to the positive scenario tree, it is possi- 5.1.5 Symmetry breaking ble to ensure that exactly one color is assigned to each negative node, but these constraints can be shown to Inaddition,symmetry-breakingconstraints[36]areused be redundant. to speed up solver execution on unsatisfiable problem Next,eachaddedcounterexampleisassociatedwith instances. They ensure that the states of the FSM are its own constraint. The simplest case is adding finite traversed in the BFS order. We denote them as B. The counterexamples: their end nodes (denoted as “termi- use of B requires additional variables described in [36], nal”ones)areassertedtonotcorrespondtoanystates: but we omit them for simplicity. ľ ľ|S| S3 “ (cid:32)xv,i. 5.1.6 Assembled formula vPVsc:terminalpvq i“1 Finally, we assemble all the mentioned constraints into Notethatingeneralsuchterminalnodesmaystillhave the formula which is fed to the SAT solver: outgoing edges: this corresponds to situations when a shorter, more restrictive counterexample is added after Dtx , y ,z ,x u: S^Z ^S^C^B. (1) a longer one. v,i i1,i2,e i,a,e v,i Next,recallthataloopingcounterexampleisadded to the tree as a path consisting of the before-the-cycle The Iterative SAT-based solution is summarized in prefix,asingleoccurrenceofthecycle,andabackedge Algorithm 1. The function ModelCheck runs the model which links the end of the cycle to its beginning (see checker on the FSM and the LTL specification and re- Fig. 4). In general, such an end node may still have turns minimum counterexamples to falsified formulae, outgoing edges due to previously added counterexam- andSatSolverunsaSATsolver(intheIterativeSAT- ples. The respective constraints state that cycles are based solution, SAT solving is implemented incremen- forbidden: their start and end nodes (the ones linked tally, thus on each step only changes in the Boolean by back edges) cannot have identical colors, i.e. they formula are fed to the solver). SatSolve might fail and do not correspond to the same state of the FSM: return null. ľ ľ ľ|S| S “ (cid:32)px ^x q. Data:setof scenariosSC,temporalspecification LTL 4 v,i u,i f Ðgenerateformula(1) vPVsc u:backEdgepv,uq i“1 runaSATsolver intheincrementalmode whiletrue do Finally, the overall constraint on the negative sce- FSMÐ SatSolve(f) nario tree is denoted as S: if FSM = null thenreturn‘UNSATISFIABLE’ counterexamplesÐModelCheck(FSM, LTL) S “S1^S2^S3^S4. if counterexamples ‰∅thenupdate S withinf elsereturnFSM end 5.1.4 FSM completeness Algorithm 1: Iterative SAT-based solution. To simply resolve the FSM completeness issue, we add the completeness constraint which ensures that for ev- ery state i and every event e there exists a transition 1 to some state i : 2 ľ|S| ľ ł|S| 5.2 QSAT-based solution C “C “ y . @ i1,i2,e The QSAT-based solution employs BMC. Assume k is i1“1 ePE i2“1 the BMC boundary, that is, paths with the length of However, even when completeness is not required, we k ` 1 are checked for counterexamples. If we find a must ensure that there is at least one transition from wayofidentifyinganFSMwhichsatisfiesscenariosand each state of the FSM to prevent vague interpretations LTL properties with the boundary k, then we can iter- of LTL formulae, which are defined over infinite paths. ativelyincreasek untiltheFSMsatisfiestheproperties This can be done with a weaker constraint: intheunboundedsense(thiscanbecheckedwithmodel ľ|S| ł ł|S| checking).SuchkalwaysexistsaccordingtoTheorem1 C “C “ y . from[4],andthereasonswhythistheoremisapplicable D i1,i2,e i1“1 ePE i2“1 here will become evident soon. 8 V.Ulyantsev,I. Buzhinsky,A. Shalyto 5.2.1 Idea of the method withmodelchecking.IfthedesiredKripkestructureM exists, it will be found together with the corresponding InusualBMC,theKripkestructuretobemodel-checked FSM when k reaches k . 0 isassumedtobeknowninadvance.BMCcheckswhether there are no paths with length bounded with k in this 5.2.2 Kripke structure representation and correctness structure for which the negation of the LTL specifica- tion hold – such a path would be a counterexample We have not yet discussed the way M can be repre- for the specification, which must hold for every path in sentedwithBooleanvariablesandhowtheconstraint(2) the Kripke structure. But instead of querying the SAT can be expressed as a QSAT instance. The translation solver whether there exists such a path, with the help of the stated problem to QSAT is again based on the of a QSAT solver we can solve a quantified Boolean methodfrom[35].Ifweassumethatstate1istheinitial formula which states that each path in the model is stateoftheFSM,thenyandzvariablesaresufficientto not a counterexample. Furthermore, we can now as- define the Kripke structure. Thus, we will search both sumethattheKripkestructureisnotknowninadvance thescenariocoloringdeterminedbyxvariablesandthe and add the existential part of the formula, which de- informationsufficienttoconstructtheKripkestructure fines the structure to be identified, before the univer- of the FSM. We constrain all three types of variables salone,whichspecifiestheabsenceofcounterexamples. with S, Z, B and C (see Section 5.1). Theproofofthecorrectnessoftheoutlinedideaandits formal description are provided below. Recall the notations M (Af and M (Ef, which 5.2.3 Path representation and correctness state that f is satisfied either for all paths or for some path in M (see Section 3). Assume M is the Kripke We have just identified how to express M P M as a sc structure which models an FSM complying with sce- Booleanformula.Wenowmovetothewayofdefininga narios (denote the set of such models as M ), and for pathinM (recallthatitslengthisk`1).Weintroduce sc whichM (Ag,wheregistheLTLpropertywerequire the following variables for each position j p0 ĺ j ĺ kq from the FSM. If there are several such properties, as- of the path: sume that g is their logical conjunction. M ( Ag is – σ :thej-thpositionofthepathisatransitionfrom equivalent to (cid:32)pM ( E(cid:32)gq. Next, we need to utilize i,j state i of the FSM; two theorems from [4]: – ε : the j-th position of the path is a transition e,j – Theorem 1. M (Ef ôDk ľ0:M (k Ef, where triggered by event e; the symbol “(k” denotes property satisfiability in – ζa,j:thej-thpositionofthepathisatransitionwith the k-bounded sense. action a. – Theorem 2. There is a Boolean formula M,f k (defined in [4]), which is satisfiable if and(cid:74)only(cid:75)if Thus, each Boolean vector sj, introduced in the M ( Ef. end of Section 3, is composed of σi,j p1 ĺ i ĺ |S|q, k ε pe P Eq, and ζ pa P Zq. In fact, σ and ε vari- e,j a,j Theorem 1 implies: M ( E(cid:32)g ô Dk “ k0 ľ ables are sufficient to determine a path in the Kripke 0 : M (k E(cid:32)g. Thus, if we try k “ k0, we need to structure, since the action sequence of the transition in find M such that (cid:32)pM (k E(cid:32)gq. Then, according to the FSM can be uniquely determined from the source Theorem 2, M (k E(cid:32)g can be expressed as a Boolean state of the transition and its triggering event. Thus, ζ formula Dp M,f k, where p is a variable assignment variables are supplemental, but later they will become whichdeterm(cid:74)ines(cid:75)apathinM (possiblyaninvalidone, helpful to express atomic propositions. In Fig. 5, we seeclarificationsbelow),andf “(cid:32)g.Hence,wesearch show an example of a path in a Kripke structure with for M such that Dp M,f k is false. To find M, we the corresponding variable assignment. start from k “0 and i(cid:74)terati(cid:75)vely increase it by one. On Whichconstraintswouldensurethatanassignment eachiteration,wesolvethefollowingquantifiedBoolean of the introduced variables produces a correct path? formula: Wedenotethisconstraintas M (fromnowon,some k notations from [4] are employ(cid:74)ed)(cid:75): DM PM @p (cid:32) M,f . (2) sc k (cid:74) (cid:75) M “σ ^P ^P ^P ^Pk^P , where If the formula is unsatisfiable, then it is also unsat- k 1,0 σ ε y y z (cid:74) (cid:75) ¨¨ ˛ ˛ isfiableforgreaterk (thiscanbeinferredfromthedefi- ľk ł|S| ľ|S| ľ|S| ntuitrieonMofd(cid:74)oMes,fn(cid:75)okt[e4x])isatn.dOtthhuerswthisee,dewseirevderKifyripMke(strAucf- Pσ “j“0˝˝i“1σi,j‚^i1“1i2“i1`1(cid:32)pσi1,j ^σi2,jq‚, ExactFinite-StateMachineIdentificationfromScenariosandTemporalProperties 9 conditionofawitnessoff,thenegationofg.Morepre- j = 0 cisely, W states that there exists a path in the Kripke (1, e , z , 1) (2, e , z , 1) structure on which the negation f of the required LTL 1 1 1 1 formula g holds. While defining W, we will largely use the derivations from [4]. By L p0 ĺ (cid:96) ĺ kq we denote a Boolean formula (cid:96) k whichrequiresapathtobeapk,(cid:96)q-loop.Insuchapath, (1, e , z , 2) (2, e , z , 1) there exists a transition in the Kripke structure from 2 1 2 2 the last position k of the path to some position (cid:96). L j = 1 j = 2 (cid:96) k has the following form: Fig. 5 AnexampleofapathinaKripkestructureshownin ł Fig.2fork“2.Thefollowingpathvariablesaretrueforthis L “ σ ^ε ^σ ^y . path: σ ,ε ,ζ ,σ ,ε ,ζ ,σ ,ε ,ζ , and (cid:96) k i1,k e,k i2,(cid:96) i1,i2,e theothe1r,0onees1,0arezf1a,l0se.1,1 e2,1 z1,1 2,2 e2,2 z2,2 pi1,i2,eq Note that the looping edge is not included in the ¨˜ ¸ ˛ Boolean description of the path, and thus y is ľk ł ľ i1,i2,e Pε “ ˝ εe,j ^ (cid:32)pεe1,j ^εe2,jq‚, obligatory in the definition of (cid:96)Lk. An example of a loopingpathwithp2,0qandp2,1q-loopsistheoneshown j“0 ePE te1‰e2u in Fig. 5: its last state has transitions to the first two kľ´1 ľ ones. Next, L will denote theexistence of a pk,(cid:96)q-loop P “ pσ ^ε ^σ Ñy q, k y i1,j e,j i2,j`1 i1,i2,e for at least one (cid:96): j“0pi˜1,i2,eq ¸ ľ ł łk Pk “ σ ^ε Ñ y , L “ L . y i1,k e,k i1,i2,e k (cid:96) k pi1,eq i2 (cid:96)“0 ľk ľ Finally, the witness condition is expressed in the P “ pσ ^ε Ñpζ Øz qq. z i,j e,j a,j i,a,e following way: j“0pi,a,eq ` ˘ łk ` ˘ ThepathmuststartintheinitialstateoftheFSM, W “ (cid:32)L ^ f 0 _ L ^ f 0 , k k (cid:96) k (cid:96) k thereforeweneedσ1,0.TheconstraintsPσ andPεcheck (cid:74) (cid:75) (cid:96)“0 (cid:74) (cid:75) that each transition in the path starts in exactly one where f 0 and f 0 are formula “translations” – con- stateandistriggeredbyexactlyoneevent,respectively. k (cid:96) k straint(cid:74)s p(cid:75)roduced(cid:74)fr(cid:75)om the structure of f. The transla- Among several existing encodings of the at-most-one tions can be performed according to the rules defined constraint [24] in P and P , we have chosen the sim- σ ε in [25], but before this f must be transformed to the plest binomial encoding, since these constraints do not negation-normal form [25]: all negations must be prop- formasignificantportionofthefinalformula.Thecon- agated towards atomic propositions. Atomic proposi- straints P and Pk (the special case of P for j “ k) y y y tions encountered at position j of the path are trans- assert that the transitions in the path correspond to lated in the following simple way: y variables. Note that Pk is not required if the com- y pleteness constraint C is included. Finally, P defines – wasEventpeq“ε ; z e,j ζ variables, enforcing correct (corresponding to z vari- – wasActionpaq“ζ . a,j ables) actions in each state of the path. 5.2.5 Assembled formula 5.2.4 Absence of a witness of the formula’s negation We are now ready to assemble the complete quantified Bynow,theonlyremainingthingistoexpress M,f formula (2), which is further fed to a QSAT solver: k as a Boolean formula. The idea of M,f is t(cid:74)o chec(cid:75)k whether there exists a finite or loop(cid:74)ing p(cid:75)akth in M for Dtxv,i, yi1,i2,e, zi,a,eu: @tσi,j, εe,j, ζa,ju: (3) which f holds – its witness. Such path is also a coun- S^Z ^B^C^p(cid:32) M k_(cid:32)Wq. (cid:74) (cid:75) terexample for the original formula g. According to [4], Variablesx ,y ,andz definetheFSMbe- v,i i1,i2,e i,a,e ing identified and its Kripke structure. Constraints S, M,f “ M ^W, k k (cid:74) (cid:75) (cid:74) (cid:75) Z and B ensure that the assignment of these variables where the path correctness condition M has been isvalidanddefinesanFSMwithBFSsymmetrybreak- k discussed previously, and W expresses(cid:74) th(cid:75)e existence ingpredicates,andCguaranteesthecompletenessofthe 10 V.Ulyantsev,I. Buzhinsky,A. Shalyto synthesizedFSM(or,ifcompletenessisnotrequired,it Table 1 Several subformulae of the QSAT translation for just forbids states with no outgoing transitions). Next, |S| “ |E| “ |Z| “ 2, the scenario tree from Fig. 3 and eachassignmentofpathvariableseitherdoesnotdefine theLTLformulaGpwasActionpz2qÑXwasActionpz1qq.The FSM from Fig. 2 satisfies these data. For this particular ex- acorrectpath((cid:32) M )orisnotawitnessforf ((cid:32)W). k ample k “ 0 is sufficient, but k “ 1 is used instead to make Thepseudocod(cid:74)eo(cid:75)ftheQSAT-basedsolutionisshown theexamplenontrivial. in Algorithm 2. The function QSatSolve runs a QSAT Name Subformula solvertofindaproperFSM,anditreturnsnull incase Variables Dx1..9,1..2,y1..2,1..2,1..2,z1..2,1..2,1..2 of the unsatisfiability of the formula. The differences @ε1..2,0..1,σ1..2,0..1,ζ1..2,0..1 between the QSAT-based and the Iterative SAT-based Pσ pσ1,0_σ2,0q^(cid:32)pσ1,0^σ2,0q^ solutions are also stressed in Fig. 6. In addition, a par- pσ1,1_σ2,1q^(cid:32)pσ1,1^σ2,1q tial example of a QSAT translation is given in Table 1. Pε pε1,0_ε2,0q^(cid:32)pε1,0^ε2,0q^ pε1,1_ε2,1q^(cid:32)pε1,1^ε2,1q Py pσ1,0^ε1,0^σ1,1Ñy1,1,1q^ Data:setofscenarios SC,temporalspecificationLTL pσ1,0^ε2,0^σ1,1Ñy1,1,2q^ kÐ0 pσ1,0^ε1,0^σ2,1Ñy1,2,1q^ whiletrue do pσ1,0^ε2,0^σ2,1Ñy1,2,2q^ fÐgenerateformula(3),FSMÐQSatSolve(f) pσ2,0^ε1,0^σ1,1Ñy2,1,1q^ if FSM = null thenreturn‘UNSATISFIABLE’ pσ2,0^ε2,0^σ1,1Ñy2,1,2q^ else if ModelCheck(FSM, LTL)“∅then pσ2,0^ε1,0^σ2,1Ñy2,2,1q^... returnFSM Pz pσ1,0^ε1,0Ñpζ1,0Øz1,1,1qq^ end pσ1,0^ε2,0Ñpζ1,0Øz1,1,2qq^ elsekÐk`1 pσ1,0^ε1,0Ñpζ2,0Øz1,2,1qq^ end pσ1,0^ε2,0Ñpζ2,0Øz1,2,2qq^ pσ2,0^ε1,0Ñpζ1,0Øz2,1,1qq^ Algorithm 2: QSAT-based solution. pσ2,0^ε2,0Ñpζ1,0Øz2,1,2qq^... 0L1 pσ1,1^ε1,1^σ1,0^y1,1,1q_ pσ1,1^ε2,1^σ1,0^y1,1,2q_ pσ1,1^ε1,1^σ2,0^y1,2,1q_ pσ1,1^ε2,1^σ2,0^y1,2,2q_ 5.3 Exponential SAT-based solution pσ2,1^ε1,1^σ1,0^y2,1,1q_ pσ2,1^ε2,1^σ1,0^y2,1,2q_ pσ2,1^ε1,1^σ2,0^y2,2,1q_ Any QSAT instance can be transformed to a SAT in- pσ2,1^ε2,1^σ2,0^y2,2,2q stance by eliminating every universal quantifier: each f 01 pζ2,0^(cid:32)ζ1,1q_pζ2,1^falseq fsourbmscurliapt@xexfpirsescsoinovnesrtaefdtetrovfe|rxt:i“c0al^lifn|exs:“d1,enwohteerevatrhie- 10(cid:74)(cid:74)ff(cid:75)(cid:75)0k01 ppζζ22,,00^^(cid:32)(cid:32)ζζ11,,11qq__ppζζ22,,11^^(cid:32)(cid:32)ζζ11,,10qq (cid:74) (cid:75) able assignments. If the formula contains q universally quantified variables, then this procedure can bloat its substituted concrete values of σ ,ε , and ζ into i,j e,j a,j size in up to 2q times. We take this approach in an op- the final Boolean formula. timized form and feed the following constraint to the SAT solver: 5.4 Backtracking solution Dtx , y , z u: ľ` v,i i1,i2,e i,a,e˘(cid:12) S^Z ^B^C^ (cid:32)P _(cid:32)Pk_(cid:32)W (cid:12) , The solution based on backtracking is the baseline one y y (cid:12) t anddoesnotinvolveSATorQSATsolverexecution.A tPX where T “ttσ ,ε ,ζ u|σ ^P ^P ^P u. recursive procedure iterates over various (possibly in- i,j e,j a,j 1,0 σ ε z complete) FSMs, starting from the FSM with no tran- First, constraints S, Z, B and C do not depend on sitions. It maintains the current set of edges of the sce- path variables and thus are included into the formula nariotreewhichcannotyetbepassedbytheFSMdue only once. Then we iterate over the set T of all valid to the absence of transitions – the frontier (see Fig. 7 pathvariableassignments–theonesforwhichσ ,P , for an example). 1,0 σ P , and P hold. It is important to mention that while Ifthefrontierisnotempty,thentheproceduretries ε z σ and ε variables are assigned to constants (improper augmenting the FSM with one of its edges. Each new assignmentsarefilteredoutbyσ ^P ^P ),eachζ FSMAischeckedforcompliancewiththescenariotree, 1,0 σ ε a,j is assigned to the corresponding z variable, which andifitcomplieswithit,thenthenewfrontierisfound. i,a,e is uniquely determined from the P constraint based Moreover, A is verified and thus again can be rejected. z on σ and ε values. For each path variable assignment, The rationale behind verifying intermediate FSMs is we include the remaining part of the constraint with as follows. If A is incomplete, then the set of paths in