Equisatisfiable SAT Encodings of Arithmetical Operations M zur Erlangung des akademischen Grades Master of Science im Studiengang Medieninformatik der Fakultät Informatik, Mathematik und Naturwissenschaften der Hochschule für Technik, Wirtschaft und Kultur (HTWK) Leipzig vorgelegt von Martin Finke Leipzig, den 18. November 2015 Erstgutachter: Prof. Dr. rer. nat. Johannes Waldmann Zweitgutachter: Alexander Bau Eidesstattliche Versicherung Ich erkläre hiermit, dass ich diese Masterarbeit selbstständig ohne Hilfe Dritter und ohne Benutzung anderer als der angegebenen Quellen und Hilfsmittel ver- fasst habe. Alle den benutzten Quellen wörtlich oder sinngemäß entnommenen Stellen sind als solche einzeln kenntlich gemacht. Diese Arbeit ist bislang keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröffentlicht worden. Ich bin mir bewusst, dass eine falsche Erklärung rechtliche Folgen haben wird. Leipzig, den 18. November 2015 Martin Finke Abstract In Constraint Programming, problems containing both Boolean and arithmeti- cal constraints are common in areas such as scheduling and software verifica- tion. To solve them using a SAT solver, the arithmetical constraints must be encoded as propositional formulas in conjunctive normal form (CNF). One way is to Tseitin-transform logic circuits, adding an extra variable for each gate. This thesis shows two CNF encoding methods that add only a small number of extra variables. One replaces sub-formulas by extra variables (similar to the Tseitin transform), the other is syntax-independent and does not constrain the semantics of the extra variables. The methods are used to encode the operations +,·,> and ≥ on natural numbers with up to 7 bits. The resulting CNFs are up to 86% smaller than a Tseitin encoding and up to 84% smaller than a minimal CNF without extra variables. However, using them to encode an example problem results in an increased SAT solver time. Table of Contents Abstract 1 Introduction 1 2 Background 4 2.1 Boolean Satisfiability (SAT) . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Formula Syntax . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Formula Semantics . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Satisfiability Modulo Theories (SMT) . . . . . . . . . . . . . . . . 12 2.3 Bit Blasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Addition and Multiplication Circuits . . . . . . . . . . . . 14 2.3.2 Handling Integer Overflow . . . . . . . . . . . . . . . . . . 16 2.3.3 Binary Relations . . . . . . . . . . . . . . . . . . . . . . . 18 3 Requirements 19 3.1 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Encoding Arithmetical Operations . . . . . . . . . . . . . 19 3.1.2 Minimizing the Size of Formulas . . . . . . . . . . . . . . . 20 3.1.3 Integrating Results for Evaluation . . . . . . . . . . . . . 21 3.2 Runtime Performance . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Quality Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4 Existing CNF Encoding Methods 23 4.1 Conversion to the Equivalent Canonical CNF . . . . . . . . . . . 23 4.1.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Conversion to an Equisatisfiable CNF . . . . . . . . . . . . . . . 25 4.2.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.2 Implementation: The Tseitin Transform . . . . . . . . . . 27 4.2.3 Improvements on the Tseitin Transform . . . . . . . . . . 29 4.3 CNF Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.2.1 Finding the Set of Primes . . . . . . . . . . . . . 31 4.3.2.2 Finding a Minimum Subset of Primes . . . . . . 33 4.3.3 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Syntactic Method 35 5.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 Selecting Sub-Formulas . . . . . . . . . . . . . . . . . . . . 39 5.2.2 Replacing Sub-Formulas . . . . . . . . . . . . . . . . . . . 39 5.2.3 Conversion to a CNF and Minimization . . . . . . . . . . 41 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Semantic Method 43 6.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3 Removing Redundant Clauses . . . . . . . . . . . . . . . . . . . . 47 6.3.1 Illegal Clauses . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3.2 Non-minimal Clauses . . . . . . . . . . . . . . . . . . . . . 48 6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.5 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7 Results 53 7.1 Formula Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.2 SAT Solver Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 8 Conclusions and Future Work 57 Software Overview 59 List of Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Usage Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Glossary 63 References 65 1 Introduction Constraint programming is a type of declarative programming where a problem is expressed as a constraint program, which is a formula in predicate or proposi- tional logic. The Boolean Satisfiability Problem (SAT) is the problem of finding an assignment to the variables in a Boolean formula such that the formula is satisfied. A SAT solver is a program that solvesthis problem for a givenformula. Many modern SAT solvers require the formula to be in conjunctive normal form (CNF).1 Inthelasttwodecades, SATsolverperformancehasincreasedsignificantly[3, 4]. To use a SAT solver to solve a problem with arithmetical constraints, they have to be encoded in propositional logic. This thesis examines different methods to translate logic circuits for +,·,> and ≥ to a CNF. Since SAT solver time depends on formula size, the aim is to find CNFs with a minimum number of literals. To this end, extra variables may be introduced, as long as the CNF is equisatisfiable2 to the circuit. Converting a formula to a CNF without adding extra variables incurs an expo- nential growth in the worst case. Example 1.1 (CNF Without Extra Variables): Consider the formula F = v ⊕v ⊕v ⊕v . The ⊕ operator is the exclusive 0 1 2 3 or (XOR). A minimal equivalent CNF is: (¬v ∨¬v ∨¬v ∨¬v )∧(¬v ∨¬v ∨v ∨ 0 1 2 3 0 1 2 v )∧(¬v ∨v ∨v ∨¬v )∧(v ∨¬v ∨¬v ∨v )∧(v ∨v ∨¬v ∨¬v )∧ 3 0 1 2 3 0 1 2 3 0 1 2 3 (¬v ∨ v ∨ ¬v ∨ v ) ∧ (v ∨ ¬v ∨ v ∨ ¬v ) ∧ (v ∨ v ∨ v ∨ v ). It has 8 0 1 2 3 0 1 2 3 0 1 2 3 1This includes all solvers based on the Davis–Putnam–Logemann–Loveland (DPLL) algorithm [1, 2]. 2Equisatisfiability is formally introduced in Def. 4.9. 1 clauses and 32 literals.3 The Tseitin transform (described in Subsec. 4.2.2) introduces an extra variable for every binary operator and accomplishes a linear increase in size. Example 1.2 (CNF from the Tseitin Transform): Converting the formula F from Ex. 1.1 to an equisatisfiable CNF using the Tseitin transform introduces 3 extra variables (v ,v ,v ) and results in the 4 5 6 CNF: v ∧(v ∨v ∨¬v )∧(v ∨¬v ∨v )∧(¬v ∨v ∨v )∧(¬v ∨¬v ∨ 6 2 3 4 2 3 4 2 3 4 2 3 ¬v )∧(v ∨v ∨¬v )∧(v ∨¬v ∨v )∧(¬v ∨v ∨v )∧(¬v ∨¬v ∨¬v )∧ 4 1 4 5 1 4 5 1 4 5 1 4 5 (v ∨v ∨¬v )∧(v ∨¬v ∨v )∧(¬v ∨v ∨v )∧(¬v ∨¬v ∨¬v ). It has 13 0 5 6 0 5 6 0 5 6 0 5 6 clauses and 37 literals. F is small, so in this case the linear growth is higher than the exponential one. Adding a small number of extra variables can result in a smaller CNF than not adding any extra variables or performing the Tseitin transform. The following example shows this. Example 1.3 (CNF with One Extra Variable): Introducing one extra variable t with the semantics t ⇔ (v ⊕v ) results in 0 3 the CNF: (¬v ∨¬v ∨t)∧(¬v ∨v ∨¬t)∧(¬v ∨¬v ∨¬t)∧(¬v ∨v ∨ 1 2 1 2 0 3 0 3 t)∧(v ∨¬v ∨t)∧(v ∨v ∨¬t)∧(v ∨¬v ∨¬t)∧(v ∨v ∨t). It has 8 0 3 0 3 1 2 1 2 clauses and 24 literals. The research questions are: Can the size of CNF encodings for arithmetical operations be reduced by introducing extra variables, compared to no extra variables or using the Tseitin transform? And if the encodings are smaller, does SAT solver time decrease when they are used? This thesis shows two algorithms to find small CNF encodings with a given number of extra variables. One replaces selected sub-formulas with extra vari- ables, the other is based on the Boolean function realized by the formula. If the smaller CNFs indeed decrease SAT solver times, they can be hardcoded and 3Moregenerally,forn>0,aminimalequivalentCNFencodingofanXORovernvariableshas 2n−1 clauses,andeachclausecontainsnliterals. Theintuitionisthecheckerboardpatternon the Karnaugh map: for each zero, one clause is needed to cover it, and not cover anything else. 2 reused in constraint solvers such as Satchmo [5]. Ch. 2 formally introduces propositional logic and the Boolean Satisfiability Problem (SAT). Furthermore, it describes how arithmetical operations can be encoded as propositional formulas to be used in a SAT encoding. Ch. 3 specifies the requirements for the developed software and how the quality of the resulting formulas is evaluated. Ch. 4 shows two existing methods for conversion of propositional formulas to conjunctive normal form. One (Sec. 4.1) is based on the truth table realized by a formula (similar to Ex. 1.1), the other (Sec. 4.2) is the Tseitin transform (as in Ex. 1.2). Furthermore, CNF minimization is covered. The following two chapters show two methods to find a small CNF for an input formula F and a number k of allowed extra variables. The syntactic method (Ch. 5) is to replace k sub-formulas of F by extra variables, choosing the set of sub-formulas that results in the smallest CNF. The semantic method (Ch. 6) expresses the problem as a series of constraint satisfaction problems and uses a SAT solver to find a solution. Its output is a minimal CNF independent of the syntax of F. In Ch. 7, the results are evaluated with regards to the requirements from Ch. 3. The formula sizes are compared and SAT solver time of a termination problem is measured, comparing different encodings. Ch. 8 concludes the text by pointing out possible improvements and areas for future work. 3 2 Background 2.1 Boolean Satisfiability (SAT) This section introduces the syntax and semantics of propositional logic used in this thesis, and the Boolean Satisfiability Problem (SAT). 2.1.1 F S The fundamental components in propositional logic are Booleans and variables. Definition 2.1 (Set of Booleans): The set of Booleans is B = {True,False}. Definition 2.2 (Set of Variables): The set of variables is denoted as V = {v ,v ,v ,...}. 0 1 2 These components can be combined using logical operators. Definition 2.3 (Logical Operators): The set of operators is {¬,∧,∨,⊕,→,⇔}. They have the following mean- ings: • ¬ is the logical complement (negation, not) • ∧ is the logical conjunction (and) • ∨ is the logical disjunction (or) • ⊕ is the exclusive or (xor) • → is the logical implication • ⇔ is the logical equivalence 4 The complement (¬) is unary. The logical implication (→) is binary: the left side is the premise, the right side is the conclusion. The other operators (∧,∨,⊕ and ⇔) are applied to a finite set of operands (which may be empty). In this text, a formula is a formula in propositional logic. Formulas are created by combining Booleans, variables and operators as follows. Definition 2.4 (Formula): A formula is an expression that has one of the following forms: • v, where v ∈ V • b, where b ∈ B • ¬F, where F is a formula • F → F , where F and F are formulas prem conc prem conc ∧ ∨ ⊕ ⇔ • Fs or Fs or Fs or Fs, where Fs is a finite set of formulas. For n ≥ 2, a formula ∧{F ,...,F } can be written as F ∧ ... ∧ F . 1 n 1 n The same applies analogously to the operators ∨,⊕ and ⇔. The set of formulas is denoted as F. Example 2.5 (Formula): The expression v ∨¬¬(¬False ⊕v ) is a formula. 1 4 (3+5)∨v is not a formula. Neither is (∀v ∈ V : ¬v). 0 Formulascanbeparenthesizedtoavoidambiguity. Toeasereading,theformulas ∧{F},∨{F} and ⊕{F} can be simplified to F. For a given formula, the set of all variables in it can be determined as follows. Definition 2.6 (Variable Set of a Formula): For a formula F, the variable set V is defined as: F ∅{F,}if, Fif F∈ B∈ V V = V , if F has the form ¬F′ F ∪VF{′ VF∪′ |VF′ ∈,Fisf}F, ihfaFs hthaes ftohremfoFrm ∧→FsF,∨Fs,⊕Fs, or ⇔Fs Fprem Fconc prem conc 5
Description: