Tree Automata Techniques and Applications Hubert Comon Max Dauchet Re´mi Gilleron Florent Jacquemard Denis Lugiez Christof Lo¨ding Sophie Tison Marc Tommasi Contents Errata 9 Introduction 11 Preliminaries 15 1 Recognizable Tree Languages and Finite Tree Automata 19 1.1 Finite Tree Automata . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2 The Pumping Lemma for Recognizable Tree Languages . . . . . 28 1.3 Closure Properties of Recognizable Tree Languages . . . . . . . . 29 1.4 Tree Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . 31 1.5 Minimizing Tree Automata . . . . . . . . . . . . . . . . . . . . . 35 1.6 Top Down Tree Automata . . . . . . . . . . . . . . . . . . . . . . 38 1.7 Decision Problems and their Complexity . . . . . . . . . . . . . . 39 1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 1.9 Bibliographic Notes. . . . . . . . . . . . . . . . . . . . . . . . . . 47 2 Regular Grammars and Regular Expressions 51 2.1 Tree Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.1.2 Regularity and Recognizabilty . . . . . . . . . . . . . . . 54 2.2 Regular Expressions. Kleene’s Theorem for Tree Languages . . . 54 2.2.1 Substitution and Iteration . . . . . . . . . . . . . . . . . . 55 2.2.2 Regular Expressions and Regular Tree Languages . . . . . 57 2.3 Regular Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.4 Context-free Word Languages and Regular Tree Languages . . . 63 2.5 Beyond Regular Tree Languages: Context-free Tree Languages . 66 2.5.1 Context-free Tree Languages . . . . . . . . . . . . . . . . 66 2.5.2 IO and OI Tree Grammars . . . . . . . . . . . . . . . . . 67 2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.7 Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3 Logic, Automata and Relations 73 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2 Automata on Tuples of Finite Trees . . . . . . . . . . . . . . . . 75 3.2.1 Three Notions of Recognizability . . . . . . . . . . . . . . 75 3.2.2 Examples of The Three Notions of Recognizability . . . . 77 3.2.3 Comparisons Between the Three Classes . . . . . . . . . . 78 TATA — November 18, 2008 — 4 CONTENTS 3.2.4 ClosurePropertiesforRec andRec;Cylindrificationand × Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.2.5 Closure of GTT by Composition and Iteration . . . . . . 82 3.3 The Logic WSkS . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3.4 Restricting the Syntax . . . . . . . . . . . . . . . . . . . . 89 3.3.5 Definable Sets are Recognizable Sets . . . . . . . . . . . . 90 3.3.6 Recognizable Sets are Definable . . . . . . . . . . . . . . . 94 3.3.7 Complexity Issues . . . . . . . . . . . . . . . . . . . . . . 95 3.3.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.4 Examples of Applications . . . . . . . . . . . . . . . . . . . . . . 96 3.4.1 Terms and Sorts . . . . . . . . . . . . . . . . . . . . . . . 96 3.4.2 The Encompassment Theory for Linear Terms . . . . . . 97 3.4.3 TheFirst-orderTheoryofaReductionRelation: theCase Where no Variables are Shared . . . . . . . . . . . . . . . 99 3.4.4 Reduction Strategies . . . . . . . . . . . . . . . . . . . . . 100 3.4.5 Application to Rigid E-unification . . . . . . . . . . . . . 102 3.4.6 Application to Higher-order Matching . . . . . . . . . . . 103 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.6 Bibliographic Notes. . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.6.1 GTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.6.2 Automata and Logic . . . . . . . . . . . . . . . . . . . . . 109 3.6.3 Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.6.4 Applications of tree automata to constraint solving . . . . 109 3.6.5 Application of tree automata to semantic unification . . . 110 3.6.6 Applicationoftreeautomatatodecisionproblemsinterm rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.6.7 Other applications . . . . . . . . . . . . . . . . . . . . . . 111 4 Automata with Constraints 113 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2 Automata with Equality and Disequality Constraints . . . . . . . 114 4.2.1 The Most General Class . . . . . . . . . . . . . . . . . . . 114 4.2.2 Reducing Non-determinism and Closure Properties . . . . 117 4.2.3 Decision Problems . . . . . . . . . . . . . . . . . . . . . . 120 4.3 Automata with Constraints Between Brothers . . . . . . . . . . . 122 4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.2 Closure Properties . . . . . . . . . . . . . . . . . . . . . . 122 4.3.3 Emptiness Decision. . . . . . . . . . . . . . . . . . . . . . 122 4.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4 Reduction Automata . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.4.2 Closure Properties . . . . . . . . . . . . . . . . . . . . . . 126 4.4.3 Emptiness Decision. . . . . . . . . . . . . . . . . . . . . . 127 4.4.4 Finiteness Decision . . . . . . . . . . . . . . . . . . . . . . 132 4.4.5 Term Rewriting Systems . . . . . . . . . . . . . . . . . . . 132 4.4.6 Application to the Reducibility Theory . . . . . . . . . . 133 4.5 Other Decidable Subclasses . . . . . . . . . . . . . . . . . . . . . 133 TATA — November 18, 2008 — CONTENTS 5 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.7 Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5 Tree Set Automata 137 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 142 5.2.1 Generalized Tree Sets . . . . . . . . . . . . . . . . . . . . 142 5.2.2 Tree Set Automata . . . . . . . . . . . . . . . . . . . . . . 142 5.2.3 Hierarchy of GTSA-recognizable Languages . . . . . . . . 145 5.2.4 Regular Generalized Tree Sets, Regular Runs . . . . . . . 146 5.3 Closure and Decision Properties. . . . . . . . . . . . . . . . . . . 149 5.3.1 Closure properties . . . . . . . . . . . . . . . . . . . . . . 149 5.3.2 Emptiness Property . . . . . . . . . . . . . . . . . . . . . 152 5.3.3 Other Decision Results. . . . . . . . . . . . . . . . . . . . 154 5.4 Applications to Set Constraints . . . . . . . . . . . . . . . . . . . 155 5.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 155 5.4.2 Set Constraints and Automata . . . . . . . . . . . . . . . 155 5.4.3 Decidability Results for Set Constraints . . . . . . . . . . 156 5.5 Bibliographical Notes. . . . . . . . . . . . . . . . . . . . . . . . . 158 6 Tree Transducers 161 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.2 The Word Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.2.1 Introduction to Rational Transducers . . . . . . . . . . . 162 6.2.2 The Homomorphic Approach . . . . . . . . . . . . . . . . 166 6.3 Introduction to Tree Transducers . . . . . . . . . . . . . . . . . . 167 6.4 Properties of Tree Transducers . . . . . . . . . . . . . . . . . . . 171 6.4.1 Bottom-up Tree Transducers . . . . . . . . . . . . . . . . 171 6.4.2 Top-down Tree Transducers . . . . . . . . . . . . . . . . . 174 6.4.3 Structural Properties . . . . . . . . . . . . . . . . . . . . . 176 6.4.4 Complexity Properties . . . . . . . . . . . . . . . . . . . . 177 6.5 Homomorphisms and Tree Transducers . . . . . . . . . . . . . . . 177 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6.7 Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7 Alternating Tree Automata 183 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 183 7.2.1 Alternating Word Automata . . . . . . . . . . . . . . . . 183 7.2.2 Alternating Tree Automata . . . . . . . . . . . . . . . . . 185 7.2.3 Tree Automata versus Alternating Word Automata. . . . 187 7.3 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.4 From Alternating to Deterministic Automata . . . . . . . . . . . 188 7.5 Decision Problems and Complexity Issues . . . . . . . . . . . . . 189 7.6 Horn Logic, Set Constraints and Alternating Automata . . . . . 189 7.6.1 The Clausal Formalism . . . . . . . . . . . . . . . . . . . 189 7.6.2 The Set Constraints Formalism . . . . . . . . . . . . . . . 190 7.6.3 Two Way Alternating Tree Automata . . . . . . . . . . . 191 7.6.4 Two Way Automata and Definite Set Constraints. . . . . 193 7.6.5 Two Way Automata and Pushdown Automata . . . . . . 195 TATA — November 18, 2008 — 6 CONTENTS 7.7 An (other) example of application . . . . . . . . . . . . . . . . . 195 7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 7.9 Bibliographic Notes. . . . . . . . . . . . . . . . . . . . . . . . . . 196 8 Automata for Unranked Trees 199 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 200 8.2.1 Unranked Trees and Hedges . . . . . . . . . . . . . . . . . 200 8.2.2 Hedge Automata . . . . . . . . . . . . . . . . . . . . . . . 201 8.2.3 Deterministic Automata . . . . . . . . . . . . . . . . . . . 205 8.3 Encodings and Closure Properties . . . . . . . . . . . . . . . . . 206 8.3.1 First-Child-Next-Sibling Encoding . . . . . . . . . . . . . 207 8.3.2 Extension Operator . . . . . . . . . . . . . . . . . . . . . 210 8.3.3 Closure Properties . . . . . . . . . . . . . . . . . . . . . . 211 8.4 Weak Monadic Second Order Logic . . . . . . . . . . . . . . . . . 212 8.5 Decision Problems and Complexity . . . . . . . . . . . . . . . . . 214 8.5.1 Representations of Horizontal Languages. . . . . . . . . . 214 8.5.2 Determinism and Completeness . . . . . . . . . . . . . . . 216 8.5.3 Membership . . . . . . . . . . . . . . . . . . . . . . . . . . 216 8.5.4 Emptiness . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8.5.5 Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 8.6 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 8.6.1 Minimizing the Number of States . . . . . . . . . . . . . . 221 8.6.2 Problems for Minimizing the Whole Representation . . . 223 8.6.3 Stepwise automata . . . . . . . . . . . . . . . . . . . . . . 224 8.7 XML Schema Languages . . . . . . . . . . . . . . . . . . . . . . . 229 8.7.1 Document Type Definition (DTD) . . . . . . . . . . . . . 232 8.7.2 XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . 236 8.7.3 Relax NG . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 8.9 Bibliographic Notes. . . . . . . . . . . . . . . . . . . . . . . . . . 242 Bibliography 245 Index 259 TATA — November 18, 2008 — CONTENTS 7 Acknowledgments Many people gave substantial suggestions to improve the contents of this book. Theseare,inalphabeticorder,WitoldCharatonik,ZoltanFu¨l¨op,Werner Kuich, Markus Lohrey, Jun Matsuda, Aart Middeldorp, Hitoshi Ohsaki, P. K. Manivannan, Masahiko Sakai, Helmut Seidl, Stephan Tobies, Ralf Treinen, Thomas Uribe, Sandor Va´gv¨olgyi, Kumar Neeraj Verma, Toshiyuki Yamada. TATA — November 18, 2008 — 8 CONTENTS TATA — November 18, 2008 — Errata This is a list of some errors in the current release of TATA. For the next re- lease these errors will be corrected. The list can also be found separately on http://tata.gforge.inria.fr/. Chapter 8 1. Page 218: In the proof of Theorem 8.5.9(1) it is stated that DFHA(DFA) can easily be completed. This is not true. To complete a DFHA(DFA) one has to construct the complement of the union of several horizon- tal languages given by DFAs: Assume that there are several transitions a(R )→q ,...,a(R )→q for letter a. To complete the automaton one 1 1 n n hastoaddatransitiona(R)→q withRthecomplementofR ∪···∪R ⊥ 1 n andq arejectingsinkstate. BuildinganautomatonforR canbeexpen- ⊥ sive. Theproofof(2)isnotaffectedbecausetheconstructionofaDFHA(DFA) from an NFHA(NFA) yields a complete DFHA(DFA), and for complete DFHA(DFA) Theorem 8.5.9(1) holds. The statement in Theorem 8.5.9(1) remains true if the DFHA(DFA) is normalized,i.e.,foreachaandq thereisatmostonetransitiona(R)→q. Inthiscasetheinclusionproblemcanbereducedtotheinclusionproblem for unambiguous ranked tree automata; see Theorem 5 of W. Martens and J. Niehren. On the Minimization of XML Schemas and Tree Automata for Unranked Trees. Full version of DBPL 2005 paper. Journal of Computer and System Sciences (JCSS), 73(4), pp. 550-583, 2007. 2. Page 223, Figure 8.8: A transition with label q is missing from state C bb 0 to state C . 2 3. Page 225, Figure 8.10, left-hand side: A transition with label BB is miss- ing from state C to state BB. 0 TATA — November 18, 2008 — 10 Errata TATA — November 18, 2008 —