Construction of rational expression from tree automata using a generalization of Arden’s Lemma Younes Guellouma1,3, Ludovic Mignot2, Hadda Cherroun1,3, and Djelloul Ziadi2,3 1 Laboratoire LIM, UniversitéAmar Telidji, Laghouat, Algérie {y.guellouma,hadda_cherroun}@mail.lagh-univ.dz 2 LITIS,Universitéde Rouen,76801 Saint-Étiennedu RouvrayCedex, France {ludovic.mignot,djelloul.ziadi}@univ-rouen.fr 3 supported bythe MESRS - Algeria underProject 8/U03/7015. Abstract. Arden’sLemmaisaclassical resultin languagetheoryallowing thecomputation ofarational expression denotingthelanguage recognized byafinitestringautomaton.Inthispaperwegeneralize this important lemma to the rational tree languages. Moreover, we propose also a construction of a rational 5 treeexpression which denotes theaccepted tree language of a finitetree automaton. 1 0 2 Keywords: Tree automata theory, Arden’s lemma, Rational expression. n a J 1 Introduction 0 3 Trees are natural structures used in many fields in computer sciences like XML [15], indexing, natural ] language processing, code generation for compilers, term rewriting [6], cryptography [7] etc. This large L use of this structure leads to concider the theoretical basics of a such notion. F . Infact, inmany cases,the problem of trees blow-up causes difficulties of storage and representation s c of this large amount of data. To outcome this problem, many solutions persist. Among them, the use [ of tree automata and rational tree expressions as compact and finite structures that recognize and 1 represent infinite tree sets. v As a part of the formal language theory, trees are considered as a generalization of strings. Indeed 6 8 in the late of 1960s [3,10], many researches generalize strings to trees and many notions appeared like 6 tree languages, tree automata, rational tree expressions, tree grammars, etc. 7 Since tree automata are beneficial in an acceptance point of view and the rational expressions in a 0 . descriptive one, an equivalence between the two representations must be resolved. Fortunately, Kleene 1 0 result [14] states this equivalence between the accepted language of tree automata and the language 5 denoted by rational expressions. 1 Kleene theorem proves that the setof languages denoted by all rational expressions over the ranked : v alphabet Σ noted Rat(Σ) and the set of all recognized languages over Σ noted Rec(Σ) are equivalent. i X This can be checked also by verifying the two inclusions Rat(Σ) ⊆ Rec(Σ) and Rec(Σ) ⊆ Rat(Σ′) r where Σ ⊆ Σ′. In other words, any tree language is recognized by some automaton if and only if it is a denoted by some rational expression. Thus two constructions can be pulled up. From a rational expression to tree automata, several techniques exist. First, Kuske et Meinecke [8] generalizethenotionoflanguagespartialderivation[1]fromstringstotreesandproposeatreeequation automaton which is constructed from a derivation of a linearized version of rational expressions. They use the ZPC structure [4] to reach best complexity. After that, Mignot et al. [11] propose an efficient algorithm to compute this generalized tree equation automata. Next, Laugerotte et al. [9] generalize position automata to trees. Finally, the morphic links between these constructions have been defined in [12]. In this paper, we propose a construction of the second way of Kleene Theorem, the passage from a tree automaton to its rational tree expression. For this reason we propose a generalization of Arden’s Lemma for strings to trees. The complexity of a such construction is exponential. Section 2 recalls some preliminaries and basic properties. We generalize the notion of equation system in Section 3. Next the generalization of Arden’s lemma to trees and its proof is given in Section 4, leading to the computation of some solutions for particular recursive systems. Finally, we show how to compute a rational expression denoting the language recognized by a tree automaton in Section 5. 2 Preliminaries and Basic Properties Let Σ = Σ be a graded alphabet. A tree t over Σ is inductively defined by t = f(t ,...,t ) n≥0 n 1 n with f ∈ Σ and t ,...,t any n trees over Σ. A tree language is a subset of T(Σ). The subtrees set n 1 n S n St(t) of a tree t = f(t ,...,t ) is defined by St(t) = {t}∪ St(t ). This set is extended to tree 1 n k=1 k languages, and the subtrees set St(L) of a tree language L ⊂ T(Σ) is St(L) = St(t). The height S t∈L of a tree t in T(Σ) is defined inductively by Height(f(t ,...,t )) = 1+max{Height(t ) | 1 ≤ i ≤ n} 1 n i S where f is a symbol in Σ and t ,...,t are any n trees over Σ. n 1 n A finite tree automaton (FTA) over Σ is a 4-tuple A = (Σ,Q,Q ,∆) where Q is a finite set of f states, Q ⊂ Q is the set of final states and ∆ ⊂ Σ ×Qn+1 is a finite set of transitions. The f n≥0 n output of A, noted δ, is a function from T(Σ) to 2Q inductively defined for any tree t = f(t ,...,t ) S 1 n by δ(t) = {q ∈ Q | ∃(f,q ,...,q ,q) ∈ ∆,(∀1 ≤ i ≤ n,q ∈ δ(t ))}. The accepted language of A is 1 n i i L(A) = {t ∈ T(Σ)|δ(t)∩Q 6= ∅}. The state language L(q) (also known as down language [5]) of a f state q ∈ Q is defined by L(q)= {t ∈ T(Σ)|q ∈ δ(t)}. Obviously, L(A) = L(q) (1) q∈[Qf In thefollowing of this paper, we consider accessible FTAs,that are FTAs any state q of which satisfies L(q) 6= ∅. Obviously, any FTA admits an equivalent accessible FTA obtained by removing the states the down language of which is empty. Given a symbol c in Σ , the c-product is the operation · defined for any tree t in T(Σ) and for any 0 c tree language L by L if t = c, t· L = {d} if t = d∈ Σ \{c}, (2) c 0 f(t · L,...,t · L) otherwise if t = f(t ,...,t ) 1 c n c 1 n This c-product is extendedforany two treelanguages L andL′ by L· L′ = t· L′.Inthe following c t∈L c of this paper, we use some equivalences over expressions using some properties of the c-product. Let S us state these properties of the c-product. As it is the case of catenation product in the string case, it distributes over the union: Lemma 1. Let L , L and L be three tree languages over Σ. Let c be a symbol in Σ . Then: 1 2 3 0 (L ∪L )· L = (L · L )∪(L · L ) 1 2 c 3 1 c 3 2 c 3 Proof. Let t be a tree in T(Σ). Then: t ∈ (L ∪L )· L ⇔ ∃u∈ L ∪L ,∃v ∈ L ,t = u· v 1 2 c 3 1 2 3 c ⇔ (∃u∈ L ,∃v ∈ L ,t = u· v)∨(∃u ∈ L ,∃v ∈ L ,t = u· v) 1 3 c 2 3 c ⇔ t ∈ (L · L )∪(L · L ) 1 c 3 2 c 3 ⊔⊓ Another common property with the catenation product is that any operator · is associative: c Lemma 2. Let t and t′ be any two trees in T(Σ), let L be a tree language over Σ and let c be a symbol in Σ . Then: 0 t· (t′· L) = (t· t′)· L c c c c Proof. By induction over the structure of t. 1. Consider that t = c. Then t· (t′· L) = t′· L = (t· t′)· L. c c c c c 2. Consider that t ∈ Σ \{c}. Then t· (t′· L) = t = (t· t′)· L. 0 c c c c 3. Let us suppose that t = f(t ,...,t ) with n> 0. Then, following Equation (2): 1 n f(t ,...,t )· (t′· L)= f(t · (t′· L),...,t · (t′· L)) 1 n c c 1 c c n c c = f((t · t′)· L,...,(t · t′)· L) (Induction hypothesis) 1 c c n c c = f(t · t′,...,t · t′)· L 1 c n c c = (f(t ,...,t )· t′)· L 1 n c c ⊔⊓ Corollary 1. Let L, L′ and L′′ be any three tree languages over a graded alphabet Σ and let c be a symbol in Σ . Then: 0 L· (L′· L′′) = (L· L′)· L′′ c c c c However, the associativity is not necessarily satisfied if the substitution symbols are different; as an example, (f(a,b)· b)· c 6= f(a,b)· (b· c). Finally, the final common property is that the operation a b a b · is compatible with the inclusion: c Lemma 3. Let t be a tree over Σ, and let L ⊂ L′ be two tree languages over Σ. Then: t· L ⊂ t· L′ c c Proof. By induction over the structure of t. 1. Consider that t = c. Then c· L = L ⊂ L′ = c· L′. c c 2. Consider that t ∈ Σ \{c}. Then t· L = {t} = t· L′. 0 c c 3. Let us suppose that t = f(t ,...,t ). 1 n Then f(t ,...,t )· L = f(t · L,...,t · L) 1 n c 1 c n c By induction hypothesis, ∀1≤ j ≤ n,t · L ⊂ t · L′ j c j c Therefore, f(t · L,...,t · L)⊂ f(t · L′,...,t · L′) = t· L′ 1 c n c 1 c n c c ⊔⊓ Corollary 2. Let L, L′ ⊂ L′′ be any three tree languages over Σ and let c be a symbol in Σ . Then: 0 L· L′ ⊂ L· L′′ c c The first property not shared with the classical catenation product is that the c-product may distribute over other products: Lemma 4. Let t , t and t be any three trees in T(Σ). Let a and b be two distinct symbols in Σ 1 2 3 0 such that a does not appear in t . Then: 3 (t · t )· t = (t · t )· (t · t ) 1 a 2 b 3 1 b 3 a 2 b 3 Proof. By induction over t . 1 1. If t = a, then 1 (t · t )· t = t · t = (t · t )· (t · t ) 1 a 2 b 3 2 b 3 1 b 3 a 2 b 3 2. If t = b, then 1 (t · t )· t = t = (t · t )· (t · t ) 1 a 2 b 3 3 1 b 3 a 2 b 3 3. If t = c ∈ Σ \{a,b}, then 1 0 (t · t )· t = t = (t · t )· (t · t ) 1 a 2 b 3 1 1 b 3 a 2 b 3 4. If t = f(u ,...,u ) with n > 0, then, following Equation (2): 1 1 n (t · t )· t = (f(u · t ,...,u · t ))· t 1 a 2 b 3 1 a 2 n a 2 b 3 = f((u · t )· t ,...,(u · t )· t ) 1 a 2 b 3 n a 2 b 3 = f((u · t )· (t · t ),...,(u · t )· (t · t )) (Induction Hypothesis) 1 b 3 a 2 b 3 n b 3 a 2 b 3 = f(u · t ,...,u · t )· (t · t ) 1 b 3 n b 3 a 2 b 3 = (f(u ,...,u )· t )· (t · t ) 1 n b 3 a 2 b 3 ⊔⊓ Corollary 3. Let L , L andL be any three tree languages over Σ. Let a andb be twodistinct symbols 1 2 3 in Σ such that L ⊂ T(Σ \{a}). Then: 0 3 (L · L )· L = (L · L )· (L · L ) 1 a 2 b 3 1 b 3 a 2 b 3 In some particular cases, two products commute: Lemma 5. Let t , t and t be any three trees in T(Σ). Let a and b be two distinct symbols in Σ 1 2 3 0 such that a does not appear in t and such that b does not appear in t . Then: 3 2 (t · t )· t = (t · t )· t 1 a 2 b 3 1 b 3 a 2 Proof. By induction over t . 1 1. If t = a, then 1 (t · t )· t = (a· t )· t = t · t 1 a 2 b 3 a 2 b 3 2 b 3 = t = a· t 2 a 2 = (a· t )· t = (t · t )· t b 3 a 2 1 b 3 a 2 2. If t = b, then 1 (t · t )· t = (b· t )· t = b· t 1 a 2 b 3 a 2 b 3 b 3 = t = t · t 3 3 a 2 = (b· t )· t = (t · t )· t ) b 3 a 2 1 b 3 a 2 3. If t = c ∈ Σ \{a,b}, then 1 0 (t · t )· t = (c· t )· t = c· t 1 a 2 b 3 a 2 b 3 b 3 = c = c· t a 2 = (c· t )· t b 3 a 2 4. If t = f(u ,...,u ) then, following Equation (2): 1 1 n (t · t )· t = (f(u · t ,...,u · t ))· t 1 a 2 b 3 1 a 2 n a 2 b 3 = f((u · t )· t ,...,(u · t )· t ) 1 a 2 b 3 n a 2 b 3 = f((u · t )· t ,...,(u · t )· t ) (Induction Hypothesis) 1 b 3 a 2 n b 3 a 2 = f(u · t ,...,u · t )· t 1 b 3 n b 3 a 2 = (f(u ,...,u )· t )· t 1 n b 3 a 2 ⊔⊓ The iterated c-product is the operation n,c recursively defined for any integer n by: L0,c = {c} (3) Ln+1,c = Ln,c∪L· Ln,c (4) c The c-closure is the operation ∗c defined by L∗c = Ln,c. Notice that, unlike the string case, the n≥0 products may commute with the closure in some cases: S Lemma 6. Let L and L be any two tree languages over Σ. Let a and b be two distinct symbols in 1 2 Σ such that L ⊂ T(Σ \{a}). Then: 0 2 L∗a · L = (L · L )∗a 1 b 2 1 b 2 Proof. Let us show by recurrence over the integer n that Ln,a· L = (L · L )n,a. 1 b 2 1 b 2 1. If n = 0, then, according to Equation (3)): L0,a· L = {a} = (L · L )0,a 1 b 2 1 b 2 2. If n > 0, then, following Equation (4)): n+1,a n,a n,a L · L = (L · L ∪L )· L 1 b 2 1 a 1 1 b 2 n,a n,a = (L · L )· L ∪(L )· L (Lemma 1) 1 a 1 b 2 1 b 2 n,a n,a = ((L · L )· (L · L ))∪(L )· L (Corollary 3) 1 b 2 a 1 b 2 1 b 2 = ((L · L )n,a· (L · L ))∪(L · L )n,a (Induction Hypothesis) 1 b 2 a 1 b 2 1 b 2 = (L · L )n+1,a 1 b 2 As a direct consequence, L∗a · L = (L · L )∗a. ⊔⊓ 1 b 2 1 b 2 A rational expression E over Σ is inductively defined by: E = 0, E = f(E ,...,E ), 1 n E = E +E , E = E · E , E = E∗c 1 2 1 c 2 1 where f is any symbol in Σ , c is any symbol in Σ and E ,...,E are any n rational expressions. n 0 1 n The language denoted by E is the tree language L(E) inductively defined by: L(0) = ∅, L(f(E ,...,E )) = f(L(E ),...,L(E )), 1 n 1 n L(E +E ) = L(E )∪L(E ), L(E · E ) =L(E )· L(E ), L(E∗c) = (L(E ))∗c 1 2 1 2 1 c 2 1 c 2 1 1 where f is any symbol in Σ , c is any symbol in Σ and E ,...,E are any n rational expressions. n 0 1 n In the following of this paper, we consider that rational expressions include some variables. Let X = {x ,...,x } be a set of k variables. A rational expression E over (Σ,X) is inductively defined by: 1 k E =0, E = x , E = f(E ,...,E ), j 1 n E = E +E , E = E · E , E = E∗c 1 2 1 c 2 1 where f is any symbol in Σ , c is any symbol in Σ , 1 ≤ j ≤ k is any integer and E ,...,E are n 0 1 n any n rational expressions over (Σ,X). The language denoted by an expression with variables needs a context to be computed: indeed, any variable has to be evaluated according to a tree language. Let L = (L ,...,L ) be a k-tuple of tree languages over Σ. The L-language denoted by E is the tree 1 k language L (E) inductively defined by: L L (0) = ∅, L (x )= L , L L j j L (f(E ,...,E )) = f(L (E ),...,L (E )), L 1 n L 1 L n L (E +E ) = L (E )∪L (E ) L 1 2 L 1 L 2 L (E · E ) = L (E )· L (E ), L (E∗c)= (L (E ))∗c L 1 c 2 L 1 c L 2 L 1 L 1 where f is any symbol in Σ , c is any symbol in Σ , 1 ≤ j ≤ k is any integer and E ,...,E are any n 0 1 n n rational expressions over (Σ,X). Two rational expressions E and F with variables are equivalent, denoted by E ∼ F, if for any tuple L of languages over Σ, L (E) = L (F). Let Γ ⊂ Σ. Two L L rational expressions E and F with variables are Γ-equivalent, denoted by E ∼ F, if for any tuple L Γ of languages over Γ, L (E) = L (F). By definition, L L E ∼ F ⇒ E ∼ F (5) Γ Notice that any expression over (Σ,X) is also an expression over Σ ∪ X. However, two equivalent rational expressions over (Σ,X) are not necessarily equivalent as rational expressions over Σ∪X. As anexample,x· bisequivalenttoxasexpressionsover{a,b,x},butnotasexpressionsover({a,b},{x}): a L(x· b) = {x} = L(x) a L (x· b) = {b} 6= L (x) = {a} {a} a {a} In the following, we denote by Ex←E′ the expression obtained by substituting any symbol x by the expression E′ in the expression E. Obviously, this transformation is inductively defined as follows: ax←E′ = a 0x←E′ = 0 yx←E′ = y xx←E′ = E′ (f(E1,...,En))x←E′ = f((E1)x←E′,...,(En)x←E′) (E1+E2)x←E′ = (E1)x←E′ +(E2)x←E′ (E1·cE2)x←E′ = (E1)x←E′ ·c(E2)x←E′ (E1∗c)x←E′ = ((E1)x←E′)∗c where a is any symbol in Σ , x 6= y are two variables in X, f is any symbol in Σ , c is any symbol 0 n in Σ and E ,...,E are any n rational expressions over (Σ,X). This transformation preserves the 0 1 n language in the following case: Lemma 7. Let E be an expression over an alphabet Σ and over a set X = {x ,...,x } of variables. 1 n Let F be a rational expression over (Σ,X). Let x be a variable in X. Let L = (L ,...,L ) be a n-uple j 1 n of tree languages such that L = L (F). Then: j L L ((E) ) = L (E) L xj←F L Proof. By induction over the structure of E. 1. If E ∈{a,y,0} with a ∈ Σ and y 6= x , (E) = E. 0 j xj←F 2. If E = x , then (E) = F. Therefore j xj←F L ((E) )= L (F) = L L xj←F L j = L (x ) = L (E) L j L 3. If E = f(E ,...,E ), with f ∈ Σ , k > 0 then: 1 n k L ((E) ) = L (f((E ) ,...,(E ) )) L xj←F L 1 xj←F n xj←F = f(L ((E ) ),...,L ((E ) )) L 1 xj←F L n xj←F = f(L (E ),...,L (E )) (Induction Hypothesis) L 1 L n = L (f(E ,...,E )) L 1 n 4. If E = E +E , then 1 2 L ((E +E ) ) = L ((E ) +(E ) ) L 1 2 xj←F L 1 xj←F 2 xj←F = L ((E ) )∪L ((E ) )) L 1 xj←F L 2 xj←F = L (E )∪L (E ) (Induction Hypothesis) L 1 L 2 = L (E +E ) L 1 2 5. If E = E · E , then 1 c 2 L ((E · E ) ) = L ((E ) · (E ) ) L 1 c 2 xj←F L 1 xj←F c 2 xj←F = L ((E ) )· L ((E ) )) L 1 xj←F c L 2 xj←F = L (E )· L (E ) (Induction Hypothesis) L 1 c L 2 = L (E · E ) L 1 c 2 6. If E = E∗c, then 1 L ((E∗c) )= (L ((E ) ))∗c L 1 xj←F L 1 xj←F = (L (E ))∗c (Induction Hypothesis) L 1 = L (E∗c) L 1 ⊔⊓ In the following, we denote by op(E) the set of the operators that appear in a rational expression E. The previous substitution can beused in order to factorize an expression w.r.t. a variable. However, this operation does not preserve the equivalence; e.g. L (x· c) = {c} =6 L ((a· c)· x)= {b} {b} b {b} b a Nevertheless, this operation preserves the language if it is based on a restricted alphabet: Proposition 1. Let E be a rational expression over a graded alphabet Σ and over a set X of variables. Let x be a variable in X. Let Γ ⊂ Σ be the subset defined by Γ = {b ∈ Σ0 | {·b,∗b}∩op(E) 6= ∅}. Let a be a symbol not in Σ. Then: E ∼ (E) · x Σ\Γ x←a a Proof. By induction over the structure of E. 1. If E = x, then since x ∼ a· x, it holds from Equation (5) that E ∼ (E) · x. Σ∪{a} a Σ\Γ x←a a 2. If E ∈{0}∪Σ ∪X \{x}, since x does not appear in E, it holds E = E . x←a 3. If E = f(E ,...,E ), then 1 n (f(E ,...,E )) · x = f((E ) ,...,(E ) )· x 1 n x←a a 1 x←a n x←a a ∼ f((E ) · x,...,(E ) · x) (Equation (2)) 1 x←a a n x←a a ∼ f(E ,...,E ) (Induction hypothesis) Σ\Γ 1 n 4. If E = E +E , then 1 2 (E +E ) · x =((E ) +(E ) )· x 1 2 x←a a 1 x←a 2 x←a a ∼((E ) )· x+((E ) )· x (Lemma 1) 1 x←a a 2 x←a a ∼ E +E (Induction hypothesis) Σ\Γ 1 2 5. If E = E · E , then 1 c 2 (E · E ) · x = ((E ) · (E ) )· x 1 c 2 x←a a 1 x←a c 2 x←a a ∼ (((E ) )· x)· (((E ) )· x) (Corollary 3) Σ 1 x←a a c 2 x←a a ∼ E · E (Induction hypothesis) Σ\Γ 1 c 2 6. If E = E∗c, then 1 (E∗c) · x = ((E ) )∗c · x 1 x←a a 1 x←a a ∼ (((E ) )· x)∗c (Lemma 6) Σ 1 x←a a ∼ E∗c (Induction hypothesis) Σ\Γ 1 ⊔⊓ 3 Equations Systems for Tree Languages Let Σ be an alphabet and E = {E ,...,E } be a set of n variables. An equation over (Σ,E) is an 1 n expression E = F , where 1 ≤ j ≤ n is any integer and F is a rational expression over (Σ,E). An j j j equation system over (Σ,E) is a set X = {E = F | 1 ≤ j ≤ n} of n equations. Let L = (L ,...,L ) j j 1 n be a n-tuple of tree languages. The tuple L is a solution for an equation (E = F ) if L = L (F ). j j j L j The tuple L is a solution for X if for any equation (E = F ) in X, L is a solution of (E = F ). j j j j Example 1. Let us define the equation system X as follows: E = f(E ,E )+f(E ,E ) 1 1 1 2 4 E2 = b+f(E2,E4) X = E3 = a+h(E4) E = a+h(E ) 4 3 The tuple (∅,∅,∅,∅) is a solution for the equation E1 = F1, but not of the system X. Two systems over the same variables are equivalent if they admit the same solutions. Notice that a system does not necessarily admit a unique solution. As an example, any language is a solution of the system E = E . Obviously, 1 1 Proposition 2. If X only contains equations E = F with F a rational expression without variables, k k k then (L(F ),...,L(F )) is the unique solution of X. 1 n Let us now define the operation of substitution, computing an equivalent system. Definition 1. Let X = {(E = F )| 1≤ j ≤ n} be an equation system. The substitution of (E = F ) j j k k in X is the system Xk = {Ek = Fk}∪{Ej = (Fj)Ek←Fk | j 6= k∧1 ≤ j ≤n}. As a direct consequence of Lemma 7, Proposition 3. Let X = {(E = F )| 1 ≤ j ≤ n} be an equation system. Let E = F be an equation j j k k in X. Let L be a solution of X. Then for any integer 1≤ j,k ≤ n with j 6=k, L is a solution of Ej = (Fj)Ek←Fk. And following Proposition 3, Proposition 4. Let X be an equation system over n variables. Let k ≤ n be an integer. Then: X and Xk are equivalent. Example 2. Let us consider the system X of Example 1. Then: E = f(E ,E )+f(E ,a+h(E )) 1 1 1 2 3 X4 = E2 = b+f(E2,a+h(E3)) E3 = a+h(a+h(E3)) E = a+h(E ) 4 3 Let us determine a particular case that can be solved by successive substitutions. Let X = {(Ej = F ) | 1 ≤ j ≤ n} be an equation system. The relation < is defined for any two variables E and E j X j k by E < E ⇔ E appears in F j X k j k The relation (cid:22) is defined as the transitive closure of < . In the case where E < E , the equation X X k X k E = F is said to be recursive. Let us say that a system is recursive if there exists two symbols E k k j and E such that E (cid:22) E and E (cid:22) E . If a system is not recursive, it can be solved by successive k j X k k X j substitutions. If E is a variable that does not appear in any right side of an equation of X, we denote k by X \(E = F ) the system obtained by removing E = F of X, and by reindexing any symbol E k k k k j with j > k into E . j−1 Lemma 8. Let X = {(E = F ) |1 ≤ j ≤ n} be an equation system over a graded alphabet Σ and over j j n variables {E ,...,E }. Let E = F be an equation in X such that E = F is not recursive. Then 1 n k k k k for any n−1-tuple Z = (L ,...,L ,L ,...,L ), the two following conditions are equivalent: 1 k−1 k+1 n 1. (L ,...,L ,L (F ),L ,...,L ) is a solution of X 1 k−1 Z k k+1 n 2. (L ,...,L ,L ,...,L ) is a solution of Xk \{E = F } 1 k−1 k+1 n k k Proof. Let L = (L ,...,L ,L (F ),L ,...,L ) and L′ = (L ,...,L ,L ,...,L ). Obvi- 1 k−1 Z k k+1 n 1 k−1 k+1 n ously, L is a solution for the (non recursive) equation E = F . From Proposition 4, k k L is a solution of X ⇔ L is a solution of Xk Consequently, for any integer j 6= k, L is a solution of Ej = Fj ⇔ L is a solution of Ej = (Fj)Ek←Fk Moreover, by definition of L′, for any integer j 6= k, L is a solution of Ej = (Fj)Ek←Fk ⇔ L′ is a solution of Ej = (Fj)Ek←Fk ⇔ L′ is a solution of Xk \{E = F } k k ⊔⊓ As a direct consequence of the previous lemma, a non-recursive system can be solved by solving a smaller system, obtained by substitution: Corollary 4. Let X = {(E = F ) | 1 ≤ j ≤ n} be an equation system over a graded alphabet Σ and j j over n variables {E ,...,E }. Let E = F be an equation in X such that F is a rational expression. 1 n k k k Then for any n−1-tuple (L ,...,L ,L ,...,L ), the two following conditions are equivalent: 1 k−1 k+1 n 1. (L ,...,L ,L(F ),L ,...,L ) is a solution of X 1 k−1 k k+1 n 2. (L ,...,L ,L ,...,L ) is a solution of Xk \{E = F } 1 k−1 k+1 n k k Moreover, such a system admits a unique solution: Proposition 5. Let X = {(E = F ) | 1 ≤ j ≤ n} be an equation system that is not recursive over a j j graded alphabet Σ and over variables {E ,...,E }. Then 1 n X admits a unique solution. Proof. By recurrence over the cardinal of X. 1. X = {E = F }, then F is a rational expression over Σ (with no variable) and therefore L(F ) is 1 1 1 1 the unique solution of X. 2. Since X is not recursive, there exists an equation E = F with F a rational expression over Σ k k k (with no variable). Therefore, according to Corollary 4, a tuple (L ,...,L ,L(F ),L ,...,L ) 1 k−1 k k+1 n is a solution of X if and only if (L ,...,L ,L ,...,L ) is a solution of Xk \ {E = F }. 1 k−1 k+1 n k k By recurrence hypothesis, since Xk \ {E = F } is not recursive, it admits a unique solution k k (L ,...,L ,L ,...,L ). Thus (L ,...,L ,L(F ),L ,...,L ) is a solution of X. Finally, 1 k−1 k+1 n 1 k−1 k k+1 n since for any L 6= L(F ),the tuple (L ,...,L ,L ,L ,...,L ) is not a solution for E = F , k k 1 k−1 k k+1 n k k (L ,...,L ,L(F ),L ,...,L ) is the unique solution of X. 1 k−1 k k+1 n ⊔⊓ Example 3. Let us define the equation system Y as follows: E = f(E ,E )+f(E ,E ) 1 2 3 2 3 E2 = b+f(E4,E4) Y = E3 = a+h(E4) E4 = a+(f(a,b))∗b ·ba Then E =f(E ,E )+f(E ,E ) 1 2 3 2 3 Y4 = E2 =b+f(a+(f(a,b))∗b ·ba,a+(f(a,b))∗b ·ba) E3 =a+h(a+(f(a,b))∗b ·ba) E4 =a+(f(a,b))∗b ·ba E1 =f(E2,a+h(a+(f(a,b))∗b ·ba))+f(E2,a+h(a+(f(a,b))∗b ·ba)) (Y4)3 = E2 =b+f(a+(f(a,b))∗b ·ba,a+(f(a,b))∗b ·ba) E3 =a+h(a+(f(a,b))∗b ·ba) E4 =a+(f(a,b))∗b ·ba E1 =f(b+f(a+(f(a,b))∗b ·ba,a+(f(a,b))∗b ·ba),a+h(a+(f(a,b))∗b ·ba)) +f(b+f(a+(f(a,b))∗b ·ba,a+(f(a,b))∗b ·ba),a+h(a+(f(a,b))∗b ·ba)) ((Y4)3)2 = E2 =b+f(a+(f(a,b))∗b ·ba,a+(f(a,b))∗b ·ba) E3 =a+h(a+(f(a,b))∗b ·ba) E4 =a+(f(a,b))∗b ·ba 4 Arden’s Lemma for Trees and Recursive Systems Arden’s Lemma [2] is a fundamental result in automaton theory. It gives a solution of the recursive language equation X = A·X ∪B where X is an unknown language. It can be applied to compute a rational expression from an automaton and therefore prove the second way of Kleene theorem for strings. Following the same steps as in string case, we generalize this lemma to trees. Proposition 6. Let A and B be two tree languages over a graded alphabet Σ. Then A∗c · B is the c smallest language in the family F of languages L over Σ satisfying L = A· L∪B. Furthermore, if c c ∈/ A, then F = {A∗c · B}. c Proof. Let us set Z = A∗c · B. c 1. Obviously, Z belongs to F: A· (A∗c · B)∪B = (A· A∗c)· B ∪B from Corollary 1 c c c c = (A· A∗c)· B ∪{c}· B c c c = ((A· A∗c)∪{c})· B c c = A∗c · B c 2. Let us now show that if C belongs to F, then Z ⊂ C. To do so, let us show that for any integer n ≥ 0, An,c · B ⊂ C. Since C belongs to F, then C = A· C ∪B. Therefore A0,c · B = B ⊂ C c c c and A· C ⊂ C. Suppose that An,c · B ⊂ C for some integer n ≥ 0. Therefore, from Corollary 2, c c A· (An,c· B)⊂ A· (C) and from Corollary 1, An+1,c· B ⊂ A· C ⊂ C. Consequently, since for c c c c c any integer n, An,c· B ⊂ C, it holds that Z = A∗c · B ⊂ C. c c 3. Finally, let us show that if c ∈/ A, then any language Y in F satisfies Y ⊂ Z, implying that F = {Z}. Let Y 6= Z satisfying Y = A· Y ∪B. Suppose that Y 6⊂ Z. Let t be a tree in Y \Z c such that Height(y) is minimal. Obviously, since B ⊂ Z, t is not in B. Consequently, t belongs to A· Y and therefore t = t · t with t ∈A and t ∈ Y. Since c ∈/ A, t 6= c. Furthermore, if c does c 1 c 2 1 2 1 not appear in t , then t = t ∈ A and consequently, t ∈ A∗c · B = Z, contradicting the fact that 1 1 c t ∈/ Z. Therefore c appears in t and then Height(t ) < Height(t), contradicting the minimality of 1 2 the height of t. As a direct consequence, any language Y in F satisfies Y ⊂ Z. Following previous point, since Z ⊂ Y, it holds that Y = Z. ⊔⊓