COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES ROBERTFERENSANDMAREKSZYKUL A Abstract. Westudydescriptivecomplexitypropertiesoftheclassofregularbifix-freelanguages, 7 which is the intersection of prefix-free and suffix-free regular languages. We show that there 1 existasingleternaryuniversal (stream of)bifix-freelanguages that meetall thebounds forthe 0 state complexity basic operations (Boolean operations, product, star, and reversal). This is in 2 contrast with suffix-free languages, where it is known that there does not exist such a stream. Then we present a stream of bifix-free languages that is most complex in terms of all basic n operations,syntacticcomplexity,andthenumberofatomsandtheircomplexities,whichrequires a asuperexponential alphabet. J Wealsocompletethepreviousresultsbycharacterizingstatecomplexityofproduct,star,and 3 reversal, and establishing tight upper bounds for atom complexities of bifix-free languages. We 1 show that to meet the bound for reversal we require at least 3 letters and to meet the bound for atom complexities n+1 letters are sufficient and necessary. For the cases of product, star, ] and reversal we show that there are no gaps (magic numbers) in the interval of possible state L complexities of the languages resulted from an operation; in particular, the state complexity of F theproductLmLn isalwaysm+n−2,whileofthestariseithern−1orn−2. . s Keywords: atom complexity, bifix-free, Boolean operations, magic number, most complex, c prefix-free,product,quotientcomplexity,regularlanguage,reversal,statecomplexity,suffix-free, [ syntacticcomplexity, transitionsemigroup 1 v 8 6 1. Introduction 7 3 A language is prefix-free or suffix-free if no word in the language is a proper prefix or suffix, 0 respectively, of another word from the language. If a language is prefix-free and suffix-free then it . is bifix-free. Languages with these properties have been studied extensively. They form important 1 0 classes of codes, whose applications can be found in such fields as cryptography,data compression, 7 information transmission, and error correction methods. In particular, prefix and suffix codes are 1 prefix-freeand suffix-freelanguages,respectively,while bifix-free languagescanserve asboth kinds : v ofcodes. Fora surveyaboutcodes see[1,19]. Moreover,they arespecialcasesofconvex languages i (see e.g. [12] for the related algorithmic problems). Here we are interested how the descriptive X complexity properties of prefix-free and suffix-free languages are shared in their common subclass. r a There are three natural measures of complexity of a regular language that are related to the Myhill (Myhill-Nerode) congruence on words. The usual state complexity or quotient complexity is the number of states in a minimal DFA recognizing the language. Therefore, state complexity measures how much memory we need to store the language in the form of a DFA, or how much time we need to perform an operation that depends on the size of the DFA. Therefore, we are interested in finding upper bounds for complexities of the resulting languages obtained as a result of some operation (e.g. union, intersection, product, or reversal). Syntactic complexity measures the number of transformations in the transition semigroup or, equivalently, the number of classes Institute of Computer Science,,University ofWrocl aw, Wrocl aw, Poland E-mail addresses: [email protected], [email protected]. 1 2 COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES of words that act differently on the states [11, 22]; this provides a natural bound on the time and space complexity of algorithms working on the transition semigroup (for example, a simple algorithmchecking whether a languageis star-free just enumerates all transformationsand verifies whethernooneofthemcontainsanon-trivialcycle[20]). Thethirdmeasureiscalledthecomplexity of atoms [10],whichisthenumberandstatecomplexitiesofthelanguagesofwordsthatdistinguish exactly the same subset of states (quotients). Most complex languagesand universalwitnesses were proposedby Brzozowskiin [3]. The point hereisthat,itismoresuitabletohaveasinglewitnessthatismostcomplexinthegivensubclassof regularlanguages,insteadofhavingseparatewitnessesmeetingtheupperboundforeachparticular measureandoperation. Besidestheoreticalaspects,thisconcepthasalsoapracticalmotivation: To testefficiencyofvariousalgorithmsorsystemsoperatingonautomata(e.g. computationalpackage GAP [15]), it is natural to use worst-case examples, that is, languages with maximal complexities. Therefore, it is preferred to have just one universal most complex example than a set of separate example for every particular case. Of course, it is also better to use a smallest possible alphabet. Itmaybesurprisingthatsuchasinglewitnessexistsformostofthenaturalsubclassesofregular languages: the class of all regular languages [3], right-, left-, and two-sided ideals [4], and prefix- convex languages [7]. However, there does not exist a single witness for the class of suffix-free languages [9], where two different witnesses must be used. Inthis paperwecontinuethepreviousstudiesconcerningtheclassofbifix-freelanguages[5,24]. In [5] the tight bound on the state complexity of basic operations on bifix-free languages were established; however, the witnesses were different for particular cases. The syntactic complexity complexityofbifix-freelanguageswasfirststudiedin[6],wherealowerboundwasestablished,and then the formula was shown to be an upper bound in [24]. Our main contributions are as follows: (1) We show a single ternary witness of bifix-free languages that meets the upper bounds for all basic operations. This is in contrast with the class of suffix-free languages, where such most complex languages do not exist. (2) We show that there exist most complex languages in terms of state complexity of all basic operations, syntactic complexity, and number of atoms and their complexities. It uses a superexponential alphabet, which cannot be reduced. (3) We prove a tight upper bound on the number of atoms and the quotient complexities of atoms of bifix-free languages. (4) We providea complete characterizationofstate complexityfor productandstar,andshow the exact ranges for the possible state complexities for product, star, and reversal of bifix- free languages. (5) We prove that at least a ternary alphabet must be used to meet the bound for reversal, and at an (n+1)-ary alphabet must be used to meet the bounds for atom complexities. 2. Preliminaries 2.1. Regular languages and complexities. Let Σ be a non-empty finite alphabet. In this paper we deal with regular languages L ⊆ Σ∗. For a word w ∈ L, the (left) quotient of L is the set {u | wu ∈ L}, which is also denoted by L.w. Left quotients are related to the Myhill-Nerode congruenceonwords,wheretwowordsu,v∈Σ∗ areequivalentifforeveryx∈Σ∗, wehaveux∈L if and only if vx ∈ L. Thus the number of quotients is the number of equivalence classes in this relation. The number of quotients of L is the quotient complexity κ(L) of this language [2]. A language is regular if it has a finite number of quotients. COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES 3 Let L,K ⊆Σ∗ be regular languages over the same alphabet Σ. By Boolean operations on these languageswemeanunion L∪K,intersection L∩K,difference L\K,andsymmetricdifference L⊕K. ThereverselanguageLR ofListhelanguage{ak...a1 |a1...ak ∈L,a1,...,ak ∈Σ}. Bythebasic operations onregularlanguageswe meanthe Booleanoperations,the product(concatenation),the star,andthereversaloperation. Bythecomplexity ofanoperationwemeanthemaximumpossible quotient complexity of the resulted language, given as a function of the quotient complexities of the operands. Thesyntacticcomplexity σ(L)ofListhenumberofequivalenceclassesoftheMyhillequivalence relation on Σ+, where two words u,v ∈ Σ+ are equivalent if for any words x,y ∈ Σ∗, we have xuy ∈L if and only if xvy ∈L. ThethirdmeasureofcomplexityofaregularlanguageListhenumberandquotientcomplexities ofatoms [10]. AtomsarisefromtheleftcongruenceofwordsrefinedbyMyhillequivalencerelation: twowordsu,v ∈Σ∗ areequivalentifforanywordx∈Σ∗,wehavexu∈Lifandonlyifxv ∈L[16]. Thus u and v are equivalent if they belong exactly to the same left quotients of L. An equivalence class of this relation is an atom [10] of L. It is known that (see [10]) an atom is a non-empty intersection of quotients and their complements, and the quotients of a language are unions of its atoms. Therefore, we can write A for an atom, where S is the set of quotients of L; then A is S S the intersection of the quotients of L from S together with the complements of the quotients of L outside S. 2.2. Finite automata and transformations. Adeterministic finite automaton (DFA)isatuple D = (Q,Σ,δ,q0,F), where Q is a finite non-empty set of states, Σ is a finite non-empty alphabet, δ: Q×Σ → Q is the transition function, q0 ∈ Q is the initial state, and F ⊆ Q is the set of final states. We extend δ to a function δ: Q×Σ∗ →Q as usual: for q ∈Q, w ∈Σ∗, and a∈Σ, we have δ(q,ε)=q and δ(q,wa)=δ(δ(q,w),a), where ε denotes the empty word. A state q ∈ Q is reachable if there exists a word w ∈ Σ∗ such that δ(q0,w) = q. Two states p,q ∈ Q are distinguishable if there exists a word w ∈ Σ∗ such that either δ(p,w) ∈ F and δ(q,w)∈/ F or δ(p,w)∈/ F and δ(q,w)∈F. A DFA is minimal if there is no DFA with a smaller number of states that recognizes the same language. It is well known that this is equivalent to that every state is reachable and every pair of distinct statesis distinguishable. Givena regularlanguageL,allits minimalDFAs areisomorphic, and their number of states is equal to the number of left quotients κ(L) (see e.g. [2]). Hence, the quotient complexity κ(L) is also called the state complexity of L. If a DFA is minimal then every state q corresponds to a quotient of the language, which is the set words w such that δ(q,w) ∈F. We denote this quotient by K . We also write A , where S is a subset of states, for q S A = K ∩ K , S \ q \ q q∈S q∈S which is an atom if A is non-empty. S A state q is empty if K =∅. q Throughoutthe paper, by D we denote a DFA with n states, and without loss of generalitywe n always assume that its set of states Q={0,...,n−1} and that the initial state is 0. In any DFA D , every letter a ∈ Σ induces a transformation δ on the set Q of n states. By n a T we denote the set of all nn transformations of Q; then T is a monoid under composition. For n n two transformations t1,t2 of Q, we denote its composition as t1t2. The transformation induced by a word w ∈ Σ∗ is denoted by δ . The image of q ∈ Q under a transformation δ is denoted w w by qδ , and the image of a subset S ⊆ Q is Sδ = {qδ | q ∈ S}. The preimage of a subset w w w 4 COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES S ⊂ Q under a transformation δw−1 is Sδw−1 = {q ∈ Q | qδw ∈ S}. Note that if w = a1...ak, then δ−1 = δ−1...δ−1. The identity transformation is denoted by 1, which is also the same as δ , a1...ak ak a1 ε and we have q1=q for all q ∈Q. The transition semigroup T(n) of D is the semigroup of all transformations generated by the n transformations induced by Σ. Since the transition semigroup of a minimal DFA of a language L is isomorphic to the syntactic semigroup of L [22], syntactic complexity σ(L) is equal to the cardinality |T(n)| of the transition semigroup T(n). Since a transformation t of Q can be viewed as a directed graph with regular out-degree equal 1 and possibly with loops, we transfer well known graph terminology to transformations: The in- degree of a state q ∈ Q is the cardinality |{p ∈ Q | pt = q}|. A cycle in t is a sequence of states q1,...,qk for k ≥ 2 such that qit = qi+1 for i = 1,...,k−1, and qkt = q1. A fixed point in t is a state q such that qt=q; we therefore do not call fixed points cycles. A transformation that maps a subset S to a state q and fixes all the other states is denoted by (S →q). If S is a singleton {p} then we write shortly (p →q). A transformation that has a cycle q1,...,qk and fixes all the other states is denoted by (q1,...,qk). A nondeterministic finite automaton (NFA) is a tuple N = (Q,Σ,δ,I,F), where Q, Σ, and F are defined as in a DFA, I is the set of initial states, and δ: Q×Σ∪{ε} → 2Q is the transition function. 2.3. Most complex languages. A stream is a sequence (Lk,Lk+1,...) of regular languages in some class, where n is the state complexity of L . A dialect L′ of a language L is a language n n n that differs only slightly from L . There are various types of dialects, depending what changes n are allowed. A permutational dialect (or permutationally equivalent dialect) is a language in which letters may be permuted or deleted. Let π: Σ→Σ be a partialpermutation. If Ln(a1,...,ak) is a language over the alphabet Σ={a1,...,ak), then we write Ln(π(a1),...,π(ak)) for a language in which a letter a is replaced by π(a ). In the case a letter a is removed, so not defined by π(a ), i i i i we write π(a )= . For example, if L={a,ab,abc},then L(b,a, )={b,ba}. i A stream is most complex in its class if all their languages and all pairs of languages together with their dialects meet all the bounds for the state complexities of basic operations,the syntactic complexity, the number and the complexities of atoms. Note that binary operations were defined for languages with the same alphabets. Therefore, if the alphabet is not constant in the stream, to meet the bounds for binary Boolean operations, for every pair of languages we must use their dialects that restrict the alphabet to be the same. Sometimes we restrict only to some of these measures. In some cases, this allows to provide an essentiallysimplerstreamoverasmalleralphabetwhenweareinterestedonlyinthosemeasures. In particular, if a syntactic complexity requires a large alphabet and for basic operations it is enough to use a constant number of letters, it is desirable to provide a separate stream which is most complex just for basic operations. Dialectsarenecessaryformostcomplexstreamsoflanguages,sinceotherwisethey wouldnotbe able tomeet upper bounds inmostclasses. Inparticular,since L ∪L =L , the state complexity n n n of union would be at most n in this case. Other kinds of dialects are possible (e.g. [7]), though permutational dialects are the most restricted. 2.4. Bifix-free languages. A language L is prefix-free if there are no words u,v ∈ Σ+ such that uv ∈ L and u ∈ L. A language L is suffix-free if there are no words u,v ∈ Σ+ such that uv ∈ L and v ∈L. A language is bifix-free if it is both prefix-free and suffix-free. COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES 5 The following properties of minimal DFAs recognizing prefix-free, suffix-free, and bifix-free lan- guages, adapted to our terminology, are well known (see e.g. [5, 6, 13, 24]): Lemma 1. Let D (Q,Σ,δ,0,F) be a minimal DFA recognizing a non-empty language L. Then L n is bifix-free if and only if: 1 There is an empty state, which is n − 1 by convention, that is, n − 1 is not final and (n−1)δ =n−1 for all a∈Σ. a 2 There exists exactly one final state, which is n−2 by convention, and its quotient is {ε}; thus (n−2)δ =n−1 for all a∈Σ. a 3 For u∈Σ+ and q ∈Q\{0}, if qδ 6=n−1, then 0δ 6=qδ . u u u The conditions (1) and (2) are sufficient and necessary for a prefix-free languages, and the condi- tions (1) and (3) are sufficient and necessary for a suffix-free language. It follows that a minimal DFA recognizing a non-empty bifix-free language must have at least n≥3 states. Since states 0, n−2, and n−1 are special in the case DFAs of bifix-free languages, we denote the remaining “middle” states by Q = {1,...,n−3}. Condition 3 implies that suffix-free and M so bifix-free are non-returning (see [14]), that is, there is no non-empty word w ∈ Σ+ such that L.w=L. Note that in the case of unary languages, there is exactly one bifix-free language for every state complexity n ≥ 3, which is {an−2}. The classes of unary prefix-free, unary suffix-free, and unary bifix-free languages coincide and we refer to it as to unary free languages. Thestatecomplexityofbasicoperationsonbifix-freelanguageswasstudiedin[5],wheredifferent witness languages were shown for particular operations. The syntactic complexity of bifix-free languages was shown to be (n−1)n−3 +(n−2)n−3 + (n−3)2n−3 for n≥6 [24]. Moreover,the transition semigroup of a minimal DFA D of a witness n language meeting the bound must be W≥6(n), which is a transition semigroup containing three bf types of transformations and can be defined as follows: Definition 2 (The largest bifix-free semigroup). W≥6(n) = {t∈T(n)| bf (type 1) {0,n−2,n−1}t={n−1} and Q t⊂Q ∪{n−2,n−1}, or M M (type 2) 0t=n−2 and {n−2,n−1}t={n−1} and Q t⊂Q ∪{n−1}, or M M (type 3) 0t∈Q and {n−2,n−1}t={n−1} and Q t⊆{n−2,n−1}}. M M Following [24], we say that an unordered pair {p,q} of distinct states from Q is colliding in M T(n) if there is a transformation t ∈ T(n) such that 0t = p and rt = q for some r ∈ Q . A pair M of states is focused by a transformation u ∈ T(n) if u maps both states of the pair to a single state r ∈Q ∪{n−2}. It is known that ([24]) in semigroup W≥6(n) there are no colliding pairs M bf and every possible pair of states is focused by some transformation, and it is the unique maximal transition semigroup of a minimal DFA of a bifix-free language with this property. 3. Complexity of bifix-free languages Inthissectionwesummarizeandcompleteknownresultsconcerningstatecomplexityofbifix-free regular languages. We start from the obvious upper bound for the maximal complexity of quotients. 6 COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES Proposition 3. Let L be a bifix-free language with state complexity n. Each (left) quotients of L have state complexity at most n−1, except L, {ε}, and ∅, which always have state complexities n, 2, and 1, respectively. Proof. Since bifix-free languages are non-returning, their non-initial quotients have at most state complexity n−1. (cid:3) 3.1. Boolean operations. In [5] it was shown that mn−(m+n) (for m,n≥4) is a tight upper bound for the state complexity of union and symmetric difference of bifix-free languages,and that to meet this bound a ternaryalphabetis required. Itwas alsoshownthere that mn−3(m+n−4) and mn−(2m+3n−9) (for m,n ≥ 4) are tight upper bounds for intersection and difference, respectively, and that a binary alphabet is sufficient to meet these bounds. Since the tight bound is smaller for unary free languages, the size of the alphabet cannot be reduced. Itmaybe interestingto observethatthe alphabetmustbe essentiallylargerto meetthe bounds in the case when m=3. Remark 4. For n ≥ 3, to meet the bound mn−(m+n) for union or symmetric difference with minimal DFAs D′ and D at least n−2 letters are required. 3 n Proof. For eachq ∈{1,...,n−2} state (1′,q) must be reachablein the product automaton. In D′ 3 state 0′ is the only state that can be mapped to 1′ by the transformation of some letter a. This means that at least n−2 different letters are required. (cid:3) 3.2. Product. Thetightboundfortheproductism+n−2,whichismetbyunaryfreelanguages. We show that there is no other possibly for the product of bifix-free languages, that is, L L has m n always state complexity m+n−2. Theorem 5. For m≥3, n≥3, for every bifix-free languages L′ and L , the product L′ L meets m n m n the bound m+n−2. Proof. Let D′ = (Q′,Σ,δ′,0′,{(n−2)′}) and D = (Q,Σ,δ,0,{n−2}) be minimal DFAs for L′ m n m and L , respectively. We use the well known construction for an NFA N recognizing the product n oftwo regularlanguages. Then Q′∪Q is the setofstates, 0′ is initialand n−2is a finalstate. We have δN being the transition function such that δN(p,a)={q} whenever δ′(p,a)=q for p,q ∈Q′, or δ(p,a) = q for p,q ∈ Q. Also, we have the ε-transition δN((m−2)′,ε) = {0}. We determinize N to D ; since every reachable subset has exactly one state from Q′, we can assume that the set P of states is Q′×2Q, so D =(Q′×2Q,Σ,δP,{0′},Q′×{{n−2}}). P Since m+n−2 is an upper bound for product, it is enough to show that at least m+n−2 states are reachable and pairwise distinguishable in D . We show that the states ((n−2)′,{0}), P (p,∅) for each p ∈ {0′,...,(m−3)′}, and ((m−1)′,{q}) for each q ∈ {1,...,n−1} are reachable anddistinguishable. LetRbethesetofthesestates. Fromthisplaceinthecontextofreachablitity and distinguishability we will consider only the states from R. Since Lm′ is prefix-free and Dm′ is minimal, every state (p,∅)∈R (where p∈{0′,...,(m−3)′}) is reached from (0′,∅) by a word reaching state p in D′ . Furthermore, state ((m −2)′,{0}) is m reached by a word w′ from non-empty language L′ . Every state ((m − 1)′,{q}) ∈ R (where m q ∈{1,...,n−1}) may be reached from state ((m−2)′,{0}) by a word w such that 0δ =q, and w hence by w′w in D. It remains to show distinguishability. Consider two distinct states (r,{q1}) and (r,{q2}) from R; then r ∈ {(m−2)′,(m−1)′} and q1,q2 ∈Q. These states are distinguishable by a word distinguishing q1 and q2 in Dn. COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES 7 Consider two states (p,∅) and (r,{q}) from R. There exists a word w such that (p,∅)δP = w ((m−2)′,{0}). We have (r,{q})δP = (rδ′ ,{qδ }) = ((m−1)′,{qδ }). Because 0 is reachable w w w w only by the empty word in D since L is suffix-free, we have qδ 6= 0. Then ((m−2)′,{0}) and n n w ((m−1)′,{qδ }) are distinguishable by our earlier considerations. w Finally,considertwostates(p1,∅)and(p2,∅)fromR. Thereexistsawordw whichdistinguishes p1 and p2 in Dm′ . Let w be a shortestsuchword. Then, without loss of generality, p1δw′ =(m−2)′ andp2δw′ 6=(m−2)′. SinceDm′ isprefix-free,foreveryproperprefixv ofwwehavep1δv′ 6=(m−2)′, and since w is shortest, we have p2δv′ 6=(m−2)′. Then (p1,∅)δwP =((m−2)′,{0}) and (p2,∅)δwP = (p2δw′ ,∅). If p2δw′ ∈{1′,...,(m−3)′} then ((m−2)′,{0}) and (p2δw′ ,∅) are distinguishable by our earlier considerations. Otherwise p2δw′ must be (m−1)′, and since ((m−1)′,∅) is equivalent to ((m−1)′,n−1)), it is also distinguishable from ((m−2)′,{0}). (cid:3) 3.3. Star. Thetightboundforstarisn−1,whichismetbybinarybifix-freelanguages[5]. Herewe provide a complete characterizationfor the state complexity ofL∗ and showthat there are exactly n two possibilities for its state complexity: n−1 and n−2. This may be compared with prefix-free languages, where there are exactly three possibilities for the state complexity L∗: n, n−1, and n n−2 [18]. Theorem 6. Let n≥3 and let D =(Q,Σ,δ,{n−2},0)be a minimal DFA of a bifix-free language n L . If the transformation of some a ∈ Σ maps some state from {0,...,n−3} to n−1, then L∗ n n has state complexity n−1. Otherwise it has state complexity n−2. Proof. LetN =(Q,Σ,δ ,{0},{0})betheNFAobtainedfromD bythestandardconstructionfor N n L∗: We have δN(p,a)={q} whenever δ(p,a)=q, and there is the ε-transitionδN(n−2,ε)={0}. n LetD =(2Q,Σ,δS,{0},{0}∪2Q\{0}) be the DFA obtainedby the powersetconstructionfromN. S SinceinD onlyn−2isfinalandthetransformationofeverylettermapsittoemptystaten−1, n we know that only the subsets of the forms {q}, {q,n−1}, {0,n−2}, and {0,n−2,n−1}, where q ∈Q,arereachableinD . Since n−1is empty, subsets{q}and{q,n−1}withq ∈{0,...,n−2} S areequivalent,andsincesubsetswith0arefinal,{0},{0,n−2},and{0,n−2,n−1}areequivalent. Firstobservethateverysubset{q}withq ∈{0,...,n−3}isreachableinD byawordreaching S q in D . Also, if n−1 is reachable from some state q ∈{0,...,n−3} in D by the transformation n n of some letter, then {n−1} can be reached in D from {q} by the transformation of this letter. S Otherwise,inD ,forallwordsw andallsubsetsS containingq ∈{0,...,n−3}weknowthatSδS S w also contains a state from {0,...,n−3}; thus subset {n−1} cannot be reached. Let p,q ∈ {0,...,n−3,n−1} be two distinct states; we will show that {p} and {q} are dis- tinguishable in D . They are distinguishable in D , which means that there exists a word w 6= ε S n such that exactly one of the states pδ and qδ is final state n−2. Let w be a shortestsuch word, w w and without loss of generality assume that pδ = n−2. For any non-empty proper prefix v of w, w we have pδ 6= n−2 because L is prefix-free, qδ 6= 0 because L is suffix-free, and qδ 6= n−2 v n v n v because w is shortest. Hence, in D , {p}δS ={0,n−2} and {q}δS ={r} with r ∈{1,...,n−3}. S w w Thus w distinguishes both subsets. (cid:3) 3.4. Reversal. For the state complexity of a reversebifix-free language,it was shownin [5, Theo- rem 6] that for n≥3 the tight upper bound is 2n−3+2, and that a ternary alphabet is sufficient. We show that the alphabet size cannot be reduced, and characterize the transition semigroup of the DFAs of witness languages. 8 COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES Theorem 7. For n ≥ 6, to meet the bound 2n−3+2 for reversal, a witness language must have at least three letters. Moreover, for n ≥ 5 the transition semigroup T(n) of a minimal DFA D (Q,Σ,δ,0,{n−2}) accepting a witness language must be a subsemigroup of W≥6(n). n bf Proof. We use the standard reversal construction: let N be the NFA obtained from DFA D n by reversing all edges, making n − 2 an initial state, and making 0 a final state. Then N = (Q,Σ,δN,{n−2},{0}), and δN(q,a) = δ−1(q,a) = qδ−1. We use the powerset construction to a determinize N to D . R RemindthatQ ={1,...,n−3}. LetR(n) be the transitionsemigroupofD . By the reversal M R construction, R(n) consists of all transformations t−1 for t ∈ T(n). As it was shown in the proof of [5, Theorem6], to achieve the upper bound, in particular,each subset of Q must be reachable M in D . R FirstweshowthatthetransitionsemigroupT(n)ofD isasubsemigroupofW≥6(n). Everypair n bf {p,q}⊆Q ,wherep6=q,mustbereachableinD . Thismeansthatthereexistsatransformation M R t−1 ∈ R(n) such that {n−2}t−1 = {p,q}. We have pt = n−2 and qt = n−2, so p and q are focused by t∈T(n). Then we know that every pair of states in Q is focused, and since W≥6(n) M bf is the unique maximal transition semigroup with this property ([24]), we have T(n)⊆W≥6(n). bf Nowweshowthatabinaryalphabet,sayΣ={a,b},isnotenoughtoreachtheupperbound. Let a∈Σ be a letter such that 0δ =q ∈Q . Since T(n)⊆W≥6(n), we haveQ δ ⊆{n−2,n−1}. a M bf M a Moreover, for all p ∈ Q we have {p}δ−1 ⊆ {0}. Also, the set {n−2}δ−1 is empty, because if it M a b contains some state p ∈ Q , then at most two of the 2n−4 subsets of Q containing p might be M M reachable,sincepδ ={n−2}andpδ ∈{n−2,n−1};thennotallsubsetsofQ arereachablesince b a M n≥6. Because{n−2,n−1}δ ={n−2,n−1}δ ={n−1},anysubsetcontainingn−2isreachable a b inD onlybytheemptyword,andanysubsetcontainingn−1isunreachable. Hence,anon-empty R subset S of Q must be reachable in D by a word of the form abi, that is, S ={n−2}δ−1. Let M R bia C be the states from Q that are fixed points or belong to a cycle in δ . Observe that δ−1 does M b b notchangethe cardinalityinC, thatis,for anysubsetT ⊆Q we have|T ∩C|=|Tδ−1∩C|. Thus, b if C is non-empty, only subsets with the same cardinality in C are reachable. If C is empty, then Sδ−1 ∩Q = ∅, thus at most n−2 subsets of Q are reachable; since n ≥ 6, not all subsets of bn−3 M M Q are reachable. (cid:3) M It is known that in the case of the class of all regular languages the resulted language of the reversaloperationcanhaveanystate complexityinrangeofintegers[log n,2n][17,23], thusthere 2 arenogaps(magic numbers)intheintervalofpossiblestatecomplexities. Thenexttheoremstates that the situation is similar for the case of bifix-free languages. Theorem 8. If L is a bifix-free language with state complexity n ≥ 3, then the state complexity n of LR is in [3+log (n−2),2+2n−3]. Moreover, all values in this range are attainable by LR for n 2 n some bifix-free language L , whose a minimal DFA has transition semigroup that is a subsemigroup n of W≥6(n). bf Proof. Note that if L is a bifix-free language, then so LR is. Also we know that (LR)R =L . n n n n Suppose for a contradiction that there is some L whose LR has state complexity α < 3+ n n log (n−2). We have LR with state complexity α whose reverse (LR)R =L has state complexity 2 n n n n. However,since n>2α−3+2, this means that (LR)R exceeds the upper bound for reversal. n Now it is enough to show that every value from [n,2+2n−3] is attainable, because to reach α∈[3+log (n−2),n−1] for some LR, we can use L =LR whose reverse (LR)R =L has state 2 n α n n n complexity n∈[α+1,2+2α−3]. COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES 9 Let α ∈ [n,2+2n−3]. We construct the DFA D (Q,Σ,δ,0,{n−2}) as follows: For each q ∈ n {1,...,n−3}, we put the letter a : (0 → q)({1,...,n−2 → n−1). We choose α−2 subsets q of {1,...,n − 3} in such a way that we always have ∅ and all singletons {q}, while the other α−3−(n−3) subsets are chosen arbitrarily. Since α ≤ 2+2n−3, n ≥ 3, and the number of the subsets is 2n−3, we can always make that choice. For each of the chosen subset, we put the letter b : (S →n−2)(Q\S →n−1). S Observe that D is minimal: every state q ∈ {1,...,n−3} is reachable by a , and n−2 and n q n−1 are reachable using some b . State 0 is distinguished since a maps all the other states to S q empty state n−1. Two distinct states p,q ∈ {1,...,n−3} are distinguished by b , since its {p} action maps p to n−2 and q to n−1. Also, the transformations of a and b are of type 3 and q S type 1 from Definition 2, and so the transition semigroup of D is a subsemigroup of W≥6(n). n bf LetD (2Q,Σ,δ−1,{n−2},{0}∪2Q\{0})betheautomatonrecognizingLR obtainedbyreversing R n the edges of D and determinization. We show that there are exactly α reachable subsets in D . n R The transformationδ−1 allowsto reachthe subsetS from{n−2}. Thenfor anynon-emptyS, δ−1 bS aq for some q ∈S allows to reach{0} from S. Thus we have reachable{0}, {n−2},and α−2 chosen subsets S. No subset containing n−1 can be reached, and {n−2} is the only reachable subset containingn−2. Since{0}isreachableonlybythetransformationsδ−1,andthesetransformations aq map Q\{q} to ∅, no other subset containing 0 can be reached. Since the transformations δ−1 of aq the letters a do not map any state to a state in {1,...,n−3}, to reach S ⊆ {1,...,n−3} we q must use the transformations of the letters of type b . But this is possible only from {n−2} for S the chosen α−2 subsets. Finally, observe that every two distinct subsets S,S′ ⊆ {1,...,n−3} are distinguished by a−1 q such that q ∈ S⊕S′. Subset {0} is the final subset, and {n−2} is distinguished as it is the only subset for which b does not result in ∅. (cid:3) {q} 3.5. Atom complexities. Here we prove a tight upper bound on the number and the state com- plexities of atoms of a bifix-free language. Remind that for S ⊆Q an atom A = K ∩ K is a non-empty set. Then for any S Tq∈S q Tq∈Q\S q w ∈Σ∗ we have A .w ={u|wu∈A }= K .w∩ K .w. S S \ q \ q q∈S q∈S A quotient of a quotient of L is also a quotient of L, and therefore A .w has the following form: S A .w = K ∩ K , S \ q \ q q∈X q∈Y where |X|≤|S|, |Y|≤n−|S|, and X,Y ⊆Q are disjoint. Using the approach from Iv´an [16] we define the DFA D =(Q ,Σ,δ,(S,Q\S),F ) such that: S S S • Q ={(X,Y)|X,Y ⊆Q,X∩Y =∅}∪{⊥}. S • For all a ∈ Σ, (X,Y)a = (Xa,Ya) if Xt ∩Yt = ∅, and (X,Y)a = ⊥ otherwise; also a a ⊥t =⊥. a • F ={(X,Y)|X ⊆{n−2},Y ⊆Q\{n−2}}. S Then DFA D recognizes A , and so if D recognizes a non-empty language, then A is an S S S S atom. Every quotient of an atom is represented by a pair (X,Y). Theorem9. SupposethatL isabifix-freelanguagerecognizedbyaminimalDFAD (Q,Σ,δ,0,{n− n n 2}). Then there are at most 2n−3+2 atoms of L and the quotient complexity of κ(A ) of atom n S 10 COMPLEXITY OF REGULAR BIFIX-FREE LANGUAGES A satisfies: S ≤2n−2+1 if S =∅; κ(AS)=n if S ={0}; =2 if S ={n−2}; ≤3+P|xS=|1Pny=−03−|S|(cid:0)n−x3(cid:1)(cid:0)n−y3−x(cid:1) if ∅6=S ⊆{1,...,n−3}. Proof. We follow similarly to the proof from [9] for the class of suffix-free languages. If n−1∈S then A would be empty, because the quotient of n−1 is the empty language,and S so will not form an atom. Suppose that 0∈S. Since Ln is suffix-free, we have K0∩Kq =∅ for q 6=0, so if {0,q}⊆S then AS would be empty. Thus S ={0}, A0 =K0, and the quotient of 0 has complexity n, since Dn is minimal. Suppose that n−2∈S. Since Kn−2 ={ε} and this is the quotient containing the empty word, we have Kn−2 ∩Kq = ∅ for q 6= n −2, so if {n −2,q} ⊆ S then AS would be empty. Thus S ={n−2}, and the quotient of n−2 has complexity 2. Itfollowsthatthereareatmost2n−3+2atoms: A ,A ,andA foranyS ⊆{1,...,n−3}. {0} {n−2} S Suppose that S =∅. Then A = K . For any w ∈Σ+ we haveA .w = K , for some ∅ Tq∈Q q ∅ Tq∈Y q Y ⊆Q\{0}. Wecanassumethatn−1∈Y sinceKn−1 =Σ∗. Thusthereareatmost2n−2 choices of Y, which together with the initial quotient A yields the quotient complexity 2n−2+1. ∅ Suppose that ∅6=S ⊆{1,...,n−3}. Consider the non-empty quotient A .w for some w ∈Σ+ S be representedas K ∩ K . So X has at least one and at most |S| states from Q\{0}, Tq∈X q Tq∈Y q Y ⊆ Q\{0} and always contains n−1. If n−2 ∈ X then if this quotient is non-empty then it must be {ε}, and if n−2∈Y then the quotient may be represented by (X,Y \{n−2}). If 0δ ∈{n−2,n−1} then Y \{n−2,n−1} contains from0 to at most n−3−|S| states from w Q\({0,n−2,n−1}∪X). Suppose that 0δ =q ∈Q . Since the languageis suffix-free, the path w M inδ starting at0 mustendin n−1,asotherwise 0δi =qδi 6=n−1for some i,whichcontradicts w w w Lemma 1(Condition 3). But then there exists a state p ∈ Q such that pδ ∈ {n−2,n−1}. If M w p ∈ S, then n−1 ∈ X, and so (X,Y) represents the empty quotient. If p ∈ Q\S, then again Y \{n−2,n−1} contains at most n−3−|S| states. SoforeverychoiceofX wehave0≤|Y\{n−2,n−1}|≤n−3−|S|fromQ\({0,n−2,n−1}∪X), which together with the initial quotient, {ε} quotient, and the empty quotient yields the formula in the theorem. (cid:3) Theorem 10. For n≥6, let L bethelanguage recognizedbytheDFAD(Q,Σ,δ,0,{n−2}),where n Σ={a,b,c,d,e1...,en−3}, and δ is defined as follows: δ : (0→1)((Q\{0})→n−1), a δ : ({0,n−2}→n−1)(1,2), b δ : ({0,n−2}→n−1)(1,...,n−3), c δ : ({0,n−2}→n−1)(2→1), d δ : ({0,n−2}→n−1)(q →n−2) for q ∈Q . eq M Then D is minimal, L is bifix-free and it meets the upper bounds for the number and complexities n of atoms from Theorem 9. Proof. It is easy to observe that D is minimal, it recognizes a bifix-free language L , and its n transition semigroup is a subsemigroup of W≥6(n). We show that the atom complexities κ(A ) bf S meet the bounds, which also implies that there are 2n−3+2 atoms.