A Coloring Problem for Sturmian and Episturmian Words Aldo de Luca1, Elena V. Pribavkina2, and Luca Q. Zamboni3 3 1 1 Dipartimento diMatematica 0 Universit`a di Napoli Federico II, Italy 2 [email protected] 2 Ural Federal University,Ekaterinburg,Russia n [email protected] a J 3 Universit´eClaude Bernard Lyon 1, France and Universityof Turku,Finland 2 2 [email protected] ] O Abstract. WeconsiderthefollowingopenquestioninthespiritofRam- C sey theory: Given an aperiodic infinite word w, does there exist a finite h. coloring ofitsfactors suchthatnofactorization ofw ismonochromatic? t We show that such a coloring always exists whenever w is a Sturmian a word or a standard episturmian word. m [ 1 Introduction 1 v 3 Ramseytheory(includingVanderWaerden’stheorem)(see[5])isatopicofgreat 6 interestincombinatoricswithconnectionstovariousfieldsofmathematics.Are- 2 markableconsequenceofRamsey’sInfinitary Theoremappliedto combinatorics 5 on words yields the following unavoidable regularity of infinite words1: . 1 0 Theorem 1. Let A be a non-empty alphabet, w be an infinite word over A, C 3 a finite non-empty set (the set of colors), and c : Fact+w → C any coloring of 1 the set Fact+w of all non-empty factors of w. Then there exists a factorization : v of w of the form w = VU U ···U ··· such that for all positive integers i and 1 2 n i X j, c(Ui)=c(Uj). r a One can ask whether given an infinite word there exists a suitable coloring mapabletoavoidthe monochromaticityofallfactorsinallfactorizationsofthe word.Moreprecisely,thefollowingvariantofTheorem1wasposedasaquestion by T.C. Brown [3] and, independently, by the third author [11]: Question 1 . Let w be an aperiodic infinite word over a finite alphabet A. Does there exist a finite coloring c : Fact+w → C with the property that for any factoring w =U U ···U ···, there exist positive integers i,j for which c(U )6= 1 2 n i c(U ) ? j 1 Actually,theproofofTheorem1givenbySchu¨tzenbergerin[10]doesnotuseRam- sey’s theorem Let us observe that for periodic words the answer to the preceding question is trivially negative. Indeed, let w = Uω, and c : Fact+w → C be any finite coloring.By factoring w as w=U U ···U ···, where for all i≥1, U =U one 1 2 n i hasc(U )=c(U )forallpositiveintegersiandj.Itiseasytoseethatthereexist i j non recurrent infinite words w and finite colorings such that for any factoring w = U U ···U ··· there exist i 6= j for which c(U ) 6= c(U ). For instance, 1 2 n i j consider the infinite word w = abω and define the coloring map as follows: for any non-empty factor U of w, c(U)=1 if it contains a and c(U)=0,otherwise. Then for any factoring w = U U ···U ···, c(U ) = 1 and c(U ) = 0 for all 1 2 n 1 i i>1. It is not very difficult to prove that there exist infinite recurrent words for which Question 1 has a positive answer, for instance square-free, overlap-free words, and standard Sturmian words [11]. In this paper we show that Question 1 has a positive answer for every Stur- mianwordwherethenumberofcolorsisequalto3.Thissolvesaproblemraised in [3] and in [11]. The proof requires some noteworthy new combinatorial prop- ertiesofSturmianwords.Moreover,weprovethatthe sameresultholdstruefor aperiodic standard episturmian words by using a number of colors equal to the number of distinct letters occurring in the word plus one. For all definitions and notation not explicitly given in the paper, the reader isreferredtothe booksofM.Lothaire[7,8];forSturmianwordssee[8,Chap.2] and for episturmian words see [4,6] and the survey of J. Berstel [1]. 2 Sturmian words There exist several equivalent definitions of Sturmian words. In particular, we recall (see, for instance, Theorem 2.1.5 of [8]) that an infinite word s ∈ {a,b}ω is Sturmian if and only if it is aperiodic and balanced, i.e., for all factors u and v of s such that |u|=|v| one has: ||u| −|v| |≤1, x∈{a,b}, x x where |u| denotes the number of occurrences of the letter x in u. Since a Stur- x mian word s is aperiodic, it must have at least one of the two factors aa and bb. However,from the balance property, it follows that a Sturmian wordcannot have both the factors aa and bb. Definition 1. We say that a Sturmian word is of type a (resp. b) if it does not contain the factor bb (resp. aa). We recall that a factor u of a finite or infinite word w over the alphabet A is called right special (resp. left special) if there exist two different letters x,y ∈A such that ux,uy (resp. xu,yu) are factors of w. AdifferentequivalentdefinitionofaSturmianwordisthefollowing:Abinary infinite word s is Sturmian if for every integer n ≥ 0, s has a unique left (or equivalently right)specialfactorof lengthn.It followsfromthis thats is closed under reversal, i.e., if u is a factor of s so is its reversal u∼. A Sturmianwords is calledstandard (or characteristic) if allits prefixes are left special factors of s. As is well known, for any Sturmian word s there exists a standard Sturmian word t such that Facts = Factt, where for any finite or infinite word w, Factw denotes the set of all its factors including the empty word. Definition 2. Let s ∈ {a,b}ω be a Sturmian word. A non-empty factor w of s is rich in the letter z ∈ {a,b} if there exists a factor v of s such that |v| = |w| and |w| >|v| . z z From the aperiodicity and the balance property of a Sturmian word one easily derives that any non-empty factor w of a Sturmian word s is rich either in the letter a orinthe letterb butnotinbothletters.Thusone canintroduce forany given Sturmian word s a map r :Fact+s→{a,b} s definedas:foranynon-emptyfactorw ofs,r (w)=z ∈{a,b}ifw is richinthe s letter z. Clearly, r (w)=r (w∼) for any w ∈Fact+s. s s For any letter z ∈{a,b} we shall denote by z¯the complementary letter of z, i.e., a¯=b and¯b=a. Lemma 1. Let w be a non-empty right special (resp. left special) factor of a Sturmian word s. Then r (w) is equal to the first letter of w (resp. r (w) is s s equal to the last letter of w). Proof. Write w =zw′ withz ∈{a,b}andw′ ∈{a,b}∗.Since w is arightspecial factor of s one has that v =w′z¯is a factor of s. Thus |w|=|v| and |w| >|v| , z z whence r (w) =z. Similarly, if w is left special one deduces that r (w) is equal s s to the last letter of w. ⊓⊔ 3 Preliminary Lemmas Lemma 2. Let s be a Sturmian word such that s= U , Y i i≥1 where the U ’s are non-empty factors of s. If for every i and j, r (U )=r (U ), i s i s j then for any M >0 there exists an integer i such that |U |>M. i Proof. Suppose to the contrary that for some positive integer M we have that |U | ≤ M for each i ≥ 1. This implies that the number of distinct U ’s in the i i sequence (U ) is finite, say t. Let r (U )=x∈{a,b} for all i≥1 and set for i i≥1 s i each i≥1: |U | i x f = . i |U | i Thus {f | i ≥ 1} is a finite set of at most t rational numbers. We set r = i min{f |i≥1}. i Let f (s) be the frequency of the letter x in s defined as x |s | [n] x f (s)= lim , x n→∞ n where for every n≥1, s denotes the prefix of s of length n. As is well known [n] (see Prop. 2.1.11 of [8]), f (s) exists and is an irrational number. x Let us now prove that r >f (s). From Proposition 2.1.10 in [8] one derives x that for all V ∈Facts |V|f (s)−1<|V| <|V|f (s)+1. x x x Now foranyi≥1,U is richin the letter x, sothatthere exists V ∈Facts such i i that |U |=|V | and |U | >|V | . From the preceding inequality one has: i i i x i x |U | =|V | +1>|V |f (s)=|U |f (s), i x i x i x i x so that for all i≥1, f >f (s), hence r >f (s). i x x For any n>0, we can write the prefix s of length n as: [n] s =U ···U U′ , [n] 1 k k+1 for a suitable k ≥0 and U′ a prefix of U . Thus k+1 k+1 k |s | = |U | +|U′ | . [n] x X i x k+1 x i=i Since |U | =f |U |≥r|U | and |U′ |≤M, one has i x i i i k+1 k |s | ≥r |U |=r(n−|U′ |)≥rn−rM. [n] x X i k+1 i=1 Thus |s[n]|x M ≥r−r , n n and |s | [n] x f (s)= lim ≥r, x n→∞ n a contradiction. ⊓⊔ InthefollowingweshallconsidertheSturmianmorphismR ,thatwesimply a denote R, defined as: R(a)=a and R(b)=ba. (1) Foranyfinite orinfinite wordw,Prefw willdenote the setofallits prefixes. The following holds: Lemma 3. Let s be a Sturmian word and t ∈ {a,b}ω such that R(t) = s. If either 1) the first letter of t (or, equivalently, of s) is b or 2) the Sturmian word s admits a factorization: s=U ···U ··· , 1 n where each U , i ≥ 1, is a non-empty prefix of s terminating in the letter a i and r (U )=r (U ) for all i,j ≥1, s i s j then t is also Sturmian. Proof. Let us prove that in both cases t is balanced. Suppose to the contrary thatt isunbalanced.Then(see Prop.2.1.3of[8])there wouldexistsv suchthat ava,bvb∈Factt. Thus aR(v)a, baR(v)ba∈Facts. If ava 6∈ Preft, then t = λavaµ, with λ ∈ {a,b}+ and µ ∈ {a,b}ω. There- fore R(t) = R(λ)R(ava)R(µ). Since the last letter of R(λ) is a, it follows that aaR(v)a ∈ Facts. As baR(v)b ∈ Facts we reach a contradiction with the bal- ance property of s. In case 1), t begins in the letter b, so that ava 6∈ Preft and then t is balanced. In case 2) suppose that ava ∈ Preft. This implies that aR(v)a∈Prefs. From the preceding lemma in the factorization of s in prefixes there exists an integer i > 1 such that |U | > |aR(v)a|. Since U terminates i i−1 in a and U U ∈ Facts, it follows that aaR(v)a ∈ Facts and one contradicts i−1 i again the balance property of s. Hence, t is balanced. Trivially, in both cases t is aperiodic, so that t is Sturmian. ⊓⊔ Let us remark that, in general, without any additional hypothesis, if s = R(t), then t need not be Sturmian. For instance, if f is the Fibonacci word f = abaababaaba···, then af is alsoa Sturmianword.However,itis readily verified that in this case the word t such that R(t)=s is not balanced, so that t is not Sturmian. For any finite or infinite word w over the alphabet A, alphw denotes the set of all distinct letters of A occurring in w. We will make use of the following lemma: Lemma 4. Let s be a Sturmian word having a factorization s=U ···U ··· , 1 n where for i ≥ 1, U are non-empty prefixes of s. Then for any p ≥ 1, U 6= cp i 1 where c is the first letter of s. Proof. Suppose that U = cp. Since s is aperiodic there exists a minimal in- 1 teger j such that card(alphU ) = 2. Since U is a prefix of s, one has then j j U ···U U = U ξ, with ξ ∈ {a,b}∗. As U ···U = cq for a suitable q ≥ p, 1 j−1 j j 1 j−1 it follows that ξ =cq and U ∈cc∗, a contradiction. ⊓⊔ j 4 Main results Proposition 1. Let s be a Sturmian word of type a having a factorization s=U ···U ··· , 1 n where for i≥1, U are non-empty prefixes of s such that r (U )=r (U ) for all i s i s j i,j ≥1. Then one of the following two properties holds: i) All U , i≥1, terminate in the letter a. i ii) For all i≥1, U a∈Prefs. i Proof. Let us first suppose that s begins in the letter b. All prefixes U , i ≥ 1, i of s begin in the letter b and,as s is of type a,have to terminate inthe letter a. Thus in this case Property i) is satisfied. Let us then suppose that s begins in the letter a. Now either all prefixes U , i i ≥ 1, terminate in the letter a or all prefixes U , i ≥ 1, terminate in the letter i b or some of the prefixes terminate in the letter a and some in the letter b. We have then to consider the following cases: Case 1. All prefixes U , i≥1, terminate in the letter b. i Since s is of type a, no one of the prefixes U , i ≥ 1, can be a right special i factor. This implies that U a∈Prefs and Property ii) is satisfied. i Case 2.Some ofthe prefixes U , i≥1,terminate in the letter a and some in the i letter b. We have to consider two subcases: a). r (U )=b, for all i≥1. s i As all U , i ≥ 1, begin in a, if any U were right special, then by Lemma 1, i i r (U )=a, a contradiction. It follows that for all i≥1, U a∈Prefs. s i i b). r (U )=a, for all i≥1. s i Some of the prefixes U , j ≥ 1, terminate in a (since otherwise we are in j Case 1). Let U be a prefix terminating in a for a suitable k ≥ 1. If a prefix k U terminates in b, then aU is not a factor of s. Indeed, otherwise, the word i i aU b−1 is such that |aU b−1| = |U | and |aU b−1| < |U | , so that r (U ) = b a i i i i b i b s i contradiction.Thus one derivesthat all U with l ≥k terminate in a. Moreover, l if some U terminate in b, by Lemma 2 there exists j > k such that U has the i j prefix U , so that U U ∈Facts. Since U terminates in a, one has that aU i j−1 i j−1 i is a factor of s, a contradiction. Thus all U , i≥1, terminate in a. ⊓⊔ i Proposition 2. Let s be a Sturmian word having a factorization s=U ···U ··· , 1 n where for i≥1, U are non-empty prefixes of s such that r (U )=r (U ) for all i s i s j i,j ≥1. Then there exists a Sturmian word t such that t=V ···V ··· , 1 n whereforalli≥1,V arenon-emptyprefixesoft,r (V )=r (V )for alli,j ≥1, i t i t j and |V |<|U |. 1 1 Proof. We can suppose without loss of generality that s is a Sturmian word of type a. From Proposition 1 either all U , i ≥ 1, terminate in the letter a or for i all i≥1, U a∈Prefs. We consider two cases: i Case 1. For all i≥1, U a∈Prefs. i We can suppose that s begins in the letter a. Indeed, otherwise, if the first letter ofs is b,thenallU , i≥1,begininthe letter band,ass is oftype a,they i have to terminate in the letter a. Thus the case that the first letter of s is b will be considered when we will analyze Case 2. We consider the injective endomorphism of {a,b}∗, L , or simply L, defined a by L(a)=a and L(b)=ab. Since s is of type a, the first letter of s is a, and X = {a,ab} is a code having a finite deciphering delay (cf. [2]), the word s can be uniquely factorized by the elements of X. Thus there exists a unique word t ∈{a,b}ω such that s = L(t). The following holds: 1. The word t is a Sturmian word. 2. For any i≥1 there exists a non-empty prefix V of t such that L(V )=U . i i i 3. The word t can be factorized as t=V ···V ··· . 1 n 4. |V |<|U |. 1 1 5. For all i,j ≥1, r (V )=r (V ). t i t j Point1.ThisisaconsequenceofthefactthatLisastandardSturmianmorphism (see Corollary 2.3.3 in Chap. 2 of [8]). Point 2. For any i ≥ 1, since U a ∈ Prefs and any pair (c,a) with c ∈ {a,b} is i synchronizing for X∞ =X∗∪Xω (cf. [2]), one has that U ∈X∗, so that there i exists V ∈Preft such that L(V )=U . i i i Point3.OnehasL(V ···V ···)=U ···U ···=s=L(t).Thust=V ···V ···. 1 n 1 n 1 n Point 4. By Lemma 4, U is not a power of a so that in U there must be at 1 1 least one occurrence of the letter b. This implies that |V |<|U |. 1 1 Point 5. We shall prove that for all i ≥ 1, r (V ) = r (U ). From this one has t i s i that for all i,j ≥1, r (V )=r (V ). t i t j Since t is a Sturmian word, there exists V′ ∈Factt such that i |V |=|V′| and either |V | >|V′| or |V | <|V′| . i i i a i a i a i a In the first case r (V )=a and in the second case r (V )=b. Let us set t i t i F =L(V′). i i Since U =L(V ), from the definition of the morphism L one has: i i |F | =|V′| +|V′| =|V′|, |F | =|V′| . (2) i a i a i b i i b i b |U | =|V | +|V | =|V |, |U | =|V | . (3) i a i a i b i i b i b Let us first consider the case r (V ) = a, i.e., |V | = |V′| +1 and |V | = t i i a i a i b |V′| −1. From the preceding equations one has: i b |F |=|U |+1. i i Moreover,fromthe definitionof Lone has thatF begins in the letter a.Hence, i |a−1F |=|U | and|a−1F | =|F | −1=|U | −1.Thus|U | >|a−1F | .Since i i i a i a i a i a i a a−1F ∈Facts, one has i r (U )=r (V )=a. s i t i Letusnowconsiderthecaser (V )=b,i.e.,|V | =|V′| −1and|V | =|V′| +1. t i i a i a i b i b From (2) and (3) one derives: |U |=|F |+1, i i and |U | > |F | . Now F a is a factor of s. Indeed, F = L(V′) and for any i b i b i i i c ∈ {a,b} such that V′c ∈ Factt one has L(V′c) = F L(c). Since for any letter i i i c, L(c) begins in the letter a it follows that F a∈Facts. Since |F a|=|U | and i i i |U | >|F | =|F a| , one has that U is rich in b. Hence, r (U )=r (V )=b. i b i b i b i s i t i Case 2. All U , i≥1, terminate in the letter a. i We consider the injective endomorphism of {a,b}∗, R , or simply R, defined a in (1). Since s is of type a and X = {a,ba} is a prefix code, the word s can be uniquely factorized by the elements of X. Thus there exists a unique word t∈{a,b}ω such that s=R(t). The following holds: 1. The word t is a Sturmian word. 2. For any i≥1 there exists a non-empty prefix V of t such that R(V )=U . i i i 3. The word t can be factorized as t=V ···V ··· . 1 n 4. |V |<|U |. 1 1 5. For all i,j ≥1, r (V )=r (V ). t i t j Point 1. From Lemma 3, since R(t)=s it follows that t is Sturmian. Point 2. For any i ≥ 1, since U terminates in the letter a and any pair (a,c) i with c ∈ {a,b} is synchronizing for X∞, one has that U ∈ X∗, so that there i exists V ∈Preft such that R(V )=U . i i i Point3.OnehasR(V ···V ···)=U ···U ···=s=R(t).Thust=V ···V ···. 1 n 1 n 1 n Point 4. By Lemma 4, U is not a power of the first letter c of s, so that in U 1 1 theremustbeatleastoneoccurrenceoftheletterc¯.Thisimpliesthat|V |<|U |. 1 1 Point 5. We shall prove that for all i ≥ 1, r (V ) = r (U ). From this one has t i s i that for all i,j ≥1, r (V )=r (V ). t i t j Since t is a Sturmian word, there exists V′ ∈Factt such that i |V |=|V′| and either |V | >|V′| or |V | <|V′| . i i i a i a i a i a In the first case r (V )=a and in the second case r (V )=b. Let us set t i t i F =R(V′). i i Since U =R(V ),fromthe definitionofthe morphismRone hasthatequations i i (2) and (3) are satisfied. Let us first consider the case r (V ) = a, i.e., |V | = |V′| +1 and |V | = t i i a i a i b |V′| −1. From the preceding equations one has: i b |F |=|U |+1. i i From the definition of the morphism R one has that F = R(V′) terminates in i i the letter a. Hence, |F a−1| = |U | and |F a−1| = |F | −1 = |U | −1. Thus i i i a i a i a |U | =|F a−1| +1, so that U is rich in a and r (U )=r (V )=a. i a i a i s i t i Letusnowsupposethatr (V )=b,i.e.,|V | =|V′| −1and|V | =|V′| +1. t i i a i a i b i b From (2) and (3) one derives: |U |=|F |+1, i i and |U | > |F | . We prove that aF ∈ Facts. Indeed, F = R(V′) and for any i b i b i i i c∈{a,b} such that cV′ ∈Factt one has R(c)R(V′)=R(c)F . Note that such a i i i lettercexistsalwaysastisrecurrent.Sinceforanyletterc,R(c)terminatesinthe letter ait followsthataF ∈Facts.Since |aF |=|U | and|U | >|aF | =|F | , i i i i b i b i b one has that U is rich in b. Hence, r (U )=r (V )=b. ⊓⊔ i s i t i Theorem 2. Let s be a Sturmian word having a factorization s=U ···U ··· , 1 n whereeachU ,i≥1,isanon-emptyprefixofs.Thenthereexistintegersi,j ≥1 i such that r (U )6=r (U ). s i s j Proof. Let s be a Sturmian word and suppose that s admits a factorization s=U ···U ··· , 1 n where for i ≥ 1, U are non-empty prefixes such that for all i,j ≥ 1, r (U ) = i s i r (U ). Among all Sturmian words having this property we can always consider s j a Sturmian word s such that |U | is minimal. Without loss of generality we can 1 suppose that s is of type a. By Proposition 2 there exists a Sturmian word t such that t=V ···V ··· , 1 n where for all i ≥ 1, V are non-empty prefixes, r (V ) = r (V ) for all i,j ≥ 1, i t i t j and |V |<|U |, that contradicts the minimality of the length of U . ⊓⊔ 1 1 1 Theorem 3. Let s be a Sturmian word. There exists a coloring c of the non- empty factors of s, c:Fact+s→{0,1,2} such that for any factorization s=V ···V ··· 1 n in non-empty factors V , i≥1, there exist integers i,j such that c(V )6=c(V ). i i j Proof. Let us define the coloring c as: for any V ∈Fact+s 0if V is not a prefix of s c(V)= 1if V is a prefix of s and r (V)=a s 2if V is a prefix of s and r (V)=b s Letussupposetocontrarythatforalli,j,c(V )=c(V )=x∈{0,1,2}.Ifx=0 i j we reach a contradiction as V is a prefix of s so that c(V ) ∈ {1,2}. If x = 1 1 1 or x=2, then all V have to be prefixes of s having the same richness, but this i contradicts Theorem 2. ⊓⊔ 5 The case of standard episturmian words An infinite word s over the alphabet A is called standard episturmian if it is closed under reversal and every left special factor of s is a prefix of s. A word s∈Aω is called episturmian if there exists a standard episturmian t∈Aω such that Facts = Factt. We recall the following facts about episturmian words [4, 6]: Fact 1. Every prefix of an aperiodic standard episturmian word s is a left special factor of s. In particular an aperiodic standard episturmian word on a two-letter alphabet is a standard Sturmian word. Fact 2. If s is a standard episturmian word with first letter a, then a is separating, i.e., for any x,y ∈A if xy ∈Facts, then a∈{x,y}. Foreachx∈A,letL denotethestandardepisturmianmorphism[6]defined x for any y ∈A by L (y)=x if y =x and L (y)=xy for x6=y. x x Fact3. The infinite words∈Aω is standardepisturmian if and only if there exist a standard episturmian word t and a ∈ A such that s = L (t). Moreover, a t is unique and the first letter of s is a. The following was proved in [9]: Fact4. A recurrentwordw overthe alphabet A is episturmianif and only if foreachfactoruofw,aletterbexists(dependingonu)suchthatAuA∩Factw⊆ buA∪Aub. Definition 3. We say that a standard episturmian word s is of type a, a ∈ A, if the first letter of s is a. Theorem 4. Let s be an aperiodic standard episturmian word over the alphabet A and let s = U U ··· be any factoring of s with each U , i ≥ 1, a non-empty 1 2 i prefix of s. Then there exist indices i 6= j for which U and U terminate in a i j different letter. Proof. Suppose to the contrary that there exists an aperiodic standard epistur- mian word s over the alphabet A admitting a factorization s = U U ··· in 1 2 which all U are non-empty prefixes of s ending in the same letter. Amongst all i