ebook img

Combinatorics of words [expository notes] PDF

116 Pages·00.895 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Combinatorics of words [expository notes]

Combinatorics of words Juhani Karhuma(cid:127)ki The central notion of this course is a word, i.e. a (cid:12)nite or in(cid:12)nite se- quence of symbols taken from a (cid:12)nite set. It follows immediately that the mathematical research of words emphasizes two features, namely discreteness, and (cid:15) noncommutativity. (cid:15) In addition an algorithmic point of view is often natural. It is probably mainly the noncommutativity which makes the (cid:12)eld very challenging: many easily formulated problems are di(cid:14)cult to solve. This is connected to the general fact that there are much weaker mathematical tools to deal with noncommutative structures than commutative ones. FirstimportantpapersonwordswerewrittenbyA.Thueatthebeginning of this century, cf.[Th1,Th2]; however they became noticed only much later and "classical" as late as 70’s. A systematic study of words was initiated by M.P. Schu(cid:127)tzenberger in 60’s. Two in(cid:13)uencial papers were [LySch] and [LeSch]. The (cid:12)rst, and still most comprehensive, book on words appeared in 1983, cf. [Lo]. A recent survey is [CK]. Combinatoricsofwords is connected to manymodern, as wellas classical, (cid:12)elds of mathematics. Connections to combinatorics - actually being part of it - are obvious, but also connections to algebra are deep. Indeed, a natural environment of a word is a free semigroup. Moregenerally, the above connections can be illustrated as follows: i This course considers basic properties of words and ((cid:12)nite) sets of words, mainly from combinatorical, but also from algebraic, point of view. In more details topics covered will be: periodicity properties, (cid:15) equations on words, (cid:15) dimension properties such as (cid:15) { freeness and { defect e(cid:11)ect, unavoidable regularities such as (cid:15) { repetitionfree words and { words with repetitions. No particular prerequisities are required. ii Books on the topic: M.Lothaire, Combinatorics on words, Addison-Wesley, 1983. C.Cho(cid:11)rutandJ.Karhuma(cid:127)ki, Combinatorics of words, in: G.Rozenberg and A. Salomaa (eds), Handbook of Formal Languages, Springer, 1997. H.J.Shyr, Free monoids and languages, HonMinBookCompanyTaiwan, 1991. G. Lallement, Semigroups and combinatorial applications, John Wiley and Sons, 1979. J. Berstel and D. Perrin, Theory of codes, Academic Press, 1985. D.Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge University Press, 1996 M.Lothaire,Algebraiccombinatoricsonwords, CambridgeUniversityPress, 2002, also in http://www-igm.univ-mlv.fr/~berstel/Lothaire/index.html J. Berstel, J. Karhuma(cid:127)ki, Combinatorics on Words - A Tutorial, Bull. EATCS 79, 178-228, 2003, also in http://www.tucs.(cid:12)/Publications/insight.php?id=tBeKa03a&table=techreport Historically important articles: A. Thue, U(cid:127)ber unendliche Zeichenvihen, Kra. Vidensk. Selsk. Skrifter. I Mat.-Nat.Kl., Christiana, Nr. 7, 1906. (cid:127) A.Thue, Uber die gegenseitigeLage gleicherTeile gewisserZeichenreihen, as above, Nr. 12, 1912. R.C. Lyndon and M.P. Schu(cid:127)tzenberger, The equation am = bncp in a free group, Michigan Math. J. 9 289-298, 1962. A. Lentin and M.P. Schu(cid:127)tzenberger, A combinatorical problem in the the- ory of free monoids, in: R.C. Bose and T.E. Dowling (eds), Combinatorial Mathematics, North Carolina Press, Chapel Hill, 112-144, 1967. iii Contents 1 Notations and basic properties 1 2 Basics on equations 22 3 Repetition-free words 33 4 Applications of repetition-free words 51 5 Free monoids and semigroups 64 6 Dimension properties 91 iv 1 Notations and basic properties We (cid:12)rst (cid:12)x some terminology of words. Alphabet A : A nonempty (cid:12)nite set of symbols, like A = a;b . f g Word w : A sequence of symbols from A, like (a;b;a) = aba. Can be (cid:12)nite or in(cid:12)nite (to the right); the latter are called !-words . Empty word 1 is a sequence of zero symbols. A(cid:3); A+ and A! : Sets of all (cid:12)nite, (cid:12)nite nonempty and in(cid:12)nite words over A, respectively. Catenation or product of words : Operation de(cid:12)ned as a :::a b :::b = a :::a b :::b : 1 n 1 m 1 n 1 m (cid:1) Clearly, this operation is associative and the empty word is the unit element with respect to this operation. Consequently, A(cid:3) = (A(cid:3); ) and (cid:1) A+ = (A+; ) are a monoid and a semigroup. Moreover, they are free (cid:1) in the following sense - so-called free-monoid and semigroup generated by A. Free semigroup or monoid : Asemigroup(ormonoid)S iscalledfreeifit hasasubsetB suchthateachelementofS canbeuniquelyexpressedas a product of elements of B. Such a B is referred to as a free generating set of S, or a base of S. Language L over A : Any subset of A(cid:3). Let w;u A(cid:3), a A and L;K A(cid:3). 2 2 (cid:18) Length of w = w : the total number of letters in w; 1 = 0. j j j j w : the number of a’s in w. a j j Alphabet of w : Alph(w) = a w > 1 : a f j j j g Factors : A word u is a factor of w (resp. left factor or a pre(cid:12)x, a right factor or a su(cid:14)x if there exist words x and y such that w = xuy (resp. w = uy; w = xu): All these are proper if they are di(cid:11)erent from w. We write u w (resp. (cid:20) u < w) denoting that u is a pre(cid:12)x (resp. a proper pre(cid:12)x) of w. The set of all pre(cid:12)xes of w is denoted by pref(w), while pref (w) means k the pre(cid:12)x of lenght k of w (or w if w < k). Similarly, by suf(w),for j j instance, we mean the set of su(cid:14)xes of w. 1 Quotients : If u w there exists the unique y such that w = uy. Such a y (cid:20) is called a left quotient of w by u, and is denoted by u(cid:0)1w. In the case u is not a pre(cid:12)x of w u(cid:0)1w is unde(cid:12)ned, so that the function (u;w) u(cid:0)1w 7! becomes a well de(cid:12)ned partial function. Similarly we de(cid:12)ne right quo- tients wu(cid:0)1. Reverse of w : if w = a :::a with a A, then wR = a :::a . 1 n i n 1 2 Factorizations : A factorization of a word w is any sequence u ;:::;u of 1 n words such that w = u :::u : (1) 1 n (1) is L-factorization if all u ’s are from L. It is natural to write i L(cid:3) = u :::u n 0 and u L ; 1 n i f j (cid:21) 2 g L+ = u :::u n 1 and u L : 1 n i f j (cid:21) 2 g The sets L(cid:3) and L+ are submonoids and subsemigroups of A(cid:3), respec- tively, so-called sub-monoids and subsemigroups generated by L. Note that each w in L(cid:3) has at least one L-factorization, and if this is always unique, then L(cid:3) is free, and L is its base. Such an L is called a code. Factorizations are illustrated by pictures : or the latter meaning that w = xz = zy = ztz: 2 Operations for languages : For languages we have Boolean operations : (cid:15) { Union L K, [ { Intersection L K, \ { Complementation Lc = A(cid:3) L: n Operations connected to the product of words : (cid:15) { Product LK, { Quotients L(cid:0)1K and KL(cid:0)1, { Iterations L(cid:3) and L+. Here the product and quotients are de(cid:12)ned componentwise, i.e. for ex- ample L(cid:0)1K = l(cid:0)1k l L; k K . Further L+ is usually called 1-free f j 2 2 g iteration of L. Morphism h : A(cid:3) B(cid:3) (or A+ B+) : A mapping h : A(cid:3) B(cid:3) ! ! ! which satis(cid:12)es : h(ww0) = h(w)h(w0) for all w;w0 A(cid:3): 2 In particular, it follows that h(1) = 1 and (cid:15) h is completely speci(cid:12)ed by the words h(a) with a A. (cid:15) 2 We call a morhphism h 1-free if h(a) = 1 for all a A, (cid:15) 6 2 periodic if z such that h(a) z(cid:3) for all a A, (cid:15) 9 2 2 uniform if h(a) = h(b) for all a;b A, (cid:15) j j j j 2 pre(cid:12)x (resp. su(cid:14)x) if none of the words in h(A) is a pre(cid:12)x (resp. (cid:15) su(cid:14)x) of another, code if h is injective. (cid:15) The notion of a morphism is very important in combinatorics of words! 3 Finally, we de(cid:12)ne a few more special notions of words. Conjugates : Two words x and y are conjugates if there exist words u and v such that x = uv and y = vu; or equivalently, that they are obtained from each other by a cyclic permutation c : A(cid:3) A(cid:3) de(cid:12)ned as ! c(1) = 1 (c(w) = pref(cid:0)1(w)wpref (w) for w A+; 1 1 2 i.e. x = ck(y) for some k. Note that in the second picture of page 2 x and y are conjugates. We denote the relation "being conjugates" by ; clearly this is an equivalence relation. (cid:24) Periods of w : Let w = a :::a with a A. We say that number p is a 1 n i 2 period of w if a = a for i = 1;:::;n p: i i+p (cid:0) This can be illustrated as where u =pref (w). The smallest period of w is called the period of w, p denoted as p(w). The elements in the conjugacy class of pref (w) p(w) are called cyclic roots of w Example 1. A word can have several periods. For example words abababa and aabaabbaabaa have periods 2,4,6 and 7,10,11, respectively. Moreover, any number w is always a period of w. (cid:21) j j Primitive words : We say that a word w = 1 is primitive if it is not a 6 proper integer power of any of its cyclic roots. Theorem 1. A word w A+ is primitive i(cid:11) it satis(cid:12)es 2 z A(cid:3) : [w = zn n = 1 and hence w = z]: (2) 8 2 ) Proof. Clearly, (2) implies the primitiveness. To prove the converse let w be primitive and w = zn with n 2. Let r = pref (w). We can illustrate the (cid:21) p(w) situation as follows : 4 Since r is the period of w; z r . Moreover, by primitiveness of w we j j j j (cid:21) j j have z = r(cid:3). Consequently, comparing the pre(cid:12)xes of length r of the two 2 j j (cid:12)rst occurrences of z we can write r = ps = sp with p;s = 1: 6 But now by Theorem 3 (be sure that we do not make a chain conclusion!) p and s are powers of a nonempty word, a contradiction since r was the j j period of w. Note that often the primitiveness of a word is de(cid:12)ned using the condition (2). There exist two particularly important classes of primitive words, namely unbordered and Lyndon words. Unbordered words : A word w is unbordered if its smallest period is w . j j In other words, w does not contain any nonempty word both as a proper pre(cid:12)x and as a su(cid:14)x. Of course, a word is bordered if it is not unbordered. Note that bordered words can overlap as factors of another word : For unbordered words this is not possible. Example 2. We give a simple construction how arbitrary word over at least twoletteralphabet canbe extended toan unbordered word. Consideraword w. Let u = wabjwj; where a = pref (w) and b = a: 1 6 The illustration is now as follows : Now, by the choise of a and b, - no nonempty su(cid:14)x of u of length w is a pre(cid:12)x of u ; (cid:20) j j - no pre(cid:12)x of u of length w is a su(cid:14)x of u. (cid:21) j j Consequently, u is unbordered. 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.