ebook img

Counting Finite Languages by Total Word Length PDF

0.13 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Counting Finite Languages by Total Word Length

COUNTING FINITE LANGUAGES BY TOTAL WORD LENGTH STEFANGERHOLD Abstract. We investigate the number of sets of words that can be formed from a finite alphabet, counted by the total length of the words in the set. Anexplicit expression forthe counting sequence isderived from the generat- 0 ingfunction,andasymptotics forlargealphabetrespectivelylargetotal word 1 lengtharediscussed. Moreover,wederiveaGaussianlimitlawforthenumber 0 ofwordsinarandomfinitelanguage. 2 n a J 1. Introduction and Basic Properties 5 2 Let f = f (m) denote the number of languages (i.e., sets of words) with total n n wordlengthnoveranalphabetwithm 2symbols[1,I.37]. Forinstance,f2(2)=5 ] ≥ O and f3(2)=16, as seen from the listings C a,b , aa , ab , ba , bb { } { } { } { } { } . h respectively t a a,aa , a,ab , a,ba , a,bb , b,aa , b,ab , b,ba , b,bb , aaa , m { } { } { } { } { } { } { } { } { } aab , aba , abb , baa , bab , bba , bbb . [ { } { } { } { } { } { } { } Another value is f (3)=12, illustrated by 2 1 v aa , ab , ac , ba , bb , bc , ca , cb , cc , a,b , a,c , b,c . { } { } { } { } { } { } { } { } { } { } { } { } 2 The sequencef (2)is number A102866ofSloane’sOn-LineEncyclopediaofInte- 9 n gerSequences.1 Inthe presentnote,we willderiveanexplicitexpressionforf (m) 3 n 4 (Theorem1below),establishasymptotics(Sections2–4),andderivealimitlawfor . the number of words in a random finite language (Section 5). 1 0 The ordinary generating function (ogf) [1, I.37] 0 1 (1) F(z):= ∞ f zn =exp ∞ (−1)k−1 mzk : n k 1 mzk v n=0 k=1 − ! X X i X can be obtained by a standard procedure (the “power set construction” [1, I.2]; finite languagesare sets of sequences built from alphabet elements). Its first terms r a are (2) F(z)=1+mz+ 1m(3m 1)z2+m(13m2 1m+ 1)+O(z4). 2 − 6 − 2 3 Note that mz F(z)=exp φ(z), 1 mz (cid:18) − (cid:19) where ∞ ( 1)k−1 mzk φ(z;m)=φ(z):=exp − k 1 mzk ! k=2 − X Date:January25,2010. 2000 Mathematics Subject Classification. Primary: 05A15;Secondary: 05A16. Keywords and phrases. Finitelanguages,saddlepointmethod. 1http://www.research.att.com/˜njas/sequences/ 1 2 STEFANGERHOLD isanalyticfor z <1/√m. (Indeed,for0<ε<1/√m, z 1/√m ε,andk 2, | | | |≤ − ≥ we have mz k mz 2 m(m−1/2 ε)2 =1 ε√m(2 ε√m)=:1 ε′, | | ≤ | | ≤ − − − − whence mzk mz k | | .) 1 mzk ≤ ε (cid:12) − (cid:12) ′ The dominating singularity of(cid:12)F(z) is t(cid:12)hus located at z = 1/m, leading to the (cid:12) (cid:12) rough approximation f (m) (cid:12)mn. Clea(cid:12)rly (consider languages consisting only of n ≈ one word), we have f (m) > mn for m,n 2. We will see in Theorem 3 below n ≥ that the ratio f (m)/mn is e2√n+O(logn). n Our first result is an explicit expression for f (m), which can be obtained n from (1). To state it, we write i n, if the vector i = (i ,...,i ) Zn rep- resents a partition of n, in the sense⊢that i +2i + +ni 1=n. n ∈ ≥0 1 2 n ··· Theorem 1. For m 2 and n 1, we have ≥ ≥ A1(m)i1...An(m)in (3) f (m)= , n i !...i ! i n 1 n X⊢ where A (m):= ( 1)d 1mj/d/d, j 1. j − − ≥ Xd|j Proof. We expand the Lambert series [4] in the exponent of F(z), using the geo- metric series formula: F(z)=exp ∞ (−1)k−1 ∞ mjzkj  k  k=1 j=1 X X   ∞ =exp A (m)zn n ! n=1 X m m 2 1 1 (4) =1+ A (m)zn + A (m)zn +... n n 1! 2! ! ! n=1 n=1 X X The k-th term here can be expanded as m k k A (m)zn = (A z)i1(A z2)i2...(A zm)im n 1 2 m i ,...,i nX=1 ! i1+··X·+im=k(cid:18) 1 m(cid:19) k = i ,...,i Ai11...Aimmzi1+2i2+···+mim +O(zm+1) i1+2i1i2++··X···+·+imm=imk≤m(cid:18) 1 m(cid:19) m k (5) = Ai1...Ain zn+O(zm+1).  i ,...,i 1 n  n=1 i n (cid:18) 1 n(cid:19) Now (3) followXs fromi1+(·X4··⊢+)iann=dk(5).  (cid:3) 2. Asymptotics for Large Alphabet Size Next we derive the asymptotics of f (m) as m, the cardinality of the alphabet, n tends to infinity. Define κ and µ =µ (m) by n n n ∞ z ∞ κ zn =exp and µ zn =φ(z). n n 1 z n=0 (cid:18) − (cid:19) n=0 X X COUNTING FINITE LANGUAGES BY TOTAL WORD LENGTH 3 Notethatn!κ isSloane’sA000262(severalcombinatorialinterpretationsaregiven n on that web page), and that κ has the representation n 1 (6) κ = , n 1. n i !...i ! ≥ i n 1 n X⊢ Then we can write z f /mn =[zn]exp φ(z/m) n 1 z (cid:18) − (cid:19) =κ +κ µ /m+ +κ µ /mn. n n 1 1 0 n − ··· If the dependence of µ on m is not too strong, the first term on the right-hand n side should dominate when m . This is indeed the case: →∞ Theorem 2. If n 1 is fixed and m , we have ≥ →∞ (7) f (m) κ mn. n n ∼ Proof. Since, as m , →∞ A (m)=mj +O(mj/2), j 1, j ≥ we have Aj(m)ij =mjij(1+O(m−j/2)), whence, for i n, ⊢ A1(m)i1...An(m)in =mn(1+O(m−1/2)). The result thus follows from (3) and (6). (cid:3) Note that κ =1, κ = 3, and κ = 13, in line with (2). 1 2 2 3 6 3. Asymptotics for Large Total Word Length Theorem 3. For large total word length n, the sequence f = f (m) has the n n asymptotics φ(1/m) mne2√n (8) f , n . n ∼ 2√eπ × n3/4 →∞ More precisely, there is a full asymptotic expansion of the form φ(1/m) mne2√n (9) fn ∼ 2√eπ × n3/4 1+ cjn−j/4, n→∞. j 1 X≥   Proof. The proof of Theorem 3 is similar to the saddle point analysis [1, Exam- ple VIII.7] of exp(z/(1 z)), the ogf of κ , slightly perturbed by the presence of n − the factor φ(z). The ogf F(z) is actually Hayman-admissible [1, 2], but carrying out the saddle point method explicitly gives access to a full asymptotic expansion, and will be useful for the refined expansions required in Sections 4 and 5. Let us shift the dominating singularity from z = 1/m to z = 1. Then the integrand in Cauchy’s formula mn F(z/m) (10) f =f (m)= dz n n 2iπ zn+1 I has an approximate saddle point at z = zˆ:= 1 1/√n. We write z = zˆeiθ, where − θ =arg(z) is constrained by (11) |θ|<n−α, 23 <α< 34, 4 STEFANGERHOLD so that z lies in a small arc around the saddle point. In this range we have the uniform expansion 1 z−n−1 =exp (n+1)log 1 inθ+O(n−α) − − √n − (cid:18) (cid:18) (cid:19) (cid:19) (12) =exp √n+ 12 −inθ+O(n−1/2) , n→∞. Furthermore (cid:0) (cid:1) 1 z =n−1/2(1 iθ√n+O(n−α)), − − hence 1 (13) =√n+iθn n3/2θ2+O(n1/2 α). − 1 z − − From (12) and (13) we get z z n 1exp =exp 1 +2√n n3/2θ2 1+O(n1/2 α) . − − 1 z −2 − × − (cid:18) − (cid:19) (cid:0) (cid:1) (cid:0) (cid:1) Since φ(z/m) is analytic at z = 1, the local expansion of the integrand in (10) at the saddle point zˆis F(z/m) (14) =φ(1/m)exp 1 +2√n n3/2θ2 1+O(n1/2 α) , zn+1 −2 − × − valid as n , uniformly w.r.t(cid:0). θ in the range (11).(cid:1)No(cid:0)te that (cid:1) →∞ n−α e n3/2θ2dθ √πn 3/4, − − Z−n−α ∼ so that integrating (14) from n α to n α yields the right-hand side of (8). To − − − prove (8), it remains to show that the integral from n α to π grows slower (the − other half of the tail is handled by symmetry). There is a C >0 such that F(z/m) 1 (15) C z −nexp , z <1. zn+1 ≤ | | ℜ 1 z | | (cid:12) (cid:12) (cid:18) − (cid:19) (cid:12) (cid:12) If z =zˆeiθ lies on(cid:12)the integ(cid:12)ration contour, then the factor z n in (15) is O(e√n). (cid:12) (cid:12) | |− The remaining factor exp (1/(1 z)) decreases if θ = arg(z) increases, hence ℜ − | | | | π 1 1 exp dθ π exp Zn−α ℜ(cid:18)1−zˆeiθ(cid:19) ≤ ℜ(cid:18)1−zˆeiθ(cid:19)(cid:12)θ=n−α =exp √n n3/2−2α+(cid:12)(cid:12) O(1) . − (cid:12) (The last line is obtained by recapitulating th(cid:0)e derivation of (14), w(cid:1)ith θ = n α.) − Hence F(z/m) (16) dz exp 2√n n3/2−2α+O(1) . (cid:12)(cid:12)(cid:12)In−α<|θ|<π zn+1 (cid:12)(cid:12)(cid:12)≤ (cid:0) − (cid:1) This grows ind(cid:12)eed slower than e2√n/(cid:12)n3/4, so that the proof of (8) is complete. (cid:12) (cid:12) It remains to justify the full expansion (9). First note that it suffices to check nα that the central part dθ of the Cauchy integral has such an expansion, as n−α the tail estimate (16) li−es asymptotically below (9). The full expansion of z n 1 R − − proceeds by powers of n 1/2: − 2(j+1) z n 1 =exp √n+ 1 inθ iθ+ n j/2 . − −  2 − − j(j+2) −  j 1 X≥   COUNTING FINITE LANGUAGES BY TOTAL WORD LENGTH 5 Taking more terms in (13) and in the expansion of the analytic function φ, we readily see that the full local expansion of the integrand in (10) is of the form F(z/m) =φ(1/m)exp 1 +2√n n3/2θ2 1+ c n k/2θj , zn+1 −2 − × kj −  k,j (cid:0) (cid:1) X   where each term in the sum is o(1). The resulting Gaussian integrals are of the kind n−α θMe n3/2θ2dθ =n 3/4 n3/4−α y Me y2dy − − − Z−n−α Z−n3/4−α(cid:16)n3/4(cid:17) const n−3(M+1)/4, M even. ∼ × (Those with odd M vanish.) This finishes the proof of (9). (cid:3) 4. Joint Asymptotics Note that the limits m and n commute in the following sense: Since →∞ →∞ we have κ 1/(2√eπ)e2√n/n3/4 [1, Prop. 8.4], the right-hand side of (7) has, n ∼ as n , the same asymptotics as the right-hand side of (8) for m . We → ∞ → ∞ will now show that letting m andn tend to infinity simultaneously yields the same result, regardless of their respective speeds. Theorem 4. If both the word length and the alphabet size tend to infinity, we have 1 mne2√n f (m) , m,n . n ∼ 2√eπ × n3/4 →∞ Proof. TheresultcanbeobtainedbyanadaptionoftheproofofTheorem3. Again we use Cauchy’s formula, with the same saddle point contour as before: mn π fn(m)= zˆ−n F(zˆeiθ/m)e−i(n+1)θdθ 2π Z−π mn π zˆeiθ zˆeiθ (17) = zˆ n exp φ ;m e i(n+1)θdθ. − − 2π 1 zˆeiθ m Z−π (cid:18) − (cid:19) (cid:16) (cid:17) We will show that zˆeiθ (18) φ ;m 1, m,n , uniformly w.r.t. θ [ π,π]. m → →∞ ∈ − (cid:16) (cid:17) Assuming this we are done. Indeed, assertion (18) shows at the same time the validityofthelocalexpansion(14),withφ(1/m)replacedby1,andthepersistence of the tail estimate (16). To prove (18), notice that zˆeiθ ∞ ( 1)k−1 m1−kzˆkekiθ (19) φ ;m =exp − . m k 1 m1 kzˆkekiθ (cid:16) (cid:17) Xk=2 − − ! We have m1 kzˆkekiθ < 1 for m 2, hence | − | 2 ≥ ∞ (−1)k−1 m1−kzˆkekiθ ∞ m1−kzˆk (cid:12)(cid:12)Xk=2 k 1−m1−kzˆkekiθ(cid:12)(cid:12)≤Xk=2 (cid:12) (cid:12) k (cid:12) (cid:12) ∞ 1 (cid:12) (cid:12)= m1 k 1 − − √n k=2 (cid:18) (cid:19) X (1 1/√n)2 = − . m(1 1/m+1/(m√n)) − Thus the exponent in (19) is uniformly o(1), which establishes (18). (cid:3) 6 STEFANGERHOLD 5. The Distribution of the Number of Words A natural parameter to consider is the number W of words in a random finite n languageoftotalwordlength n. (The alphabet size m 2 is fixedthroughoutthis ≥ section.) The appropriate bivariate ogf, with z marking total word length and u marking number of words, is given by ∞ ( 1)k−1 mzkuk F(z,u):=exp − . k 1 mzk ! k=1 − X The expected number of words is then (20) E[Wn]=fn−1[zn]∂uF(z,u)|u=1. Notice that ∞ ( 1)k−1mzk (21) ∂ F(z,u) =F(z) − , u |u=1 1 mzk k=1 − X sothatthe asymptotic analysisof[zn]∂ F(z,u) isaneasyextensionofthe one u u=1 | of f = [zn]F(z) in Section 3: Close to the saddle point, the new factor resulting n from the right-hand side of (21) is 1 =√n(1+o(1)). 1 z − Hence [zn]∂ F(z,u) √nf , so that, by (20), the expectation of W satisfies u u=1 n n | ∼ E[W ] √n, n . n ∼ →∞ Similarly, one can obtain the asymptotics σ(W ) n1/4/√2 for the standard de- n ∼ viation. Theorem5. ThenumberofwordsW inarandomfinitelanguageadmitsaGauss- n ian limit law: W a n n − (0,1), n , b →N →∞ n in distribution, where the scaling constants satisfy a √n and b n1/4/√2. n n ∼ ∼ Proof. As is well known, combinatorial limit laws can often be obtained by an asymptotic analysis of the probability generating function (22) E[uWn]=fn−1[zn]F(z,u). Again,weadapttheproofofTheorem3. Ifurangesinafixedsmallneighbourhood of u=1, the expansion (14) generalizes to the uniform local expansion F(z/m,u) zn+1 =φ(1/m,u;m)exp −12u+2√un−u−1/2n3/2θ2 × 1+O(n1/2−α) , where (cid:0) (cid:1) (cid:0) (cid:1) ∞ ( 1)k−1 mzkuk φ(z,u;m):=exp − . k 1 mzk ! k=2 − X Integratingfrom θ = n α to n α, andtaking into account(8), we infer that (22) − − − has the uniform asymptotics E[uWn] exp(h (u)), n , n ∼ →∞ with φ(1/m,u;m) h (u):=2(√u 1)√n+ 1logu+log . n − 4 φ(1/m;m) COUNTING FINITE LANGUAGES BY TOTAL WORD LENGTH 7 Note that, for n , →∞ h (1)=√n+O(1), ′n h (1)= 1√n+O(1), ′n′ −2 h′n′′(1)= 43√n+O(1), so that the function h (u) satisfies the conditions of [1, Theorem9.13],itself taken n from [3]. We conclude that W h (1) n− ′n (h (1)+h (1))1/2 ′n ′n′ converges in distribution to a standard normal random variable. (cid:3) References [1] P.FlajoletandR.Sedgewick,AnalyticCombinatorics,CambridgeUniversityPress,2009. [2] W. K. Hayman, A generalisation of Stirling’s formula, J. Reine Angew. Math., 196 (1956), pp.67–95. [3] V.N.Sacˇkov,Veroyatnostnye metody v kombinatornom analize,“Nauka”, Moscow,1978. [4] H.S.Wilf,generatingfunctionology,AKPetersLtd.,Wellesley,MA,thirded.,2006. Vienna University of Technology, Wiedner Hauptstraße 8–10,A-1040 Vienna, Aus- tria E-mail address: sgerhold at fam.tuwien.ac.at

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.