ebook img

Entropy of finite random binary sequences with weak long-range correlations PDF

0.23 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Entropy of finite random binary sequences with weak long-range correlations

Entropy of finite random binary sequences with weak long-range correlations S. S. Melnik∗ and O. V. Usatenko † A. Ya. Usikov Institute for Radiophysics and Electronics Ukrainian Academy of Science, 12 Proskura Street, 61805 Kharkov, Ukraine WestudytheN-stepbinarystationaryergodicMarkovchainandanalyzeitsdifferentialentropy. Supposing that the correlations are weak we express the conditional probability function of the chain through the pair correlation function and represent the entropy as a functional of the pair 5 correlator. Sincethemodelusesthetwo-pointcorrelatorsinsteadoftheblockprobability,itmakesit 1 0 possible to calculate theentropy of strings at much longer distances than using standard methods. 2 A fluctuation contribution to the entropy due to finiteness of random chains is examined. This contribution can be of the same order as its regular part even at the relatively short lengths of n subsequences. A self-similar structureof entropy with respect to thedecimation transformations is a revealed for some specific forms of the pair correlation function. Application of the theory to the J DNAsequenceof theR3 chromosome of Drosophila melanogaster is presented. 9 1 PACSnumbers: 05.40.-a,02.50.Ga,87.10+e ] h c I. INTRODUCTION and redundancy in a sequence of data, it is a powerful e and popular tool in examination of complexity phenom- m ena. It has been used for the analysis of a number of At present there is a commonly accepted viewpoint - that our world is complex and correlated. The most different dynamical systems. at peculiar manifestations of this concept are the records A standard method of understanding and describing t statistical properties of real physical systems or random s of brain activity and heart beats, human and animal . communication,writtentextsofnaturallanguages,DNA sequences of data can be represented as follows. First of t a and protein sequences,data flows in computer networks, all, we have to analyze the sequence to find the corre- m lation functions or the probabilities of words occurring, stockindexes,sunactivity,weather(thechaoticnatureof - the atmosphere), etc. For this reasonsystems with long- withthelengthLexceedingthecorrelationlengthRc but d being shorter than the length M of the sequence, rangeinteractions(and/orwithlong-rangememory)and n o natural sequences with non-trivial information content R <L M. (1) c have been the focus of a large number of studies in dif- c ≪ [ ferent fields of science over the past several decades. Atthesametime,thenumberdLofdifferentwordsofthe 1 Random sequences with finite number of states exist length L composed in the alphabet containing d letters v asnaturalsequences(DNAornaturaltexts)orariseasa has to be much less than the number M L of words in 3 result of coarse-grained mapping of the evolution of the − the sequence, 6 chaotic dynamic system into a string of symbols [1, 2]. 3 Such sequences are very closely connected to and are dL M L M. (2) 7 the subject of study of the algorithmic (Kolmogorov- ≪ − ≈ 0 . Solomonoff-Chaitin) complexity, artificial intellect, in- The next step is to express the correlation properties of 2 formation theory, compressibility of digital data, sta- thesequenceintermsoftheconditionalprobabilityfunc- 0 tistical inference problem, computability [3] and have tion(CPF)oftheMarkovchain,seebelowEq.(5). Note, 5 many application aspects as a creative tool for design- theMarkovchainshouldbeoforderN,whichissupposed 1 : ing the devices and appliances with randomcomponents to be longer than the correlation length, v in their structure [4] (different wave-filters, diffraction i X gratings, artificial materials, antennas, converters, delay Rc <N. (3) lines, etc.). r a There are many methods for describing complex dy- This is the critical requirement because the correlation length of natural sequence of interest (e.g., written or namical systems and random sequences connected with DNA texts) is usually of the same order as the length of them: correlation function, fractal dimensions, multi- sequences. None of inequalities (1) (3) can be fulfilled. point probability distribution functions, and many oth- − Really,thelengthsofwordsthatcouldrepresentcorrectly ers. One of the most convenient characteristics serv- the probability of words occurring are 4 5 letters for a ing to the purpose of studying complex dynamics is en- realnaturaltextofthelength106(written−onanalphabet tropy [3, 5]. Being a measure of the information content containing 27 30 letters and symbols) or of order of 20 − symbols for a coarse-grained text represented by means of a binary sequence. ∗[email protected] Here we develop an approach that is complimentary †[email protected] to the above exposed. In particular, we represent the 2 conditional probability function of the Markov chain by where P(a1,...,aN) is the probability of the N-words means of pair correlator, which makes it possible to cal- a1,...,aN occurring. culateanalyticallytheentropyofthesequence. Itshould The Markovchaindetermined by Eq.(4) is a homoge- be stressedthat the standardmethod for calculating the neous sequence because its conditional probability does entropycanonlytakeintoaccounttheshort-rangepartof not depend explicitly on i, i.e., is independent of the po- statistics. We presenta theorythatexpresseslong-range sition of symbols ai−N,...,ai−1,ai in the chain. It de- correlation properties through the correlation functions. pends only on the values of symbols ai−N,...,ai−1,ai. The scope of the paper is as follows. First, we dis- The homogeneous sequences are stationary: the average cussbrieflythepropertiesoftheN-stepadditiveMarkov valueofanyfunctionf(ar1,ar1+r2,...,ar1+...+rs)ofsar- chainmodeland,supposingthatthecorrelationsbetween guments symbols in the sequence are weak, we express the con- ditional probability function by means of the pair cor- f(ar1,...,ar1+...+rs) (6) relation function. In the next section we represent the M−1 1 ditiyffefurenncttiiaolneonftrtohpeyMinartkeorvmcshoafinthaendcoenxdpirteiossnathlepreonbtraobpily- =Mli→m∞M f(ai+r1,...,ai+r1+...+rs) i=0 X as the sum of squares of the pair correlators. Then we depends on s 1 differences between the indexes. In discusssomepropertiesoftheresultsobtained,inpartic- − other words, all statistically averaged functions of ran- ular, a property of self-similarity of entropywith respect dom variables are shift-invariant. to decimation for some particular classes of the Markov We suppose that the chain is ergodic. According to chains. Next, a fluctuation contribution to the entropy the Markov theorem (see, e.g., Ref. [6]), this property due to finiteness of random chains is examined. Some is valid for the homogenous Markov chains if the strict remarks on literary texts are followed by discussions of inequalities, directions in which the research can be progressed. 0<P(ai+N =α|aii+N−1)<1, i∈N+ ={0,1,...} (7) are fulfilled for all possible values of the arguments in II. ADDITIVE MARKOV CHAINS function(4). Hereafterweuse the shorternotationai−1 i−N forN-wordai−N,...,ai−1. Itfollowsfromergodicitythat This section includes mainly introductory material. the correlations between any blocks of symbols in the Some authors’ results presented by Eqs. (9)–(14) were chain go to zero when the distance between them goes exposed earlier in Ref. [7–11]. to infinity. The other consequence of ergodicity is the Consider a semi-infinite sequence A = a∞ = 0 possibility to use one random sequence as an equitable a0,a1,a2,... of real random variables ai taken from the representative of the ensemble of chains and to do aver- finite alphabet A= 1,2,...,d , a A. The sequence A { } i ∈ aging over the sequence, Eq. (6), instead of an ensemble is an N-step Markov chain if it possesses the following averaging. property: the probability of symbol a to have a certain i Belowwewillconsideranimportantclassofthebinary value a under the condition that the values of all other randomsequenceswith symbols a taking on two values, i symbols are given depends only on the values of N pre- say 0 and 1, a 0,1 . The conditional probability to i vious symbols, ∈ { } find i-th element a = 1 in the binary N-step Markov i sequence depending on N preceding elements ai−1 is a P(ai =a|...,ai−2,ai−1) set of 2N numbers: i−N =P(ai =a|ai−N,...,ai−2,ai−1). (4) P(1|aii−−1N)=P(ai =1|aii−−1N), Note, definition (4) is valid for i ≥ N; for i < N P(0ai−1 )=1 P(1ai−1 ). (8) we have to use the well known conditions of compat- | i−N − | i−N ibility for the conditional probability functions of the Conditionalprobability(8)ofthebinarysequenceofran- lower order, Ref. [6]. Sometimes the number N is dom variables a 0,1 can be represented exactly as i ∈ { } also referred to as the order or the memory length of a finite polynomial series: the Markov chain. The conditional probability function N p(ClePteFly) aPll(asita=tistai|caail−pNr,o.p.e.r,taiei−s2o,fatih−e1)Mdaertkeorvmcinheasincoamnd- P(1|aii−−1N)=a¯+ F1(r1)(ai−r1 −a¯) the method of their iterative numerical construction. If rX1=1 N the sequence, statistical properties of which we would like to analyze is assigned, the conditional probability + F2(r1,r2)(ai−r1ai−r2 −ai−r1ai−r2)+... function of the N-th order can be found by a standard r1X,r2=1 N method, + FN(r1,...,rN)(ai−r1...ai−rN P(aN+1 =a|a1,...,aN)= PP(a(a1,1.,....,.a,aNN,a)), (5) −ra1i,−..X.r,r1N.=..1ai−rN), (9) 3 where the statistical averages a ...a are taken over function and so on. To avoid the difficulties in solving r1 rN sequence(6),F isthefamilyofmemory functions anda¯ Eq.(11)wesupposethatcorrelationsinthesequenceare s is the relativeaveragenumber ofunities inthe sequence. weak. Thismeansthatallcomponentsofthenormalized The representation of Eq. (8) in the form of Eq. (9) fol- correlation function are small, K(r) 1, r = 0, with lows from the simple identical equalities, a2 = a and the exception of K(0)= 1. So,|takin|g≪into a|c|c6ount that f(a) = af(1) + (1 a)f(0), for an arbitrary function in the sum of Eq. (11) the leading term is K(0)=1 and − f(a) determined on the set a 0,1 . The first term all the others are small, we can obtain an approximate ∈ { } in Eq. (9) is responsible for generation of uncorrelated solutionforthememoryfunctionintheformoftheseries white-noise sequences. Taking into account the second term, proportional to F1(r), we can reproduce correctly N correlationpropertiesofthechainuptothesecondorder. F(r) = K(r) K(r r′)K(r′) (13) − − Higher-ordercorrelatorsand all correlationproperties of r′6=r X higher orders are not independent anymore. We cannot N N control them and reproduce correctly by means of the + K(r r′)K(r′ r′′)K(r′′)+... memory function F(r), because the latter is completely − − r′6=rr′′6=r′ X X determined by the pair correlation function, see below Eq. (11). Studying of the properties of these higher- The equation for the conditional probability function in order correlators is beyond the scope of this paper. In the first approximation with respect to the small func- what follows we will only use the first two terms, which tions K(r) 1, r =0 takes the form determine the so-called additive Markov chain [7, 8]. | |≪ | |6 A particular form of the conditional probability func- N tionofadditiveMarkovchainisthe chainwithstep-wise P(1ai−1 ) a¯+ F(r)(a a¯) memory function, | i−N ≃ i−r − r=1 X 1 2k N P(1k)= +µ 1 . (10) a¯+ K(r)(a a¯). (14) | 2 N − ≃ i−r − (cid:18) (cid:19) r=1 X TheprobabilityP(1k)ofhavingthe symbola =1after i N-word ai−1 contai|ning k unities, k = N a , is a This formula provides a very important tool for con- i−N l=1 i−l structing a sequence with a given pair correlation func- linear function of k and is independent of the arrange- ment of symbols in the word ai−1 . ThPe parameter µ tion. Notethati-independenceofthefunctionP(1|aii−−1N) i−N guaranteeshomogeneityandstationarityofthe sequence characterizes the strength of correlations in the system. under consideration; and finiteness of N provides its There is a rather simple relation between the memory ergodicity. Evidently, we can only consider sequences function F(r) (hereafter we will omit the subscript 1 of withthecorrelationfunctions,determinedbyP(1ai−1 ), F1(r))andthepaircorrelationfunctionofthebinaryad- | i−N which satisfy Eq. (7). ditive Markovchain. There were suggestedtwo methods The correlation functions are typically employed as forfinding theF(r)ofasequencewithaknownpaircor- the input characteristics for describing the random se- relation function. The first one [7] is based on the mini- quences. However,the correlationfunction describesnot mizationofa “distance” betweenthe Markovchaingen- only the direct interconnection of the elements a and erated by means of the sought-for memory function and i the initial given sequence of symbols with a known cor- ai+r, but also takes into account their indirect interac- tion via all other intermediate elements. Our approach relation function. The minimization equation yields the operates with the “origin” characteristics of the system, relationship between the correlation and memory func- specifically, with the memory function. The correlation tions, andmemoryfunctionsaremutuallycomplementarychar- N acteristics of a random sequence in the following sense. K(r)= F(r′)K(r r′), r>1. (11) The numerical analysis of a given random sequence en- − r′=1 ables one to determine directly the correlation function X rather than the memory function. On the other hand, where the normalized correlation function K(r) is given it is possible to construct a random sequence using the by memoryfunction,butnotthecorrelationone,inthegen- C(r) eral case. Therefore, the memory function permits one K(r)= , C(r)=(ai a¯)(ai+r a¯). (12) togetadeeperinsightintotheintrinsicpropertiesofthe C(0) − − correlatedsystems. Equation(14)showsthatinthelimit The second method for deriving Eq. (11) is the com- of weak correlations both functions play the same role. pletely probabilistic straightforwardcalculation [9]. The concept of the additive Markov chain was exten- Equation (11), despite its simplicity, can be analyti- sively used earlier for studying random sequences with cally solved only in some particular cases: for one- or long-rangecorrelations. Theexamplesandreferencescan two-step chains, Markov chain with step-wise memory be found in Ref. [8]. 4 III. DIFFERENTIAL ENTROPY UponusingEq.(17)foraveragingh(aL+1 aL1)andinview | ofδ =0,thedifferentialentropyofthesequencebecomes In order to estimate the entropy of an infinite station- 1 ary sequence A of symbols ai one could use the block hL = hL≤N =h0− 2ln2 Lr=1F2(r), (21) entropy, ( hL>N =hL=N. P H = P(aL)log P(aL). (15) Ifthelengthofblockexceedsthememorylength,L>N, L − 1 2 1 theconditionalprobabilityP(1ai−1)dependsonlyonN a1X,...,aL | i−L previous symbols, see Eq. (4). Then, it is easy to show HereP(aL1)=P(a1,...,aL)istheprobabilitytofindthe from (17) that the differential entropy remains constant L-word aL in the sequence. The differential entropy, or at L N. The second line of Eq. (21) is consistent with 1 ≥ entropy per symbol, is given by the first one because in the first approximation in δ the correlationfunctionvanishesatL>N togetherwiththe hL =HL+1 HL, (16) memory function. The final expression, the main result − ofthepaper,forthedifferentialentropyofthestationary and specifies the degree of uncertainty of the (L+1)th ergodic binary weakly correlatedrandom sequence is symbolsobservingiftheprecedingLsymbolsareknown. The source entropy (or Shannon entropy) is the differ- 1 L ential entropy at the asymptotic limit, h = lim h . hL =h0 K2(r). (22) L→∞ L − 2ln2 Thisquantitymeasurestheaverageinformationpersym- Xr=1 bol if all correlations, in the statistical sense, are taken into account. IV. DISCUSSION The differential entropy h can be presented in terms L of the conditional probability function. To show this we It follows from Eq. (22) that the additional correction have to rewrite Eq. (15) for the block of length L+1, expressP(aL+1)viatheconditionalprobability,andafter totheentropyh0 oftheuncorrelatedsequenceistheneg- 1 ative and monotonously decreasing function of L. This a bit of algebra we obtain is the anticipated result — the correlations decrease the entropy. The conclusion is not sensitive to the sign of hL = P(aL1)h(aL+1|aL1)=h(aL+1|aL1). (17) correlations: persistent correlations, K > 0, describing a1,...X,aL=0,1 an “attraction” of the symbols of the same kind, and anti-persistent correlations, K < 0, corresponding to an Here h(aL+1|aL1) is the amountof informationcontained attraction between “0” and “1”, provide the corrections in the (L+1)-th symbol of the sequence conditioned on of the same negative sign. If the correlation function is L previous symbols, constantat16r6N,theentropyis alineardecreasing functionofthe argumentLupto the pointN;the result h(aL+1|aL1)=− P(aL+1|aL1)log2P(aL+1|aL1).(18) is coincidentwith that obtained in Ref. [14] (in the limit aL+X1=0,1 of weak correlations) for the Markov chain model with step-wise memory function (10). So,thedifferentialentropyh ofrandomsequenceispre- L As an illustration of result (22), in Fig. 1 we present sented as a special case of the standard conditional en- the plot of the differential entropy versus the length L. tropy H = P(C) P(B C)log P(B C). The cond−itionCal probabiBlity P(|1ai−12) at L|<N, Both numerical and analytical results (the dotted and P P | i−L solid curves) are presented for the power-law correlation function K(r) = 0.01/r1.1. The cut–off parameter r of L c P(1ai−1) a¯+δ; δ = F(r)(a a¯), (19) thepower-lawfunctionfornumericalgenerationofthese- | i−L ≃ i−r − quence, coinciding with the memory length of the chain, Xr=1 is 104. The good agreement between the curves is the is obtained in the first approximation in the parameter manifestation of adequateness of the additive Markov δ from Eq. (14) by means of the probabilistic reasoning chain approach for studying entropy properties of ran- presented in the Appendix. domchains. Theabruptdeviationofthedashedlinefrom Takingintoaccounttheweaknessofcorrelations, δ the upper analytical and numerical curves at L 10 is | |≪ ∼ min[a,(1 a)], one can expand the right-hand side of the result of violation of inequality (2) and a manifesta- − Eq. (18) in Taylor series up to the second order in δ, tion of quickly growing errors in the entropy estimation h(aL+1 aL1)=h0+(∂h/∂a)|δ=0δ+(1/2)(∂2h/∂a2)|δ=0δ2, by using the probability P(a1,...,aL) of the L blocks | where the derivatives are taken at the “equilibrium occurring. NotethatviolationofEq.(2)doesnotdepend point” P(1|aii−−1L) = a¯ and h0 is the entropy of uncor- on the choice of the model parameter. It only depends related sequence, on the length M of the random sequence. In the main panel of Fig. 1 the deviation of numerical h0 = a¯log2(a¯) (1 a¯)log2(1 a¯). (20) curve from analytical one is nearly absent. Nevertheless − − − − 5 after decimation Eq. (21) does not change its form, but insteadofN wehaveonlytoputthenewmemorylength N∗. 1.0000 0.99989 h 1.000 h 0.99988 800 1600 0.9999 L 0.995 h 0.990 0 10 20 30 L FIG. 1: The differential entropy vs the length of words. The 0.985 solid lineis theanalytical result,Eq.(22),forthecorrelation 1 10 100 1000 10000 function K(r) = 0.01/r1.1 and a¯ = 1/2, whereas the dots correspond to the direct evaluations of the same Eq. (22) for L thenumericallyconstructedsequence(ofthelengthM =108 FIG.2: DifferentialentropyhvslengthLforR3chromosome and the cut-off parameter rc =104) by means of conditional of Drosophila melanogaster DNA of length M ≃ 2.7×107. probabilityfunction(14)andthenumericallyevaluatedcorre- Thesolid line isobtained byusing Eq.(22) with numerically lationfunctionK(r)oftheconstructedsequence. Thedashed evaluated correlation function Eq. (12). The dashed line is line is thedifferential entropy,Eqs.(15) and (16), plotted by the differential entropy, Eqs. (15) and (16), plotted by using using the numerical estimation of probability P(a1,...,aL) the numerical estimation of probability P(a1,...,aL) of the of the L-blocks occurring in the same sequence. The inset L blocks occurring in thesame sequence. demonstrate the linear dependence of differential entropy at large L governed by fluctuationsof thecorrelation function. in the large scale, presented in the inset, a systematic V. FINITE RANDOM SEQUENCES linear deviation of numerical result from the analytical one is clearly seen. Explication of this phenomenon is The relative average number of unities a¯, correlation given in the next section while discussing finite random functions and other statistical characteristics of random sequences. sequencesaredeterministicquantitiesonlyinthelimitof Our next illustration of applicability of the developed their infinite lengths. It is the direct consequence of the theory deals with the DNA sequence of the R3 chromo- law of large numbers. If the length M of the sequence some of Drosophila melanogaster. In Fig. 2 the plot of is finite, the set of numbers aM cannot be considered 1 the differential entropy versus the length L is presented. anymore as ergodic sequence. In order to restore its sta- Weseethatcoincidenceofthetwoapproachesonlyholds tus we have to introduce an ensemble of finite sequences for L . 5−6 units. It is difficult to do a single-valued aM1 p,p N = 0,1,2,.... However, we would like to conclusion of which factor, finiteness of the chain and { } ∈ retain the right to examine finite sequences even if ap- violation of Eq. (2) or strength of correlations, is more proximatelybyusingasinglefinite chain. So,forafinite important for discrepancy between two theories. Nev- chainwehavetoreplacedefinition(12)ofthecorrelation ertheless, even observed coincidence between two curves function by the following one, seems rather astonishing. Markov’schains with step-wise memory functions and M−r−1 1 a larger class of permutable chains are invariant under CM(r)= (ai a¯)(ai+r a¯), (23) decimation procedure [11]. Chains whose conditional M r − − − i=0 X probabilityfunctionsareindependentoftheorderofsym- M−1 1 bols in the N wordprecedinga generatedsymbolarere- a¯= a . i ferredtoaspermutable. Thedecimationisareductionof M i=0 X a randomsequence by regularorrandomremovingsome part of symbols from the whole chain. It was shown in Nowthe correlationfunctions anda¯ arerandomquan- Refs. [10, 11] that after decimation the correlationfunc- tities depending on a particular realization of the se- tion of indicated classes of sequences is invariant up to quence aM. Their fluctuations can contribute to the en- 1 thenewreducedmemorylengthN∗ =λN,whereλisthe tropy of finite random chains even if the correlations in relativenonremovedpartofsymbolsinthechain. Hence, therandomsequenceareabsent. Itiswellknownthatthe 6 orderofrelativefluctuationsofadditiverandomquantity ItisclearthatinthelimitM thisfunctiontrans- (as, e.g. the correlationfunction Eq. (23)) is 1/√M. forms into Eq. (22). When L→∞M the last term in ≪ Below we give more rigorous justification of this ex- Eq. (28) takes the form L/M and describes the linear planation and show its applicability to our case. Let us decreasing entropy in the inset of Fig. 1. presentthecorrelationfunctionC (r)asthesumoftwo The squared correlation function K2 (r) is usually a M M components, decreasing function of r, whereas the function K2(r) is f an increasing one. Hence, the terms L K2 (r) and C (r)=C(r)+C (r), (24) r=1 M M f log [M/(M L] being concaveand convex functions de- 2 − P scribe competitive contributions to the entropy. It is not where the first summand C(r) = lim C (r) is the M→∞ M possible to analyze all particular cases of their relation- correlation function determined by Eqs. (12) and (23), ship. Therefore we indicate here the most interesting obtained by averaging over the sequence with respect to ones keeping in mind a monotonically decreasing corre- the index i, numbering the elements a of the sequence i lation function. An example of such type of function, A;andthesecondone,C (r), isafluctuation–dependent f K(r)=a/rb, a>0, b>1, was considered above. contribution. ThefunctionC(r)canbealsopresentedas If the correlations are extremely small and compared the ensemble average C(r) = C (r) due to ergodicity h M i with the inverse length M of the sequence, K2 (1) of the sequence. M ∼ 1/M, the fluctuating part of entropy exceeds the corre- Now we can find a connection between variances of lation one nearly for all values of L>1. C (r) and C (r). Taking into account that the corre- M f lations are weak and neglecting their contribution into C (r) we have f 0.99992 C2 (r) =C2(r)+ C2(r) . (25) h M i h f i In order to obtain the last equation we used Eq. (24) 0.99991 and the property of the function C (r) = 0 at r = 0. f h i 6 Themeanfluctuationofthesquaredcorrelationfunction C2(r) is h f 0.99990 M−r−1 1 hCf2(r)i= (M r)2 h(an−a¯)(an+r −a¯) − n,Xm=0 0.99989 (am a¯)(am+r a¯) . (26) × − − i 1 10 100 1000 Neglecting correlations between the elements a and n taking into accountthat only the terms with n=m give L nonzero contribution into the result we easily obtain FIG. 3: The differential entropy vs the length of words. The K2(r) = hCf2(r)i, K2(r) = 1 1 . (27) solid line is the analytical result for the correlation function h f i C2(0) h f i M r ≃ M K(r) = 0.01/r1.1, whereas the dots correspond to the di- f − rect numerical evaluations Eq. (22) for the numerically con- Note that Eq. (27) is obtained by means of averaging structed sequence of the length M =108 and the cut-off pa- over the ensemble of chains. This is the shortest way to rameter rc = 20. The dashed line is the differential entropy with fluctuation correction described byEq. (28). obtainthedesiredresult. Atthesametime,fornumerical simulations we used only averaging over the chain as is With increasing of M (or correlations), when the in- seen from Eq. (23), where summation over the sites i of equality K2 (1) > 1/M is fulfilled, there is at list one the chain plays the role of averaging. M point where the contribution of fluctuation and corre- Notealsothatdifferentsymbolsa inEq.(26)arecor- i lation parts of entropy are equal. For monotonically de- related. It is possible to show that contribution of their correlations to K2(r) is of order R /M2 1/M. creasingfunctionK(r)thereisonlyonesuchpoint. Com- h f i c ≪ paring the functions in square brackets in Eqs. (28) we The fluctuating part of entropy, proportional to L K2(r), should be subtracted from Eq. (22), which findthattheyareequalatsomeL=Rs,whichhereafter r=1 f will be referred to as a stationarity length. If L R , is only valid for the infinite chain. Thus, Eqs. (25) ≪ s P the fluctuations ofthe correlationfunction arenegligibly and (27) yield the differential entropy of the finite bi- small with respect to its magnitude, hence the finite se- nary weakly correlated random sequences quencemaybeconsideredasquasi-stationary. AtL R s ∼ thefluctuationsareofthesameorderasthegenuinecor- L hL =h0− 2l1n2" KM2 (r)−log2 MM L#. (28) creoluantitonthefuflnuctcitounatKio2n(rc)o.rrHecetrieonwdeuheavtoe fitonittaenkeesisntoof tahce- r=1 − X 7 random chain. At L > R the fluctuating contribution distances(intheregionL 300ofgrammaticalrulesac- s ≤ exceeds the correlation one. tion). Atlongdistances(intheregionL 300ofseman- ≥ The other important parameter of the random se- ticrulesaction)theymanifestweak persistentpower-law quence is the memory length N. If the length N is correlations. It is clear that our model of the additive less than R , we have no difficulties to calculate the Markovchaincanonlyclaimtodescribetheweakpower- s entropy of finite sequence, which can be considered as law part of entropy, proportional to L−γ. quasi-stationary. This case is illustrated in Fig. 3. If the Ebeling and Nicolis [12] and Schu¨rmann and Grass- memory length exceeds the stationarity length R . N, berger [13] suggested the empirical form of entropy for s we have to take into account the fluctuation correction written texts to the entropy. log L h =h+c 2 , γ >0. (29) L Lγ There emerges a natural question about the origin of 0.99991 thisdependence. Thepartialanswertothequestionisas follows. The entropy of the Markovchainwith step-wise memoryfunction (10)inthelimitof strong correlations, 0.9998 h NlnN(1 2µ) 4µ, was obtained in Ref. [14] − ≪ 0.99990 log L h 0.9996 hL =h+c 2 . (30) L 100 102 104 L After comparingthe results ofEqs.(21)and(30) with thatofEq.(29),itbecomesclearthatthetermlog Lde- 2 0.99989 scribesstrongshort-rangecorrelationsandthepower-law termL−γ isresponsibleforweaklong-rangecorrelations. So, we need a combined model that could unify two ap- 10 100 1000 proachesoftheadditiveMarkovchainexposedaboveand L the Markov chain with a step-wise memory function. Theanswertothequestionofwhichpartofthecorrela- FIG. 4: The differential entropy vs the length of words in tionormemoryfunctionisresponsibleforthedecimation L-axis log scale. The solid and dotted curves are the same as in the main panel of Fig. 1. The dashed line corresponds invariance is still unsolvable. to the direct evaluations of Eq. (22) for the sequencenumer- ically constructed by means of Eq. (14) with fluctuation cor- rection (28) and the cut-off parameter rc = 104. The inset VII. CONCLUSION AND PERSPECTIVES demonstratesthelargeLregionforthesequenceofthelength M =106. (i) The main result of the paper, the differential en- tropy of the stationary ergodic binary weakly correlated In Fig. 4 the plot of the differential entropy versusthe random sequence A is given by Eq. (22). The other im- lengthofwordsis shownasanillustrationofimportance portant point of the work is the calculation of the fluc- of this correction. Both numerical and analytical results tuation contribution to the entropy due to finiteness of arepresentedforthesamepower-lawcorrelationfunction random chains, the last term in Eq. (28). as in Figs. 1 and 3. Comparing sums of squared corre- (ii) In order to obtain Eq. (22) we used an assump- lation function K(r) = 0.01/r1.1 with contribution (26), tion that the randomsequence of symbols is the Markov proportional to log [M/(M L], we find that they are 2 chain. Nevertheless,thefinalresultcontainsonlythecor- equal at R 104. A gra−phical confirmation of this s relation function, does not contain the conditional prob- ≈ result is shown in the inset of Fig. 4. We can conclude ability function of the Markov chain. This allows us to thatthedashedlinesbetterapproximatetheoreticalsolid suppose that result (22) and the region of its applica- curves than dotted lines (till L 104). The nonmono- bility is wider than the assumptions under which it is ≈ tonedecreaseofh(L)isduetothe fluctuationofrandom obtained [16]. quantity, the entropy of the finite sequence. (iii) To obtain Eq. (22) we have supposed that cor- relations in the random chain are weak. This is not a very severe restriction. Many examples of such sys- VI. APPLICATION TO WRITTEN TEXTS tems,describedbymeansofthepaircorrelatoraregiven in Ref. [4]. The randomly chosen example of DNA se- A theory of additive Markov chains with long-range quences supports this conclusion. The strongly corre- memory was used for a description of correlation prop- lated systems, which are opposed to weakly correlated erties of literary texts [9]. The coarse-grained naturally chains, are nearly deterministic. For their description written texts were shown to be strongly correlated se- we need completely different approach. Their study is quences that possess anti-persistent properties at small beyond the scope of this paper. 8 (iv)Thedevelopedtheoryopensawayforconstructing Adding the symbol a to the string (ai−1 ,1) we i−N i−N+1 a more consistent and sophisticated approach describ- have ing the systems with long-range correlations. Namely, Eq. (22) can be considered as expansion of the entropy P(1ai−1 )= P(0,aii−−1N+1,1)+P(1,aii−−1N+1,1). in series with respect to the small parameter δ, where | i−N+1 P(ai−1 ) i−N+1 the entropyh0 ofthe non-correlatedsequence is the zero (A2) approximation. Alternatively, for the zero approxima- ReplacingheretheprobabilitiesP(a ,ai−1 ,1)with i−N i−N+1 tionwe canuse the exactly solvablemodel ofthe N-step the CPF P(1a ,ai−1 ) from equation similar to Markovchainwiththeconditionalprobabilityfunctionof | i−N i−N+1 that of Eq. (A1), words occurring taken in the form of the step-wise func- tion, Eq. (10). Another way to choose the zero approxi- P(b,ai−1 ,1) mation can be based on CPF obtained from probability P(1b,ai−1 )= i−N+1 , b=(0,1), (A3) of the bloc occurring Eq. (15). | i−N+1 P(b,aii−−1N+1) (v) In this paper we have considered the random se- after some algebraic manipulations, we get quences with the binary space of states, but almost all N−1 results can be generalized to non-binary sequences and P(1ai−1 )=a¯+ F(r)(a a¯) (A4) can be applied for describing natural written and DNA | i−N+1 i−r − r=1 texts. X F(N) (vi) Our consideration can be generalized to the + (1 a)P(1,ai−1 ) aP(0,ai−1 ) . Markov chain with the infinite memory length N. In P(ai−1 ) − i−N+1 − i−N+1 i−N+1 this case we have to impose a condition on the decreas- (cid:2) (cid:3) ing rate of the correlation function and the conditional From the compatibility condition for the Chapman- probability function at N . Kolmogorovequation (see, for example, Ref. [15]), →∞ P(ai )= P(ai−1 ),P (a ai−1 ), (A5) Acknowledgments i−N+1 i−N N i| i−N ai−XN=0,1 Wearegratefulfortheveryhelpfulandfruitfuldiscus- it follows that its solution is given by sionswithA.A. Krokhin,G.M. Pritula,S.V.Denisov, S. S. Apostolov, and Z. A. Mayzelis. P(k)=ak(1 a)N−k+O(δ). (A6) − Here P(k)is the probability to havek units and(N k) − Appendix A: zeros at fixed sites of the N-word. Therefore, the last term in the square brackets of Eq. (A4) van- ishes in the main approximation, so that the difference Here we prove Eq. (19) using Eq. (14) as a starting (1 a)P(1,ai−1 ) aP(0,ai−1 ) is of order of δ. point. It follows from definition (5) of the conditional − i−N+1 − i−N+1 Hence, we have to neglect the third term in the right- probability function (cid:2) (cid:3) hand side of Eq. (A4) because it is of the second order P(ai−1 ,1) in δ. So, Eq. (19) is provenfor L=N 1. By induction P(1ai−1 )= i−N+1 . (A1) the equation can be written for arbitra−ry L. | i−N+1 P(ai−1 ) i−N+1 [1] P. Ehrenfest, T. Ehrenfest, Encyklop¨adie der Mathema- [8] O. V. Usatenko, S. S. Apostolov, Z. A. Mayzelis, and tischen Wissenschaften (Springer, Berlin, 1011), p. 742, S. S. Melnik, Random finite-valued dynamical systems: Bd. II. additive Markov chain approach (Cambridge Scientific [2] D.LindandB.Marcus.AnIntroduction toSymbolicDy- Publisher, Cambridge, 2010). namics and Coding (Cambridge University Press, Cam- [9] S. S. Melnyk, O. V. Usatenko, V. A. Yampol’skii, bridge, 1995). V. A.Golick, Phys. Rev.E 72, 026140 (2005). [3] T. M. Cover, J. A. Thomas, Elements of Information [10] S. S. Apostolov, Z. A. Mayzelis, O. V. Usatenko, Theory (Wiley, New York,1991). V.A.Yampol’skii,Int.J.Mod.Phys.B22,3841(2008). [4] F. M. Izrailev, A. A. Krokhin, N. M. Makarov, Phys. [11] O.V.Usatenko,V.A.Yampol’skii, Phys.Rev.Lett.90, Rep.512, 125 (2012). 110601 (2003). [5] C.E.ShannonandW.Weaver,TheMathematicalTheory [12] W. Ebeling and G. Nicolis, Europhys. Lett. 14, 191 of Communication (University of Illinois Press, Urbana, (1991). Illinois, 1949). [13] T. Schu¨rmann and P. Grassberger, Chaos 6, 414 (1996). [6] A.N. Shiryaev,Probability (Springer, New York,1996). [14] S. Denisov, S. S. Melnik, A. A. Borisenko, O. V. Us- [7] S.S.Melnyk,O.V.Usatenko,V.A.Yampol’skii, Phys- atenko, and V.A. Yampolsky (unpublished). ica A 361, 405 (2006). [15] C. W. Gardiner, Handbook of Stochastic Methods for 9 Physics, Chemistry, and the Natural Sciences, Springer [16] S.A. Apostiolov (private communication). Series in Synergetics, Vol. 13 (Springer-Verlag, Berlin, 1985).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.