APS/123-QED A Comparison of natural (english) and artificial (esperanto) languages. A Multifractal method based analysis J. Gillet∗ and M. Ausloos† GRAPES, B5a Sart-Tilman, B-4000 Li`ege, Belgium (Dated: February 3, 2008) Wepresentacomparisonoftwoenglishtexts,writtenbyLewisCarroll,one(Aliceinwonderland) and the other (Through a looking glass), the former translated into esperanto, in order to observe whether natural and artificial languages significantly differ from each other. We construct one dimensional time series like signals using either word lengths or word frequencies. We use the multifractalideasforsortingoutcorrelationsinthewritings. Inordertochecktherobustnessofthe methodswealsowrite(!) (consider?) thecorrespondingshuffledtexts. Wecomparecharacteristic functions and e.g. observe marked differences in the (far from parabolic) f(α) curves, differences whichweattributetoTsallisnonextensivestatisticalfeaturesinthefrequencytimeseriesandlength time series. The esperanto text has more extreme vallues. A very rough approximation consists in 8 modeling the texts as a random Cantor set if resulting from a binomial cascade of long and short 0 words(orwordsandblanks). Thisleadstoparameterscharacterizingthetextstyle,andmostlikely 0 in fine the author writings. 2 n PACSnumbers: a J As soon as modern fractals appeared in order to de- itcanbedecomposedthroughlevelthresholdswhichcan 6 1 scribe physical objects, it was evident that some gen- be thought to be a set of characters taken from an al- eralization was in order: multifractals spurred up, e.g., phabet. Here below we take real texts in fact as the ] since obvioulsy a fractal dimension D is not enough to source of experimental observations and follow the mul- L describe an object [1, 2]. The more so in non equilib- tifractal ideas to make an analysis of such texts. The C rium systems, characterized by some unusual dynamics. main question concerns whether multifractals are indeed s. Through a generator and from an initiator one can pro- found in real texts; a question raised in [20]; another is c duce a fractal object with a given dimension. How to whether the technique of analysis can give some insight [ produce realistic and meaningful multifractal models is on a logical construction [21], from which stems the pos- 1 a challenge. Do they really exist [3]? Do multifractal sibleconnectionofsuchideaswithcoding, transcription, v model exist nowadays [4]? These questions come in par- machinetranslation, socialdistances, networkproperties 0 allel with the measurement of the fractal dimension, ... of languages, ... [21] [22] [23] and more classically in 1 and its distribution. One question of interest is whether physicsaboutidentifyingcoherentstructuresinspatially 5 the apparently multifractal nature of an object is due to extended systems [24]. We examine two classical english 2 itsfinitesizeortoacomplexdynamicalfeatureorsome- texts but also one translation, into an unusual language, . 1 thing else! Some attempt in this direction results from and the corresponding shuffled texts. We focus on how 0 observation of multifractal features in meteorology and local/structural properties develop into global ones. 8 climatestudies[5, 6, 7], butalsoinmanyotherfields[8], 0 like mathematical finance [4, 9, 10, 11, 12, 13] Since Shannon [25], writings and codings are of inter- : v estinstatisticalphysics. Writingsaresystemspractically Let us recall that one has basically to obtain a D(q) i composed of a large number of internal components (the X functionorthef(α)spectrum,whereqrepresentsthede- words, signs, and blanks in printed texts). In terms of r greeofsomemomentdistributionofsomevariable,andα a issomesortofcriticalexponentatphasetransitions,also complexity investigations, writings which are a form of recordedlanguages,likelivingsystems,belongtothetop called the Holder exponent; f(α) being its distribution. levelclassinvolvinghighlyoptimizedtolerancedesign,er- There is a need for experimental work leading to re- ror thresholds in optimal coding, and financial markets liable D(q) and f(α) data, before modeling. Interest- [21]. Relevant questions pertain to the life time, concen- ing pioneering data should be here recalled : see work tration, distribution, .. complexity of these. One should on DLA [14, 15], DNA [16, 17], SOI [18], NAO [19], .... distinguish two main frameworks. On one hand, lan- It appears that most of the time some “signal is either guage developments seem to be understandable through directly a time series or is transformed into a time se- competitions, like in Ising models, and in self-organized ries; more generally, the signal is called a text, because systems. Theirdiffusionseemssimilartopercolationand nucleation-growthproblemstakingintoaccounttheexis- tenceofdifferenttimescales, forinter-andintra-effects; thisistherealmofanthropology. Thesecondframeorig- ∗Electronic address: [email protected]; Present address : Institut inates from more classical linguistics studies; it pertains de Physique Nucl´eaire, Atomique et de Spectroscopie, Universit´e to the content and meanings of words and texts. Con- deLi`ege,B-4000Li`ege,Belgium †Electronicaddress: [email protected] cerningtheinternalstructureofatext, supposedlychar- 2 acterized by the language in which it is written, it is tion or to the specificity of this author work. Previous well known that a text can be mapped into a signal, of work on the english AWL version (AWLeng) should be course through the alphabet characters. However it can mentioned [40], but pertains to a mere Zipf analysis. be also reduced to less abundant symbols through some In Sect. I, we present some elementary facts on these threshold, like a time series, which can be a list of +1 texts and briefly expose the methodology, i.e. we em- and -1, or sometimes 0. In fact, laws of text content and phasize that we distinguish between “frequency time se- structureshavebeensearchedforalongtimeagobyZipf ries (FTS) and “length time series (LTS). We recall the and others, see many refs. in [26], through the least ef- multifractal technique for this specific application. In fort (so called ranking) method. The technique is now order to check the robustness of the method we also in- currently applied in statistical physics as a first step to vent (write !) (or consider) the corresponding shuffled obtain, when they exist, the primary scaling law. Yet texts, to which we apply the same technique of analysis. long range order correlations (LROC) between words in Therefore, in Sect. II, we present the somewhat unusual text are searched for. In [27] it was claimed that LROC results, and discuss them in Sect. III. For the simplicity express an author’s ideas, and in fine consist in some of the discussion we very roughly approximate the texts author’s signature. as resulting from a binomial cascade of short and long Interestingly writings can be thought as social net- words. We obtain parameters in fine characterizing the works [23]. Social networks have fractal properties [28]; text style and the author’s writings. We observe that most usually they should be multifractals; one can thus such multifractals have a deep connection to Tsallis non imagine/consider that a text which is a form of partially extensive statistics as pointed out in ref. [41] in another self-organized social network (for words) due to gram- framework. matical and style constraints present multifractal fea- tures. The properties of such texts taken as signals have already been examined, e.g., a multifractal analysis of I. DATA AND METHODOLOGY Moby Dick letter distribution can be found in [20]. Even though we recognize such a pioneering paper, For our empirical considerations we have selected the we stress that sentences made of words, not letters, are two texts here above mentioned downloading them from translated. Thus we present below an original consider- a freely available site [37], resulting obviously into three ation in this respect, i.e. the analysis and results about files, called. Next, we have removed the chapter heads. a translation between one of the most commonly used All our analyses are carried over this reorganized file for language, i.e. english, and a relatively recent language, eachtext. Thereafter,wehaveshuffledthesetexts,inthe i.e. esperanto. Esperanto is an artificially constructed files, without taking into account the punctuation [42]. language [29], which was intended to be an easy-to-learn There are two ways to construct a time series from lingua franca. Statistical analyses seem to indicate that such documents esperanto’s statistical proportions are similar to those of 1. Take a document of N words. Select all the differ- other languages [30]. It was found that esperanto’s sta- entwords. Countthefrequencyf ofappearanceof tisticalproportionsresemblemostlythoseofGermanand each word in the document. The time of “appear- Spanish,andsomewhatsurprisinglyleastthoseofFrench ance is played by the rank position of the word in and Italian. English seems to be the intermediary case. the file. We map the word frequency to a time se- Comparison of different languages (writings) arising ries f(t). Such a time series is called the frequency from apparently different origins or containing different time series (FTS). signs, e.g. greek [31], turkish [32], chinese [33], ... even 2. Take a document of N words. Consider the length somewhatartificiallanguages,likethoseusedforsimula- l (number of letters) of each word. Record where tioncodesoncomputers[34]hasalsobeenmade. Toour each word of length l is located in the text; the knowledge few comparisons have been presented about time is played by the position of the word in the written texts translated from one to another language document, i.e. the first word is considered to be [35,36]andinparticularfromthepointofviewofLROC emittedattimet=1,thesecondattimet=2,etc. in words. A time series l(t) is so constructed. We refer to The text used here was chosen for its wide diffusion, such a time series as the length time series (LTS). freelyavailablefromtheweb[37]andasarepresentative one of a famous scientist, Lewis Caroll, i.e. Alice in Obviously there is a large number of ways to map a wonderland (AWL) [38]. Moreover knowing the special text to a time series, but in the present study we only (mathematical)qualityofthisauthormind,andsome,as consider the above two since some physical meaning can we thought a priori, some possibly special way of writ- be thought to arise in the mapping. As indicated in [43] ing, another text has been chosen for comparison, i.e. the length of the word is associated with speaker effort, Through a looking glass (TLG) [39]; - to our knowledge meaning that the longer the word the higher the effort only available in english (on the web). This will allow required to pronounce it. The frequency of the word is us to discuss whether the differences, if any, between es- also associated with the hearer effort as frequently used peranto and english, are apparently due to the transla- words require less effort to be understood by the hearer. 3 1.4 AWLeng 1 AWLeng AWLesp AWLesp Generalized fractal dimension D(q)1.2 TLGeng Generalized critical exponent f()α–01 TLGeng 1.0 1.0 1.2 1.4 1.6 –20 –10 0 10 20 α q 1.4 1 AWLeng ) neralized fractal dimension D(q)111...123 ATLWGLenesgp Generalized critical exponent f(α–01 AWLeng Ge1.0 AWLesp TLGeng 1.0 1.2 1.4 α –20 –10 0 10 20 FIG. 3: f(α) for (a) FTS (b) LTS of the indicated texts q FIG. 1: D(q) for (a) FTS (b) LTS of the original texts Shuffled AWLeng 1.10 Shuffled AWLesp 1.0 Shuffled AWLeng Shuffled TLGeng Shuffled AWLesp neralized fractal dimension D(q)11..0005 Generalized critical exponent f()α 00..05 Shuffled TLGeng Ge 0.95 –0.5 –20 –10 0 10 20 0.9 1.0 1.1 1.2 q α 1.0 Shuffled AWLeng 1.2 Shuffled AWLesp D(q) Shuffled TLGeng nt f()α 0.5 Generalized fractal dimension 11..01 Generalized critical expone–00..05 SShhuufffflleedd AAWWLLeenspg Shuffled TLGeng –1.0 0.9 1.0 1.1 1.2 1.3 α –20 –10 q0 10 20 FIG. 4: f(α) for (a) FTS (b) LTS of the shuffled indicated FIG. 2: D(q) for (a) FTS (b) LTS of the shuffled indicated texts texts 4 These time series are thereby analyzed along the mul- II. RESULTS tifractal ideas, for which we briefly recall the formulae of interest in order to set the notations. The results of the FTS and LTS multifractal analysis forthethreemaintextsandtheirshuffledcorresponding ones are shown in Figs. 1-4 (a-b). A. Multifractal Analysis Amereperusingofthegraphsindicatethatthemulti- fractalapproachisingoodorder,e.g. sinceD(q)isnota Letthe(LTSouFTS)timeserieshavingN datapoints single point, and should allow one to observe interesting (words, here), i.e. yi (1≤i≤N). LRO correlations and local ones. Transformtheseriesasfollows: ifthelengthofaword mot in LTS (or its frequency in FTS) is smaller than the nextone,theformerwordgetsavalue=2;ifitisgreater, A. D(q) plots: Figs. 1-2 it gets the value = 1; and 0 if both are equal. The new series is called M (1 ≤ i ≤ N −1). Each i InFTS,thegeneralizedfractaldimensionhasasimilar M is cut into N subseries of size s , where N is the i s s setofvaluesforbothenglishtexts,decayingfromca. 1.2 smallest integer in N/s. The ordering starts from the to 1.0 for q increasing but negative; D(q) decays slowly beginningofthetext(contrarytoanalysesinwhichsome forqpositive,barelyreachingavalue0.95forq=80(Fig. forecastingisexpected,andforwhichtheend“pointsare 1). ThevalueofD(q)ismuchgreateralongthenegative more relevant). q axis, for AWLesp but is identical to the other two for Next one calculates the probability q ≥ 0. In LTS, even though the form of D(q) is that to Σs M be expected, it has to be stressed that the AWLeng and P(s,ν)= ΣNsi=Σ1s (Mν−1)s+i (1) AWLesp are very quantitatively similar, but markedly ν=1 i=1 (ν−1)s+i differfromTLGeng. Thisalreadyindicatesthatonecan for every ν and s. Thereafter one calculates observe the high creativity of the author through these two books. Moreover the translation effect on style is χ(s,q)=ΣNν=s1P(s,ν)q (2) much better seen on FTS than LTS. The shuffled texts (Fig.2) remarkably have the same for each s value. A power law behaviour is expected D(q) values; their range and variations being similar to χ(s,q)∼sτ(q); (3) those of the real texts. Slight quantitative differences occur, more markedlyforthe shuffledAWLesp FTS,but where τ(q) plays the role of the partition function [1]. along a Baeysian reasoning these can be attributed to The generalized fractal dimension D(q) [1, 2] is defined the finite size of the sample. from By the way, (cid:12) τ(q) dτ(q)(cid:12) D(q)= q−1 (4) C1 = dq (cid:12)(cid:12) (8) q=1 and the generalized Hurst exponent, h(q), from isameasureoftheintermittencylyinginthesignaly(n); 1+τ(q) it can be numerically estimated by measuring τq around h(q)= . (5) q =1. InallcasesthevalueofC isclosetounity. Some q 1 conjectureontherole/meaningofC isfoundinRef.[45]. 1 Let From some financial and political data analysis it seems dτ(q) that H1 is a measure of the information entropy of the α= , (6) system. The same can be thought of here. dq fromwhichoneobtainsthegeneralizedcriticalexponent, f(α) curve [44] from B. f(α) plots: Figs. 3-4 The f(α) spectra are shown in Figs. 3-4. They are f(α)=qα−τ(q). (7) markedly non symmetric, as was found for DLA [14, 15], In the present work, we have calculated χ(s,q) for s withveryhighpositiveskewness, i.e. forq ≤0. Interest- between 2 and 200. The τ(q) values were calculated by ingly, the esperanto text curve behaves differently from a linear best fit on a log-log plot of χ(s,q) and s, for the english texts, in FTS, though TLGeng is different in q values ranging from -25 till 25. As one may expect theLTScase;theshuffledtextsf(α)spectrabehavingin the q and q − 1 values at the denominator in Eq.(4)- averysimilarqualitativeandquantitativeway. However Eq.(5)wereleadingtonumericalsingularities. Asmooth the shuffling does not fully symmetrize the spectra. interpolationcanbevisuallymadewithoutdifficulty. For It is interesting to observe that the f(α) curve is very conciseness we don’t show χ(s,q). sharp: it originates from negative values for α less than 5 1.0; reaches a maximum (=1.0) at 1.0, at the maximum Series D(q) Ds(q) f(α) fs(α) so called box dimension, and decays rapidly for α posi- AWLeng AWLeng AWLeng AWLeng FTS (cid:39) (cid:39) (cid:39) (cid:39) tive;f(α)=0atα=1.2and1.3respectivelyforAWLeng TLGeng TLGeng TLGeng TLGeng and TLGeng; the maximum is also reached at (1.0, 1.0) AWLeng AWLeng AWLeng AWLeng and the spectrum spans the (narrow) interval 0.90: 1.25 LTS (cid:39) (cid:39) (cid:39) (cid:39) for AWLesp on the f(α) =0 line. It is worth noticing AWLesp TLGeng AWLesp TLGeng thatthevaluesarereasonableinviewoftheircorrespon- dence to the fractal dimensions. On the other hand the TABLEI:Comparingoriginaltextsquasiidenticalbehaviors sharpness indicates a high lack of uniformity of the texts through functions D(q) and f(α), and their counterpart for LROC. shuffled (s) cases, i.e. Ds(q) and fs(α) Originaltexts Q α− α+ w1 w2 r1 r2 III. DISCUSSION WITH CONCLUSION AWLeng FTS 5.71 0.95 1.19 0.96 0.04 0.97 0.03 AWLesp FTS 4.39 0.94 1.30 0.99 0.01 0.99 0.01 TLGeng FTS 5.71 0.95 1.19 0.99 0.01 0.99 0.01 In summary, one can observe similarities between the AWLeng LTS 4.65 0.92 1.23 0.89 0.11 0.91 0.09 original andshuffledtexts and their translations; see Ta- AWLesp LTS 4.83 0.92 1.21 0.87 0.13 0.89 0.11 ble 1 for summarizing the similarities seen through D(q) TLGeng LTS 3.94 0.92 1.34 0.96 0.04 0.97 0.03 and f(α). The english texts look more similar with each Shuffledtexts Q α− α+ w1 w2 r1 r2 otherthanwithrespecttotheesperantotranslation. On AWLeng FTS 6.94 0.95 1.13 0.89 0.11 0.90 0.10 the other hand, one physical conclusion arises from the AWLesp FTS 6.57 0.96 1.16 0.97 0.03 0.97 0.03 TLGeng FTS 6.59 0.94 1.13 0.82 0.18 0.84 0.16 above : the existence of a multifractal spectrum found AWLeng LTS 4.35 0.91 1.25 0.88 0.12 0.90 0.10 for the examined texts indicates a multiplicative process AWLesp LTS 4.56 0.92 1.24 0.90 0.10 0.92 0.08 intheusualstatisticalsenseforthedistributionofwords TLGeng LTS 4.35 0.91 1.25 0.88 0.12 0.9 0.10 length and frequency in the text considered as a time se- ries. Thus linguistic signals may be considered indeed TABLE II: Tsallis Q-parameter, derived from α and α as the manifestation of a complex system of high di- − + values,readonFigs. 3-4,fromEq. (11),fortheoriginaltexts mensionality, different from random signals or systems and for shuffled or translated corresponding texts, according of low dimensionality such as the financial and geophys- to the type of series; the weights and ratios of the binomial ical (climate) signals. Finally, the f(α) curve represents cascade approximation (see text) are given the measurable aspects of the word networks, be they consideredthroughLTSorFTS.Ourworkconfirmsthat texts could be seen as networks indeed [23]. tervals the generalized fractal dimension (or rather τ(q)) Before suggesting a physical model describing the is obtained from construction of a writing, let us consider implications from the f(α) spectrum in some detail. The not fully n (cid:88) parabolic, to say the least, f(α) curve indicates non uni- wiqri−τ =1. (9) formity and strong LROC between long words and small i=1 words, - evidently arising from strong short range corre- The formula is easily generalized for random contraction lations between these. In some sense this is expected for ratios and weights. However it simplifies for the case of classicalwritings. Itisusuallyknownthattheleft(right) a simple binomial cascade, i.e. hand side of the f(α) curve corresponds to fluctuations oftheq ≥0(q ≤0)-correlationfunction. Inotherwords, wqr−τ +wqr−τ =1. (10) theycorrespondtofluctuationsinsmall(large)worddis- 1 1 2 2 tributions. Therefore the distribution of small and long Whence the extremal α values read α = − wordsshouldbeexaminedinfurtherworkinordertoob- log(w )/log(r ) and α = log(w )/log(r ), from 2 2 + 1 1 serve these local correlations, e.g. through a detrended which the weights and ratios can be estimated by fluctuation analysis. inversion (table I), thereby suggesting the author’s Moreover, in order to characterize the writings, texts somewhat systemic way used in his/her writings. and/or authors, we propose a very rough approxima- The physics connection can be obtained if one relates tion/model, i.e. let us consider (assume !) that the writ- the f(α) curve extremal points through their physical ings are made of only two types of words : small and meanings [48], i.e. large [46, 47], appearing through some recursive process. In so doing one can consider the behavior of the atypi- cal f(α) curve as originating form a binomial cascade of 1 1 1 = − (11) short and long words, on a support [0,1], with an arbi- 1−Q α α + − trarycontractionratior andaweightw forthewordin i i each successive subinterval, as for a multifractal Cantor whereα andα aretheextremesoftherangeofsup- − + setconstruction[1]. Foranarbitrarynumbernofsubin- port for the (positive) multifractal spectrum f(α) and Q 6 (instead of the usual q) is used to represent the parame- of these correlations, thus a new measure of an author’s ter arising in the non extensive description of statistical style. This suggests a (binomial, at first) cascade model physics [49]; see values in table I. By extension it is a containing parameters characterizing (or reflecting, at measure of the attractor dimension or the number of so least) the text style, and most likely in fine the author called degrees of freedom. It is obvious from the table writings. It remains to be seen whether the f(α) curve that Q varies between 4 and 7, with interesting differ- andthe(tobegeneralized)binomialcascademodel,with ences between the LTS and FTS cases, LTS’s Q being the weight and ratio parameters hold through in other systematically smaller, in the original or shuffled texts. cases, and can characterize authors and texts, - and in Notice that the value of Q is more extreme though with general time series. Moreover the multifractal method thesameorderofmagnitudeinthecaseoftheesperanto should additionally be able to distinguish a natural lan- text for both types of series. guagesignalfromacomputercodesignal[43]andhelpin Finally we re-emphasize the remarkable difference for improving translations by suggesting perfection criteria theesperantotext(Fig. 3a)withtheenglishtextsinthe and indicators of text qualitative values. FTS analysis. Linguistics input should be searched at this level and is left for further discussion. The origin of AcknowledgementsTheauthorswouldliketothank differences between TLG and AWL needs more work at D.Staufferforasusualfruitfuldiscussions.... Thiswork the linguistic level. However we have indicated the in- has been supported by European Commission Project terest of the multifractal scheme in providing a measure CREEN FP6-2003-NEST-Path-012864. [1] T.C. Halsey, M.H. Jensen, L.P.Kadanoff, I. Procaccia, Math. Gen. 33 (2000) 3637-3651. andB.I.Shraiman,Fractalmeasuresandtheirsingulari- [13] K. Ivanova and M. Ausloos, Low q-moment multifractal ties: The characterization of strange sets , Phys. Rev. A analysisofGoldprice,DowJonesIndustrialAverageand 33 (1986) 1141-1151 BGL-USD exchange rate, Eur. Phys. J. B 8 (1999) 665- [2] B.J. West and W. Deering, The lure of modern science : 669; Err. 12, 613 (1999). fractal thinking, (World Sci., River Edge, NJ, 1995) [14] S. Schwarzer, J. Lee, A. Bunde, S. Havlin, H. E. Ro- [3] Th.LuxandM.Ausloos,MarketFluctuationsI:Scaling, man and H. E. Stanley, Minimum growth probability of Multi-scaling and their Possible Origins, in The Science diffusion-limited aggregates , Phys. Rev. Lett. 65, 603- of Disaster : Scaling Laws Governing Weather, Body, 606 (1990) Stock-Market Dynamics, A.Bunde, J. Kropp, and H.- [15] J. Lee, S. Havlin, H.E. Stanley and J. E. Kiefer, Hier- J.Schellnhuber, Eds. (Springer Verlag, Berlin, 2002) pp. archicalmodelforthemultifractalityofdiffusion-limited 377-413 aggregation , Phys. Rev. A 42 (1990) 4832-4837 [4] Th. Lux , The Markov-Switching Multifractal Model of [16] C.-K.Peng,S.V.Buldyrev,A.L.Goldberger,S.Havlin, Asset Returns: Estimation via GMM Estimation and R.N.Mantegna,M.SimonsandH.E.Stanley,Statistical Linear Forecasting of Volatility, Journal of Business and propertiesofDNAsequences,PhysicaA221(1995)180- Economic Statistics (in press) 192. [5] K. Ivanova, N. Gospodinova, H.N. Shirer, T.P. Ack- [17] A. Rosas, E. Nogueira, and J. F. Fontanari, Multifrac- erman, M.A. Michalev, M. Ausloos, Multifractality of tal analysis of DNA walks and trails, Phys. Rev. E 66, Cloud Base Height Profiles, cond-mat/0108395 061906 (2002) [6 pages] [6] K.Ivanova,H.N.Shirer,E.E.Clothiaux,N.Kitova,M.A. [18] M.AusloosandF.Petroni,Tsallisnon-extensivestatisti- Mikhalev, T.P.Ackerman, and M. Ausloos, A case study calmechanicsofElNinosouthernoscillationindex,Phys- of stratus cloud base height multifractal fluctuations, ica A 373 (2007) 721 - 736 Physica A 308 (2002) 518-532 [19] C. Collette and M. Ausloos, Scaling Analysis and Evo- [7] R.G.KavasseriandR.Nagarajan,Amultifractaldescrip- lution Equation of the North Atlantic Oscillation Index tionofwindspeedrecords,Chaos,SolitonsandFractals, Fluctuations,Int.J.Mod.Phys.C15,1353-1366(2005) 24 (2005) 165-173 [20] A.N. Pavlov, W. Ebeling, L. Molgedey, A. R. Ziganshin [8] J.W. Kantelhardt, S.A. Zschiegner, E. Koscielny-Bunde, and V. S. Anishenko, Scaling features of texts, images A. Bunde, S. Havlin and H.E. Stanley, Multifractal de- and time series, Physica A 300 (2001) 310-324 trended fluctuation analysis of nonstationary time series [21] DavidB.Saakian,Errorthresholdinoptimalcoding,nu- , Physica A 316, 87 - 114 (2002) mericalcriteria,andclassesofuniversalitiesforcomplex- [9] M. Ausloos and K. Ivanova, Multifractal nature of stock ity, Phys. Rev. E 71, 016126 (2005) exchangeprices,Comp.Phys.Commun.147(2002)582- [22] V Zlatic, M Bozicevic, H Stefancic, M Domazet, 585 Wikipedias: Collaborative web-based encyclopedias as [10] J. Alvarez-Ramirez, M. Cisneros, C. Ibarra-Valdez and complex networks, Phys. Rev. E 74, 016115 (2006) A. Soriano, Multifractal Hurst analysis of crude oil [23] A. P. Masucci and G. J. Rodgers, Network properties of prices,Physica A 313 (2002) 651-670. writtenhumanlanguage,Phys.Rev.E74,026102(2006) [11] POswiecimka,JKwapien,SDrozdz,Waveletversusde- (8 pages) trended fluctuation analysis of multifractal structures, [24] Cosma Rohilla Shalizi, Robert Haslinger, Jean-Baptiste Phys. Rev. E 74, 016103 (2006) (17 pages) Rouquier,KristinaLisaKlinkner,andCristopherMoore, [12] E. Canessa, Multifractality in time series, J. Phys. A: Automatic filters for the detection of coherent struc- 7 tureinspatiotemporalsystems,Phys.Rev.E73,036104 laws, in New Methods in language Processing and Com- (2006) (16 pages) putational natural language learning (ACL, . XXX..., [25] C.Shannon,Amathematicaltheoryofcommunications, 1998)pp. 151-160. Bell. Syst. Tech. J. 27, pp. 379-423, 623-656 (1948); see [41] A. Erzan and J.P. Eckmann, q-analysis of Fractal Sets, alsoPredictionandentropyofprintedEnglish,BellSyst. Phys. Rev. Lett. 78 (1997) 3245-3248. Tech. J. 30 (1951) 50-64. [42] The shuffle algorithm is found on wikipedia. The first [26] J. Gillet and M. Ausloos, A Comparison of natural (en- “data point is exchanged with a following one, its loca- glish) and artificial (esperanto) languages. I. Zipf and tion chosen from a generated random number. The sec- Grasseberg Procaccia method based analysis, submitted onddatapointisexchangedwithafollowingone,chosen [27] M.Amit,Y.Shmerler,E.Eisenberg,M.AbrahamandN. from another random number, etc. The random number Shnerb,LanguageandCodificationDependenceofLong- generatorwaschecktoleadtoaratheruniformdistribu- Range Correlations in Texts, Fractals 2 (1994) 7 - 13 tion, for a number between 0 and 1. The algorithm was [28] Marin Boguna, Romualdo Pastor-Satorras, Albert Daz- applied ten times on the texts to get the final shuffled Guilera, and Alex Arenas, Models of social networks texts we used. based on social distance attachment, Phys. Rev. E 70, [43] K. Kosmidis, A. Kalampokis, P. Argyrakis, Language 056122 (2004) time series, Physica A 370 (2006) 808-816 [29] M. Boulton, Zamenhof, Creator of Esperanto. London: [44] A. Chhabra, R.V. Jensen, Direct determination of the Routledge, Kegan & Paul. (1960). f(α)singularityspectrum,PhysRevLett62(1989)1327 [30] BillManaris,LucaPellicoro,GeorgePothering,Harland - 1330. Hodges,InvestigatingEsperanto’sstatisticalproportions [45] N. Vandewalle and M. Ausloos, Coherent and random relative to other languages using neural networks and sequences in financial fluctuations, Physica A 246, 454- Zipf’s law, Proceedings of the 24th IASTED (Interna- (1997) tional Association Of Science And Technology For De- [46] Weapologizetoallauthorsforthissimplificationoftheir velopment)internationalconferenceonArtificialintelli- work.Yet,insomesensethisisequivalenttoconsidering gence and applications, held in Innsbruck, Austria, pp. that the (+/-1) Ising model describes all magnetic fea- 102 - 108 (ACTA Press, Anaheim, CA, USA, 2006) tures. Notice that this assumption on writings has been [31] N. Hatzigeorgiu, G. Mikros, G. Carayannis, Word successfullymadeinthestudyofblogs,[47]wherewords Length,WordFrequenciesandZipf’slawinthegreeklan- occurringwithanequivalentfrequencyareconsideredto guage,J.Quant.Ling.8(2001)175-185;GeorgeMikros, beidentical forstudyingtheblogsstatisticalproperties. Nick Hatzigeorgiu, George Carayannis, Basic Quantita- [47] R. Lambiotte, M. Ausloos, M. Thelwall, Word statistics tive Characteristics of the Modern Greek Language Us- in Blogs and RSS feeds: Towards empirical evidence, J. ingtheHellenicNationalCorpus,JournalofQuantitative Informetrics 1 (2007) 277 - 286. Linguistics, 12 (2005) 167 - 184 [48] T. Arimitsu and N. Arimitsu, Analysis of turbulence by [32] G. Dalkilic and Y. Cebi, Zipf’s Law and Mandelbrot’s statistics based on generalized entropies, Physica A 295 Constants for Turkish Language Using Turkish Corpus, (2001) 177. Lect. Notes Comput. Sci. 3621 (2004) 273-282. [49] C. Tsallis, Nonextensive statistics: theoretical, exper- [33] R Rousseau, Qiaoqiao Zhang , Zipf’s data on the fre- imental and computational evidences and connections, quency of Chinese words revisited, Scientometrics, 24 Braz. J. Phys. 29 (1999) 1. (1992) 201-220. [34] MZemankova,CMEastman-,Comparativelexicalanal- ysis of FORTRAN code, code comments and English text, ACM Southeast Regional Conference Proceedings of the 18th annual Southeast regional conference, Talla- hassee, Florida, 193 - 197 (1980) [35] When finalizing the writing of this paper we became aware of related work of interest [36]. [36] D.R. Amancio, L. Antiqueira. T.A.S. Pardo, L. da F. Costa, O.N. Oliveira Jr., M. G. V. Nunes, Complex networks analysis of manual and machine translations. Translations between Engflish, Spanush and Portuguese are used to find structural differences between human and computerized translations. Int. J. Mod. Phys. C 19 (2008) [37] Project Gutenberg (National Clearinghouse for Machine Readable Texts) (one mirror site is at UIUC ). Project Gutenberg (2005), accessed Sep. 25, 2005. http://www. gutenberg.org. [38] L.W. Carroll, Alice’s Adventures in Wonderland, Macmillan (1865); http : //www.gutenberg.org/etext/11 [39] L.W. Carroll, Through the Looking Glass and What Alice Found There, Macmillan (1871) ; http : //www.gutenberg.org/etext/12 [40] D.M.W.Powers,ApplicationsandexplanationsofZipf’s