ebook img

Extremum statistics: A framework for data analysis PDF

28 Pages·0.21 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Extremum statistics: A framework for data analysis

2 0 0 2 n a J 2 2 ] h c e m Extremum Statistics: a Framework for Data Analysis. - t a t s . t S. C. Chapman,G. Rowlands1 and N. W. Watkins2 a m 1 PhysicsDept. Univ. ofWarwick, CoventryCV4 7AL, UK - d 2BritishAntarcticSurvey,High Cross,MadingleyRd., CambridgeCB3 0ET,UK n o c [ 3 v 5 1 Manuscriptsubmittedto 0 6 NonlinearProcesses inGeophysics 0 1 February 1, 2008 0 / t a m - d n o c : v i X r a Abstract Recent work has suggested that in highly correlated systems, such as sandpiles, turbulent fluids, ignited trees inforest firesandmagnetization inaferromagnet close to a critical point, the probability distribution of a global quantity (i.e. total energy dissipation, magnetization andsoforth)thathasbeennormalizedtothefirsttwomo- mentsfollowsaspecificnonGaussiancurve. Thiscurvefollowsaformsuggestedby extremumstatistics, whichisspecifiedbyasingleparametera(a =1corresponds to theFisher-Tippett TypeI(“Gumbel”)distribution.) Here, we present a framework for testing for extremal statistics in a global ob- servable. In any given system, we wish to obtain a in order to distinguish between the different Fisher Tippett asymptotes, and to compare with the above work. The normalizationsoftheextremalcurvesareobtainedasafunctionofa. Wefindthatfor realisticrangesofdata,thevariousextremaldistributionswhennormalizedtothefirst twomoments aredifficult todistinguish. Inaddition, the convergence tothelimiting extremal distributions for finite datasets is both slow and varies with the asymptote. However, whenthethirdmomentisexpressed asafunction ofathisisfound tobea moresensitivemethod. 1 1 Introduction The study of systems exhibiting non Gaussian statistics is of considerable current interest (see e.g. Sornette (2000) and references therein). These statistics are often observed to arise in finite size many body systems exhibiting correlation over a broad range of scales; leading to emergent phenomenology suchasselfsimilarityandinsomecasesfractionaldimension(Bohretal.,1998). The apparent ubiquitous nature of this behavior has led to interest in self organized criticality (Bak,1997;Jensen,1998)asaparadigm;otherhighlycorrelatedsystemsincludethoseexhibiting fully developed turbulence. In solar terrestrial physics in particular, problems of interest include MHD turbulence in the solar wind and in the earth’s magnetotail. Irregular or bursty transport and energy release in the latter has recently led to complex system approaches such as SOC (see the review by Chapman and Watkins, Space. Sci. Rev., 2001). These complex systems are often characterized by a lack of scale, and in particular, by the exponents of the power law probability distributions (PDF)ofpatches ofactivity inthesystem. Examplesofthese patches ofactivity are energy dissipated by avalanches in sandpiles, vortices in turbulent fluids, ignited trees in forest fires and magnetization in a ferromagnet close to the critical point. In the earth’s magnetotail, patchesofactivityintheauroraasseenbyPOLARUVIhavebeenusedasaproxyfortheenergy released in bursty magnetotail transport in order to infer its scaling properties (Lui et al., 2000; Uritsky et al., 2001). The challenge is to distinguish the system from an uncorrelated Gaussian process, by demonstrating self similarity; and to determine the power law exponents. To do this directly is nontrivial, requiring measurements of the individual patches or activity events over manydecades. Hereweconsider whatmaybeamorereadily accessible measure, thestatistics of aglobalaveragequantity suchasthetotalenergydissipation, magnetization andsoforth. Animportanthypothesisthatisthesubjectofthispaperisthatthedataarisefromanextremum process; i.e. thatsomeunknownselection processoperates suchthattheobserved globalquantity isdominatedbythelargesteventsselectedfromensemblesofindividual‘patches’ofactivity. This is a real possibility for two reasons. First, measurements of physical systems, and in particular, observationsofnaturalsystems,inevitablyincorporateinstrumentalthresholdsandthismayaffect the statistics of a global quantity comprising activity summed over patches. Second, there has recently been considerable interest in a series of intriguing results from turbulence experiments (Labbe et al., 1996; Pinton et al., 1999; Bramwell et al., 1998), and numerical models exhibiting correlations (Bramwellet al. (2000), but see also Ajiand Goldenfeld (2001); Zheng and Trimper (2001); Bramwell et al. (2001)). These studies reveal statistics of a global quantity (i.e. E) that 2 follows curves that are of the form of one of the limiting extremal distributions: (Gumbel, 1958; FisherandTippett,1928) P(E) = K(ey−ey)a y = b(E s) (1) − where K,b and s are obtained by normalizing to the first two moments (M = 1, M = 0, 0 1 M = 1),andthesingleparameteraappearstobeclosetothevalueπ/2. 2 Foraninfinitelylargeensemble, therearetwolimitingdistributions thatweconsider here. The Fisher-Tippett type I (or ‘Gumbel’) extremal distribution is of the form (1) but with a = 1 and arisesfromselectingthelargesteventsfromensembleswithdistributionsthatfalloffexponentially or faster. Since we wish to construct a framework that could encompass all highly correlated systems we also treat the case where the distribution of ‘patches’ is power law. An example is thePottsmodel(Cardy,1996)formagnetizationwhereconnectedbondsformclusters,thesizeof which is powerlaw distributed at thecritical point. In this case the relevant extremal distribution isFisher-Tippett typeII(or’Frechet’). Here we provide a framework for comparing data with Fisher Tippett type I and II extremal curves. This essentially requires obtaining the normalizations of these curves in terms of the momentsofthedataandultimatelyasfunctions ofthesingleparametera. Wefindthatthecurvesofform(1)whichareobtainedbynormalizingtothefirsttwomoments are difficult to distinguish if a is in the range [1,2] or from Frechet curves given a realistic range ofdata. Furthermorewedemonstratethatslowconvergence withrespecttothesizeofthedataset, tothe limiting a = 1extremal distribution has the consequence that, for alarge but finite ensem- ble, the extremal distribution of an uncorrelated Gaussian process is indistinguishable from the a = π/2 curve. Toovercome these limitations wesuggest two much more sensitive methods for determining whether or not the curve is of the form (1), and, if so, the corresponding value of a. Thesemethodsarebasedonthethirdmoment,andthepeakofthedistribution, bothofwhichwe obtainhereasafunctionofa. 2 Extremumstatistics: generalresults. To facilitate the work here we first develop some results from extremum statistics (for further background reading see Sornette (2000); Gumbel (1958); Bouchaud and Potters (2000)). If the ∗ maximum Q drawn from an ensemble of M patches of activity Q with distribution N(Q) is 3 ∗ ∗ Q = max Q ,..Q ,thentheprobability distribution (PDF)forQ isgivenby 1 M { } P (Q∗)= MN(Q∗)(1 N (Q∗))M−1 (2) m > − whereM isthenumberofpatchesintheensembleand ∞ ∗ N (Q )= N(Q)dQ (3) > ZQ∗ Wenow obtain P for large M,Q. Forgeneral PDFN(Q) wecan write(for appropriate choice m ∗ ofthefunction g(Q )): (1 N )M = e−Mg(Q∗) (4) > − ∗ andforsmallN (Q )wehave > N2 g(Q∗) = ln(1 N (Q∗)) N + > (5) > > − − ∼ 2 Wenowconsideracharacteristic valueofQ∗,namelyQ˜∗,suchthatbydefinition Mg(Q˜∗) = q (6) sothat N2(Q˜∗) q = Mg(Q˜∗) MN (Q˜∗)+M > + (7) > ≈ 2 ··· Wenowexpandg(Q∗)aboutQ˜∗ toobtain g′′(Q˜∗) g(Q∗) = g(Q˜∗)+g′(Q˜∗)∆Q∗+ (∆Q∗)2+ (8) 2 ··· andfrom(5)wehave ′ ∗ ∗ ∗ g (Q )= N(Q ) N(Q )N + (9) > − − ··· g′′(Q∗)= N′(Q∗) N′(Q∗)N +N2(Q∗)+ (10) > − − ··· where g′,g′′ denote differentiation with respect to Q∗, ∆Q∗ = Q∗ Q˜∗, and we have used − ′ ∗ N = dN /dQ = N. Inverting expansion (7)gives > > − MN>(Q˜∗)= q 1 q + M 1 e−Mq (11) − 2M ··· ≈ − (cid:20) (cid:21) (cid:16) (cid:17) ∗ Weobtainfrom(5)anditsderivatives withrespecttoQ : q q 1 q 2 g(Q˜∗) = 1 + + M − 2M 2 M ··· (cid:18) (cid:19) (cid:18) (cid:19) q q 3 = +0 (12) M M (cid:18) (cid:19) 4 whichtorelevantorderisconsistent with(6),and q g′(Q˜∗)= N(Q˜∗) 1+ + (13) − M ··· (cid:20) (cid:21) Forq finiteasM thisgivesg′(Q˜∗) = N(Q˜∗)andMN (Q˜∗) =q. > → ∞ − Wecan now consider the extremal statistics of specific PDFN(Q), and importantly show that ∗ P (Q )canbewrittenintheuniversalform(1). m 2.1 GaussianandExponential N(Q) If N(Q) falls off sufficiently fast in Q, i.e. is Gaussian or exponential it is sufficient to consider ∗ lowest order only in (5) giving g(Q ) N (Gumbel, 1958; Bouchaud and Mezard, 1997) and > ∼ q = MN (Q˜∗). Expanding (3)inQ∗ nearQ˜∗ givestothisorder: > ∞ MN (Q∗) = M N(Q)dQ MN(Q˜∗)∆Q∗ > ZQ˜∗ − = q 1 MN(Q˜∗)∆Q∗+ qe−MN(Qq˜∗)∆Q∗ (14) " − q ···# ≈ ∗ Expanding N(Q)aboutQ yields N′(Q˜∗) N(Q∗)= N(Q˜∗) 1+ ∆Q∗+ " N(Q˜∗) ···# N(Q˜∗)eNN′((QQ˜˜∗∗))∆Q∗ (15) ≈ Astothisorder(1 N )M−1 e−MN> wethenhavefrom(2) > − ≈ P (Q∗)= MN(Q∗)(1 N (Q∗))M−1 MN(Q∗)e−MN> m > − ≈ (eu−eu)a (16) ∼ with N′(Q˜∗)N (Q˜∗) > a= (17) − N2(Q˜∗) and MN (Q˜∗) N(Q˜∗) > ∗ u= ln ∆Q (18) a !− N>(Q˜∗) SincethroughoutweareconsideringQ˜∗large(M ,qfinite)wehavetheeffectivevalueof → ∞ aasthatgivenby(17)inthelimitQ˜∗ . ForN(Q)exponential theabovegivesa= 1. Inthe → ∞ 5 particularcaseoftheexponentialallthesummationswhichintheabovewehavetruncatedcanbe resummedexactlyandgivea 1,recovering theresultofBouchaudandMezard(1997). ≡ ForN(Q) Gaussian wecannot obtain a exactly in this way but as we shall see it is instructive tomakeanestimate. GivenN(Q)= N exp( λQ2)andexpandingequations(14),(15)and(16) 0 − tonextorderweobtain P = P¯ eR(u) m m ln2(q) 2ln(q) u¯2 R = +u¯ 1+ eu¯ (19) −4λQ˜∗2 4λQ˜∗2!− 4λQ˜∗2 − where we have used u = 2λQ˜∗∆Q∗ and u¯ = u + ln(q). To lowest order in ∆Q∗/Q˜∗ (i.e. − Q˜∗ )wehaveauniversalPDFwitha = 1,buttonextorder,thatis,neglectingonlytheterm → ∞ inu¯2 in(19)wehaveauniversal distribution ofform(1,16)with 2ln(q) a 1+ = 1 (20) ≡ 4λQ˜∗2! 6 2.2 PowerlawN(Q) ThePDFofpatchesN(Q)mayhoweverbeapowerlawandinthiscaseitwillfalloffsufficiently slowlywithQthatweneedtogotonextorderasin(7). IfweconsideranormalizablesourcePDF N 0 N(Q) = (21) (1+Q2)k thenforlargeQ(Q >>1)wehaveN(Q) N /Q2k andthenusing(3)and(7) 0 ∼ q q Q˜∗N(Q˜∗) =(2k 1)N (Q˜∗) = (2k 1) (1 ) (22) > − − M − 2M which with the above general expressions for g(Q˜∗) and its derivatives substituted into (8) gives ∗ anexpression forg(Q ) ∗ ∗ q ∆Q ∆Q g(Q∗) = 1 (2k 1) +k(2k 1)( )2 (23) M " − − Q˜∗ − Q˜∗ ···# WealsorequireanexpressionforN(Q∗),againexpandingaboutQ˜∗andobtainingthederivatives ofN(Q˜∗)fromthoseofg(Q˜∗)andvia(11)gives ∗ ∗ ∆Q ∆Q N(Q∗)= N(Q˜∗) 1 2k +k(2k+1)( )2 (24) " − Q˜∗ Q˜∗ # whichcanberearranged as ∗ ∗ N(Q∗)= N(Q˜∗)e −2k∆Q˜Q∗ +k(∆Q˜Q∗ )2 (25) h i 6 Aftersomealgebra(23)canberearranged togive Mg(Q∗) = qe −(2k−1)∆Q˜Q∗∗+2k2−1(∆Q˜Q∗∗)2 (26) h i Thesetwoexpressions combinetofinallygive P (Q) P (Q∗) (eu¯−eu¯)a (27) m m ≡ ∼ with ∗ ∗ ∆Q ∆Q u¯ = ln(a) ln(q) (2k 1) (1 ) (28) − − − − Q˜∗ − 2Q˜∗ and 2k a= (29) 2k 1 − Tolowestorder, neglecting the(∆Q∗/Q˜∗)2 term(28)reducesto(18). Hence a power law PDF has maximal statistics P (Q) which, when evaluated to next order, m can be written in the form of a universal curve (i.e. of form (1,16)) with a correction that is non negligible atthe asymptotes. This canbe seen (Jenkinson, 1955; Bouchaud and Potters, 2000) to beconsistentwiththewellknownresultduetoFrechetwhere(followingthenotationofBouchaud andPotters(2000)) ifwehavePDF 1 N(x) (30) ∼ x 1+µ | | then 1 N (31) > ∼ xµ Pm(x∗)= (x∗µ)1+µe−(x∗1)µ whichwecanwriteintheform Pm(x∗)= µeµ+µ1ln(µ+µ1)(eu−eu)a (32) µ+1 ∗ u= µln(x ) ln (33) − − µ (cid:18) (cid:19) whichisofuniversal form (1,16) inu. Noting thathere µ = 2k 1anda = (µ+1)/µ and that − tosecond order ∗ ∗ ∗ ∆Q ∆Q ∆Q (1 )= ln 1+ (34) Q˜∗ − 2Q˜∗ Q˜∗ ! wesimply identify 1+∆Q∗/Q˜∗ withx˜∗ toobtain (28). Tonext order in∆Q∗/Q˜∗ the analogue of(28)stillyieldstherighthandsideof(34). 7 2.3 Convergence tothelimitingdistributions TheaboveresultsshouldbecontrastedwiththederivationofFisherandTippett(FisherandTippett , 1928). Central to (Fisher and Tippett , 1928) and later derivations is that a single ensemble of NM patcheshasthesamestatisticsastheN ensembles(ofM patches),ofwhichitiscomprised. The fixed point of the resulting functional equation (Bhavsar and Barrow, 1985) for arbitrarily largeN andM isa = 1fortheexponentialandGaussianPDF,andtheFrechetresultforpowerlaw PDF.Here,weconsider afinitesizedsystemsothatalthough thenumberofrealizable ensembles of the system can be taken arbitrarily large, the number of patches M per ensemble is always large but finite. Importantly, the rate of convergence with M depends on the PDF N(Q). For an exponential or power law PDF we are able to resum the above expansion exactly to obtain a; and convergence will then just depend on terms O(1/M) and above. This procedure is not possible for N(Q) Gaussian, instead we consider the characteristic Q∗, that is Q˜∗ which for M arbitrarilylargeshouldbelargealso. Rearranging(7)tolowestorderforN(Q) = N exp( λQ2) 0 − yields √λQ˜∗ ln(M)implying significantly slowerconvergence. Thisisfurther discussed in ∼ Sornette(2000)(ppp. 19-21). The extremal distributions are thus essentially a family of curves that are approximately of universalform(1,16)andareasymmetricwithahandednessthatjustdependsonthesignofQ;we haveassumedQpositivewhereasonecouldchooseQnegativeinwhichcaseN(Q) N( Q ). → | | This would correspond to, say, power absorbed, rather than emitted, from a system. The single parameter a that distinguishes the extremal PDF then just depends on the PDF of the individual events. For N(Q) exponential we then recover exactly the well known result (Gumbel, 1958; Bouchaud and Mezard, 1997) a = 1. For a power law PDF a is determined by k via (29). We havealsodemonstratedthatforaGaussianPDFwithfinitebutlargeM andN thata= 1andwill 6 explorethesignificance ofthisinSection3.1. 3 Normalization tothefirsttwomoments To compare these curves with data we need P(Q¯) P (Q∗) in normalized form. This has m ≡ moments ∞ M = ynP¯(y)dy (35) n −∞ Z whichwewillobtainasafunction ofaandtheninsistthatM = 1,M = 0andM = 1. 0 1 2 8 Setting M = 0 (and M = 1, M = 1) in our analysis of extremal distributions does not 1 0 2 requireanyassumptionsabouttheformofthePDFexceptthatthemomentsexist. Itwillallowus towritetheanalytically obtained extremaldistributions asfunctions ofsingleparameter a. 3.1 Extremaldistributions arisingfromGaussianandexponential N(Q) ForGaussianandexponential PDFwehave P¯(y) = K(eu−eu)a (36) u= b(y s) (37) − Thishasmomentswhichconverge foralln. FromAppendixAwehavethatthenth moment: 1 ∞ [ln(a)+bs η]n dn M = P¯(y)dη − = Ke−aln(a) Γ(a) (38) n b −∞ bn dan Z whereη = ln(a) u. − To normalize we insist that M = 1, M = 0 and M = 1. The necessary integrals can be 0 1 2 expressed in terms of derivatives of the Gamma function Γ(a) (Gradshteyn and Ryzhik (1980)) andweobtaininAppendixA: b2 = Ψ′(a) b K = ealn(a) (39) Γ(a) (Ψ(a) ln(a)) s = − − b where 1 dΓ(a) Ψ(a) = Γ(a) da dΨ ′ Ψ(a) = da The ambiguity in the sign of b (and hence s) corresponds to the two solutions for P(Q¯) for positiveandnegativeQ. Wecannowplotthecurves,thatis,normalizedtothefirsttwomomentsandtheseareshownin Figure1. Experimental measurements ofaglobal PDFP(E)normalized toM wouldbeplotted 0 M P versus(E M )/M . Inthemainplotweshownormalizeddistributions oftheform(1,16) 2 1 2 − for a = 1,π/2 and 2. It is immediately apparent that the curves are difficult to distinguish over severaldecadesinP¯(y)andthustoobtainagoodestimatefora,thenumericalorrealexperiments 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.