MEPSA: a flexible peak search algorithm designed for uniformly spaced time series CristianoGuidorzia aUniversityofFerrara,DepartmentofPhysicsandEarthSciences,viaSaragat1,I–44122Ferrara,Italy Abstract 5WepresentanovelalgorithmaimedatidentifyingpeakswithinauniformlysampledtimeseriesaffectedbyuncorrelatedGaussian 1noise. The algorithm, called “MEPSA” (multiple excess peak search algorithm), essentially scans the time series at different 0 timescalesbycomparingagivenpeakcandidatewithavariablenumberofadjacentbins. Whilethishasoriginallybeenconceived 2 for the analysis of gamma–ray burst light (GRB) curves, its usage can be readily extended to other astrophysical transient n phenomena,whoseactivityisrecordedthroughdifferentsurveys. Wetestedandvalidateditthroughsimulatedfeaturelessprofiles a aswellassimulatedGRBtimeprofiles.Weshowcasethealgorithm’spotentialbycomparingwiththepopularalgorithmbyLiand J Fenimore,thatisfrequentlyadoptedintheliterature.Thankstoitshighflexibility,themaskofexcesspatternsusedbyMEPSAcan 6 betailoredandoptimisedtothekindofdatatobeanalysedwithoutmodifyingthecode.TheCcodeismadepubliclyavailable. ] MKeywords: gammarays:bursts–methods:statistical–Designandanalysisofalgorithms:Patternmatching I . h 1. Introduction To tackle this, one must develop an effective technique p - to detect as many peaks as possible in the observed light o Transient phenomena are manifestation of various classes curves to a reliable degree, properly accounting for the sta- trofastrophysicalsources.Theirstudycontributestocharacterise tistical uncertainties affecting the time series. Several au- sthe dynamics of such objects and gain clues on the physical thors in the GRB literature faced this problem building on a [processes responsible for their behaviour and evolution with different techniques: LiandFenimore 1996 (hereafter, LF) time. In the current time domain era, most if not all tran- set up a simple but effective algorithm based on the Pois- 1 sientsourcesarediscoveredandsignaltheirtransientcharacter son statistics affecting photon counting detectors. Other au- v through the emission of one or multiple peaks in their flux 7 thors made extensive use of LF algorithm (Bhatetal., 2012; 1time profile as a manifestation of enhanced activity. In high– DragoandPagliara,2007;NakarandPiran,2002)tostudythe 1energyastrophysics,forinstance,thisisthecaseforblack–hole peak intensity as well as the distribution of waiting times, 1candidates in binary systems (e.g., RemillardandMcClintock defined as the time intervals between adjacent pulses. In an 0 2006), that of outbursting magnetars (e.g., Mereghetti 2008), alternativeapproach,Quilliganetal.(2002)setupadedicated . 1orthatofsuperfastX–raytransients(e.g.,Romanoetal.2014; filterbasedonwaveletstosuppressthestatisticalnoiseinGRB 0Sgueraetal.2005). timeprofilesandstudythebasicstatisticalpropertiesofpulses 5 Moving to most energetic and disruptive events on stellar 1 as a consequence. Morerecently, Charisietal. (2014) studied scales, gamma–ray bursts (GRBs) are no exception. Their : the statistics of waiting times between emission episodes and vgamma–ray time profiles, lasting from a fraction of a second precursoractivityinGRBsobservedbyseveralpastandpresent i Xall the way up to thousands seconds (Horva´thetal., 2010; experimentsbyadaptinganalgorithmoriginallydevelopedfor Kouveliotouetal., 1993; Levanetal., 2014), are characterised r gravitationaldataanalysis. Thisisbasedonacombinedtime– aby a variable number of pulses with no firm evidence for frequencydecompositionofthetimeseriesvariance. periodicity. Such pulses often cluster within some emission Motivatedby thepeakdetectionproblemin GRB time se- episodes,separatedbyso–calledquiescenttimes,duringwhich ries,inthispaperweproposeanewalgorithm,called“multiple the gamma–ray activity drops to the detector’s background excesspeaksearchalgorithm”(hereafter,MEPSA),thatisde- level. The so–called GRB “prompt”, i.e. the gamma–ray scribed in Sect. 2. As it will be shown in Sect. 3, compared emission itself, is still the least understoodaspectof the GRB with the previousLF algorithm and with a more conservative phenomenon: e.g., what is (or are) the gamma–ray emission versionofthesamedevisedbyus,MEPSAischaracterisedby mechanism(s), at what distance from the collapsing object, a lowerrate of false positiveand a higherrate of true positive which kind of environment(see KumarandZhang 2014 for a events,particularlyforlowsignal–to–noiseratio(SNR)peaks. recentreview). Oneoftheopenissuesconcernsthestochastic Moreover,thanks to its high flexibility, MEPSA can easily be (ordeterministic?) processwhichrulesthetimeprofileandits adapted and tailored to different time series that are routinely complicatedmulti–pulsestructure(Grecoetal.,2011). obtained in many fields other than GRBs, without having to modifythecodeitself. AppendixAdescribestheoutputinfor- mationprovidedbythecode,whichismadepubliclyavailable Emailaddress:[email protected](CristianoGuidorzi) PreprintsubmittedtoAstronomyandComputing January7,2015 1. the one with the (statistically most significant) highest value. Thistimescaleisthereforedefinedas“detectiontimescale”and hereafterisdenotedwith∆t . Thisautomaticallyidentifiesthe 2. Algorithm’sdescription det timescaleatwhichthepeakisdetectedatbest;assuch,thiscan MEPSAsearchestheinputlightcurvesforpeaksbyapply- betakenasareasonableproxyforthepeaktimescaleitself. ing simultaneously a multi–pattern set of excesses, i.e. a set Thesetofvalueschosenforthepatternthresholdshadbeen of N = 39 patterns. The inputtime series must have uniform determinedafter analysinghundredsof GRB profilesdetected samplingandmustbeaffectedbystatisticaluncorrelatedGaus- with CGRO/BATSE(Paciesasetal.,1999), BeppoSAX/GRBM sian noise. The series must be either background–subtracted, (Fronteraetal., 2009), andSwift/BAT (Sakamotoetal., 2011). orremovedofpossibletrends,sothatchangesintheexpected Assuch,theyweretailoredtotheGRBfeaturesthemselvesso valueshouldonlybeduetosignalandnottobackground. For as to ensure an acceptable trade-off between the rate of true each bin of the light curve, let r the rates in the i-th bin. A negative and that of false positive detections. In this respect, i given pattern P (k = 1,...,N) consists of a fixed number of thealgorithminitscurrentversion(October2014)hasproved k adjacent bins around the given i-th bin: around i there are a tobeconservative,asisshowninSect.3. givenn leftwardbins(whichtemporallyprecedethei-thbin) k,l 2.1. Li–Fenimorealgorithm andn rightwardbins(whichtemporallyfollowthe i-thbin). k,r The pattern assigns each of its bins, except for the i-th bin, Forcomparisonpurposes,weconsideredthepeaksearchal- a threshold v (j = 1,...,n + n ) in terms of number of gorithmproposedbyLiandFenimore(1996)(hereafterLFA), k,j k,l k,r σ’s(whereσisthestatisticalnoisecorrespondingtothatbin). that has become popular in the GRB literature (Bhatetal., PatternP isthensaidtobefulfilledbybiniwhenthefollowing 2012; DragoandPagliara, 2007; NakarandPiran, 2002) as k conditionsaresimultaneouslyfulfilled: wellasinotherfields(Fengetal.,1999;LiuandLi,2004). r r v σ (j=i n ,...,i 1) (classical)LFA. AccordingtoLFAprescription,alocalmaxi- ( rii−−rjj ≥≥vkk,,((jj−−ii++nnkk,,ll+)1σ)′ij′ij (j=i+−1,k,.l..,i+−nk,r) (1) mvaullmeyastai–ttjh–bthinainsdpkro–mthobtiendstoarpeefaokucnadn,dsiodtahteatw(hrentrwo)neanrbσy i j,k i − ≥ whereσ = (σ2 +σ2)1/2. Eachpatternhasdifferentnumbers (n = 5), with no other higher peak lying between the same ′ij i j of leftward and rightward bins as well as different threshold two valleys. Hereafter, LFA in its original formulation will valuesforeachofthem. Foreachbinithesearchisperformed bereferredtoas“classicalLFA”orsimplyLFA todistinguish by applyingsimultaneouslya set of 39 differentpatterns(k = it from a slightly modified version of the same algorithm we 1,...,39)andthei-binispromotedtopeakcandidateifatleast conceived. onepatternisfulfilled.Thecompletesetofthresholdvaluesv k,j Conservative LFA (cLFA). The so–called conservative LFA currentlyadoptedisreportedinTable1. Operatively,threshold (hereafter, cLFA) works in the same way as LFA, except for valuesarenothard–coded,butarestoredwithinanexternalfile: the conditionis slightly butimportantlydifferent: (r r ) thisallowsuserstodisableexistingpatterns/enablenewonesin i j,k n(σ2 + σ2 )1/2 (n = 5). In principle, this accounts−for th≥e averyflexibleway. i j,k variance of excesses in a statistically more correct way. For When the entire light curve has been screened, the whole manyrealistic casesofastrophysicalinterest, time profilesare procedure is repeated to rebinned versions of the same curve, nearly homoscedastic (i.e., all σ’s are comparable with each each time increasingthe rebinningfactorby oneupto a max- other),socLFAessentiallyimposesthresholdsashighas √2 imumvalue F establishedbytheuser. Moreover,forare- reb,m ∼ times those of LFA. Its expected robustness (greater than that binningfactorofF (integer)oftheoriginallightcurve,there reb of LFA against the misclassification of statistical fluctuations are F possibleoffsets: in fact, onecanchoose F different reb reb asfalsepositives),explainsthe“conservative”labelling. starting bins and end up with as many different rebinnedpro- files. ForforagivenF rebinningfactor,allthecorresponding reb rebinnedcurvesaresearchedthroughthesamealgorithm. The 3. Testsandvalidation goal behind this is to identify peaks with very different SNR and/orverydifferenttimescales. Thecomputationaltimescales WetestedMEPSAbymeansofsimulatedtimeprofilesthat asF3 ,sounreasonablyhighvaluesshouldbeavoided. wereusedtoevaluatethefollowingcharacterisingfeatures: reb,m Clearly, most of the peaks are expected to be detected in 1. falsepositive(FP)rate; multiplesearches. So afinalcrosscheckisperformedtomake 2. true positive (TP) or, equivalently, true negative (TN) sure the same peak candidate is not going to be classified re- rate. peatedlyasanumberofdistinctpeakcandidates. Thisisdone bycomparingthepeaktimestogetherwiththetimescalesasso- Followingstandardnamingconventions,FPsdenotestatistical ciated with eachpeaktime. Finally, whenthe same peakcan- fluctuationswhich do notcorrespondto anyreal peakand are didate is detectedatdifferenttimescales, the algorithmselects misclassified by MEPSA as genuine peak candidates. The highertheFPrate,thelesspurethesampleofpeakcandidates. TPs are instead real peaks which are correctly identified as 1http://www.fe.infn.it/u/guidorzi/new_guidorzi_files/code.htmlsuch; theycomplementthe numberof TNs, which are instead 2 Table1:Matrixofexcessthresholds.EachlinereferstoasinglepatternPk,identifiedbyk.nk,landnk,rarethenumbersofleftwardandrightwardbins,respectively, asinEq.(1).Thresholdvaluesvk,j(j=1,...,nk,l+nk,r)aregiveninthenumberedcolumns. k n n v v v v v v v v v v k,l k,r k,1 k,2 k,3 k,4 k,5 k,6 k,7 k,8 k,9 k,10 1 1 1 5.0 5.0 - - - - - - - - 2 1 2 5.0 1.0 5.0 - - - - - - - 3 1 3 5.0 4.8 2.0 3.5 - - - - - - 4 1 3 5.0 2.0 2.2 5.0 - - - - - - 5 2 1 5.0 1.0 5.0 - - - - - - - 6 2 2 5.0 2.0 2.0 5.0 - - - - - - 7 2 3 4.5 3.0 2.0 3.5 5.0 - - - - - 8 2 3 5.0 3.0 4.5 3.0 5.0 - - - - - 9 3 1 5.0 2.0 2.2 5.0 - - - - - - 10 3 1 5.0 4.5 1.5 5.0 - - - - - - 11 3 2 5.0 3.0 4.5 3.0 5.0 - - - - - 12 3 3 3.0 2.8 1.7 2.0 4.0 5.0 - - - - 13 3 3 5.0 4.5 2.0 2.0 4.0 2.5 - - - - 14 3 4 5.0 4.5 -2.0 0.2 2.0 2.2 2.0 - - - 15 4 1 2.0 3.3 2.6 2.4 5.0 - - - - - 16 4 2 5.0 1.7 2.2 2.5 2.4 5.0 - - - - 17 4 3 5.0 4.6 0.8 0.4 1.4 1.3 5.0 - - - 18 4 3 5.0 3.5 2.0 3.0 3.0 3.5 4.5 - - - 19 4 3 3.4 1.8 1.2 3.4 1.2 3.8 5.0 - - - 20 4 3 5.0 2.1 2.2 3.0 1.8 3.4 5.0 - - - 21 4 5 4.9 3.5 2.0 1.8 1.8 2.9 3.3 3.2 3.2 - 22 5 2 3.4 2.8 2.0 3.4 3.5 3.4 5.0 - - - 23 5 3 3.0 2.8 3.5 0.2 1.0 1.9 4.3 5.0 - - 24 5 3 1.7 2.4 1.4 2.0 1.0 2.2 4.0 5.0 - - 25 5 4 3.4 3.8 4.0 3.0 1.5 0.3 1.2 2.7 4.0 - 26 5 4 2.2 3.9 2.2 3.4 0.7 3.1 2.2 1.6 1.7 - 27 5 4 1.5 2.6 2.4 2.5 1.0 1.5 2.5 2.5 4.8 - 28 5 4 4.5 1.4 4.0 1.9 1.1 1.9 2.8 3.8 3.0 - 29 5 5 0.5 -1.8 -0.1 2.7 3.8 4.0 2.5 1.5 3.8 4.0 30 5 5 3.5 4.0 2.5 2.0 1.0 0.7 2.0 2.1 3.1 2.8 31 5 5 5.0 5.0 5.0 4.0 2.3 0.5 1.4 3.0 3.0 2.7 32 5 5 2.3 3.6 2.6 0.9 1.8 2.1 2.9 4.1 3.6 2.7 33 5 5 3.0 4.0 2.5 2.8 0.7 2.4 3.3 4.0 4.5 3.0 34 5 5 3.1 2.8 3.4 1.2 1.4 2.8 2.0 3.6 3.2 3.2 35 5 5 3.4 3.6 3.0 1.6 0.6 2.3 2.0 0.8 3.2 3.2 36 2 2 3.0 4.0 4.0 3.0 - - - - - - 37 3 3 2.5 3.5 3.5 3.5 3.5 2.5 - - - - 38 3 3 3.0 4.0 0.0 0.0 4.0 3.0 - - - - 39 4 4 3.0 4.0 0.0 0.0 0.0 0.0 4.0 3.0 - - 3 0.25 Table 2: Number of FPs detected by each algorithm out of two groups of MEPSA simulatedcurveswithoutpeaks.Thecorrespondingfractions(outof1.5 106 0.2 LFA × cLFA scannedbins)areamongbrackets. 0.15 2G1 3260((M21..E43P×S11A00−−55)) 116027((17c..L11F×A1100−−45)) 75028653((43L..F75A×1100−−33)) -1-1s det] 0 .00.51 × × × c 0 e [ 0.2 at-0.05 R real peaks being missed. The higher the TP rate, the more 0.1 -0.1 completethepeakcandidatesample. The ideal algorithm has a null FP rate and 100% TP rate. -0.15 0 148 150 152 In practice, the two criteria compete with each other, so that -0.2 only a compromise is feasible. The best trade–off is to be -200 0 200 400 600 tailoredtothegoalofagivenexperiment,dependingonwhich, Time [s] betweenpurityandcompletenessofthesample,ismorecrucial. Figure 1: Example of simulated featureless profile affected by uncorrelated Concerningthe problemofpeakidentification, unlessonehas Gaussiannoise.MEPSA,LFA,cLFAfound1,65,3FPspeaks,respectively.A specific requirements, in most cases purity is more important. close-inoftheonlyMEPSAFPisshownintheinset.Thistimeprofilemimics On the other hand the capability to identify dim peakscan be thebackgroundofatypicalmask-weightedcurvebySwift/BAT. worth pursuing e.g., when one aims at studying waiting time distributions, providedthatit costs a relativelylow numberof FPs. 14 TP FP 3.1. FalsePositiverate 12 Wesimulatedanumberofconstanttimeprofilesaffectedby 10 Gaussian uncorrelatednoise and applied the three algorithms. y c n We assumed a fixed bin time of 64 ms, which is the typical e 8 u q resolutionof GRB experimentssuch as BATSE, and is signif- re 6 F icantly shorter than 0.6–1 s, that is the characteristic variabil- 4 ity timescale range observed in GRB profiles (Marguttietal., 2011). Wegeneratedtwodifferentgroupsofcurves: 2 group1 N = 300timeprofileswithN = 5000binseach. We 0 b 5 10 15 20 25 30 35 generated Poisson distributed counts with an expected # Pattern rate r = 1000 counts/bin. Such high-counting regime e ensures that rates are normally distributed according to Figure2:MEPSAFP(dashed)andTP(solid)spectra,i.e.,numberofeventsas a N(re, √re). These values are representative of back- afunctionofthetriggeringpattern.TheTPspectrumhasbeendownsizedsoas groundcountsfor scintillatorssuch as those which flew tohavethesameareaastheFPone. aboardCGROorBeppoSAXoperatingfromafewdozens uptoseveralhundredskeV. worseFPrate,about4–5timesashigh,correspondingto3.9– group2 N = 100 time profiles with N = 15000 bins each. 4.0σ(Gaussian),whereasLFAisbyfartheworst,witharateof b 3–5 10 3FP/bin,correspondingto2.8–2.9σ(Gaussian).The We generated Gaussian distributed rates according to − × N(0,σ), where σ were taken from a typical mask- ratiobetweenLFAandcLFAsignificanceGaussianthresholds i i weighted light curve of Swift/BAT in the 15–150 keV isindeedcompatiblewith √2,asexpected(Sect.2.1). band.2 In the case of MEPSA, we also studied the FP rate as a functionof the triggeringpatternthroughwhatcan be consid- Eachgrouptotalsto1.5 106binspurelyaffectedbystatistical ered as a “FP spectrum”. The dashed histogram in Figure 2 × noise with no real structure. Table 2 reports the number of shows how the 36 FPs detected in group 2 profiles distribute FPsforeachgroupandforeachalgorithm. Asanillustrative amongthe 39 patterns. Clearly, pattern 26 has the highestFP exampleFigure3.1showsagroup–2lightcurvetogetherwith rateandthusappearstobe theweakestpatterninthisrespect. the FPs identified by each algorithm. MEPSA has the lowest The FP spectrum is compared with the TP one obtained from FPrate, 1–2 10−5FP/bin,whichisequivalenttoa4.2–4.4σ thesimulatedpeaksdescribedinSect.3.2. ∼ × (Gaussian) significance threshold. cLFA has a significantly 3.2. TruePositiverate StartingfromthesametemplateofSwift/BATmask–weighted 2We took the time profile of GRB100814A (Krimmetal., 2010), which background time profile used to build group 2, we simulated consistsofacomplexsuperpositionofanumberofpulseswithdifferentdura- tions. 150curvespopulatedwithfast–riseexponentialdecay(FRED) 4 1 0.1 0.9 0.9 1 0.4 0.3 0.7 00..6 8 0.8 0.8 0.7 0.9 1 1.0 0.3 0.9 1 M) 0.3 0.9 Dlog(t/FWHmin −3−2−10 0.2 0.1 0.2 0 0..1 0.2 3 0.3 M 0.1 EP 0.3 S 0 0..14 0.A 0.1 1 0.3 0 .40 .2 0.2 0.1 0.2 0.3 0.2 0 .02.4 0.1 0.3 0.4 0.2 0.2 0.4 0.2 0 .20. 0.1 2 0.4 0 .0.43 0.2 0.6 0 0 .0..83 50 .6 0.2 0 0.00..379. 4 0.2 0.1 0.2 L 0.1 0.3 0F.1 1.0 0.1 0A.2 0.1 0.2 0.02. 1 0.3 0.2 0.4 0.2 0.1 0 .3 0.2 0.4 0.3 0.3 0.1 0.3 0.2 0 .4 0.2 0 .02 0..3 7 0.7 0.6 0.1 00 00....4432 0.5 0. 0.9 5 0.5 0 .01. 10 00 ...0.4382 cL 0.1 FA 0.2 0.1 0.1 0.3 0.1 0.2 0.3 0.1 0.4 0.2 0.3 0.2 0.1 0.4 0 .2 0.3 0.4 0.1 0.3 00.5 . 0 0 ..042.0 53 . 6 0.8 0.9 0.6 0 .7 0 .0 8.0 7. 4 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 log(SNR) Figure3: Peakdetection efficiency intheSNR–separability planeforMEPSA(left), LFA(mid), andcLFA(right). DifferentContourlevels (fromcoldtohot colours)correspondtotendifferent, equallyspacedefficiencylevelsfrom0%to100%withincreasingorder. TheMEPSA90%contourlevelisapproximately describedwithtwoconnectedpower-laws(dashedline)andisshowninallpanelsforcomparison.Dotsandbackgroundcoloursrepresentthebivariatedistribution ofthedetectedpeaks. pulsesassumingthemodelbyNorrisetal.(1996).Weassumed statistical uncertainty,and the ratio betweenthe time intervals a fixed value for the peakedness given by the average value mentionedabove. foundin real GRBs, ν = 1.5(Norrisetal., 1996). Thiscorre- Differentpulseswithinthesamesimulatedcurveweregen- spondstoapulseshapewhichliesbetweenapureexponential erated assuming an exponential distribution for the waiting andaGaussianprofile. Wealsoassumedfixedriseanddecays times,i.e. thecaseofamemorylessprocesswithaconstantex- times, σ = 1 and σ = 3 s, respectively, in agreement with pectedPoissonrateofpulsesperunittime.Weassumedarange r d what is typically observedin GRB time profiles (Norrisetal., fortheexpectedratesofpulsesfrom1/40upto1/20pulsess 1. − 1996). In this mathematical formulation, the corresponding Peak intensities were assumed so as to cover the SNR range full width at half maximum (hereafter, FWHM) amounts to from 2 all the way up to 100. The nature of the waiting time (ln2)1/ν(σ +σ ). distribution (an exponentialin this case) is not relevantto our r d As will be shown in the following, the relevant parameter goal;rather,theaimiscoveringasmuchaspossibletheSNR– for the peak detection of a given pulse is the ratio between separabilityplaneandmonitoringthealgorithms’efficiencyas its lowest adjacent waiting time ∆t and its FWHM, rather afunctionofbothvariates. min than the absolute value of the FWHM itself. For a given i–th Overall,89540pulsesweregeneratedspanningtheranges pulsepeakingatt ,thelowestadjacentwaitingtimeisdefined 0.5.log(SNR).2.0and 3.logs.2. p,i − as ∆t = min(t t ,t t ). This explains our min,i p,i p,i 1 p,i+1 p,i choice of adopting a−fixed−FWHM−for the simulated pulses, 3.2.1. Efficiency since we varied the waiting time distribution. The higher the WesplittheSNR–splanein30differentboxesandforeach ratio,themoreeasilythepulsecanberecognisedasaseparate ofthemwe calculatedthefractionofidentifiedpeaksoverthe entity from its surrounding siblings. We therefore define the totalnumberofpulses. Left–handpanelinFigure3showsthe separabilitys ofagivenpulsei,as MEPSA efficiency. The90%contourlevelcan approximately i bedescribedwithadoublepiece-wisepower–law(dashedline ∆t s = min,i . (2) inFig.3),whoseequationis i FWHM i 8.28 log(SNR)+8.42 (log(SNR)<0.95) The more two adjacent pulses overlap, the lower their indi- logs0.9(SNR)=( −0.63 log(SNR)+1.15 (log(SNR) 0.95) . − ≥ vidual separabilities, and correspondingly harder for a given (3) algorithm is to identify them as two separate entities. The Whenever the condition s(SNR) s (SNR) is fulfilled, 0.9 ≥ peakedness, which determines the shape of the pulse profile, MEPSA efficiency is 90% at least. Worth noting are the fol- isexpectedtohaveaminorweightasfarasthepeakdetection lowingproperties: isconcerned,providedthatextremeandnonphysicalvaluesare efficiency is very high in the top right region of well notconsidered. Thisjustifiesourchoiceofassumingatypical, • separated,highcontrastpulses,asoneexpects; fixedvalueforit,andourchoiceofexploringmoreindetailthe effectsofothermorecrucialparameters,suchastheSNRofthe whenlogs < 0.4,i.e. s< 0.4,pulsescanhardly(<10– • − pulse, i.e. the ratio between the total area (or fluence) and its 20%) be identified as separate pulses, regardless of the 5 SNR 3 10 100 1 Estimated Peak vs. True Peak 1] 100 -t e d -1s 2 c [ s e al t u ak Ra 0.1 SNRResid 0 Pe m. d 10 or e N at 2 m − ti s E 3 4 0.01 − 0.01 0.1 1 SNR<10 10<SNR<20 20<SNR<50 SNR>50 -1 -1 True Peak Rate [c s det ] Figure4:Leftpanel: MEPSAestimatedvs. truepeakratesforweaklyseparated(1<s<10,grey)andwellseparatedpulses(s>10,red). Thesolidlineshows equality.Rightpanel:correspondingviolinplotofnormalisedresidualsofestimatedvs.truevaluesfordifferentclassesofSNRofwellseparatedpulses. SNR; grey), and the well separated ones (s > 10, red). The mildly separated peak rates tend to scatter more significantly above at a given separability, efficiency slightly improves for equalitythanwellseparatedonesdo: thereasonbehindthisis • higherSNR; that more overlapping pulses are more likely to be identified as a single peak, whose estimated rate is therefore the sum when SNR drops below 4–5, efficiency drops as well • of different pulses’ contributions. While the scatter around almostregardlessofseparability; equality seems to enhance in the low–SNR (or low–rate) end at fixed SNR>4–5, efficiency drops from 90% to 50% ofthe distribution,the correspondinguncertaintiesincreaseas • when separability decreases from s (SNR) by a factor well. TobetterunderstandwhethertheaccuracyofaMEPSA– 0.9 of 10 0.2 0.6. estimated peak rate depends on SNR or, equivalently, on the − ∼ ≈ value of the rate itself, we show in the right–hand panel of Likewise, we studied of efficiencyin the SNR–s plane for Fig.4theviolinplotofthenormalisedscatter,i.e.thedifference theothertwoalgorithms.Midandright–handpanelsinFigure3 betweentrueandestimatedvaluesnormalisedbytheestimated show the results for LFA and cLFA, respectively: the dashed uncertainty, as a function of peak rate. We considered the line in both plots is the same as in right–hand panel and de- sample of well separated pulses (s > 10) only. There is no scribedbyEq.(3)forcomparison.Atgivenseparabilityvalues, significant trend with peak rate: the variance of distribution LFA has comparable efficiency values as long as SNR& 15. remains essentially unchanged throughout the spanned range, However, at SNR < 15 LFA efficiency becomes remarkably and so does the mean value. In all cases the null value is worsethanMEPSA(e.g.,60%vs. 90%,atSNR 8ands>3). within1σofthedistribution. Allmeanvaluesareabovezero, ∼ ThisprovesthatMEPSAhasahigherTP(lowerTN)ratethan suggestiveofa smallbiasthattendstooverestimatesthepeak LFAinlow-intermediateSNRrange,i.e. 4–5<SNR<15. rate, even though it is still within uncertainties. A possible Compared with MEPSA and LFA, the TP rate of cLFA is explanationforthatis thatMEPSA searchesthe peakthrough thelowest(highestTNrate),sincethe>90%efficiencyregion all the possible binnings and shifts, with the result of a slight shrinksalongbothSNRands,asshownintheright–handpanel biastowardspositivestatisticalfluctuations. of Fig. 3. This is no wonder, because it is the downside of a Figure5showstheanalogousviolinplotsforthenormalised relativelylowFPrate,typicalofamoreselectivealgorithm. residualsforLFAandcFLA,respectively,fortheidentifiedwell separated pulses. In both cases, the accuracy is remarkably 3.2.2. Accuracyofpeakrateestimates worse at low SNR values, where peak rate estimates become We investigatedthe accuracyofdifferentalgorithmsin es- crucially biased by statistical fluctuationsthat overestimate as timating peak intensities or rates, something that can be done much as up to 3σ. It is worth noting that in this respect only for the TP peaks. Left–hand panel of Figure 4 shows ∼ cLFAisevenworseonaverage. MEPSA–estimatedvs. truepeakratesfortwodifferentclasses of separability s: the mildly separated pulses (1 < s < 10, 6 all 00 log(SNR)<1 0 5 1.0<log(SNR)<1.5 log(SNR)>1.5 0 0 0 4 y nc00 e0 u3 q e r F0 0 0 2 Estimated Peak vs. True Peak 0 0 0 1 3 duals2 0 −1.5 −1.0 −0.5 0.0 0.5 Resi log(D tdet/FWHM) m. 1 or N Figure6: DistributionoftheratiobetweenMEPSAdetectiontimescale∆tdet, 0 used as a proxy for the pulse FWHM, and the FWHMitself. This is done separatelyfordifferentSNRclasses. 1 − SNR<10 10<SNR<20 20<SNR<50 SNR>50 3.2.3. MEPSAproxyforthepeakFWHM BothMEPSAandLFAalgorithmsdonotassumeanyspe- Estimated Peak vs. True Peak cific shape for the peaks. As non–parametric methods, they havethe benefitofbeingapplicableto a broadvarietyof time 5 series characterised by peaks. The downside is that peak 3. FWHM must be estimated from the data themselves and not 0 3. throughfittingparametersthatarelinkedtospecificpeakmod- uals2.5 els. LFA algorithmsprovidenodirectinformationaboutpeak d esi0 width,theonlybareproxybeingthetimesofthetwoadjacent R2. m. valleyssurroundingagivenpeak. Nor1.5 Analogously, a detection timescale ∆tdet is associated to 0 eachMEPSApeak(Sect.2). AsinthecaseofLFAvalleys,the 1. detection timescale is affected by the peak SNR: at low SNR 5 0. valuesittendstobinupthelightcurveasmuchaspossibleto reachtherequiredstatisticalexcesstotriggerMEPSA.Onthe SNR<10 10<SNR<20 20<SNR<50 SNR>50 otherside,athighSNRthepeakalreadytriggerssomeMEPSA patternsatlowtimescalesandatlongertimescale,althoughthe SNRincreases,theaveragepeakrateestimatelikelydecreases. Figure 5: Violin plots of the normalised residuals for both versions of LF algorithms: LFA(top)andcLFA(bottom), analogouslytoFig.4. BothLFA All this turns into favouring the short timescales at high SNR and cLFA provide significantly morebiased estimates ofthe true peak rates and the other way around at low SNR. This is indeed what is thanouralgorithmdoes. observedwhenonestudiesthedistributionoftheratiobetween detectiontimescaleandFWHMforthreedifferentSNRranges, asshownbyFig.6. Lookingatthemeanvaluesandscattersof eachdistribution,weconcludethat,onaverage, atlog(SNR)>1.5,itisFWHM 10∆ , det • ≈ at1.0<log(SNR)<1.5,itisFWHM 5 6∆ , det • ≈ − atlog(SNR)<1.0,itisFWHM 2.5∆ , det • ≈ withafactorof 2–3uncertainty. ≈ 7 wellasahighertruepositiveone,especiallyatlowSNR( 4– 35 ∼ 5). WeshowedthatMEPSAefficiencymostcruciallydepends SNR = a SNR + b p,est 30 onthecombinationofseparability,definedastheratiobetween a = 0.287 b = 3.88 the lowest temporal separation from adjacent peaks and the 25 FWHM of a given peak, and SNR. At SNR<4–5, efficiency est 20 significantly drops. This is also the case whenadjacentpeaks p, R overlap non–negligibly,i.e. when separability drops below 1, N 15 S with little but significant dependence on SNR, that we mod- 10 elledwith a doublepower–lawin theseparability–SNRplane. MEPSA also yields some proxies to characterise FWHM and 5 SNRofpulsesandwedescribedaquickwaytodothat. 0 Although the motivation originally sprang up from GRB 0 10 20 30 40 50 60 70 80 90 timeprofiles,itsapplicabilityextendstoothersimilarfieldsof SNR high–energy astrophysics as well as solar X–ray flares, and, more in general, whenever the applicability requirements on Figure 7: Relation between estimated and true SNR for a sample of well the input time series are fulfilled. The search algorithm is separated(s>10)peaksthathavebeenidentifiedbyMEPSA. decoupledfromthemaskofmultipleexcessesbeingsearched. This property makes it particularlyflexible, so that users pos- 3.2.4. MEPSAproxyforthepeakSNR siblyinterestedineventsotherthanGRBscanquicklymodify SimilarlytopeakFWHMdiscussedinSect.3.2.3,MEPSA and optimise the mask of multiple excesses, disable existing doesnotprovideadirectestimateofthepeakSNR,tocalculate patterns and/or enable new ones, once these have been tested which one should integrate over the entire pulse profile or, through simulations or independent data sets. The highly– equivalently,overitsFWHMandcorrectitbyagivenfactor.As portableC codeis madepubliclyavailable so as to encourage itwasshowninSect.3.2.3,MEPSAcanonlyroughlyestimate abroadoptimisationthroughotherkindsofastrophysicaltime the FWHM, andso doesitforthe SNR, too. A SNRproxyis seriesofinterest. yielded by the estimated SNR, SNR , which is calculated by est MEPSAoverthedetectiontime∆t . det Acknowledgements We studied the relation between the estimated and true SNR’s for the sample of well separated simulated peaks and The author acknowledgessupportby PRIN MIUR project obtained that, regardlessof the scatter, the relation can be ap- on “Gamma Ray Bursts: from progenitors to physics of the proximatelylinearlydescribedas, promptemissionprocess”,P.I.F.Frontera(Prot.2009ERC3HT). The author is also grateful to Adriano Baldeschi for useful SNR 0.29SNR+3.9, (4) est ≈ discussionsabouttheMEPSAcharacterisation. asdisplayedbyFigure7. Practically,startingfromthevaluesobtainedbyMEPSAfor AppendixA. InformationprovidedbyMEPSA the SNR and ∆t of a givenpeak, onecould use Eq. (4) to est det estimatethetrueSNR,anduseittoroughlyestimateitsFWHM ThissectionpresentstheinformationprovidedbyMEPSA using the approximate relations of Sect. 3.2.3. This, in turn, for each peak candidate. Table A.3 shows an example. Field allowsoneto placethe peakin theseparability–FWHMplane namesandheaderlinearethesameaswhatisprintedtostan- showninleft–handpanelofFig.3andestimatetheefficiency– dardoutputbyMEPSA. correctedrateofsuchpeaksinthetimeseries. 1. Peak:ordinalnumberofthepeakcandidate; 2. RebF:rebinningfactorchosenbyMEPSAoftheoriginal 4. DiscussionandConclusions timeseriesforthepeakcandidate; 3. BinPhase: binningphase(from0toRebF 1)chosenby We presented MEPSA, multiple excess peak search algo- − MEPSA; rithm, whichsearchesforpeaksbyapplyinga maskofmulti– 4. PeakT:peaktime; excesspatternstoanevenlyspaced,background–subtracted(or 5. BinT: detection timescale, ∆t , which is given by the detrended)time series, which is thoughtto be affected by sta- det originaltime resolution of the time series multiplied by tisticaluncorrelatedGaussiannoise. Thisisoftenthecasealso RebF; for photon counting detectors operating in the high counting 6. PeakR:peakrateestimate(sameunitsastheoriginaltime regime. We compareditsperformanceagainstthepopularand series); widelyadoptedLFAaswellaswitha slightlymoreconserva- 7. EPeakR:erroronPeakR; tiveversionofthesame,undertwocomplementaryaspects:the falsepositiveandthetruepositiverates. IneithercaseMEPSA 8. SNR:SNRest; ismorereliable,showingalowerFPrate(.2 10 5bin 1)as 9. Criterium: triggeredpattern(Sect.2); − − × 8 TableA.3:InformationprovidedbyMEPSA.Eachlinereferstoeachpeakcandidate. Peak RebF BinPhase PeakT BinT PeakR EPeakR SNR Criterium Nadiac 1 35 34 -12.408 2.240 0.04405 0.00612 7.20 25 9 2 11 6 1.800 0.704 0.07064 0.01129 6.25 30 10 10. Nadiac:numberofadjacentbinsinvolvedinthetriggered C., Koshut, T. M., Lestrade, J. P., McCollough, M. L., Brainerd, J. J., patternidentifiedbyCriterium. Hakkila, J., Henze, W., Preece, R. D., Connaughton, V., Kippen, R. M., Mallozzi,R.S.,Fishman,G.J.,Richardson,G.A.,Sahi,M.,Jun.1999.The Bhat,P.N.,Briggs,M.S.,Connaughton, V.,Kouveliotou, C.,vanderHorst, FourthBATSEGamma-RayBurstCatalog(Revised).ApJS122,465–495. A.J.,Paciesas,W.,Meegan,C.A.,Bissaldi,E.,Burgess,M.,Chaplin,V., Quilligan, F.,McBreen, B.,Hanlon, L.,McBreen, S.,Hurley, K.J.,Watson, Diehl,R.,Fishman,G.,Fitzpatrick,G.,Foley,S.,Gibby,M.,Giles,M.M., D.,Apr.2002.Temporalpropertiesofgammarayburstsassignaturesofjets Goldstein,A.,Greiner,J.,Gruber,D.,Guiriec,S.,vonKienlin,A.,Kippen, fromthecentralengine.A&A385,377–398. M.,McBreen,S.,Preece,R.,Rau,A.,Tierney,D.,Wilson-Hodge,C.,Jan. Remillard,R.A.,McClintock,J.E.,Sep.2006.X-RayPropertiesofBlack-Hole 2012.TemporalDeconvolutionStudyofLongandShortGamma-RayBurst Binaries.ARAA44,49–92. LightCurves.ApJ744,141. Romano,P.,Krimm,H.A.,Palmer,D.M.,Ducci,L.,Esposito,P.,Vercellone, Charisi, M., Ma´rka, S., Bartos, I., Sep. 2014. Catalog of Isolated Emission S.,Evans,P.A.,Guidorzi,C.,Mangano,V.,Kennea,J.A.,Barthelmy,S.D., Episodes inGamma-ray Bursts From Fermi, Swiftand BATSE.in press, Burrows, D. N., Gehrels, N., Feb. 2014. The100-month Swift catalogue arXiv1409.2491. ofsupergiantfastX-raytransients.I.BATon-boardandtransientmonitor Drago,A.,Pagliara, G.,Aug.2007.QuiescentTimesinGamma-RayBursts: flares.A&A562,A2. HintsofaDormantInnerEngine.ApJ665,1227–1234. Sakamoto,T.,Barthelmy,S.D.,Baumgartner,W.H.,Cummings,J.R.,Feni- Feng,Y.X.,Li,T.P.,Chen,L.,Mar.1999.X-RayShotsofCygnusX-1.ApJ more,E.E.,Gehrels,N.,Krimm,H.A.,Markwardt,C.B.,Palmer,D.M., 514,373–382. Parsons,A.M.,Sato,G.,Stamatikos,M.,Tueller,J.,Ukwatta,T.N.,Zhang, Frontera, F., Guidorzi, C., Montanari, E., Rossi, F., Costa, E., Feroci, M., B.,Jul.2011.TheSecondSwiftBurstAlertTelescopeGamma-RayBurst Calura, F., Rapisarda, M., Amati, L., Carturan, D., Cinti, M. R., Fiume, Catalog.ApJS195,2. D.D.,Nicastro,L.,Orlandini,M.,Jan.2009.TheGamma-RayBurstCata- Sguera, V., Barlow, E.J.,Bird, A.J., Clark, D.J., Dean, A.J.,Hill, A.B., logObtainedwiththeGamma-RayBurstMonitorAboardBeppoSAX.ApJS Moran, L.,Shaw,S.E.,Willis, D.R.,Bazzano, A.,Ubertini, P.,Malizia, 180,192–223. A., Dec.2005.INTEGRALobservations ofrecurrent fastX-raytransient Greco,G.,Rosa,R.,Beskin,G.,Karpov,S.,Romano,L.,Guarnieri,A.,Bar- sources.A&A444,221–231. tolini,C.,Bedogni,R.,Sep.2011.EvidenceofDeterministicComponents intheApparentRandomnessofGRBs:CluesofaChaoticDynamic.Nature ScientificReports1. Horva´th, I., Bagoly, Z., Bala´zs, L. G., de Ugarte Postigo, A., Veres, P., Me´sza´ros, A., Apr. 2010. Detailed Classification of Swift ’s Gamma-ray Bursts.ApJ713,552–557. Kouveliotou, C., Meegan, C.A., Fishman, G.J.,Bhat, N.P.,Briggs, M.S., Koshut,T.M.,Paciesas,W.S.,Pendleton,G.N.,Aug.1993.Identification oftwoclassesofgamma-raybursts.ApJ413,L101–L104. Krimm, H. A., Barthelmy, S. D., Baumgartner, W. H., Beardmore, A. P., Cummings,J.R.,Fenimore,E.E.,Gehrels,N.,Markwardt,C.B.,Palmer, D. M., Parsons, A.M., Sakamoto, T., Sato, G., Stamatikos, M., Tueller, J.,Ukwatta,T.N.,2010.GRB100814A,Swift-BATrefinedanalysis.GRB CoordinatesNetwork11094,1. Kumar, P., Zhang, B., Oct. 2014. The Physics of Gamma-Ray Bursts and RelativisticJets.inpresstoPhys.Rep.,arXiv1410.0679. Levan,A.J.,Tanvir,N.R.,Starling,R.L.C.,Wiersema,K.,Page,K.L.,Perley, D.A., Schulze, S.,Wynn, G.A.,Chornock, R.,Hjorth, J.,Cenko, S.B., Fruchter,A.S.,O’Brien,P.T.,Brown,G.C.,Tunnicliffe,R.L.,Malesani, D.,Jakobsson,P.,Watson,D.,Berger,E.,Bersier,D.,Cobb,B.E.,Covino, S.,Cucchiara,A.,deUgartePostigo,A.,Fox,D.B.,Gal-Yam,A.,Goldoni, P.,Gorosabel,J.,Kaper,L.,Kru¨hler,T.,Karjalainen,R.,Osborne,J.P.,Pian, E.,Sa´nchez-Ram´ırez,R.,Schmidt,B.,Skillen,I.,Tagliaferri,G.,Tho¨ne,C., Vaduvescu, O., Wijers, R. A. M. J., Zauderer, B. A., Jan. 2014. A New PopulationofUltra-longDurationGamma-RayBursts.ApJ781,13. Li,H.,Fenimore,E.E.,Oct.1996.Log-normalDistributionsinGamma-Ray BurstTimeHistories.ApJ469,L115. Liu,C.Z.,Li,T.P.,Aug.2004.X-RaySpectralVariabilityinCygnusX-1.ApJ 611,1084–1090. Margutti,R.,Guidorzi,C.,Chincarini,G.,2011.VariabilityPropertiesofSwift- Bat Gamma-Ray Bursts. International Journal of Modern Physics D 20, 1969–1973. Mereghetti, S., Jul. 2008. The strongest cosmic magnets: soft gamma-ray repeatersandanomalousX-raypulsars.A&ARv15,225–287. Nakar, E.,Piran,T.,Mar.2002.Time-scalesinlonggamma-raybursts.MN- RAS331,40–44. Norris, J.P.,Nemiroff, R.J., Bonnell, J.T.,Scargle, J.D.,Kouveliotou, C., Paciesas, W.S.,Meegan, C.A., Fishman, G.J.,Mar. 1996.Attributes of PulsesinLongBrightGamma-RayBursts.ApJ459,393. Paciesas,W.S.,Meegan,C.A.,Pendleton,G.N.,Briggs,M.S.,Kouveliotou, 9