MNRAS000,1–4(0000) Preprint7January2016 CompiledusingMNRASLATEXstylefilev3.0 Detection of Periodicity Based on Independence Tests – II. Improved Serial Independence Measure Shay Zucker⋆ Dept.ofGeosciences,RaymondandBeverlySacklerFacultyofExactSciences,Tel-AvivUniversity,TelAviv6997801,Israel 6 7January2016 1 0 2 ABSTRACT n a We introducean improvementto a periodicitymetric we have introducedin a previous J paper.WeimproveontheHoeffding-testperiodicitymetric,usingtheBlum-Kiefer-Rosenblatt 6 (BKR) test. Besides a consistent improvement over the Hoeffding-test approach, the BKR approachturnsoutto performsuperblywhenapplied to veryshorttime series of sawtooth- ] likeshapes.TheexpectedastronomicalimplicationsaremuchmoredetectionsofRR-Lyrae M starsandCepheidsinsparsephotometricdatabases,andofeccentricKeplerianradial-velocity I (RV)curves,suchasthoseofexoplanetsinRVsurveys. . h Key words: methods:data analysis – methods:statistical – binaries:spectroscopic– stars: p individual:HIP101453–stars:variables:RRLyrae–stars:variables:Cepheids - o r t s a [ 1 INTRODUCTION 2 BLUM-KIEFER-ROSENBLATTTEST 1 Inarecentpaper(Zucker2015,;hereafterPaperI)wehaveintro- ThebestperformingmethodinPaperIwasbasedontheHoeffd- v duced anew non-parametric approach to thedetection of period- ing test. Wassily Hoeffding first proposed it in 1948 (Hoeffding 5 icities in sparse datasets. The new approach follows the logic of 1948)asatestofindependencebetweentworandomvariables.Es- 2 string-lengthtechniques(e.g.Lafler&Kinman1965;Clarke2002) sentially it estimates a measure of deviation of the joint empiri- 2 and quantifies the dependence between consecutive phase-folded caldistributionfunctionfromadistributionthatassumesindepen- 1 samples,foreverytrialperiod.InPaperIwehaveshownthatusu- dence. Themeasureofdeviation that Hoeffdingusedwastheso- 0 ally the classical string-length techniques effectively test solely called Cramér–von Mises criterion for distance between distribu- . 1 forlineardependence betweenconsecutivesamples.Ontheother tions(Cramér1928;vonMises1928). 0 hand, the Hoeffding-test approach we have presented there, tests LetusdenotebyG1andG2thecumulativedistributionfunc- 6 1 for general dependencies, not necessarily linear. PaperI showed tionsofthetworandomvariables,andbyG12 theirjointcumula- thatfortwokindsofperiodicsignals(sawtoothsignalandeccen- tivedistributionfunction.Then,independenceofthetwovariables : v tricspectroscopicbinary(SB)radial-velocity(RV)curve),ourpro- would mean G12=G1G2. Applying Cramér–von Mises criterion i posednewapproach performedbetterthantheconventional tech- fordistancebetweendistributions,Hoeffdingdefinedhisnamesake X niques. statisticDby: r a weemInbsaprikreeddobnyathweidseurccsetusdsfyultoscimomuleatuiopnwsipthrenseewnteadndinimPparpoevreId, D=Z (G12−G1G2)2dG12. (1) periodicitymetrics,similarlybasedondependencemeasures.The InestimatingDusingtheempriricaldata,weusetheempiri- currentLetterrepresentsafirststepinthatdirection,astepwehave caldistributionfunctions,determinedonlybytheobservedvalues. alreadyalludedtointheDiscussionofPaperI. ThisissomewhatreminiscentoftheKolmogorov–Smirnovphilos- Section2describesthedetailsofthemodificationwepropose ophy,whichispopularamongastronomers(e.g.Babu&Feigelson totheHoeffding test,Section3presents thesimulationswehave 2006).Theabovedefinitioneventuallyresultsintheformulaepre- performed in order to test its performance, and in Section 4 we sentedinPaperI. show a test of our new metric on real-life data. In Section 5 we Blum,Kiefer&Rosenblatt (1961) introduced a new version discussourfindings. oftheHoeffdingtest.Theyshowedthatthetwotestswereequiva- lentinlargesamples,butthenewtestwaseasiertocompute,and also more naturally amenable to generalization to more than two variables.Thefundamentaldefinitionoftheirstatisticis: B=Z (G12−G1G2)2dG1dG2. (2) ⋆ E-mail:[email protected] Whilethedifferenceseemstobeminuteandmaybeeveninsignifi- (cid:13)c 0000TheAuthors 2 ShayZucker cant,thechangeintheresultingcomputingformulaisnotnegligi- Sinusoid Almost sinusoidal 1 1 fxboilel(d.ii=nFgo.1ll,Io.n.w.,ioNnrgd)ewPrhatpeoreermtIah,keleeintsduuersxedioerunerofltceeacltctshutelhaetpioohnradsseer-afraoefltdneerodtthadefafpteahcatbesdye detection fraction0.5 detection fraction0.5 0 0 bythearbitraryzero-phasechoice,wealsodefinexN+1≡x1.Let 10 20 30 40 50 10 20 30 40 50 N N us further denote by Ri the phase-folded rank values, such that Sawtooth Pulse ’Rb(bxoiiNtv=h,axrx1Niaj+tm≤e1e)rax=aninsak(nxt’hdNac,xtix,jx1+ai)1sias≤cthctheoxeui+nns1utms.maNfbloloerertsetthotevfhacaplytuacetilhr.isecLw(eexxtrijaus,ptxseajna+rocl1sue)onodfdfoeartfinhnwdeehrpetiahcniher- detection fraction0.15 detection fraction0.15 ders the whole procedure independent of the arbitrary choice of 010 20 30 40 50 010 20 30 40 50 phase. N N Eclipsing binary Spectroscopic binary B=NN−o4wi(cid:229)=Nw1e(Ncacni−deRfiiRnei+o1u)r2d,ependencemeasurebytheformul(a3:) detection fraction0.15 detection fraction0.15 0 0 which can easily be derived from eq. (5.2) in Blumetal. (1961). 10 20 30 40 50 10 20 30 40 50 N N Theaboveexpressionisclearlymuchsimplerthantheparallelex- pressionsineqs.9–12inPaperI. Figure1.Detectionfractionasafunctionofsamplesizefortimeserieswith SNRof3.Thedetectionfractionisestimatedbasedon100simulatedlight curves. Legend: empty circles, dashed line –Generalized Lomb–Scargle 3 PERFORMANCETESTING periodogram; empty squares, dashed line – von-Neumann Ratio; empty upward-pointingtriangles,dashedline–Hoeffdingtest;filledcircles,solid Followingthesamepath asinPaperI, weperformed simulations line–Blum-Kiefer-Rosenblatttest. inwhichwerandomly drew asparsesetofsamplingtimes,from a total baseline spanning 1000 timeunits (’days’). Then we used thosetimestosamplesomeperiodicfunctionwithaperiodoftwo Sinusoid Almost sinusoidal days,andaddedwhiteGaussiannoisewithaprescribedsignal-to- 1 1 wnwoaeivsteeeU,srteanectldilioikipen(sSitPnhNagepR-abe)pri.npIWa:rroseyaintcleuihsgsthwoetidedctafuholr,elvlaesolawmamneodedstseinsecitcnPeouanfsptosreiiirxcdIaSp,le,BwrsieaRowhdVaitcovcoeufturhdvn,eeccp.tiiudolensdes detection fraction0.05 detection fraction0.05 10 20 30 40 50 10 20 30 40 50 this time to use a much simpler and intuitive performance mea- N N Sawtooth Pulse sure:ineachtestedconfigurationofsignalshape,N andSNR,we 1 1 swtahimafersep’dqalyteutteaceniocnctueyinodtngeerdxfidratachtcthetliyaontnuisn’mp.atbWhneenreecodcofartlsrhceiemucltruaaktlneandtgoieowtnh1nse0pi−pne4erw–riio1ohdddi,caichtyhi−ttuyh1se,mowbbeeitttsrahtiicnsssacteiofnporgesr detection fraction0.05 detection fraction0.05 10 20 30 40 50 10 20 30 40 50 of 10−4day−1.Thus, counting thefractionof caseswiththecor- N N Eclipsing binary Spectroscopic binary rectperiodactualymeantafrequencyerrorwhichwassmallerthan 1 1 1Rm0oe−stre4inWcdbaelwya−tet1c-tho.eamsdtpa(inrBetKrdoRdu)thcpeeedripionedrifPcoiartpmyearmnIc,eetarincdoftaolsttohhiestoHBtohleeufmftdw-iKnogie-’fttereasr--t detection fraction0.05 detection fraction0.05 10 20 30 40 50 10 20 30 40 50 ditional’ techniques of Generalized Lomb-Scargle (Lomb 1976; N N Scargle 1982; Zechmeister&Kürster 2009) and von-Neumann ratio (vonNeumannetal. 1941) as a representative of the Figure2.Detection fraction as afunction ofsamplesize fortimeseries string-lengthtechniques(Clarke2002;Lafler&Kinman1965). withSNRof100.Thedetectionfractionisestimatedbasedon100simu- Fig.1examinesthedependenceofthedetectionperformance latedlightcurves.Legend:emptycircles,dashedline–GeneralizedLomb– on the number of samples in the time series, for low-SNR. We Scargle periodogram; emptysquares, dashed line –von-Neumann Ratio; held the SNR fixed at 3 while we varied the sample size N. The emptyupward-pointing triangles, dashedline–Hoeffding test;filledcir- main feature apparent from examining the Figure isthat the new cles,solidline–Blum-Kiefer-Rosenblatttest. BKRapproachconstitutesanimprovementovertheHoeffding-test approach weintroduced inPaperI.Incaseswherethetraditional techniques performed better (e.g., pulse-wave signal shape), they and eccentric SB RV curve). However, specifically in those two usuallyperformedalsobetterthanBKR,andviceversa. cases,anothertrendemerged:theBKRmethodseemedtohavean Fig.2presentsthesametestforthecaseofahighSNR.We exceedinglybetterperformancefordatasetswithveryfewsamples fixedtheSNRatavalueof100andrepeatedthesameexercise.In (N=10). thissituationtheBKRwasagainimprovingovertheHoeffding-test Inordertomakesurethisresultwasnotspuriousorevener- approach, whichalsomeant itperformed significantlybetter than roneous,wehavedecidedtoexaminethisregimemoreclosely.We the traditional approaches in the cases of sawtooth-shaped signal repeated the simulations with SNR=100 for a finer grid of N, MNRAS000,1–4(0000) ImprovedNon-ParametricPeriodicityDetection 3 Sinusoid Almost sinusoidal Original Time Series Phase−Folded Time Series 1 1 4 4 detection fraction0.5 detection fraction0.5 123 123 0 10 15 20 0 10 15 20 00 200 400 600 800 1000 00 0.5 1 1.5 2 N N Sawtooth Pulse Generalized Lomb−Scargle Periodogram von−Neumann Ratio 1 0.8 1 30 detection fraction0.5 detection fraction000...246 0.5 1200 0 0 10 15 20 10 15 20 0 0 N N 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Eclipsing binary Spectroscopic binary Hoeffding Test Blum−Kiefer−Rosenblatt detection fraction0000....2468 detection fraction0.15 000...000123 00..0001..5512 0 0 10 15 20 10 15 20 0 0 N N 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Figure3.Detection fraction asafunction ofsample sizefortimeseries Figure4.Anexampleoftheresultsofapplyingalltheexaminedperiodic- withSNRof100,focusingonsmallN.Thedetectionfractionisestimated itydetectionmethodstoasawtoothsimulatedtime-series,with9samples, basedon100simulatedlightcurves.Legend:emptycircles,dashedline– andaSNRof100.Theuppertwopanelsshowthetimeseriesanditsphase- GeneralizedLomb–Scargleperiodogram;emptysquares,dashedline–von- foldedversion, usingthecorrect period. Theotherpanels showtheperi- NeumannRatio;emptyupward-pointingtriangles,dashedline–Hoeffding odicitymetricscalculatedforthistimeseries,withselfexplanatorytitles. test;filledcircles,solidline–Blum-Kiefer-Rosenblatttest. Notethepoorperformanceofthefirstthreeperiodicitymetrics,compared tothedetectionbytheBKRtechnique. namely,forallNfrom7to20.Fig.3,whichfocusesonthisrange, showstheveryconvincingresultofagradualincreaseoftheper- foldingusingtheresultingperiod,theBKRfunctionandtheGLS formance inmost cases. Specificallyinthecases of thesawtooth periodogramwecalculatedforcomparison.Theprominenceofthe signalandtheeccentricSBRVcurve,Thepaceofthatincreasefor BKR peak at a frequency of 2.5696d−1 is evident, and indeed theBKRmethodismuchfasterthanthatoftheothertechniques, thecorrespondingSDEvalueis17.5.Thephase-foldedlightcurve includingtheHoeffding-testapproach.Itseemedthatalreadywhen demonstratesanobviousperiodicity.Ontheotherhand,TheGLS therewere9(!)samples,theBKRapproachhadverygoodchances periodogramshowsnohintofastatisticallysignificantperiodicity todetecttheperiodicity. detection. Figs.4,5and6showselectedconcreteexamplesofcaseswith To complete the picture, this star isa known RR-Lyrae star, asawtoothsignal shapeandonly 9samples, inwhichtheperfor- withaperiodof0.38918702d(Samusetal.2013),whichisconsis- mance of BKR was definitely superior over the three other tech- tentwithourresult–0.3891656d.TheHipparcoscataloguequotes niqueswetested.Thosecaseswereindeedthemajorityofthesim- the known period as taken from literature, and adds a comment ulatedcases. aboutthescarcityofthedata,whichcastsdoubtaboutthenatureof theperiodicity. Therefore, themaincataloguedoesnot quoteany period.WeshowherethatbyusingournewBKRapproach,wecan assignahighdegreeofcredibilitytotheperiodicity,evenwiththe 4 REAL-LIFETEST veryscarceHipparcosdata. We set out to test our new technique in a real-life situation that couldpotentiallyemphasizeitsadvantagesandallowthemtoma- terialize. To this end we chose to apply it to short Hipparcos 5 DISCUSSION lightcurves.OursampleconsistedofHipparcostargetswithatmost 30samplesintheirHipparcosEpochPhotometryAnnexentry.We This Letter presents an improvement of the serial Hoeffding-test consideredonlysampleswhichwerefullyacceptedbyatleastone periodicitymetricwehavepresentedinPaperI,basedontheBlum- of Hipparcos’ FAST and NDAC consortia (ESA 1997). In total Kiefer-RosenblattmodificationoftheoriginalHoeffdingtest.This therewere51lightcurvesmeetingthosecriteria. new periodicity metric consistently outperforms the Hoeffding- Wescannedtheselectedlightcurvesforperiodicityusingthe test metric. On top of it, it seems to perform superbly better in BKRtest,onafrequencygridrangingfrom10−4to12d−1(follow- sawtooth-like signal shapes, when thenumber of samples isvery ingKoen&Eyer(2002)),withstepsof10−4d−1.Wesearchedfor smallandtheSNRishigh.Thisresultisinlinewiththestatement targetswhose peak intheBKRfunction wassignificantlypromi- ofBlumetal.(1961),thattheirtestisasymptoticallyequivalentto nent. We therefore used the SDE (Signal Detection Efficiency) theHoeffdingtestforlargesamples. statistic originally used by Alcocketal. (2000) and Kovácsetal. This advantage might prove very important for radial- (2002). We chose to single out targets whose SDE statistic was velocity surveys searching for exoplanets. The detection of high- higher than15. Only one targetpassed thishurdle –HIP101453 eccentricity Keplerian RV curves is notoriously difficult when (alsoknownasCHAql). basedonasmallnumberofsamples(Cumming2004).Anothersit- Fig.7showsthelightcurveofHIP101453,aswellasitsphase uationinwhichthisfeaturewillprovevaluableisthedetectionof MNRAS000,1–4(0000) 4 ShayZucker Original Time Series Phase−Folded Time Series Original Time Series Phase−Folded Time Series 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 0 200 400 600 800 1000 0 0.5 1 1.5 2 0 200 400 600 800 1000 0 0.5 1 1.5 2 Generalized Lomb−Scargle Periodogram von−Neumann Ratio Generalized Lomb−Scargle Periodogram von−Neumann Ratio 1 30 1 40 30 20 0.5 0.5 20 10 10 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Hoeffding Test Blum−Kiefer−Rosenblatt Hoeffding Test Blum−Kiefer−Rosenblatt 0.03 0.2 0.04 0.2 0.15 0.03 0.15 0.02 0.1 0.02 0.1 0.01 0.05 0.01 0.05 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Figure5.Anotherexampleoftheresultsofapplyingalltheexaminedperi- Figure6.Anotherexampleoftheresultsofapplyingalltheexaminedperi- odicitydetectionmethodstoasawtoothsimulatedtime-series,with9sam- odicitydetectionmethodstoasawtoothsimulatedtime-series,with9sam- ples,andaSNRof100.Theuppertwopanels showthetimeseries and ples,andaSNRof100.Theuppertwopanels showthetimeseries and itsphase-folded version,usingthecorrectperiod. Theotherpanels show itsphase-folded version,usingthecorrectperiod. Theotherpanels show theperiodicitymetricscalculatedforthistimeseries,withselfexplanatory theperiodicitymetricscalculatedforthistimeseries,withselfexplanatory titles.Notethepoorperformanceofthefirstthreeperiodicitymetrics,com- titles.Notethepoorperformanceofthefirstthreeperiodicitymetrics,com- paredtothedetectionbytheBKRtechnique. paredtothedetectionbytheBKRtechnique. RR-LyraestarsandCepheids(whosesignalshapesarealsoessen- LombN.R.1976,Ap&SS,39,447 tiallysawtooth-likeshapes)insparsedatasetssuchasthoseofHip- SamusN.N.,DurlevichO.V.,Goranskij,V.P.,Kazarovets,E.V.,Kireeva, parcosandGaia(Eyeretal.2012).WehaveprovidedinSection4 N.N.,Pastukhova,E.N.,Zharova,A.V.2013,GeneralCatalogofVari- ademonstrationofthispotentialusingHipparcosdata,specifically ableStars,VizieronlinedatacatalogB/gcvs ScargleJ.D.1982,ApJ,263,835 forthecaseofHIP101453. von Mises R. 1928, Wahrscheinlichkeit, Statistik und Wahrheit, Julius Wecontinueourinvestigationofharnessingthepowerofnon- Springer,Vienna parametricindependencemeasuresforthepurposeofdetectingpe- vonNeumannJ.,KentR.H.,BellinsonH.R.,HartB.I.1941,Ann.Math. riodicityinextremecircumstancesofeitherpoorSNR,smallsam- Stat.,12,153 plesizes,ornon-sinusoidalsignalshapes.Inthemeanwhilewealso Zechmeister,M.,&Kürster,M.2009,A&A,496,577 applyournewlydevelopedapproachestoexistingdatasets,bothfor ZuckerS.2015,MNRAS,449,2723 thesakeoftesting,butalsofordetectingpreviouslymissedperiodic variables.Topromotefurtherresearchandtestingofthisperiodic- ThispaperhasbeentypesetfromaTEX/LATEXfilepreparedbytheauthor. ity metric by the community, we make it available online, in the formofaMATLABfunction1. REFERENCES AlcockC.etal.2000,ApJ,542,257 BabuG.J.,Feigelson,E.D.2006inGabrielC.,Arviset,C.,PonzD.,En- rique S.,eds, ASP Conf. Ser. Vol. 351, Astronomical Data Analysis SoftwareandSystemsXV.Astron.Soc.Pac.,SanFrancisco,p.127 BlumJ.R.,KieferJ.,RosenblattM.1961,Ann.Math.Stat.,32,485 ClarkeD.2002,A&A,386,763 CramérH.1928,Scand.Actuar.J.,11,13 Cumming,A.2004,MNRAS,354,1165 ESA1997,TheHipparcosandTychoCatalogues,ESASP-1200 Eyer,L.etal.2012,Ap&SS,341,207 HoeffdingW.1948,Ann.Math.Stat.,19,293 Koen,C.,&Eyer,L.2002,MNRAS,331,45 KovácsG.,ZuckerS.,MazehT.2002,A&A,391,369 LaflerJ.,&Kinman,T.D.1965,ApJS,11,216 1 TheURLfordownloadingaMATLABcodetocalculatetheBKRperi- odicitymetricishttp://www.tau.ac.il/˜shayz/BKR.m MNRAS000,1–4(0000) ImprovedNon-ParametricPeriodicityDetection 5 12.5 12.5 13 13 ag]13.5 13.5 m p [ 14 14 H 14.5 14.5 15 15 8000 8200 8400 8600 8800 −0.4 −0.2 0 0.2 0.4 0.6 0.8 BJD−2 440 000 phase−folded time [days] 0.5 0.8 0.4 0.6 0.3 B p 0.4 0.2 0.1 0.2 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Frequency [d−1] Frequency [d−1] Figure 7. Detecting the periodicity in the Hipparcos lightcurve of HIP101453.Upperleft:theoriginallightcurve.Upperright:Thelightcurve phase-folded using aperiod of0.3891656d.Theemptycircles represent copiesoftheoriginaldatasetshiftedbackwardsandforwardsbyoneperiod inordertobettervisualizetheperiodicity.Lowerleft:TheBKRperiodicity metric.Notethesharppeakattheknownfrequency.Lowerright:General- izedLomb-Scargleperiodogram.Notetheabsenceofanysignificantpeak. MNRAS000,1–4(0000)