Table Of ContentMNRAS000,1–4(0000) Preprint7January2016 CompiledusingMNRASLATEXstylefilev3.0
Detection of Periodicity Based on Independence Tests – II. Improved
Serial Independence Measure
Shay Zucker⋆
Dept.ofGeosciences,RaymondandBeverlySacklerFacultyofExactSciences,Tel-AvivUniversity,TelAviv6997801,Israel
6 7January2016
1
0
2
ABSTRACT
n
a We introducean improvementto a periodicitymetric we have introducedin a previous
J paper.WeimproveontheHoeffding-testperiodicitymetric,usingtheBlum-Kiefer-Rosenblatt
6 (BKR) test. Besides a consistent improvement over the Hoeffding-test approach, the BKR
approachturnsoutto performsuperblywhenapplied to veryshorttime series of sawtooth-
] likeshapes.TheexpectedastronomicalimplicationsaremuchmoredetectionsofRR-Lyrae
M
starsandCepheidsinsparsephotometricdatabases,andofeccentricKeplerianradial-velocity
I (RV)curves,suchasthoseofexoplanetsinRVsurveys.
.
h
Key words: methods:data analysis – methods:statistical – binaries:spectroscopic– stars:
p
individual:HIP101453–stars:variables:RRLyrae–stars:variables:Cepheids
-
o
r
t
s
a
[ 1 INTRODUCTION 2 BLUM-KIEFER-ROSENBLATTTEST
1 Inarecentpaper(Zucker2015,;hereafterPaperI)wehaveintro- ThebestperformingmethodinPaperIwasbasedontheHoeffd-
v duced anew non-parametric approach to thedetection of period- ing test. Wassily Hoeffding first proposed it in 1948 (Hoeffding
5 icities in sparse datasets. The new approach follows the logic of 1948)asatestofindependencebetweentworandomvariables.Es-
2
string-lengthtechniques(e.g.Lafler&Kinman1965;Clarke2002) sentially it estimates a measure of deviation of the joint empiri-
2
and quantifies the dependence between consecutive phase-folded caldistributionfunctionfromadistributionthatassumesindepen-
1
samples,foreverytrialperiod.InPaperIwehaveshownthatusu- dence. Themeasureofdeviation that Hoeffdingusedwastheso-
0
ally the classical string-length techniques effectively test solely called Cramér–von Mises criterion for distance between distribu-
.
1 forlineardependence betweenconsecutivesamples.Ontheother tions(Cramér1928;vonMises1928).
0 hand, the Hoeffding-test approach we have presented there, tests LetusdenotebyG1andG2thecumulativedistributionfunc-
6
1 for general dependencies, not necessarily linear. PaperI showed tionsofthetworandomvariables,andbyG12 theirjointcumula-
thatfortwokindsofperiodicsignals(sawtoothsignalandeccen- tivedistributionfunction.Then,independenceofthetwovariables
:
v tricspectroscopicbinary(SB)radial-velocity(RV)curve),ourpro- would mean G12=G1G2. Applying Cramér–von Mises criterion
i posednewapproach performedbetterthantheconventional tech- fordistancebetweendistributions,Hoeffdingdefinedhisnamesake
X
niques. statisticDby:
r
a weemInbsaprikreeddobnyathweidseurccsetusdsfyultoscimomuleatuiopnwsipthrenseewnteadndinimPparpoevreId, D=Z (G12−G1G2)2dG12. (1)
periodicitymetrics,similarlybasedondependencemeasures.The
InestimatingDusingtheempriricaldata,weusetheempiri-
currentLetterrepresentsafirststepinthatdirection,astepwehave
caldistributionfunctions,determinedonlybytheobservedvalues.
alreadyalludedtointheDiscussionofPaperI.
ThisissomewhatreminiscentoftheKolmogorov–Smirnovphilos-
Section2describesthedetailsofthemodificationwepropose
ophy,whichispopularamongastronomers(e.g.Babu&Feigelson
totheHoeffding test,Section3presents thesimulationswehave
2006).Theabovedefinitioneventuallyresultsintheformulaepre-
performed in order to test its performance, and in Section 4 we
sentedinPaperI.
show a test of our new metric on real-life data. In Section 5 we
Blum,Kiefer&Rosenblatt (1961) introduced a new version
discussourfindings.
oftheHoeffdingtest.Theyshowedthatthetwotestswereequiva-
lentinlargesamples,butthenewtestwaseasiertocompute,and
also more naturally amenable to generalization to more than two
variables.Thefundamentaldefinitionoftheirstatisticis:
B=Z (G12−G1G2)2dG1dG2. (2)
⋆ E-mail:shayz@post.tau.ac.il Whilethedifferenceseemstobeminuteandmaybeeveninsignifi-
(cid:13)c 0000TheAuthors
2 ShayZucker
cant,thechangeintheresultingcomputingformulaisnotnegligi- Sinusoid Almost sinusoidal
1 1
fxboilel(d.ii=nFgo.1ll,Io.n.w.,ioNnrgd)ewPrhatpeoreermtIah,keleeintsduuersxedioerunerofltceeacltctshutelhaetpioohnradsseer-afraoefltdneerodtthadefafpteahcatbesdye detection fraction0.5 detection fraction0.5
0 0
bythearbitraryzero-phasechoice,wealsodefinexN+1≡x1.Let 10 20 30 40 50 10 20 30 40 50
N N
us further denote by Ri the phase-folded rank values, such that Sawtooth Pulse
’Rb(bxoiiNtv=h,axrx1Niaj+tm≤e1e)rax=aninsak(nxt’hdNac,xtix,jx1+ai)1sias≤cthctheoxeui+nns1utms.maNfbloloerertsetthotevfhacaplytuacetilhr.isecLw(eexxtrijaus,ptxseajna+rocl1sue)onodfdfoeartfinhnwdeehrpetiahcniher- detection fraction0.15 detection fraction0.15
ders the whole procedure independent of the arbitrary choice of 010 20 30 40 50 010 20 30 40 50
phase. N N
Eclipsing binary Spectroscopic binary
B=NN−o4wi(cid:229)=Nw1e(Ncacni−deRfiiRnei+o1u)r2d,ependencemeasurebytheformul(a3:) detection fraction0.15 detection fraction0.15
0 0
which can easily be derived from eq. (5.2) in Blumetal. (1961). 10 20 30 40 50 10 20 30 40 50
N N
Theaboveexpressionisclearlymuchsimplerthantheparallelex-
pressionsineqs.9–12inPaperI.
Figure1.Detectionfractionasafunctionofsamplesizefortimeserieswith
SNRof3.Thedetectionfractionisestimatedbasedon100simulatedlight
curves. Legend: empty circles, dashed line –Generalized Lomb–Scargle
3 PERFORMANCETESTING
periodogram; empty squares, dashed line – von-Neumann Ratio; empty
upward-pointingtriangles,dashedline–Hoeffdingtest;filledcircles,solid
Followingthesamepath asinPaperI, weperformed simulations
line–Blum-Kiefer-Rosenblatttest.
inwhichwerandomly drew asparsesetofsamplingtimes,from
a total baseline spanning 1000 timeunits (’days’). Then we used
thosetimestosamplesomeperiodicfunctionwithaperiodoftwo
Sinusoid Almost sinusoidal
days,andaddedwhiteGaussiannoisewithaprescribedsignal-to- 1 1
wnwoaeivsteeeU,srteanectldilioikipen(sSitPnhNagepR-abe)pri.npIWa:rroseyaintcleuihsgsthwoetidedctafuholr,elvlaesolawmamneodedstseinsecitcnPeouanfsptosreiiirxcdIaSp,le,BwrsieaRowhdVaitcovcoeufturhdvn,eeccp.tiiudolensdes detection fraction0.05 detection fraction0.05
10 20 30 40 50 10 20 30 40 50
this time to use a much simpler and intuitive performance mea- N N
Sawtooth Pulse
sure:ineachtestedconfigurationofsignalshape,N andSNR,we 1 1
swtahimafersep’dqalyteutteaceniocnctueyinodtngeerdxfidratachtcthetliyaontnuisn’mp.atbWhneenreecodcofartlsrhceiemucltruaaktlneandtgoieowtnh1nse0pi−pne4erw–riio1ohdddi,caichtyhi−ttuyh1se,mowbbeeitttsrahtiicnsssacteiofnporgesr detection fraction0.05 detection fraction0.05
10 20 30 40 50 10 20 30 40 50
of 10−4day−1.Thus, counting thefractionof caseswiththecor- N N
Eclipsing binary Spectroscopic binary
rectperiodactualymeantafrequencyerrorwhichwassmallerthan 1 1
1Rm0oe−stre4inWcdbaelwya−tet1c-tho.eamsdtpa(inrBetKrdoRdu)thcpeeedripionedrifPcoiartpmyearmnIc,eetarincdoftaolsttohhiestoHBtohleeufmftdw-iKnogie-’fttereasr--t detection fraction0.05 detection fraction0.05
10 20 30 40 50 10 20 30 40 50
ditional’ techniques of Generalized Lomb-Scargle (Lomb 1976; N N
Scargle 1982; Zechmeister&Kürster 2009) and von-Neumann
ratio (vonNeumannetal. 1941) as a representative of the
Figure2.Detection fraction as afunction ofsamplesize fortimeseries
string-lengthtechniques(Clarke2002;Lafler&Kinman1965).
withSNRof100.Thedetectionfractionisestimatedbasedon100simu-
Fig.1examinesthedependenceofthedetectionperformance
latedlightcurves.Legend:emptycircles,dashedline–GeneralizedLomb–
on the number of samples in the time series, for low-SNR. We Scargle periodogram; emptysquares, dashed line –von-Neumann Ratio;
held the SNR fixed at 3 while we varied the sample size N. The emptyupward-pointing triangles, dashedline–Hoeffding test;filledcir-
main feature apparent from examining the Figure isthat the new cles,solidline–Blum-Kiefer-Rosenblatttest.
BKRapproachconstitutesanimprovementovertheHoeffding-test
approach weintroduced inPaperI.Incaseswherethetraditional
techniques performed better (e.g., pulse-wave signal shape), they and eccentric SB RV curve). However, specifically in those two
usuallyperformedalsobetterthanBKR,andviceversa. cases,anothertrendemerged:theBKRmethodseemedtohavean
Fig.2presentsthesametestforthecaseofahighSNR.We exceedinglybetterperformancefordatasetswithveryfewsamples
fixedtheSNRatavalueof100andrepeatedthesameexercise.In (N=10).
thissituationtheBKRwasagainimprovingovertheHoeffding-test Inordertomakesurethisresultwasnotspuriousorevener-
approach, whichalsomeant itperformed significantlybetter than roneous,wehavedecidedtoexaminethisregimemoreclosely.We
the traditional approaches in the cases of sawtooth-shaped signal repeated the simulations with SNR=100 for a finer grid of N,
MNRAS000,1–4(0000)
ImprovedNon-ParametricPeriodicityDetection 3
Sinusoid Almost sinusoidal Original Time Series Phase−Folded Time Series
1 1 4 4
detection fraction0.5 detection fraction0.5 123 123
0 10 15 20 0 10 15 20 00 200 400 600 800 1000 00 0.5 1 1.5 2
N N
Sawtooth Pulse Generalized Lomb−Scargle Periodogram von−Neumann Ratio
1 0.8 1 30
detection fraction0.5 detection fraction000...246 0.5 1200
0 0
10 15 20 10 15 20 0 0
N N 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Eclipsing binary Spectroscopic binary Hoeffding Test Blum−Kiefer−Rosenblatt
detection fraction0000....2468 detection fraction0.15 000...000123 00..0001..5512
0 0
10 15 20 10 15 20 0 0
N N 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure3.Detection fraction asafunction ofsample sizefortimeseries Figure4.Anexampleoftheresultsofapplyingalltheexaminedperiodic-
withSNRof100,focusingonsmallN.Thedetectionfractionisestimated itydetectionmethodstoasawtoothsimulatedtime-series,with9samples,
basedon100simulatedlightcurves.Legend:emptycircles,dashedline– andaSNRof100.Theuppertwopanelsshowthetimeseriesanditsphase-
GeneralizedLomb–Scargleperiodogram;emptysquares,dashedline–von- foldedversion, usingthecorrect period. Theotherpanels showtheperi-
NeumannRatio;emptyupward-pointingtriangles,dashedline–Hoeffding odicitymetricscalculatedforthistimeseries,withselfexplanatorytitles.
test;filledcircles,solidline–Blum-Kiefer-Rosenblatttest. Notethepoorperformanceofthefirstthreeperiodicitymetrics,compared
tothedetectionbytheBKRtechnique.
namely,forallNfrom7to20.Fig.3,whichfocusesonthisrange,
showstheveryconvincingresultofagradualincreaseoftheper- foldingusingtheresultingperiod,theBKRfunctionandtheGLS
formance inmost cases. Specificallyinthecases of thesawtooth periodogramwecalculatedforcomparison.Theprominenceofthe
signalandtheeccentricSBRVcurve,Thepaceofthatincreasefor BKR peak at a frequency of 2.5696d−1 is evident, and indeed
theBKRmethodismuchfasterthanthatoftheothertechniques, thecorrespondingSDEvalueis17.5.Thephase-foldedlightcurve
includingtheHoeffding-testapproach.Itseemedthatalreadywhen demonstratesanobviousperiodicity.Ontheotherhand,TheGLS
therewere9(!)samples,theBKRapproachhadverygoodchances periodogramshowsnohintofastatisticallysignificantperiodicity
todetecttheperiodicity. detection.
Figs.4,5and6showselectedconcreteexamplesofcaseswith To complete the picture, this star isa known RR-Lyrae star,
asawtoothsignal shapeandonly 9samples, inwhichtheperfor- withaperiodof0.38918702d(Samusetal.2013),whichisconsis-
mance of BKR was definitely superior over the three other tech- tentwithourresult–0.3891656d.TheHipparcoscataloguequotes
niqueswetested.Thosecaseswereindeedthemajorityofthesim- the known period as taken from literature, and adds a comment
ulatedcases. aboutthescarcityofthedata,whichcastsdoubtaboutthenatureof
theperiodicity. Therefore, themaincataloguedoesnot quoteany
period.WeshowherethatbyusingournewBKRapproach,wecan
assignahighdegreeofcredibilitytotheperiodicity,evenwiththe
4 REAL-LIFETEST
veryscarceHipparcosdata.
We set out to test our new technique in a real-life situation that
couldpotentiallyemphasizeitsadvantagesandallowthemtoma-
terialize. To this end we chose to apply it to short Hipparcos
5 DISCUSSION
lightcurves.OursampleconsistedofHipparcostargetswithatmost
30samplesintheirHipparcosEpochPhotometryAnnexentry.We This Letter presents an improvement of the serial Hoeffding-test
consideredonlysampleswhichwerefullyacceptedbyatleastone periodicitymetricwehavepresentedinPaperI,basedontheBlum-
of Hipparcos’ FAST and NDAC consortia (ESA 1997). In total Kiefer-RosenblattmodificationoftheoriginalHoeffdingtest.This
therewere51lightcurvesmeetingthosecriteria. new periodicity metric consistently outperforms the Hoeffding-
Wescannedtheselectedlightcurvesforperiodicityusingthe test metric. On top of it, it seems to perform superbly better in
BKRtest,onafrequencygridrangingfrom10−4to12d−1(follow- sawtooth-like signal shapes, when thenumber of samples isvery
ingKoen&Eyer(2002)),withstepsof10−4d−1.Wesearchedfor smallandtheSNRishigh.Thisresultisinlinewiththestatement
targetswhose peak intheBKRfunction wassignificantlypromi- ofBlumetal.(1961),thattheirtestisasymptoticallyequivalentto
nent. We therefore used the SDE (Signal Detection Efficiency) theHoeffdingtestforlargesamples.
statistic originally used by Alcocketal. (2000) and Kovácsetal. This advantage might prove very important for radial-
(2002). We chose to single out targets whose SDE statistic was velocity surveys searching for exoplanets. The detection of high-
higher than15. Only one targetpassed thishurdle –HIP101453 eccentricity Keplerian RV curves is notoriously difficult when
(alsoknownasCHAql). basedonasmallnumberofsamples(Cumming2004).Anothersit-
Fig.7showsthelightcurveofHIP101453,aswellasitsphase uationinwhichthisfeaturewillprovevaluableisthedetectionof
MNRAS000,1–4(0000)
4 ShayZucker
Original Time Series Phase−Folded Time Series Original Time Series Phase−Folded Time Series
4 4 4 4
3 3 3 3
2 2 2 2
1 1 1 1
0 0 0 0
0 200 400 600 800 1000 0 0.5 1 1.5 2 0 200 400 600 800 1000 0 0.5 1 1.5 2
Generalized Lomb−Scargle Periodogram von−Neumann Ratio Generalized Lomb−Scargle Periodogram von−Neumann Ratio
1 30 1 40
30
20
0.5 0.5 20
10
10
0 0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Hoeffding Test Blum−Kiefer−Rosenblatt Hoeffding Test Blum−Kiefer−Rosenblatt
0.03 0.2 0.04 0.2
0.15 0.03 0.15
0.02
0.1 0.02 0.1
0.01
0.05 0.01 0.05
0 0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure5.Anotherexampleoftheresultsofapplyingalltheexaminedperi- Figure6.Anotherexampleoftheresultsofapplyingalltheexaminedperi-
odicitydetectionmethodstoasawtoothsimulatedtime-series,with9sam- odicitydetectionmethodstoasawtoothsimulatedtime-series,with9sam-
ples,andaSNRof100.Theuppertwopanels showthetimeseries and ples,andaSNRof100.Theuppertwopanels showthetimeseries and
itsphase-folded version,usingthecorrectperiod. Theotherpanels show itsphase-folded version,usingthecorrectperiod. Theotherpanels show
theperiodicitymetricscalculatedforthistimeseries,withselfexplanatory theperiodicitymetricscalculatedforthistimeseries,withselfexplanatory
titles.Notethepoorperformanceofthefirstthreeperiodicitymetrics,com- titles.Notethepoorperformanceofthefirstthreeperiodicitymetrics,com-
paredtothedetectionbytheBKRtechnique. paredtothedetectionbytheBKRtechnique.
RR-LyraestarsandCepheids(whosesignalshapesarealsoessen- LombN.R.1976,Ap&SS,39,447
tiallysawtooth-likeshapes)insparsedatasetssuchasthoseofHip- SamusN.N.,DurlevichO.V.,Goranskij,V.P.,Kazarovets,E.V.,Kireeva,
parcosandGaia(Eyeretal.2012).WehaveprovidedinSection4 N.N.,Pastukhova,E.N.,Zharova,A.V.2013,GeneralCatalogofVari-
ademonstrationofthispotentialusingHipparcosdata,specifically ableStars,VizieronlinedatacatalogB/gcvs
ScargleJ.D.1982,ApJ,263,835
forthecaseofHIP101453.
von Mises R. 1928, Wahrscheinlichkeit, Statistik und Wahrheit, Julius
Wecontinueourinvestigationofharnessingthepowerofnon-
Springer,Vienna
parametricindependencemeasuresforthepurposeofdetectingpe-
vonNeumannJ.,KentR.H.,BellinsonH.R.,HartB.I.1941,Ann.Math.
riodicityinextremecircumstancesofeitherpoorSNR,smallsam-
Stat.,12,153
plesizes,ornon-sinusoidalsignalshapes.Inthemeanwhilewealso Zechmeister,M.,&Kürster,M.2009,A&A,496,577
applyournewlydevelopedapproachestoexistingdatasets,bothfor ZuckerS.2015,MNRAS,449,2723
thesakeoftesting,butalsofordetectingpreviouslymissedperiodic
variables.Topromotefurtherresearchandtestingofthisperiodic-
ThispaperhasbeentypesetfromaTEX/LATEXfilepreparedbytheauthor.
ity metric by the community, we make it available online, in the
formofaMATLABfunction1.
REFERENCES
AlcockC.etal.2000,ApJ,542,257
BabuG.J.,Feigelson,E.D.2006inGabrielC.,Arviset,C.,PonzD.,En-
rique S.,eds, ASP Conf. Ser. Vol. 351, Astronomical Data Analysis
SoftwareandSystemsXV.Astron.Soc.Pac.,SanFrancisco,p.127
BlumJ.R.,KieferJ.,RosenblattM.1961,Ann.Math.Stat.,32,485
ClarkeD.2002,A&A,386,763
CramérH.1928,Scand.Actuar.J.,11,13
Cumming,A.2004,MNRAS,354,1165
ESA1997,TheHipparcosandTychoCatalogues,ESASP-1200
Eyer,L.etal.2012,Ap&SS,341,207
HoeffdingW.1948,Ann.Math.Stat.,19,293
Koen,C.,&Eyer,L.2002,MNRAS,331,45
KovácsG.,ZuckerS.,MazehT.2002,A&A,391,369
LaflerJ.,&Kinman,T.D.1965,ApJS,11,216
1 TheURLfordownloadingaMATLABcodetocalculatetheBKRperi-
odicitymetricishttp://www.tau.ac.il/˜shayz/BKR.m
MNRAS000,1–4(0000)
ImprovedNon-ParametricPeriodicityDetection 5
12.5 12.5
13 13
ag]13.5 13.5
m
p [ 14 14
H
14.5 14.5
15 15
8000 8200 8400 8600 8800 −0.4 −0.2 0 0.2 0.4 0.6 0.8
BJD−2 440 000 phase−folded time [days]
0.5
0.8
0.4
0.6
0.3
B p
0.4
0.2
0.1 0.2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Frequency [d−1] Frequency [d−1]
Figure 7. Detecting the periodicity in the Hipparcos lightcurve of
HIP101453.Upperleft:theoriginallightcurve.Upperright:Thelightcurve
phase-folded using aperiod of0.3891656d.Theemptycircles represent
copiesoftheoriginaldatasetshiftedbackwardsandforwardsbyoneperiod
inordertobettervisualizetheperiodicity.Lowerleft:TheBKRperiodicity
metric.Notethesharppeakattheknownfrequency.Lowerright:General-
izedLomb-Scargleperiodogram.Notetheabsenceofanysignificantpeak.
MNRAS000,1–4(0000)