Table Of ContentMon.Not.R.Astron.Soc.000,1–17(2222) Printed13January2010 (MNLATEXstylefilev2.2)
On the use of structure functions to study blazar
variability: caveats and problems
0 D. Emmanoulopoulos1⋆, I. M. McHardy1 and P. Uttley1
1
0 1 School of Physicsand Astronomy, University of Southampton, Southampton SO17 1BJ
2
n
a Accepted 2009Month#.Received2009Month#;inoriginalform2009Month#
J
3
1 ABSTRACT
Theextensiveuseofthestructurefunction(SF)inthefieldofblazarvariabilityacross
] the electromagnetic spectrum suggests that characteristics time-scales are embedded
O
in the light curves of these objects. We argue that for blazar variability studies, the
C SFresults aresometimeserroneouslyinterpretedleadingto misconceptionsaboutthe
. actual source properties. Based on extensive simulations we caution that spurious
h
breakswillappearinthe SFsofalmostalllightcurves,eventhoughtheselightcurves
p
- maycontainnointrinsiccharacteristictime-scales.i.e.havingafeaturelessunderlying
o power-spectral-density(PSD).Weshowthatthetime-scalesofthespuriousSF-breaks
r depend mainly onthe lengthof the artificialdata setand also onthe characterof the
t
s variability i.e. the shape of the PSD.
a
The SF is often invoked in the framework of shot-noise models to determine the
[
temporalpropertiesofindividualshots.WecautionthatalthoughtheSFmaybefitted
1 toinfertheshotparameters,theresultantshot-noisemodelisusuallyinconsistentwith
v theobservedPSD.Asanymodelshouldfitthedatainboththetimeandthefrequency
5 domain the shot-noise model, in these particular cases, can not be valid.
4
Moreover,we show that the lack of statistical independence between adjacent SF
0
points,inthestandardSFformulation,meansthatitisnotpossibletoperformrobust
2
statistical model fitting following the commonly used least-squares fitting methodol-
.
1 ogy. The latter yields uncertainties in the fitting parameters (i.e. slopes, breaks) that
0 are far too small with respect to their true statistical scatter. Finally, it is also com-
0
monly thought that SFs are immune to the sampling problems, such as data gaps,
1
whichaffectstheestimatorsofthePSDs.HoweverweshowthatSFsarealsotroubled
:
v by gaps which can induce artefacts.
i
X Key words: methods: statistical – methods: numerical – methods: data analysis –
r galaxies: individual: Mrk501 – galaxies: active – BL Lacertae objects: general
a
1 INTRODUCTION ods havebeen developed to studythe variability properties
of astronomical sources (see for a review SubbaRao et al.
The flux variation of Active Galactic Nuclei (AGN) is a
1997). Linear and nonlinear analysis methods in the time
phenomenon that can provide us with significant informa-
and frequency domains, adjusted to the needsof astronom-
tion about the physical properties of these sources. The
ical data sets i.e. taking into account measurement errors
most variable AGN are the blazars which exhibit dramatic
and data-gaps, can describe adequately the time behaviour
flux variations across the electromagnetic spectrum. The
of AGN. Nevertheless, several authors, as we are going
fastestvariationsareobservedintheX-rayandγ-raybands
to discuss further, are convinced that more conventional,
on time-scales of hours or even minutes (Catanese et al.
intuitive-tools, based on “running variance” computations
1997; Maraschi et al. 1999; Aharonian et al. 2003, 2005a,b)
are able to give robust results with respect to the underly-
whereas in the optical (Tosti et al. 1998; Stalin et al. 2005)
ing variability properties of the observed source. Although
andradiowavebands(Romero et al.1997;Aller et al.1999;
thesemethodshavevalueinsomecases,theycanoftengive
Ter¨asranta et al. 2005) the temporal variations are seen on
misleading results as we shall show in this paper.
time-scales of daysto weeks, up to years.
In thelast fiftyyearsseveral time-series analysis meth- One of the most extensively used tools in the field of
blazar variability is the structure function (SF) (see e.g.
Hugheset al. 1992) which measures the mean value of the
⋆ E-mail:D.Emmanoulopoulos@soton.ac.uk flux-variancefor measurements, x(t),that are separated by
2 Emmanoulopoulos et al.
agiventimeinterval,τ,whereSF(τ)= [x(t) x(t+τ)]2 . RossiX-rayTimingExplorer(RXTE),inordertostudythe
−
The SF is commonly characterized in terms of its slope β, effects of the data length on the position of the SF-break.
(cid:10) (cid:11)
where SF(τ) τβ. Then, in Section 4 we study the timing properties in the
∝
HistoricallytheSFhasbeenusedinthestudyofturbu- Fourier domain of the shot-noise model that TAN01 have
lent plasmas (Kolmogorov 1941a,b; Yu et al. 2003). It was used in order to study the SF properties. In Section 5 we
introduced to astrophysics by radio astronomers studying test the statistical robustness of the most commonly used
slow scintillations in the interstellar medium (Rickett et al. fittingprocedureswhichareemployedinordertoderiveas-
1984)followingthemethodologiesofProkhorov et al.(1975) trophysically interesting quantities from the SF. Finally, in
and Coles & Frehlich (1982) on the subjects of laser prop- Section 6 we study the sensitivity of SF to the presence of
agation in turbulent media and atmospheric scintillation data-gaps.
respectively. The first systematic description of the SF
methodology, adjusted to the needs of astronomical data
sets, was made by Simonetti et al. (1985) using as a refer-
2 LIGHT CURVE SIMULATIONS AND
ence the work of Rutman (1978) from the field of electrical
POWER SPECTRAL DENSITY
and electronic engineering. Simonetti et al. used the SF to
demonstratethatthetime-seriesofflat-andsteep-spectrum We simulate stationary light curves based on the proce-
radio sources differ qualitatively. During the same period, dure used by Timmer & Koenig (1995) which uses as an
Cordes & Downs (1985) estimate theSFs of 21 pulsars and input the Power Spectral Density (PSD) function of the
Hjellming & Narayan (1986) derivedthefirst quantitiveSF observed light curve and gives an ensemble of stochastic
resultsforthecompactgalacticradiosource1741-038.Then, time-series produced from the same underlying PSD. This
Fiedler et al. (1987); Heeschen et al. (1987) also used the is the most robust method of mimicking the properties of
SF to study the variability properties of large extragalac- a light curve since it takes into account correctly the in-
tic radio samples. Subsequently,the SF has been employed trinsic scatter in the power at a given frequency, described
for the study of the timing properties of objects in higher by a χ2, chi-squared distribution with two degrees of free-
2
energy bands. Bregman et al. (1988); Neugebauer et al. dom (e.g. Priestley 1981), by randomizing both phases and
(1989); Quirrenbachet al. (1992); Hufnagel & Bregman amplitudes.
(1992); Smith et al. (1993); Lainela & Valtaoja (1993); Throughout this paper we use the standard nomen-
Heidt & Wagner (1996) whilst Lewis & Irwin (1996) and clature, i.e. we refer to the periodogram as the modulus
Blandford & Kundic(1997)appliedtheSF-analysismethod squared of the discrete Fourier transform of the partic-
to derive masses for microlenses. In the X-ray band, ular data set under consideration whilst the PSD repre-
Brinkmann et al. (2000) applied the SF methodology to sents the mean of the true underlying distribution of vari-
the ROSAT observations of PKS2155-304 and they at- ability power as a function of frequency (Priestley 1981;
tributed the derived shot-noise variability to either the ac- Vaughan et al. 2003).Sincetheperiodogram is highly scat-
celerating inner jet model or to accelerating particles at tered around the PSD having a standard deviation of 100
shocks travelling down therelativistic jet. Thereafter many per cent (Press et al. 1992), we use the binned logarithmic
X-ray variability many X-ray observations have employed periodogram (Papadakis & Lawrence 1993;Vaughan2005).
SFanalysis(e.g.Gliozzi et al.2001;Brinkmann et al.2001; If there are more than 20 samples in each geometric-mean
Iyomoto & Makishima 2001; Zhang et al. 2002) in order to frequencybinthentheirdistributionwill beGaussian, thus
derivesome short of characteristic time-scale. producinga statistically sensible standard deviation.
SFshavebeenparticularlycommonlyappliedtoobser- Concerning the length of the simulated light curves
vations of blazars. The ASCA “long-look” observations of some special caution should be taken with respect to the
Mrk501,Mrk421,andPKS2155-304(Takahashi et al.2000; effectofred-noise leaki.e.thetransferofpowerfrom lowto
Tanihata et al.2001,2003)are,evennow,someofthemost high frequencies by thelobes of the window function which
uniformly sampled light curves in the X-ray regime. The can produce features such as slowly increasing or falling
mainoutcomeoftheaboveSFanalysisisanapparentchar- trendsacrossthegeneratedlightcurve(e.g.Priestley1981).
acteristictime-scaleof 1daywhichappearstobeinaccor- Totakethiseffect intoaccount,weextendthePSDtovery
∼
dance with the duration of the shots existing in the inter- low frequencies (i.e. long time-scales), in order to generate
nal shock particle acceleration scenario (Sikora et al. 2001; artificiallightcurves50timeslongerthantheobservedone.
Spadaet al. 2001). We then truncate the simulated data trains to the desired
Inthisworkweshowthroughextensivesimulationsthat length by selecting a random segment having equal length
muchoftheexistingliteratureonblazarSFstendstomisin- to thereal light curveunderstudy.
terpretobservedSFcharacteristicssuchasbreaksandslopes InthecaseofblazarvariabilitytheunderlyingPSDcan
asbeingrealorphysicallymeaningful,whenoftentheseare bevery-welldescribedbysimpleorbroken-power-lawswith
either artefacts intrinsic to SFs, or subject to much greater indices 1 < α < 2.5 flattening at long time-scales depict-
statistical variation than inferred from the commonly-used ing the physical fact that the variability amplitude can not
fittingprocedures. InSection 2wepresent themethod that increase forever (stationarity).For thesekindsof physically
we are going to use in order to create artificial light curves stationaryprocessestheperiodogram cancorrectlydescribe
lacking any sort of characteristic time-scales and in Section the underlying PSD even for steeper slopes of -4 (or even
3 we study the behaviour of the SF derived from the thor- -5).
oughly studied ASCA data set of Mrk501 (Tanihata et al. Apart from red-noise leak, spectral representations are
2001, hereafter TAN01). For the same source we employ alsoaffectedbyaliasingeffects. Frequencycomponentsout-
thelong-looklightcurveofAll-SkyMonitor(ASM),onboard side the frequency range covered by the data set, are
Structure function: caveats and problems 3
aliased into that range by the very act of discrete sam- of the break. Even if a clear plateau is not formed after
pling (Press et al. 1992). Finally, irregular sampling pro- τ >τ , several authors consider the first “hump” as a sig-
br
duces a window function which smears the spectral esti- nature of a genuine temporal property of the source (e.g.
mates (Scargle 1989), causing deviations from the true un- Takahashi et al. 2000; Gliozzi et al. 2001; Kataoka et al.
derlying PSD. Over the years a number of methods have 2001; Agudo et al. 2006; Fuhrmann et al. 2008).
been derived to overcome these problems. In particular, Finally we note that after thefirst break point, theSF
we can estimate the underlying PSD by model fitting (e.g. usuallyexhibits“wiggly-patterns”formingaplateau.These
Doneet al. 1992; Uttley et al. 2002). This process is now fluctuations indicate that the pairs {f(i + τ),f(i)}τ>τbr
commonly applied to observations of Seyfert galaxy X-ray which are averaged within each time bin τ > τ , are lin-
br
variability (e.g. Markowitz et al. 2003; Uttley & McHardy early independent having zero linear correlation (Appendix
2005;McHardy et al. 2004, 2005, 2006, 2007) and produces A) . In order to study the frequency of occurrence of these
reliable results. Similar simulation-based treatments of the SF structures, we register the abscissas of all the local ex-
SFresults are missing from theblazar literature. trema occurring for τ >τ .
br
In the left panel of Fig. 2 we present with the filled
greyareathedistributionoftheSF-breakscomingfromour
simulated light curves that have the same length and the
3 CAVEATS REGARDING THE STRUCTURE
same featureless PSD as the original ASCA light curve of
FUNCTION BREAK-TIME-SCALES
Mrk501. From an ensembleof2000 simulated datasets, we
3.1 The ASCA data set of Mrk501 get breaks from 1903 of them. To specify the position of
themaximum in ourhistogram we fitthehistogram entries
To study the behaviour of the SF, we examine the ASCA
with a Gaussian distribution. The results of the fit yield
data set of Mrk501 (TAN01) in 2–10 keV, with a sample
an amplitude of 279.55 samples, a mean value of 0.91 days
intervalof5678.3 sec.Thisisoneofthethreedatasets(the
and a standard deviation of 0.09 days1. With the solid and
othertwoareMrk421andPKS2155-304) inwhichTAN01,
dottedlineswepresentthedistributionofthelocalextrema
based on the SF analysis method (Fig. 1, left panel), find
(maxima with solid line and minima with the dotted line)
characteristic time-scales and suggest that these are a sig-
occurring after the first break. The occurrence times of the
natureoftheminimumtime-scalesofindividualshots.This
“wiggling”patternsarepurelyrandomlydistributedforτ >
isoneofthelongestandmostcontinuousdatasetseverob-
τ following a uniform distribution.
tainedforablazarintheX-rays,coveringatimespanof10 br
These simulations show us clearly that stochastic data
days.
setshavingthesamelengthastheASCAdatasetofMrk501
Initially, we estimate the binned logarithmic peri-
and the same featureless PSD, exhibit breaks in the SF
odogram of the data set which can be very well fitted by
around a day. Thus, the apparent break seen in the SF for
asimple featureless power-law with indexα= 1.80 0.09
(χ2 = 3.80 for 6 degrees of freedom with a nu−ll hypo±thesis Mrk501 (Fig. 1, left panel) should not be associated with
any sort of physically meaningful time-scale.
probability of 0.70). Thereis noevidencefor a break in the
PSD.
Wenext produce2000 artificial light curveshavingthe
3.2 Longer time-series: The ASM data set of
same power-law PSD of index α = 1.80 and the same
− Mrk501
lengthasthestudiedASCAdataset.Foreachartificiallight
curve we estimate the SF by employing exactly the same Following Section 3.1, we extend our simulations to even
functional form of the SFas TAN01 longer time-scales to check howSF-break time-scales might
1 be related to the length of a data set. Assuming that for
SF(τ)= w(i)w(i+τ)[f(i+τ) f(i)]2 (1)
N(τ) − long time-scales the variability properties of MRK501 are
X well-representedbythesamePSDdescribingtheshort-term
whereN(τ)= w(i)w(i+τ),τ istheseparationtimeofthe
variability (i.e. same slope and normalisation), we produce
observations, and w(i) is the weighting factor which is pro-
P 2000 artificial light curves, each being 2000 days long. For
portional to the data point f(i) and inversely proportional
every light curve we calculate the SF and we specify the
to its error σ (i). By an eye inspection, we note apparent
f position of theSF-breaks.In total 1893 light curvesexhibit
breaks in almost all of thesimulated SFs.
SF-breaksand thedistribution of theirposition can bewell
In our simulations we take into account the effect of
representedbyaGaussiandistributionhavingamean399.35
the measurement errors in the SF-estimates by represent-
days and a standard deviation of 74.73 days (Fig. 2, right
ing each measurement as a deviate drawn from a Gaussian
panel).
distributionwithmeanandstandarddeviationequaltothe
In order to compare our predicted break with the data
measurement’s value and error respectively (200 times for we use the long-term RXTE-ASM light curve of Mrk5012
each artificial light curve). We have to note that for this
spanning from MJD 50100 to MJD 52100 and we estimate
particular ASCA data set the measurement errors do not
playamajor roleintheSF-estimates.Ifweignoretheseer-
rors the values of the SF-breaks and SF-slopes change only
1 By fitting a log-normal distribution the results remain practi-
by 1 percent and 0.5 percent respectively.
∼ ∼ callythesamegivinganamplitudeof254.33samples,µ=−0.09,
Next, we determine the position of the SF-break (Fig.
andσ=0.10,yieldingameanvalueof0.92daysandastandard
1, right panel). To localize the break we produce an in-
deviationof0.09days.
terpolated version of the SF and we find the abscissa τbr 2 WehaveobtainedthelightcurvefromtheASMlightcurvesite
of the first local maximum, which indicates the position ofMIT:http://xte.mit.edu/ASM_lc.html
4 Emmanoulopoulos et al.
0.05
0.05 0.90days
0.88days
0.02
0.02
0.01
L L
Τ 0.01 Τ
H H
F F
S S
0.005
0.005
0.002
0.002
REAL SIMULATED
0.001 0.001
0.1 0.2 0.5 1 2 5 0.1 0.2 0.5 1 2 5
SeparationtimeΤHdaysL SeparationtimeΤHdaysL
Figure1.[Leftpanel]TheSF(inlogarithmicscale)fortheASCAdatasetofMrk501asshowninpanel(a)offigure4inTAN01.The
arrowindicatesthebreaktime-scalewhichisaroundaday.
[Rightpanel]ArandomlychosenSF(inlogarithmicscale)fromtheensembleofthe2000artificiallightcurveswhichareproducedfrom
afeatureless power-law PSD ofindex -1.80withnocharacteristic time-scale,having the samelength as the ASCAdata set. Thereisa
clearbreakaroundoneday,similartotheonederivedfromtheASCASF.
DistributionofΤbr
250 Distributionofthelocal 120
maximaatΤ>Τbr
Distributionofthelocal
200 minimaatΤ>Τbr 100
s s
e e
pl pl
m m 80
a a
s150 s
of of
er er 60
b b
m m
u100 u
N N
40
50
20
0 0
0 2 4 6 8 300 400 500 600 700
ΤHdaysL Τ HdaysL
br
Figure 2.ThedistributionoftheSF’sbreaksandwigglingfeatures.
[Left panel] The grey area depicts the distribution of the SF-breaks coming from an ensemble of 2000 artificial light curves which are
produced from afeatureless power-law PSD of index -1.80with nocharacteristic time-scale, forthe case of the ASCA data of Mrk501
(thehistogrambinshavealengthof0.04days). BasedonaGaussianfitthemeanvalueandthestandarddeviationofthedistribution
are 0.92 and 0.09 days respectively. The solid and dashed lines represent the distribution of the local maxima and local minima for
τ >τbr mappingthepositionsofthewigglingfeatures.
[Right panel] The distributionof the SF-breaks for the case inwhich the simulated light curves, produced by the same PSD as above,
extendto2000days(thehistogrambinshavealengthof10days).Ithasameanvalueof399.35daysandastandarddeviationof74.73
days.
Structure function: caveats and problems 5
its SF. The SF-break occurs around 402 days, something distanceinourcaseisthemeanvariancewhichismeasured
which is absolutely in accordance with the results from our from the SF, hence for t′ > t the maximum variance and
simulated light curves of the same length which did not in- hencetheposition of thebreak is expected toshift towards
cludeof any sort of characteristic time-scale. larger time-scales.
Theaforementionedexamplereflectsinaclearwaythat Theimportantpointhereisthatthemeasuredvariance
the SF deals only with the properties of the observed light ofadataset(andhencethemaximumvariancedepictedby
curveignoringthepropertiesofthetrueunderlyingvariabil- the break) does not reflect the true underlying variability
ity process. During astronomical observations we observe a properties of the source. As the data set increases, larger
source for a given time and from the observed data series variationsproducedfromthesamevariabilityprocesshaving
we haveto extract in a statistically robust way the proper- thesamePSDenterthedatasetsandthisismappedinthe
ties of the underlying variability process. If the result that SFasabreakeveninthecaseoffeatureless PSDwhich are
we obtain is a strong function of the length of a given data lacking characteristic time-scales.
set then this can lead to a serious misunderstanding of the In Fig. 5 we show the position of the SF-breaks, τ ,
br
variability properties of thesource. induced by the different length of the data sets for various
Similar SF effects to those derived for the ASCA and PSD-slopes.Thisdiagramcanbeusedonlyforcontinuously
ASMlightcurvesofMrk501shouldbeexpectedforallsim- sampledlightcurvesforwhichtheunderlyingPSDisapure
ilar light curves.By producingan artificial light curve6000 power-law with the given indices. The existence of data-
time units (t.u.) long, from a featureless power-law PSD of gapsinthedatasetcanaltersignificantlythispicturesince
slope -2, we estimate the normalized SF (NSF) (equation sampling patters play a major role in the form of the SF
(A10)) (i.e. the SF divided by two times the value of the and hencetheposition of thebreak (Section 6).Inthecase
variance of the data set, see Appendix A) for the overall that our data set has a PSD of a broken-power-law form,
data train and for a 500 t.u. subset of it (Fig. 4). In this similar simulations should be performed in order to specify
caseweusetheNSF,insteadoftheclassicalSF,foraneasy the position of the fake SF-break. Note that the physically
anddirectcomparisonofthepositionofthebreak,sincethe interesting PSD-breaks will be mapped in the SF but in
formed plateaus in both SFs are distributed around 2. The statistically unsoundwaywithrespecttotheirposition and
toprightpanelofFig.4showsthebehaviouroftheNSFfor theiruncertainties (Section 5.3).
these two data sets, coming from thesame underlyingvari- Consider the case of the blazar PKS2155-304 as pre-
abilityprocess. ThebreakintheNSFfortheshortdataset sented by Zhang et al. (2002). The 1997 ( 100 points) and
∼
occurs at around 40 days and for the overall light curve at 1999 ( 200 points) data sets are the most continuous ones
∼
1600 days respectively. In this case, the fact that the break (figure 2 and figure 3 in Zhanget al. 2002) and they have
occurs at two different time points for two different dura- featurelessPSDwithslopesα= 2andα= 3respectively
− −
tions of thedata set which is fully described from thesame (Table3inZhang et al.2002).TheSFofthecorresponding
and completely featureless PSD, indicates that the break light curves (figure 6, middle right panel and bottom right
does not reflect any physically interesting time property of panel respectively, in Zhanget al. 2002) exhibit breaks of
theprocess itself. theorderofτ 12.8ksecandτ 61.7ksecrespectively
br br
≈ ≈
For comparison purposes we also estimate the binned (or 13 t.u. and 61.7 t.u respectively for SF-bins of 1 ksec).
logarithmic periodograms forthetwolight curvesdiscussed A SF-break for the 1997 data set is predicted (from Fig. 5)
above. Aswecan see from thebottom left panel of (Fig. 4) tolieat around14.8 1.4 t.u.and forthe1998 dataset,at
±
both can very well reproducetheinput PSD slope of -2. around60 t.u.with a simplelinear interpolated estimation.
Thisoutcomereadilytellsusthatthesebreakscanoriginate
from a variability process with no characteristic time-scale
3.3 SF-breaks and data set length simplydescribed byafeatureless PSD ofapower-law form.
To quantify the relationship between the spurious SF-
breaks, τ , and both the length and the PSD-slope of the
br
data set, we perform a set of simulations. For each feature-
lesspower-lawPSD(withindexα= 1,α= 1.5,α= 2, 4 THE STRUCTURE FUNCTION AND THE
− − −
α= 2.5andα= 3)andforvariousdata-setlengths(100, BLAZAR SHOT-NOISE MODEL
− −
400, 700, 1000, 1300, 1600, 1900 and 2200 t.u.) we produce
2000 artificial light curves. In total we perform 40 simu- Particle acceleration in shocks (Kirk et al. 1998) is com-
lations, each one consisting of 2000 artificial light curves. monly invoked to explain the observed temporal and spec-
Then,foreachPSD-indexandlightcurvelengthwelocalise tralvariabilityofblazars.Theremaybeavarietyofintrinsic
thepositionoftheτ ,asdescribedinSection3.1,andafter timescales, i.e. acceleration, escape and light crossing time-
br
forming its distribution we estimate its mean and its stan- scales,associatedwithasingleemittingregion.TAN01sim-
dard deviation3 (Fig. 5). ulatelightcurvesontheassumptionofcertainshotparame-
For all the PSD-slopes we can readily see that as the tersand,fromthesimilarity betweenthesimulatedandob-
lengthofthedatasetincreasesfromtt.u.tot′ t.u.(t′>t), servedSFs,estimatesometemporalshotproperties.Herewe
the SF-break gradually shifts to larger values. This can be cautionagainstthisapproachas,ingeneral,thelightcurves
understoodfor thecaseofarandom walkprocess (α= 2) derived from such shots do not exhibit the same PSD as
where the mean distance after T steps is √T. The m−ean thoseobservedfromtheblazars.Aphysicallycorrectmodel
should reproduce both the SFand the PSD.
In TAN01 a number of simulations were carried out in
3 ThedistributionsareveryclosetoGaussian(seee.g.Fig.2). order to associate the SF-breaks with the properties of the
6 Emmanoulopoulos et al.
3.5 402days
1
3
0.7
e
t
a
r2.5
t
n 0.5
u
o L
c Τ
d 2 HF
e S
z
li 0.3
a
m1.5
r
o
N
0.2
1
0.15
0.5
20 50 100 200 500 1000
500 1000 1500 2000
MJD-50000 SeparationtimeΤHdaysL
Figure 3.[Leftpanel] Thelong-term2–10 keV ASMlightcurveofMrk501, between 50100 MJD–52100 MJD,inbins of15-days. The
dottedlineislinearinterpolationintendedtoguidetheeye.
[Rightpanel]TheSFoftheASMlightcurveofMrk501,inbinsof15-days(inlogarithmicscale).TheSF-breakoccursaround402days.
1
80 HLHLxtfluxunits1111122202468024 HLDΤNSF -01í íííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííííí
0
200 300 400 500 600 700 g1-2 æ Longdataset
L t+2000Ht.u.L lo í Shortdataset
nits60 -3
u 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
HLHxtflux40 -21(cid:144)HLD<t.u 45 æ log10@Τ(cid:144)Ht.u.LìDæ SLhoonrgtddaattaasseett
Ls æ æ
nit 3 ææææ
u 2
x ì
20 flu 1 ì
@H 0 ìì
(cid:144)
HLPf -1 ìììììììì
0 1000 2000 3000 4000 5000 6000 80 -3.0-2.5-2.0-1.5-1.0-0.5
1
g
tHt.u.L lo log10@f(cid:144)Ht.u.L-1D
Figure4.[Leftpanel]Theoverallsimulatedlightcurve6000t.u.longproducedfromanunderlyingPSDhavingapower-lawformwith
index-2.Theinlayshowsashortsegmentofthelightcurve500t.u.long.
[Toprightpanel]TheNSFofthedataintheinlay(emptydiamonds)andtheoveralldataset(blackpoints).ThebreakoftheSFoccurs
at40and1600t.u.respectively.
[Bottom right panel] The binned logarithmic periodograms of the short (filleddiamonds) and the overall data sets (grey points). Both
ofthem matchtheunderlyingPSD,beingafeaturelesspower-lawofindex-2.
observed flares. The light curve is regarded as a superposi- and t respectively, occurring randomly at t following a
d p
tion of triangular shots4 with rise and decay time-scale t
r
4 Thelast40yearsthecommonlyusedshot-noisemodelsconsist fileoronlyanexponentialdecayprofile(e.g.Lochneretal.1991;
ofpulseswitheitherasymmetricexponentialriseanddecaypro- Burderietal.1997).
Structure function: caveats and problems 7
ual triangular shots, used for the construction of each light
2000 curve,itisusefultolookatthegeneralanalyticalformofthe
æ
æ à PSDforasingletriangularshotasderiveddirectlyfromthe
1000 æà à ì Fouriertransformofthecontinuousfunctions(AppendixC).
æ ì We should note here that since the simulated light curves
500 à ì ò arediscretised,apart from theirnoisy form,duetotheran-
æ ô
à ì ò òô domness of the occurrence times and of the intensities, we
ô expect the PSD to flatten towards high frequencies due to
200 ì ò
Lu. æ ô aliasing effects (e.g. Papadakis & Lawrence 1993).
Ht.br 100 æ ìà òô InitiallyTAN01considerthatthelightcurveconsistsof
Τ 50 æ ìà òô æà ΑΑ==--23.5 tid.ue.ntTichaelPsySmDmoeftsruicchtraiapnrgoucleasrssbhroetaskis.ea.,tττ−=1 atrnd=btedco=m1e0s
very steep at higher frequencies with power-law of index
20 à òô ì Α=-2 α 4 superimposed on a wave with peaks separated by
ì ò Α=-1.5 τ−≈1 (−Fig. 6, panel a). Qualitatively we can justify this be-
10 òô ô Α=-1 hguavlairousrhobtywcohnicshidiesripnrgopthoretPioSnDalotfoafs−in4gslien4s(yπmτmf)et(reiqcutartiaionn-
(C3)). We expect that in the case of randomly occurring
0 500 1000 1500 2000
symmetric triangular shots, having different intensity but
Lengthof thedatasetHt.u.L thesameduration,theoverallshapeoftheirPSDshouldbe
similar to thesum of thePSDsoftheindividualsymmetric
triangular shots.
Figure 5. The position of the SF-break, τbr (in logarithmic
scale), as a function of the length of the data set assuming an
underlyingPSDofapower-lawformofindexα.Eachdatapoint The second case considered by TAN01 is that of non-
representsthemeanvalueandthestandarddeviationofthedis- identical symmetric shots (τ = tr = td) whose time-scale τ
tributions of the SF-breaks yielding from an ensemble of 2000 variesrandomly betweenτ =10t.u.andτ =100 t.u.
min max
simulations having a power-law PSD of index -1 (filled circles), In this case the PSD is flat until τ−1 and then it breaks
max
-1.5(filled squares), -2 (filled diamonds), -2.5 (filled up-pointing following a simple steep power-law of index α 4 (Fig.
triangles)and-3(filleddown-pointingtriangles).Thedashedlines ≈ −
6, panel b). The oscillatory behaviour, caused by the sine
arelinearinterpolationsintendedtoguidetheeye.
term of equation (C3), is smoothed out due to the random
distribution of τscoming from every symmetric shot.
Poisson distribution, with an intensity I at t
0 p ThethirdandthelastcaseconsideredbyTAN01isthat
It0r [t−(tp−tr)] tp−tr <t6tp to.fua.sTymhimseistraicctiudaenlltyicaalgsehnoetrsawlizitahtiτorn=of10thte.ufi.rasntdcτadse=. T10h0e
(t)= (2)
Itr.sh. −tId0 [t−(tp−td)] tp <t<tp+td iPnSdDexborfeaαksatτ2d−a1n,dthaefnteirtτb−ec1obmeeastsstseeeppawraittehdabpyowτ−er1-laarwe
≈ − r r
Then, the SF properties were studied with respect to the superimposedontheunderlyingpower-law(Fig.6,panelc).
aforementioned time-scales and the main conclusion was Qualitatively,basedonthePSDofasingleasymmetricshot
that only the shortest time-scale between t and t deter- (Fig. C1) we can readily distinguish very similar spectral
r d
mines the position of the break. Moreover, due to the fact features such as the position of the first break, the slope
that theASCA dataset of Mrk501 consists mainly of sym- between τ−1 and τ−1, and the separations in frequency of
d r
metric flares, they connect the shortest time-scale derived thebeating frequencies.
from the SF to the light crossing time of the emission re-
gion. Each oftheaforementioned simulated PSDsdifferssig-
TheapparentsimilaritiesbetweentheobservedSFsand nificantly from the PSD derived from the ASCA observa-
those coming from the shot-noise simulations, presented by tions i.e. a simple power-law with index α = 1.80 0.09.
− ±
TAN01, give an erroneous impression that such correspon- Neithersteep slopes nor sinusoidal features in oscillating or
dences hold. Apart from thefact that breaks in the SFcan beating forms appear in the featureless PSD of Mrk501.
occurindatasetswithnocharacteristictime-scales(Section ThefactthatthePSDfunctionsdiffersignificantlybetween
3),wepositthattheshot-noisemodelisnotaphysicallyre- simulations and observations clarifies in an unambiguous
alistic approach that can be used in order to associate the way that the variability properties of Mrk501 can not be
observedbreaksoftheSFtothesmallesttime-scalesembed- described in a physically correct way from the shot-noise
dedin thedata sets. Inorder for theshot-noise simulations model. The latter can very well reproduce the various SF-
to be realistic representations of the observed ASCA data featuresi.e.slopes andbreaksobservedin theactualSFsof
setofMrk501,theymustbeabletoreproducetheobserved theobservedinblazars,butitcannotreproducethefeature-
PSD as well as the SF. lessPSD.Thismeansthattheshot-noisesimulationsarenot
We repeat the simulations of TAN01 by creating shot- representative of the true underlying variability properties
noise light curves 2000 t.u. long and we calculate the cor- of Mrk501 and hence should not be used in order to as-
responding PSD. Since the general shape of the latter can sociate the observed SF-breaks with characteristic physical
beunderstoodqualitativelybythePSDformoftheindivid- time-scales.
8 Emmanoulopoulos et al.
0 5 FITTING MODELS TO THE STRUCTURE
FUNCTION: PARAMETER AND ERROR
LD ESTIMATION
-1Lu. -2 Τ-1Τ-1 5.1 Lack of Gaussianity and statistical
Ht. independence
(cid:144)
2
L
units -4 Othnaet tohfethvaerimouasjoerstpimroabtleesm,sSFth(aτti),affareectnsotthienduespeenofdeSnFt oisf
ux eachother.Thisproblemseverelyaffectsthefittingroutines
fl e.g. least-squares, maximum likelihood, that are commonly
H
H
L(cid:144) -6 usedin thepublishedblazar-SF-literaturetoderivetheSF-
f
HP breaks and the -slopes. These routines all require that the
@
0 data points should be statistically independent so that the
1
g
o probabilityofobtainingthefullensemble,inourcaseSF(τi)
l -8 HaL Τ-1 for all τi, is equal to the product of the probabilities of ob-
taining the individual SF(τi) (e.g. Bevington & Robinson
1992).
That means that if we want to fit a broken-power-law
1LD 0 of theform (e.g. Zhanget al. 2002)
-
L
(cid:144)Ht.u. SF (τ;C,τ ,β ,β )= C τmτax β1 τ 6τmax (3)
2Lnits -2 M max 1 2 C(cid:16)τmτax(cid:17)β2 τ >τmax
u (cid:16) (cid:17)
x to a dataset of theform
u
HHfl -4 τ1,SF(τ1) , τ2,SF(τ2) ,..., τN,SF(τN)
(cid:144)
Lf n(cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17)o
H we have to maximize the probability of obtaining the en-
P
@og10-6 Τm-1ax asesmthbelejooifnetsptirmobaatbeislit{ySF(τ1),SF(τ2),...,SF(τN)}, known
l
HbL N
-8 P(C,τmax,β1,β2) = Pi(SFM(τi;C,τmax,β1,β2)) (4)
i=1
1 Y
1LD where Pi SFM(τi;C,τmax,β1,β2) is the probability of ob-
-
Lu. taining th(cid:16)e estimate SF(τi). Equ(cid:17)ation (4) is valid (i.e. the
Ht. 0 joint probability equals to the product of individual proba-
2L(cid:144)s bilities) iff theestimates {SF(τ1),SF(τ2),...,SF(τN)} are
nit independentof each other.
u -1 Unfortunately the adjacent SF-estimates are far from
Hflux Τr-1Τr-1 i(nAdCeFpe)nodfetnhte.IfinrsFtig2.57lowgearsihthowmitchSeFa-uetsotciomraretleasti(oonutfuonfc1t3io7n)
H
(cid:144)
L of Mrk501 (Fig. 1) after detrending them by subtracting
f -2
H
P a fifth degree polynomial. Note here that this detrending
@
10 is crucial since we want to check the linear relations be-
g
o tween the SF-estimates (e.g. short-term dependence), and
l -3 HcL Τ-1 Τ-1 notlong-termlinear-dependencewhichisduetothegeneral
d r
shape of the SF. In Fig. 7 we quantify the degree of linear
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 dependenceoftheSF-estimatesfortheslope,thebreakand
log @f(cid:144)Ht.u.L-1D the plateau regions. It is impossible to have linearly cor-
10
related values which are independent5. We can readily see
Figure 6.ExamplePSDsforthe shot-noisemodel consisting of that up to τ =0.88 days (the range which is used in order
triangularshots forthreecasesofshots. to derive the break time-scale in Section 3.1) the degree of
[Top panel (a)] Identical symmetric shots with rise and decay linear correlation is greater than 0.5.
time,τr andτd respectively, equalto10t.u. The assumption of Gaussianity is another issue with
[Middlepanel(b)]Symmetricshotswithriseanddecaytimeequal
toτ (τr =τd=τ),andτ distributedbetweenτmin=10t.u.and
τmax=100t.u. 5 Weshouldnotethatsometimestheabsenceofalinearcorrela-
[Bottompanel(c)]Asymmetricidenticalshotswithriseanddecay
tionisconfusedwithstatisticalindependencebutthisisthecase
timeτr=10t.u.andτd=100t.u.,respectively. only when we are dealing with Gaussian distributions. Genuine
statistical independence requires not only linearly uncorrelated
values but also the absence of any functional relation between
thevariables.
Structure function: caveats and problems 9
remains non-Gaussian for all the time bins τ. That means
1.0
that any fitting relation of the form of equation (3), deal-
Slope Plateau ingdirectlywiththeSF-estimates,doesnotprovidereliable
results since Gaussianity i.e. equation (5), is not valid.
D Interestingly, at least for those time bins which are be-
L
Τ 0.5 lowthefakeSF-break,occurringatτ 50t.u.(Fig.5),the
H br
F ≈
S logarithms of the SF-estimates follow a distribution which
@
0 is closer to Gaussian (Fig. 8, bottom panels a and b). For
1
og τ > τbr even the logarithmic-estimates of the SF deviate
l
significantly from Gaussianity (Fig. 8, bottom panel c) and
f
o 0.0
of course for all thetimebinsτ themeasurements continue
F
C to bestatistically dependenton each other.
A
Kataoka et al. (2001) define the sum of square differ-
break encesχ2sim = k{log10[hSF(τk)i]−log10[SF(τk)]}2 ,where
- the angle brackets indicate the arithmetic mean, in order
-0.5 SF to derive a stPatistical significance of the goodness of their
SFfit,accountinginthiswayforthenon-Gaussianity.They
0.0 0.2 0.4 0.6 0.8 1.0 arguethatχ2 isdifferentfromthetraditionalχ2(i.e.equa-
sim
ΤHdaysL tion (7)) but the statistical meaning is the same. Since the
SFmeasurementsarenotindependent,aχ2 functionof the
Figure 7. The ACF of the first 25 detrended logarithmic SF- form (S E)2,whereS comesfromsimulations(i.e.from
−
estimates of Mrk501 (from Fig. 1) showing the degree of linear a given assumed underlying model) and E are the actual
P
dependence among them. Thevertical dotted lineshows thepo- estimates, makes sense only if for every SF bin the distri-
sition of the SF-break at 0.88 days, and the arrows indicate the butionoftheentriesisGaussian (yieldingthe(S E)from
−
SF-slopeandplateauregions.Thedashedlinesamongthepoints theexponentoftheGaussian,equation(5))andifalltheSF
oftheACFarelinearinterpolationsintended toguidetheeye. bins are independent of each other (yielding the sum from
thejoint probability, equation (6)).
Concerning the Gaussianity it would be more appro-
the SF fitting procedures. Only if the distribution of the
SF-estimateswithineachτi isGaussian (withamean value priate for Kataoka et al. (2001) to use the mean logarithm
SF(τi) and standard deviation σi), is it valid to consider of the SF, hlog10[SF(τk)]i rather than the logarithm of the
mean SF, log [ SF(τ ) ], in their pseudo chi-square esti-
that 10 h k i
mate: χ2 = log [SF(τ )] log [SF(τ )] 2
Pi SFM(τi;C,τmax,β1,β2) = wherebositmh,GqauuasnstitiePshlko{gh10[S10F(τk)]ikanid−log101[0SF(τk)k]a}re
(cid:16)1 −1 SF(τi)−SFM(τi;C,(cid:17)τmax,β1,β2) 2 distributed Gaussian within a tk. However, the main prob-
e 2h σi i (5) lem of non-independence is not avoided and therefore the
σi√2π derived significances are not statistically meaningful.
and estimate the product in the joint probability (equation
(4)) (which implies independence)as
5.2 Error underestimation
P(C,τ ,β ,β )=
max 1 2
In most of the cases, when we deal with statistically inde-
N 1 e−21PNi=1hSF(τi)−SFM(τσii;C,τmax,β1,β2)i2(6) pendentvariables undertheassumption of Gaussianity, the
i=1(cid:18)σi√2π(cid:19) mostprobablevalue(i.e.theonethatminimizestheχ2)and
Y itsstandarddeviationareenoughtogiveafullpictureofthe
According to the maximum likelihood method, in order to
distribution of thefit parameters.
findthemostprobablesetofparameters(C,τ ,β ,β )we
max 1 2 Thereareseveralmethods,as wearegoingtosee,that
must maximize P(C,τ ,β ,β ) (equation (6)) or equiva-
max 1 2 are used in the literature claiming robustness and statisti-
lently wemust minimize thesum-argument χ2 in theexpo-
cally meaningful errors for the SF fit parameters i.e. slopes
nential of equation (6)
and breaks. In this section we show through simulations
χ2 = N SF(τi)−SFM(τi;C,τmax,β1,β2) 2 (7) ythiealtdsthveerlaycksmoaflsltaesttisimticaatlesinodfepuenncderetnacinetiynitnhethSeFfi-etsttiinmgapteas-
σ
Xi=1(cid:20) i (cid:21) rameters, which donot reflect thetrueunderlyingdistribu-
Thusthequantityχ2automaticallydefinesthegoodness-of- tions of the fitting parameters. The essence of any correct
fit which is actually a measure of the probability of obtain- fitting procedure is to yield a statistically meaningful de-
ingtheobservedSF-estimatesfromagivensetofparameters scription of these distributions.
(C,τ ,β ,β )butonlywhenwearedealingwithindepen-
max 1 2
dent and Gaussian SF-estimates.
5.2.1 Fitting the SF-slopes and -breaks
However, in reality we do not deal with Gaussian SF-
estimates.Byproducing2000randomdatasets500t.u.long We use the previously generated simulations (2000 light
from aPSDofasimplepower-lawform havingindex-2,we curves, 500 t.u. long from a power-law PSD having an in-
formthedistributionoftheSFsforthefirst,thesecond,and dex-2), and for each light curvewe calculate thelogarithm
thethree-hundredthtimebinsrespectively(Fig.8,toppan- of theSF-estimates. We attribute to every estimate a stan-
els). The histograms clearly have a non-Gaussian form and dard error based on the error of the sample mean for each
10 Emmanoulopoulos et al.
140
ples 150 Τ=H1aLt.u. ples 150 Τ=H2bLt.u. ples 120 Τ=3Hc0L0t.u.
m m m 100
sa 100 sa 100 sa 80
of of of 60
ber 50 ber 50 ber 40
m m m
u u u 20
N N N
0 0 0
0 2 4 6 8 10 12 14 0 5 10 15 20 25 30 35 0 1000 2000 3000 4000
SFH1L SFH2L SFH300L
80
es80 HaL es HbL es100 HcL
pl Τ=1t.u. pl60 Τ=2t.u. pl Τ=300t.u.
m m m 80
a60 a a
ofs40 ofs40 ofs 60
er er er 40
b b b
um20 um20 um 20
N N N
0 0 0
0.0 0.5 1.0 0.0 0.5 1.0 1.5 1.5 2.0 2.5 3.0 3.5
log @SFH1LD log @SFH2LD log @SFH300LD
10 10 10
Figure 8.Thedistributionsoftheensembleof2000SF-estimates,comingfrom2000randomdatasets500t.u.longhavingaPSDofa
power-lawformofindex-2,forthefirst,secondandthree-hundredthtimebinrespectively.
[Toppanels]ThedistributionsofthenormalSF-estimates(thehistogrambinshavealengthof1,1and100respectively)isnotGaussian
foralltheτ.
[Bottompanels]ThedistributionofthelogarithmicSF-estimates(thehistogrambinshavealengthof0.025,0.025and0.05respectively)
approximates adequatelytheGaussiandistributiononlyforτ >50t.u.
time bin, which is one of the most usual methods (e.g. ingfromafittingprocedure,iscorrectonlywhenitspredic-
Collier & Peterson 2001; Zhang et al. 2002; Czerny et al. tive character (i.e. in our case for the PSD: 68 per cent of
2003). We then fit, using a least-squares fitting procedure, themeasurementsshouldbewithintherange 1.98 0.18)
− ±
a simple linear model of theform coincides with the actual fluctuational outcome of the pro-
cess (i.e. 1.98 0.17). These simulations show us clearly
log [SF(τ;C,β)]=C+βlog (τ) (8) − ±
10 10 that this is not the case for the SF errors, which are much
smaller than the true spread of the SF-estimates for the
whereβ representstheslopeoftheSF.Hence,foreachlight
same variability process.
curveweestimatethevalueoftheSF-slopeβ togetherwith
an error δβ. The top panels of Fig. 9 show the distribution Occasionally authors attempt to take account some of
of β and δβ. The errors derived from the fit, δβ, are very theSFproblems.InAgudoet al.(2006),forthecaseof the
smallincomparisontotheactualscatterofthemeasuredβ, intra-day variable blazar S50716+71 observed at 86 GHz,
depicting clearly the effect of non-independency. From the the authors use the interpolated version of the SF which
simulations we expect 68 per cent of the derived slopes to takes into account only the errors caused by the interpola-
be within the interval 1.02 0.13 but erroneously the δβ tion (Quirrenbach et al. 2000). By repeating thisprocedure
distributionhasameanof0.±006,dictatingarangefortheβ foroursimulationswegetameanerrorintheslopeof0.011
within1.020 0.006.Thereasonfortheverysmalluncertain- whichisverysmallcomparedtotheexpected0.13.Another
ties in the fi±tted slopes, δβ, is that the errors on the actual weaknessofthismethodisthatseveralpointsintheSFhave
SF-estimates are very small themselves due to their non- zero uncertainty (i.e. the intersection point of the forward
independentnature.EverySF-estimatefollowssmoothlythe and backward interpolations, see figure 4 in Agudo et al.
increasingtrend,definedfromtheadjacentpoints,andthus 2006andfigures24–40inQuirrenbach et al.2000)therefore
theerrorsderiveddirectlyfromtheSF-estimatesinthisway they donot really haveany statistical meaning.
are always too small. In Fuhrmann et al. (2008), again for the case of
On the contrary the binned logarithmic periodogram- S50716+71 but observed at radio frequencies, the authors
estimates together with their corresponding errors do pro- makeuse of theterm “variability timescale” which theyre-
vide us with statistically correct information about the be- late directly to the SF-break. To estimate its uncertainty
haviour of the slope. For each one of the previous simu- they use the saturation level p and the two fitted parame-
0
latedlightcurveswefitthebinnedlogarithmicperiodogram- ters of the SF function α, β, SF(τ) = aτβ, (see equations
estimates with asimplelinear modelandonceagain wede- (B.2) and (B.3) in the appendix of Fuhrmann et al. 2008).
rivetheslope αandtheerrorδα.Wecansee from thebot- We use the simulations of Section 3.1 in order to see how
tompanelsofFig.9,that68percentoftheα-estimatesare wellwecanspecifytheposition ofthesespuriousSF-breaks
expectedtofallwithin 1.98 0.18, inaccordancewiththe byapplyingthesamemethodology,despitethefactthatthe
− ±
distribution of δα (having a mean of 0.17) coming directly simulated light curves do not contain any real characteris-
from thefitstothebinnedlogarithmic periodograms of the tic time-scale. The distribution of the spurious break time-
simulatedlightcurves.Ameaningfulerroranalysisoriginat- scalesisshownintheleftpanelofFig.2(greyarea),having