Table Of ContentReceived 2009 November 14; accepted 2009 December 21
PreprinttypesetusingLATEXstyleemulateapjv.11/10/09
OVERVIEW OF THE KEPLER SCIENCE PROCESSING PIPELINE
Jon M. Jenkins1, Douglas A. Caldwell1, Hema Chandrasekaran1, Joseph D. Twicken1, Stephen T. Bryson2,
Elisa V. Quintana1, Bruce D. Clarke1, Jie Li1, Christopher Allen3, Peter Tenenbaum1, Hayley Wu1, Todd C.
Klaus3, Christopher K. Middour3, Miles T. Cote3, Sean McCauliff3, Forrest R. Girouard3, Jay P. Gunter3,
Bill Wohler3, Jeneen Sommers3, Jennifer R. Hall3, AKM K. Uddin3, Michael S. Wu5, Paresh A. Bhavsar2,
Jeffrey Van Cleve1, David L. Pletcher2, Jessie A. Dotson2, Michael R. Haas2, Ronald L. Gilliland4, David G.
Koch2, and William J. Borucki2
1SETIInstitute/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA
2NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA
3OrbitalSciencesCorporation/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA
4SpaceTelescopeScienceInstitute,Baltimore,MD21218,USAand
0
5BastionTechnologies/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA
1
Received 2009 November 14; accepted 2009 December 21
0
2 ABSTRACT
n The Kepler Mission Science Operations Center (SOC) performs several critical functions including
a managing the ∼156,000 target stars, associated target tables, science data compression tables and
J
parameters, as well as processing the raw photometric data downlinked from the spacecraft each
1 month. The raw data are first calibrated at the pixel level to correct for bias, smear induced by a
shutterlessreadout,andotherdetectorandelectroniceffects. Abackgroundskyfluxisestimatedfrom
] ∼4500 pixels on each of the 84 CCD readout channels, and simple aperture photometry is performed
P
onanoptimalapertureforeachstar. Ancillaryengineeringdataanddiagnosticinformationextracted
E
from the science data are used to remove systematic errors in the flux time series that are correlated
.
h with these data prior to searching for signatures of transiting planets with a wavelet-based, adaptive
p matched filter. Stars with signatures exceeding 7.1σ are subjected to a suite of statistical tests
- includinganexaminationofeachstar’scentroidmotiontorejectfalsepositivescausedbybackground
o
eclipsingbinaries. Physicalparametersforeachplanetarycandidatearefittedtothetransitsignature,
r
t and signatures of additional transiting planets are sought in the residual light curve. The pipeline is
s operational, finding planetary signatures and providing robust eliminations of false positives.
a
[ Subject headings: techniques: photometric — methods: data analysis
1
v 1. INTRODUCTION 3. Report on the Kepler photometer’s health and
8 status semiweekly after each X-band contact and
The Kepler Mission seeks to detect Earth-like planets
5 monthlyaftereachKa-bandsciencedatadownlink.
transiting solar-like stars by performing photometric ob-
2
servations of ∼156,000 carefully selected target stars in
0 4. Monitor the pointing error and compute pointing
Kepler’s 115 deg2 field of view (FOV), as reviewed in
. tweaks when necessary to adjust the spacecraft
1 Borucki et al. (2010) and Koch et al. (2010). These
pointing to ensure the validity of the uplinked sci-
0 Long Cadence (LC) targets are sampled every 29.4 min-
ence target tables,
0 utes and include all the planetary targets for which we
1 seek signatures of transiting planets. In addition, a to- 5. Process the science data each month to obtain
:
v tal of 512 Short Cadence (SC) targets are sampled at calibrated pixels for all LC and SC targets, raw
i 58.85 s intervals permitting further characterization of fluxtimeseries,andsystematicerror-correctedflux
X
the planet-star systems for the brighter (Kp <12) stars time series,.
r via asteroseismology, and more precise transit timing.
a
The Kepler Mission Science Operations Center (SOC) 6. Archive calibrated pixels, raw and corrected flux
at NASA Ames Research Center performs nine major timeseriesandcentroidmeasurementstotheData
functions: Management Center (DMC).
1. Manage target aperture and definition tables spec- 7. Search each flux time series for signatures of tran-
ifying which 5.4×106 of the 95×106 pixels in the siting planets,.
CCD array are processed and stored on the Solid
State Recorder for later downlink1. 8. Fit physical parameters and calculate error esti-
mates for planetary signatures, and .
2. Manage the science data compression tables and
parameters, including the length-limited Huffman 9. Performstatisticalteststorejectfalsepositivesand
coding table, and the requantization table. establishaccuratestatisticalconfidenceineachde-
tection. .
Jon.Jenkins@nasa.gov
1Kepler’s pointingstabilityrequirementis0”.009,3σ,allowing
Kepler’s observations are organized into three month
us to preselect the pixels of interest for each star (Haas et al.
2010). intervals called quarters defined by the roll maneuvers
2 Jenkins et al.
the spacecraft executes about its boresight to keep the The Pipeline module CAL corrects the raw Kepler
solar arrays pointed toward the Sun (Haas et al. 2010). photometricdataatthepixellevelpriortotheextraction
Once each month, the accumulated science data are of photometry and astrometry. Several of the processing
transmitted via the Deep Space Network2 (DSN) to the stepsgiveninFigure2arefamiliartoground-basedpho-
MissionOperationsCenter3,whichforwardsthemtothe tometrists. However, a few are peculiar to Kepler due
DMC4. The DMC packages them into FITS files and to the lack of a shutter and unique features in its analog
pushes them to the SOC. A selected set of ancillary en- electronics chains. Details of these instrument charac-
gineering data are also delivered with the science data, teristics and how they were determined and updated in
containinganyparameterslikelytohaveabearingonthe flight are discussed in Caldwell et al. (2010) and are
qualityofthesciencedata,suchastemperaturemeasure- comprehensively documented in the Kepler Instrument
ments of the focal plane and readout electronics. Handbook (?).
The Science Pipeline is divided into several compo- ThesequenceofprocessingstepsinCALthatproduce
nents in order to allow for efficient management and calibrated pixels and associated uncertainties is as fol-
parallel processing of the data, as shown in Figure 1. lows. (1) The two-dimensional black level (CCD bias
Raw pixel data downlinked from the Kepler photometer voltage) structure (fixed pattern noise) is removed, fol-
are calibrated by the Calibration module (CAL) to pro- lowedbyfittingandremovingadynamicestimateofthe
duce calibrated target and background pixels and their blacklevel. (2)Gainandnonlinearitycorrectionsareap-
associated uncertainties. The calibrated pixels are pro- plied. (3) The analog electronics chain exhibits memory,
cessed by Photometric Analysis (PA) to fit and remove necessitating the application of a digital filter to remove
skybackgroundandextractsimpleaperturephotometry this effect, called Local Detector Electronics (LDE) un-
from the background-corrected, calibrated target pixels. dershoot. (4) Cosmic ray events in the black and smear
PA also measures the centroid locations of each star on measurements are removed prior to subsequent correc-
each frame. The final step to produce light curves hap- tions. (5) The smear signal caused by operating in the
pens in Pre-search Data Conditioning (PDC) where sig- absence of a shutter and the dark current for each CCD
natures in the light curves correlated with systematic readout channel are estimated from the masked and vir-
error sources such as pointing drift, focus changes, and tual smear collateral data measurements. (6) A flat field
thermal transients are removed. Output data products correction is applied.
include raw and calibrated pixels, raw and systematic
3. PHOTOMETRICANALYSIS
error-corrected flux time series, centroids and associated
uncertainties for each target star, which are archived to Before photometry and astrometry can be extracted
the DMC and eventually made available to the public fromthecalibratedpixeltimeseries,thePipelinedetects
through the Multimission Archive at STScI.5 so-called ”Argabrightening” events in the background
In Transiting Planet Search (TPS) a wavelet-based, pixel data. These mysterious transient increases in the
adaptive matched filter is applied to identify transit-like background flux were identified early in Commissioning.
featureswithdurationsintherangeof1–16hours. Light The current hypothesis is that these transient events are
curveswithtransit-likefeatureswhosecombined(folded) duetosmalldustparticlesfromKepler achievingescape
transit detection statistic exceeds 7.1σ for some trial pe- velocity after micrometeorite hits and reflecting sunlight
riod and epoch are designated as Threshold Crossing into the barrel of the telescope as they drift across the
Events(TCEs)andsubjectedtofurtherscrutinybyData FOV.Argabrighteningsthataffect10ormoreCCDread-
Validation (DV). This threshold ensures that no more out channels occur ∼15 times per month, but the rate is
than one false positive will occur due to random fluc- dropping over time. The most egregious of these events
tuations over the course of the mission, assuming non- cannotbeperfectlycorrectedbythecurrentbackground
white, non-stationary Gaussian observation noise (Jenk- correction. Wegapthedatawhentheexcessbackground
ins, Caldwell & Borucki 2002; Jenkins 2002). DV per- flux exceeds the 100 Median Absolute Deviation (MAD)
formsasuiteofstatisticalteststoevaluatetheconfidence level.
in the detection, to reject false positives by background PA then robustly fits a two-dimensional surface to
eclipsing binaries, and to extract physical parameters of ∼4500 background pixels on each channel to estimate
eachsystem(alongwithassociateduncertaintiesandco- the sky background, which is evaluated at each target
variance matrices) for each planet candidate. After the star pixel location and subtracted from the calibrated
planetary signature has been fitted, it is removed from pixel values. Each target pixel time series is scanned for
the light curve and the residual is subjected to a search cosmic rays by first detrending the time series with a
for additional transiting planets. This process repeats moving median filter with a width of five cadences (time
until no further TCEs are identified. The DV results steps)andexaminingtheresidualsforoutlierscompared
and diagnostics are furnished to the Science Team to fa- totheMADoftheresidualsforeachpixel. Careistaken
cilitate disposition by the Follow-up Observing Program not to remove clusters of outliers that might be due to
(FOP; Gautier et al. 2010). astrophysicalsignaturessuchasflaresortransitsthatare
intrinsic to the target star.
2. PIXELLEVELCALIBRATIONS The photocenters of the 200 brightest, unsaturated
stars on each channel are fitted using the pixel response
2 The DSN is operated by the Jet Propulsion Laboratory for functions (PRF; Bryson et al. 2010) and then used to
NASA. define the ensemble motion of the stars over the ob-
3 TheMOCislocatedatLASPinBoulder,CO,USA.
servations. The aggregate star motion is used along
4 The DMC is located at the Space Telescope Science Institute
(STScI)inBaltimore,MD,USA. with the PRFs reconstructed from Commissioning data
5 http://stdatu.stsci.edu/kepler/ to define the optimal aperture as the collection of pix-
Kepler Science Pipeline 3
els that maximizes the mean signal-to-noise ratio of the and is sometimes followed by an exponential recovery
flux measurement for each star (Bryson et al. 2010). over a few hours, but usually not to the same flux level
The background-corrected, cosmic ray-corrected pixels as before. The typical drop in sensitivity is 1%, which
are then summed over the optimal aperture to define a is unmistakable in the flux time series. Such step dis-
flux estimate for each cadence frame. continuities are identified separately from those due to
operational activities, such as safe modes and pointing
4. SYSTEMATICERRORCORRECTIONS tweaks, and are mended by raising the light curve after
PDC’s task is to remove systematic errors from the thediscontinuityfortheremainderofthequarter. These
rawfluxtimeseries. Theseincludepointingerrors,focus eventsdonotmimictransitssincetheydonotrecoverto
changes, and thermal effects on instrument properties. thesamepre-eventfluxlevel,andfewtransits,ifany,are
PDCco-trendseachfluxtimeseriesagainstancillaryen- affectedbythiscorrection. Thesecondissueisthatmany
gineering data such as temperatures of the focal plane starsexhibitcoherentorslowlyevolvingoscillationsthat
and electronics, reconstructed pointing and focus varia- interfere with systematic error removal. The approach
tions to remove signatures correlated with these proxy taken is to identify and remove strong coherent compo-
systematic error measurements. A Singular Value De- nents in the frequency domain prior to co-trending, and
composition (SVD) is applied to the design matrix con- then to restore these components to the residuals after
tainingtheancillarydatatoidentifythemostsignificant, co-trending.
independent components and to stabilize the matrix in- Figure 3 shows theresults of runningtwo fluxtime se-
versioninherentinthefittothedata. Additionally,PDC riesobtainedduringQuarter2throughPDConschedule
identifiesresidualisolatedoutliers,andfillsintra-quarter for release early in 2010, demonstrating PDC’s effective-
gapssothatthedataforeachquarterlysegmentarecon- ness. We expect that learning to deal with the various
tiguouswhenpresentedtoTPS.Finally,PDCadjuststhe systematicerrorswillconsumeagreatdealofeffortover
light curves to account for the excess flux in the optimal thelifetimeofthemissionaswepushthedetectionlimit
apertures due to starfield crowding in order to make ap- to smaller and smaller planets.
parent transit depths uniform from quarter to quarter
as the stars move from detector to detector with each 5. TRANSITINGPLANETSEARCH
roll maneuver. This is achieved by estimating the mean TPS searches for transiting planets by ”stitching” the
excess flux in each photometric aperture from sources quarterly segments of data together to remove gaps and
other than the target star itself from knowledge of the edge effects and then applies the wavelet-based, adap-
PRFandbackgroundstarpositionsandmagnitudesand tive matched filter of Jenkins (2002). This approach is
subtracting this value from each point in the time series atime-adaptiveapproachthatestimatesthepowerspec-
(see Bryson et al. 2010). trum of the observational noise as a function of time.
Significant effort has been applied to PDC in order to This approach was developed specifically for solar-like
achieve good results with flight data. There are a num- starswithcolored,broadbandpowerspectra. Somemod-
ber of phenomena that were significantly different than ifications to the original approach have been developed
expected, including focus variations, and the amount of toaccommodatetargetstarsthatexhibitcoherentstruc-
pointing drift observed during the first two quarters of ture in the frequency domain. Similar to the approach
operation. The systematic errors observed in flight ex- adopted in PDC, we fit and remove strong harmonics
hibitarangeofdifferenttimescales,fromafewhoursto that are inconsistent with transit signatures prior to ap-
several days to many days and weeks. Such phenomena plying the wavelet-based filter. This significantly in-
include the intermittent modulation of the focus by ∼1 creases thesensitivity ofthetransit search for such stars
µmevery3.2hrbyaheaterononeofthereactionwheel andalsoprovidesphotometricprecisionestimates(asby-
assemblies. One of the Fine Guidance Sensors’ guide products of the search) that are more realistic for such
stars through the first quarter (Q1) of observations was targets. Ifthetransit-likesignatureofagiventargetstar
aneclipsingbinarywhose30%eclipsesinduceda1mpix exceeds7.1σ thenaTCEisrecordedandsenttoDVfor
pointing excursion lasting ∼8 hr every 1.7 days (Haas et additional scrutiny.
al. 2010). By far the strongest systematic effects in the
data so far have occurred after each of two safe mode 6. DATAVALIDATION
events (Haas et al. 2010) during which the photometer DV performs a suite of tests to establish or break con-
was shut off, the telescope cooled and the focus changed fidence in each TCE flagged by TPS, as well as to fit
by ∼2.2 µm per ◦C. One of these occurred at the end of physical parameters to each transit-like signature. DV
Q1 and the second ∼2 weeks into Q2. Thermal effects is currently under development and we anticipate its re-
can be observed in the science data for ∼5 days after lease in early 2010 to support the next FOP observing
eachsafemoderecovery. Thefactthatmostsystematics season.
such as these affect all the science data simultaneously, The statistical confidence in the TCE is examined by
and that there is a rich amount of ancillary engineering performing a bootstrap test (Jenkins 2002; Jenkins,
dataandsciencediagnosticsavailableprovidessignificant Caldwell & Borucki 2002) to take into account non-
leverage in dealing with these effects. Gaussianstatisticsoftheindividuallightcurves. Atran-
Some systematic phenomena are specific to individual siting planet model is fitted to the transit signature as a
starsandcannotbecorrectedbyco-trendingagainstan- joint noise characterization/parameter estimation prob-
cillarydata. Thefirstissueistheoccasional,abruptdrop lem. That is, the observation noise is not assumed to
inpixelsensitivitythatintroducesastepdiscontinuityin be white and its characteristics are estimated using the
an affected star’s light curve (and associated centroids). wavelet-based approach employed in TPS, but as an es-
Thisisoftenprecededimmediatelybyacosmicrayevent, timator, rather than as a detector. This process yields a
4 Jenkins et al.
set of physical parameters and an associated covariance to pixel level calibrations.
matrix. Beforelaunch,weaddedpixelstothetargettablethat
To eliminate false positives due to eclipsing binaries, allow us to sample and trend the artifacts simultane-
the planet model fit is performed again only to the even ously with the science data. The Kepler Science Office
transits,andthenonlytotheoddtransits,andtheresult- andSOCaredevelopingandprototypingalgorithmsthat
ing odd/even depths and epochs are compared in order use these image artifact pixels, together with other sci-
to see if the results indicate the presence of secondary ence data, to reconstruct the underlying temperature-
eclipses. After the multi-transiting planet search is com- dependent two-dimensional bias structure as a function
plete, the periods are compared to detect eclipsing bina- of time over each quarter. The resulting dynamic model
ries with significant eccentricity causing TPS to detect will allow the temperature-dependent bias signals to be
two transit pulse trains at essentially the same period, removed directly from the data. Moreover, the ther-
but at a phase other than 0.5. mal environment of the Kepler photometer is very sta-
To guard against background eclipsing binaries, a cen- ble and changes slowly during nominal operations. Thus
troid motion test is performed to determine whether the any residualthermaltwo-dimensionalbias effectswill be
centroids shifted during the transit sequence. If so, the small after the corrections are in place and can be co-
source right ascension and declination can be estimated trended out of the data by PDC like other thermally
by the measured in- versus out-of-transit centroid shift driven, instrumental effects.
normalized by the fractional change in brightness of the Given that the Moir´e pattern noise exhibits both high
system (i.e., the transit ”depth”; Batalha et al. 2010b; spatial frequencies and high temporal frequencies, the
Monet et al. 2010). prospect of reconstructing a high fidelity model of the
Additional tests include checking whether the transit effects at the pixel level with an accuracy sufficient to
signature is consistent in the target pixels, whether the correct the affected data appears unlikely. We are de-
transit signature is correlated with any ancillary engi- veloping algorithms that identify when these Moir´e pat-
neeringdataoranycollateraldata, andwhetherthedis- terns are present and mark the affected CCD regions as
tribution of cosmic ray events during transit is signifi- suspect on each affected LC. These suspect data flags
cantly different than that out-of-transit. will then be used to inform downstream modules so that
the affected data can be appropriately weighted, and so
7. FUTUREDEVELOPMENT that, for example, TPS can selectively ignore time inter-
Future development for the SOC includes implement- vals that are potentially contaminated with electronics-
ing difference image analysis photometry, completing inducedtransientswhensearchingfortransitsignatures.
DV, and developing and implementing mitigations for DV will produce a contamination report for each TCE
the instrument artifacts described in Caldwell et al. indicating the fraction and severity of the Moir´e pattern
(2010). These artifacts affect a small portion of the Ke- effect. The Pipeline will track and trend diagnostic met-
pler FOV at any one time. The development schedule rics reflecting the prevalence and severity of the Moir´e
calls for delivery of these features to allow us to discover pattern as a diagnostic of this aspect of the photometer
longperiodEarthsbylate2010andrecoveratleast90% performance.
oftheFOVin2011intimetocharacterizethefrequency In spite of the presence of these image artifacts, Ke-
of Earth-size transiting planets in the habitable zones of pler is already achieving photometric precision sufficient
solar-like stars. todetectEarth-sizeplanetstransitingsolar-likestarsfor
The instrumental artifacts consist of two cate- the majority of the FOV at any given time (Jenkins et
gories of phenomena: (1) temperature-dependent al. 2010). Weareconfidentthattheseeffortswillenable
two-dimensional bias image structure, and other ustominimizetheimpactoftheseartifactsonexoplanet
temperature-dependentelectronicseffects,and(2)Moir´e detection, andproducehigh-qualityphotometricandas-
patterns caused by an unstable circuit with an oper- trometrictimeseriesforotherscientificinvestigationsby
ational amplifier oscillating at ∼1.5 GHz. Normally the greater astronomical community.
this latter feature appears as a high-frequency oscilla-
tion on each readout row whose frequency changes with 8. CONCLUSIONS
row number as the readout electronics heat up during WehavepresentedanoverviewoftheKepler SOCsci-
readout. When this signal aliases to the sample rate of ence pipeline processing. The output products include
the CCD readout, a transient band appears in an af- raw and calibrated pixel time series, raw and systematic
fected channel and slowly rolls across the frame as the error-corrected flux time series, centroid time series for
temperature changes. The Moir´e pattern can interact each star, and associated uncertainties. These products
with bright, saturated star signals and generate scene- willpermitthedetectionandcharacterizationoftransit-
dependent effects. The typical amplitude of these image ing planets in the Kepler FOV as well as enabling astro-
artifacts is <1 ADU per pixel per read, comparable to physical investigations and serendipitous discoveries not
or smaller than the typical readout noise. It is impor- contemplated in Kepler’s driving design requirements.
tant to note that not all of these op amps are oscillating
and that the perturbations to the images are very small.
Funding for this Discovery Mission is provided by
Our mitigation plan consists of two approaches, one for
NASA’s Science Mission Directorate. We thank thou-
thetemperature-dependenteffects,andonefortheMoir´e
sands of people whose efforts made Kepler’s grand voy-
pattern effects, and most of the effort takes place prior
REFERENCES
Batalha,N.B.,etal.2010,ApJ,thisissue Borucki,W.J.,etal.2010,Science,inpress
Kepler Science Pipeline 5
Bryson,S.T.,etal.2010,ApJ,thisissue Koch,D.G.,etal.2010,ApJ,thisissue
Caldwell,D.A.,etal.2010,ApJ,thisissue Monet,D,etal.2010,ApJ,thisissue
Gautier,N.T.,etal.2010,ApJ,thisissue VanCleve,J.,D.A.Caldwell,2009,KeplerInstrument
Jenkins,J.M.,D.A.Caldwell,&W.J.Borucki2002,ApJ,564, Handbook,KSCI19033-001,(MoffettField,CA:NASAAmes
495 ResearchCenter)
Jenkins,J.M.2002,ApJ,575,493
Jenkins,J.M.,etal.2010,ApJ,thisissue
Haas,M.,etal.2010,ApJ,thisissue
6 Jenkins et al.
CAL PA
Raw Calibrated
Pixel Level Photometric
Data Pixels
Calibrations Analysis
TPS PDC
Corrected Raw Light
Transiting Presearch
Light Curves and
Planet Data
Curves Centroids
Search Conditioning
Threshold
DV
Crossing Diagnostics
Data Validation
Events
Fig. 1.—DataflowdiagramfortheSOCSciencePipeline.
Kepler Science Pipeline 7
Raw 2-D Black Dynamic 1-D
Correction Black
Pixels
(static) Correction
Identify LDE Gain &
Cosmic Rays Undershoot Nonlinearity
in Smear Correction Correction
Calibrated
Smear + Dark
Flat Field Pixels +
Current
Correction Uncertainties
Correction
Fig. 2.—DataflowdiagramfortheCalibrationPipelinemodule.
8 Jenkins et al.
1.14
A
Pointing
1.12 Tweak
1.1
Pixel Sensitivity Drop
1.08 Pointing
Tweak
1.06
Safe Mode
1.04
y 1.02
sit
n 1
e
nt
e I B
v
ati 1.06
el
R 1.05
Pixel Sensitivity Drop
1.04
1.03
1.02
1.01
1
0.99
5000 5010 5020 5030 5040 5050 5060 5070
Time, JD − 2450000
Fig. 3.— Raw and systematic error-corrected light curves for two different stars. The raw light curves exhibit step discontinuities due
to pointing offsets, thermal transients due to safe modes and pixel sensitivity changes, as well as pointing errors and focus changes, as
indicated in the figure. Panel A shows the raw flux time series (top curve, offset) and the PDC flux time series (bottom curve) of a
Kp = 15.6 dwarf star. Panel B shows the raw and PDC flux time series for a Kp = 14.9 dwarf star that displays less sensitivity to the
thermal transients. Both stars’ corrected flux time series are significantly improved by the detrending afforded by PDC while retaining
intrinsicstellarvariabilityontimescalesofseveralweeks.