Received 2009 November 14; accepted 2009 December 21 PreprinttypesetusingLATEXstyleemulateapjv.11/10/09 OVERVIEW OF THE KEPLER SCIENCE PROCESSING PIPELINE Jon M. Jenkins1, Douglas A. Caldwell1, Hema Chandrasekaran1, Joseph D. Twicken1, Stephen T. Bryson2, Elisa V. Quintana1, Bruce D. Clarke1, Jie Li1, Christopher Allen3, Peter Tenenbaum1, Hayley Wu1, Todd C. Klaus3, Christopher K. Middour3, Miles T. Cote3, Sean McCauliff3, Forrest R. Girouard3, Jay P. Gunter3, Bill Wohler3, Jeneen Sommers3, Jennifer R. Hall3, AKM K. Uddin3, Michael S. Wu5, Paresh A. Bhavsar2, Jeffrey Van Cleve1, David L. Pletcher2, Jessie A. Dotson2, Michael R. Haas2, Ronald L. Gilliland4, David G. Koch2, and William J. Borucki2 1SETIInstitute/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA 2NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA 3OrbitalSciencesCorporation/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA 4SpaceTelescopeScienceInstitute,Baltimore,MD21218,USAand 0 5BastionTechnologies/NASAAmesResearchCenter,M/S244-30,MoffettField,CA94035,USA 1 Received 2009 November 14; accepted 2009 December 21 0 2 ABSTRACT n The Kepler Mission Science Operations Center (SOC) performs several critical functions including a managing the ∼156,000 target stars, associated target tables, science data compression tables and J parameters, as well as processing the raw photometric data downlinked from the spacecraft each 1 month. The raw data are first calibrated at the pixel level to correct for bias, smear induced by a shutterlessreadout,andotherdetectorandelectroniceffects. Abackgroundskyfluxisestimatedfrom ] ∼4500 pixels on each of the 84 CCD readout channels, and simple aperture photometry is performed P onanoptimalapertureforeachstar. Ancillaryengineeringdataanddiagnosticinformationextracted E from the science data are used to remove systematic errors in the flux time series that are correlated . h with these data prior to searching for signatures of transiting planets with a wavelet-based, adaptive p matched filter. Stars with signatures exceeding 7.1σ are subjected to a suite of statistical tests - includinganexaminationofeachstar’scentroidmotiontorejectfalsepositivescausedbybackground o eclipsingbinaries. Physicalparametersforeachplanetarycandidatearefittedtothetransitsignature, r t and signatures of additional transiting planets are sought in the residual light curve. The pipeline is s operational, finding planetary signatures and providing robust eliminations of false positives. a [ Subject headings: techniques: photometric — methods: data analysis 1 v 1. INTRODUCTION 3. Report on the Kepler photometer’s health and 8 status semiweekly after each X-band contact and The Kepler Mission seeks to detect Earth-like planets 5 monthlyaftereachKa-bandsciencedatadownlink. transiting solar-like stars by performing photometric ob- 2 servations of ∼156,000 carefully selected target stars in 0 4. Monitor the pointing error and compute pointing Kepler’s 115 deg2 field of view (FOV), as reviewed in . tweaks when necessary to adjust the spacecraft 1 Borucki et al. (2010) and Koch et al. (2010). These pointing to ensure the validity of the uplinked sci- 0 Long Cadence (LC) targets are sampled every 29.4 min- ence target tables, 0 utes and include all the planetary targets for which we 1 seek signatures of transiting planets. In addition, a to- 5. Process the science data each month to obtain : v tal of 512 Short Cadence (SC) targets are sampled at calibrated pixels for all LC and SC targets, raw i 58.85 s intervals permitting further characterization of fluxtimeseries,andsystematicerror-correctedflux X the planet-star systems for the brighter (Kp <12) stars time series,. r via asteroseismology, and more precise transit timing. a The Kepler Mission Science Operations Center (SOC) 6. Archive calibrated pixels, raw and corrected flux at NASA Ames Research Center performs nine major timeseriesandcentroidmeasurementstotheData functions: Management Center (DMC). 1. Manage target aperture and definition tables spec- 7. Search each flux time series for signatures of tran- ifying which 5.4×106 of the 95×106 pixels in the siting planets,. CCD array are processed and stored on the Solid State Recorder for later downlink1. 8. Fit physical parameters and calculate error esti- mates for planetary signatures, and . 2. Manage the science data compression tables and parameters, including the length-limited Huffman 9. Performstatisticalteststorejectfalsepositivesand coding table, and the requantization table. establishaccuratestatisticalconfidenceineachde- tection. . [email protected] 1Kepler’s pointingstabilityrequirementis0”.009,3σ,allowing Kepler’s observations are organized into three month us to preselect the pixels of interest for each star (Haas et al. 2010). intervals called quarters defined by the roll maneuvers 2 Jenkins et al. the spacecraft executes about its boresight to keep the The Pipeline module CAL corrects the raw Kepler solar arrays pointed toward the Sun (Haas et al. 2010). photometricdataatthepixellevelpriortotheextraction Once each month, the accumulated science data are of photometry and astrometry. Several of the processing transmitted via the Deep Space Network2 (DSN) to the stepsgiveninFigure2arefamiliartoground-basedpho- MissionOperationsCenter3,whichforwardsthemtothe tometrists. However, a few are peculiar to Kepler due DMC4. The DMC packages them into FITS files and to the lack of a shutter and unique features in its analog pushes them to the SOC. A selected set of ancillary en- electronics chains. Details of these instrument charac- gineering data are also delivered with the science data, teristics and how they were determined and updated in containinganyparameterslikelytohaveabearingonthe flight are discussed in Caldwell et al. (2010) and are qualityofthesciencedata,suchastemperaturemeasure- comprehensively documented in the Kepler Instrument ments of the focal plane and readout electronics. Handbook (?). The Science Pipeline is divided into several compo- ThesequenceofprocessingstepsinCALthatproduce nents in order to allow for efficient management and calibrated pixels and associated uncertainties is as fol- parallel processing of the data, as shown in Figure 1. lows. (1) The two-dimensional black level (CCD bias Raw pixel data downlinked from the Kepler photometer voltage) structure (fixed pattern noise) is removed, fol- are calibrated by the Calibration module (CAL) to pro- lowedbyfittingandremovingadynamicestimateofthe duce calibrated target and background pixels and their blacklevel. (2)Gainandnonlinearitycorrectionsareap- associated uncertainties. The calibrated pixels are pro- plied. (3) The analog electronics chain exhibits memory, cessed by Photometric Analysis (PA) to fit and remove necessitating the application of a digital filter to remove skybackgroundandextractsimpleaperturephotometry this effect, called Local Detector Electronics (LDE) un- from the background-corrected, calibrated target pixels. dershoot. (4) Cosmic ray events in the black and smear PA also measures the centroid locations of each star on measurements are removed prior to subsequent correc- each frame. The final step to produce light curves hap- tions. (5) The smear signal caused by operating in the pens in Pre-search Data Conditioning (PDC) where sig- absence of a shutter and the dark current for each CCD natures in the light curves correlated with systematic readout channel are estimated from the masked and vir- error sources such as pointing drift, focus changes, and tual smear collateral data measurements. (6) A flat field thermal transients are removed. Output data products correction is applied. include raw and calibrated pixels, raw and systematic 3. PHOTOMETRICANALYSIS error-corrected flux time series, centroids and associated uncertainties for each target star, which are archived to Before photometry and astrometry can be extracted the DMC and eventually made available to the public fromthecalibratedpixeltimeseries,thePipelinedetects through the Multimission Archive at STScI.5 so-called ”Argabrightening” events in the background In Transiting Planet Search (TPS) a wavelet-based, pixel data. These mysterious transient increases in the adaptive matched filter is applied to identify transit-like background flux were identified early in Commissioning. featureswithdurationsintherangeof1–16hours. Light The current hypothesis is that these transient events are curveswithtransit-likefeatureswhosecombined(folded) duetosmalldustparticlesfromKepler achievingescape transit detection statistic exceeds 7.1σ for some trial pe- velocity after micrometeorite hits and reflecting sunlight riod and epoch are designated as Threshold Crossing into the barrel of the telescope as they drift across the Events(TCEs)andsubjectedtofurtherscrutinybyData FOV.Argabrighteningsthataffect10ormoreCCDread- Validation (DV). This threshold ensures that no more out channels occur ∼15 times per month, but the rate is than one false positive will occur due to random fluc- dropping over time. The most egregious of these events tuations over the course of the mission, assuming non- cannotbeperfectlycorrectedbythecurrentbackground white, non-stationary Gaussian observation noise (Jenk- correction. Wegapthedatawhentheexcessbackground ins, Caldwell & Borucki 2002; Jenkins 2002). DV per- flux exceeds the 100 Median Absolute Deviation (MAD) formsasuiteofstatisticalteststoevaluatetheconfidence level. in the detection, to reject false positives by background PA then robustly fits a two-dimensional surface to eclipsing binaries, and to extract physical parameters of ∼4500 background pixels on each channel to estimate eachsystem(alongwithassociateduncertaintiesandco- the sky background, which is evaluated at each target variance matrices) for each planet candidate. After the star pixel location and subtracted from the calibrated planetary signature has been fitted, it is removed from pixel values. Each target pixel time series is scanned for the light curve and the residual is subjected to a search cosmic rays by first detrending the time series with a for additional transiting planets. This process repeats moving median filter with a width of five cadences (time until no further TCEs are identified. The DV results steps)andexaminingtheresidualsforoutlierscompared and diagnostics are furnished to the Science Team to fa- totheMADoftheresidualsforeachpixel. Careistaken cilitate disposition by the Follow-up Observing Program not to remove clusters of outliers that might be due to (FOP; Gautier et al. 2010). astrophysicalsignaturessuchasflaresortransitsthatare intrinsic to the target star. 2. PIXELLEVELCALIBRATIONS The photocenters of the 200 brightest, unsaturated stars on each channel are fitted using the pixel response 2 The DSN is operated by the Jet Propulsion Laboratory for functions (PRF; Bryson et al. 2010) and then used to NASA. define the ensemble motion of the stars over the ob- 3 TheMOCislocatedatLASPinBoulder,CO,USA. servations. The aggregate star motion is used along 4 The DMC is located at the Space Telescope Science Institute (STScI)inBaltimore,MD,USA. with the PRFs reconstructed from Commissioning data 5 http://stdatu.stsci.edu/kepler/ to define the optimal aperture as the collection of pix- Kepler Science Pipeline 3 els that maximizes the mean signal-to-noise ratio of the and is sometimes followed by an exponential recovery flux measurement for each star (Bryson et al. 2010). over a few hours, but usually not to the same flux level The background-corrected, cosmic ray-corrected pixels as before. The typical drop in sensitivity is 1%, which are then summed over the optimal aperture to define a is unmistakable in the flux time series. Such step dis- flux estimate for each cadence frame. continuities are identified separately from those due to operational activities, such as safe modes and pointing 4. SYSTEMATICERRORCORRECTIONS tweaks, and are mended by raising the light curve after PDC’s task is to remove systematic errors from the thediscontinuityfortheremainderofthequarter. These rawfluxtimeseries. Theseincludepointingerrors,focus eventsdonotmimictransitssincetheydonotrecoverto changes, and thermal effects on instrument properties. thesamepre-eventfluxlevel,andfewtransits,ifany,are PDCco-trendseachfluxtimeseriesagainstancillaryen- affectedbythiscorrection. Thesecondissueisthatmany gineering data such as temperatures of the focal plane starsexhibitcoherentorslowlyevolvingoscillationsthat and electronics, reconstructed pointing and focus varia- interfere with systematic error removal. The approach tions to remove signatures correlated with these proxy taken is to identify and remove strong coherent compo- systematic error measurements. A Singular Value De- nents in the frequency domain prior to co-trending, and composition (SVD) is applied to the design matrix con- then to restore these components to the residuals after tainingtheancillarydatatoidentifythemostsignificant, co-trending. independent components and to stabilize the matrix in- Figure 3 shows theresults of runningtwo fluxtime se- versioninherentinthefittothedata. Additionally,PDC riesobtainedduringQuarter2throughPDConschedule identifiesresidualisolatedoutliers,andfillsintra-quarter for release early in 2010, demonstrating PDC’s effective- gapssothatthedataforeachquarterlysegmentarecon- ness. We expect that learning to deal with the various tiguouswhenpresentedtoTPS.Finally,PDCadjuststhe systematicerrorswillconsumeagreatdealofeffortover light curves to account for the excess flux in the optimal thelifetimeofthemissionaswepushthedetectionlimit apertures due to starfield crowding in order to make ap- to smaller and smaller planets. parent transit depths uniform from quarter to quarter as the stars move from detector to detector with each 5. TRANSITINGPLANETSEARCH roll maneuver. This is achieved by estimating the mean TPS searches for transiting planets by ”stitching” the excess flux in each photometric aperture from sources quarterly segments of data together to remove gaps and other than the target star itself from knowledge of the edge effects and then applies the wavelet-based, adap- PRFandbackgroundstarpositionsandmagnitudesand tive matched filter of Jenkins (2002). This approach is subtracting this value from each point in the time series atime-adaptiveapproachthatestimatesthepowerspec- (see Bryson et al. 2010). trum of the observational noise as a function of time. Significant effort has been applied to PDC in order to This approach was developed specifically for solar-like achieve good results with flight data. There are a num- starswithcolored,broadbandpowerspectra. Somemod- ber of phenomena that were significantly different than ifications to the original approach have been developed expected, including focus variations, and the amount of toaccommodatetargetstarsthatexhibitcoherentstruc- pointing drift observed during the first two quarters of ture in the frequency domain. Similar to the approach operation. The systematic errors observed in flight ex- adopted in PDC, we fit and remove strong harmonics hibitarangeofdifferenttimescales,fromafewhoursto that are inconsistent with transit signatures prior to ap- several days to many days and weeks. Such phenomena plying the wavelet-based filter. This significantly in- include the intermittent modulation of the focus by ∼1 creases thesensitivity ofthetransit search for such stars µmevery3.2hrbyaheaterononeofthereactionwheel andalsoprovidesphotometricprecisionestimates(asby- assemblies. One of the Fine Guidance Sensors’ guide products of the search) that are more realistic for such stars through the first quarter (Q1) of observations was targets. Ifthetransit-likesignatureofagiventargetstar aneclipsingbinarywhose30%eclipsesinduceda1mpix exceeds7.1σ thenaTCEisrecordedandsenttoDVfor pointing excursion lasting ∼8 hr every 1.7 days (Haas et additional scrutiny. al. 2010). By far the strongest systematic effects in the data so far have occurred after each of two safe mode 6. DATAVALIDATION events (Haas et al. 2010) during which the photometer DV performs a suite of tests to establish or break con- was shut off, the telescope cooled and the focus changed fidence in each TCE flagged by TPS, as well as to fit by ∼2.2 µm per ◦C. One of these occurred at the end of physical parameters to each transit-like signature. DV Q1 and the second ∼2 weeks into Q2. Thermal effects is currently under development and we anticipate its re- can be observed in the science data for ∼5 days after lease in early 2010 to support the next FOP observing eachsafemoderecovery. Thefactthatmostsystematics season. such as these affect all the science data simultaneously, The statistical confidence in the TCE is examined by and that there is a rich amount of ancillary engineering performing a bootstrap test (Jenkins 2002; Jenkins, dataandsciencediagnosticsavailableprovidessignificant Caldwell & Borucki 2002) to take into account non- leverage in dealing with these effects. Gaussianstatisticsoftheindividuallightcurves. Atran- Some systematic phenomena are specific to individual siting planet model is fitted to the transit signature as a starsandcannotbecorrectedbyco-trendingagainstan- joint noise characterization/parameter estimation prob- cillarydata. Thefirstissueistheoccasional,abruptdrop lem. That is, the observation noise is not assumed to inpixelsensitivitythatintroducesastepdiscontinuityin be white and its characteristics are estimated using the an affected star’s light curve (and associated centroids). wavelet-based approach employed in TPS, but as an es- Thisisoftenprecededimmediatelybyacosmicrayevent, timator, rather than as a detector. This process yields a 4 Jenkins et al. set of physical parameters and an associated covariance to pixel level calibrations. matrix. Beforelaunch,weaddedpixelstothetargettablethat To eliminate false positives due to eclipsing binaries, allow us to sample and trend the artifacts simultane- the planet model fit is performed again only to the even ously with the science data. The Kepler Science Office transits,andthenonlytotheoddtransits,andtheresult- andSOCaredevelopingandprototypingalgorithmsthat ing odd/even depths and epochs are compared in order use these image artifact pixels, together with other sci- to see if the results indicate the presence of secondary ence data, to reconstruct the underlying temperature- eclipses. After the multi-transiting planet search is com- dependent two-dimensional bias structure as a function plete, the periods are compared to detect eclipsing bina- of time over each quarter. The resulting dynamic model ries with significant eccentricity causing TPS to detect will allow the temperature-dependent bias signals to be two transit pulse trains at essentially the same period, removed directly from the data. Moreover, the ther- but at a phase other than 0.5. mal environment of the Kepler photometer is very sta- To guard against background eclipsing binaries, a cen- ble and changes slowly during nominal operations. Thus troid motion test is performed to determine whether the any residualthermaltwo-dimensionalbias effectswill be centroids shifted during the transit sequence. If so, the small after the corrections are in place and can be co- source right ascension and declination can be estimated trended out of the data by PDC like other thermally by the measured in- versus out-of-transit centroid shift driven, instrumental effects. normalized by the fractional change in brightness of the Given that the Moir´e pattern noise exhibits both high system (i.e., the transit ”depth”; Batalha et al. 2010b; spatial frequencies and high temporal frequencies, the Monet et al. 2010). prospect of reconstructing a high fidelity model of the Additional tests include checking whether the transit effects at the pixel level with an accuracy sufficient to signature is consistent in the target pixels, whether the correct the affected data appears unlikely. We are de- transit signature is correlated with any ancillary engi- veloping algorithms that identify when these Moir´e pat- neeringdataoranycollateraldata, andwhetherthedis- terns are present and mark the affected CCD regions as tribution of cosmic ray events during transit is signifi- suspect on each affected LC. These suspect data flags cantly different than that out-of-transit. will then be used to inform downstream modules so that the affected data can be appropriately weighted, and so 7. FUTUREDEVELOPMENT that, for example, TPS can selectively ignore time inter- Future development for the SOC includes implement- vals that are potentially contaminated with electronics- ing difference image analysis photometry, completing inducedtransientswhensearchingfortransitsignatures. DV, and developing and implementing mitigations for DV will produce a contamination report for each TCE the instrument artifacts described in Caldwell et al. indicating the fraction and severity of the Moir´e pattern (2010). These artifacts affect a small portion of the Ke- effect. The Pipeline will track and trend diagnostic met- pler FOV at any one time. The development schedule rics reflecting the prevalence and severity of the Moir´e calls for delivery of these features to allow us to discover pattern as a diagnostic of this aspect of the photometer longperiodEarthsbylate2010andrecoveratleast90% performance. oftheFOVin2011intimetocharacterizethefrequency In spite of the presence of these image artifacts, Ke- of Earth-size transiting planets in the habitable zones of pler is already achieving photometric precision sufficient solar-like stars. todetectEarth-sizeplanetstransitingsolar-likestarsfor The instrumental artifacts consist of two cate- the majority of the FOV at any given time (Jenkins et gories of phenomena: (1) temperature-dependent al. 2010). Weareconfidentthattheseeffortswillenable two-dimensional bias image structure, and other ustominimizetheimpactoftheseartifactsonexoplanet temperature-dependentelectronicseffects,and(2)Moir´e detection, andproducehigh-qualityphotometricandas- patterns caused by an unstable circuit with an oper- trometrictimeseriesforotherscientificinvestigationsby ational amplifier oscillating at ∼1.5 GHz. Normally the greater astronomical community. this latter feature appears as a high-frequency oscilla- tion on each readout row whose frequency changes with 8. CONCLUSIONS row number as the readout electronics heat up during WehavepresentedanoverviewoftheKepler SOCsci- readout. When this signal aliases to the sample rate of ence pipeline processing. The output products include the CCD readout, a transient band appears in an af- raw and calibrated pixel time series, raw and systematic fected channel and slowly rolls across the frame as the error-corrected flux time series, centroid time series for temperature changes. The Moir´e pattern can interact each star, and associated uncertainties. These products with bright, saturated star signals and generate scene- willpermitthedetectionandcharacterizationoftransit- dependent effects. The typical amplitude of these image ing planets in the Kepler FOV as well as enabling astro- artifacts is <1 ADU per pixel per read, comparable to physical investigations and serendipitous discoveries not or smaller than the typical readout noise. It is impor- contemplated in Kepler’s driving design requirements. tant to note that not all of these op amps are oscillating and that the perturbations to the images are very small. Funding for this Discovery Mission is provided by Our mitigation plan consists of two approaches, one for NASA’s Science Mission Directorate. We thank thou- thetemperature-dependenteffects,andonefortheMoir´e sands of people whose efforts made Kepler’s grand voy- pattern effects, and most of the effort takes place prior REFERENCES Batalha,N.B.,etal.2010,ApJ,thisissue Borucki,W.J.,etal.2010,Science,inpress Kepler Science Pipeline 5 Bryson,S.T.,etal.2010,ApJ,thisissue Koch,D.G.,etal.2010,ApJ,thisissue Caldwell,D.A.,etal.2010,ApJ,thisissue Monet,D,etal.2010,ApJ,thisissue Gautier,N.T.,etal.2010,ApJ,thisissue VanCleve,J.,D.A.Caldwell,2009,KeplerInstrument Jenkins,J.M.,D.A.Caldwell,&W.J.Borucki2002,ApJ,564, Handbook,KSCI19033-001,(MoffettField,CA:NASAAmes 495 ResearchCenter) Jenkins,J.M.2002,ApJ,575,493 Jenkins,J.M.,etal.2010,ApJ,thisissue Haas,M.,etal.2010,ApJ,thisissue 6 Jenkins et al. CAL PA Raw Calibrated Pixel Level Photometric Data Pixels Calibrations Analysis TPS PDC Corrected Raw Light Transiting Presearch Light Curves and Planet Data Curves Centroids Search Conditioning Threshold DV Crossing Diagnostics Data Validation Events Fig. 1.—DataflowdiagramfortheSOCSciencePipeline. Kepler Science Pipeline 7 Raw 2-D Black Dynamic 1-D Correction Black Pixels (static) Correction Identify LDE Gain & Cosmic Rays Undershoot Nonlinearity in Smear Correction Correction Calibrated Smear + Dark Flat Field Pixels + Current Correction Uncertainties Correction Fig. 2.—DataflowdiagramfortheCalibrationPipelinemodule. 8 Jenkins et al. 1.14 A Pointing 1.12 Tweak 1.1 Pixel Sensitivity Drop 1.08 Pointing Tweak 1.06 Safe Mode 1.04 y 1.02 sit n 1 e nt e I B v ati 1.06 el R 1.05 Pixel Sensitivity Drop 1.04 1.03 1.02 1.01 1 0.99 5000 5010 5020 5030 5040 5050 5060 5070 Time, JD − 2450000 Fig. 3.— Raw and systematic error-corrected light curves for two different stars. The raw light curves exhibit step discontinuities due to pointing offsets, thermal transients due to safe modes and pixel sensitivity changes, as well as pointing errors and focus changes, as indicated in the figure. Panel A shows the raw flux time series (top curve, offset) and the PDC flux time series (bottom curve) of a Kp = 15.6 dwarf star. Panel B shows the raw and PDC flux time series for a Kp = 14.9 dwarf star that displays less sensitivity to the thermal transients. Both stars’ corrected flux time series are significantly improved by the detrending afforded by PDC while retaining intrinsicstellarvariabilityontimescalesofseveralweeks.