Table Of ContentDraftversion January3,2017
PreprinttypesetusingLATEXstyleAASTeX6v.1.0
DEEP-HITS: ROTATION INVARIANT CONVOLUTIONAL NEURAL NETWORK FOR TRANSIENT
DETECTION
Guillermo Cabrera-Vives1,2,3,5, Ignacio Reyes4,1,5, Francisco Fo¨rster2,1, Pablo A. Est´evez4,1 and Juan-Carlos
Maureira2
Email: gcabrera@dim.uchile.cl
7 1
MillenniumInstitute ofAstrophysics,Chile
1
2
0 CenterforMathematical Modeling,UniversidaddeChile,Chile
2 3AURAObservatoryinChile
4
n DepartmentofElectricalEngineering,UniversidaddeChile,Chile
a 5Theseauthorscontributed equallytothiswork
J
2 ABSTRACT
We introduce Deep-HiTS, a rotation invariant convolutional neural network (CNN) model for clas-
]
M sifying images of transients candidates into artifacts or real sources for the High cadence Transient
I Survey (HiTS). CNNs have the advantage of learning the features automatically from the data while
.
h achieving high performance. We compare our CNN model against a feature engineering approach
p using random forests (RF). We show that our CNN significantly outperforms the RF model reducing
- the error by almost half. Furthermore, for a fixed number of approximately 2,000 allowed false tran-
o
r sient candidates per night we are able to reduce the miss-classified real transients by approximately
t
s 1/5. To the best of our knowledge, this is the first time CNNs have been used to detect astronomi-
a
cal transient events. Our approach will be very useful when processing images from next generation
[
instruments such as the Large Synoptic Survey Telescope (LSST). We have made all our code and
1
data available to the community for the sake of allowing further developments and comparisons at
v
https://github.com/guille-c/Deep-HiTSa.
8
5 Keywords: methods: data analysis — methods: statistical — techniques: image processing — super-
4
novae: general — surveys
0
0
.
1 1. INTRODUCTION butalsothetime domainopeningnewopportunitiesfor
0 understanding our universe.
As in many other fields, large scale survey tele-
7
The High cadence Transient Survey (HiTS,
1 scopes are already generating more data than hu-
F¨orster et al. 2016), is aimed at detecting and
: mans are able to process, posing the need for new
v following up optical transients with characteristic
data-processing tools. Some examples are the Sloan
i
X timescales from hours to days. The primary goal of
Digital Sky Survey (SDSS, York et al. 2000), the
r Panoramic Survey Telescope and Rapid Response Sys- HiTSistodetectsupernovae(SNe)duringtheirearliest
a
hours ofexplosion. HiTSuses the Dark EnergyCamera
tem (Pan-STARRS, Hodapp et al. 2004), the All-
(DECam, Flauguer et al. 2015) mounted at the prime
Sky Automated Survey for Supernovae (ASAS-SN,
focus of the Victor M. Blanco 4 m telescope on Cerro
Shappee et al. 2014), and the Dark Energy Survey
TololonearLaSerena,Chile. TheHiTSpipelinedetects
(DES,Dark Energy Survey Collaboration 2016)among
transients using difference images: a template image is
others. Furthermore, in the near future we expect
subtracted from the science image taken at the time of
to have the Large Synoptic Survey Telescope (LSST,
observation and point sources are located within this
LSST Science Collaboration 2009) operative, which
difference image. A custom made pipeline does the
will scan the whole southern sky every couple of days
astrometry, PSF matching, candidate extraction, and
producing terabytes of data per night. These instru-
candidate classificationinto realor faketransients. Our
ments allow us to explore not only the space domain,
previous approach uses random forests (RF, Breiman
2001) over a set of handmade features for candidate
a Deep-HiTS is licensed under the terms of the GNU General classification. Similar approaches have been used in
PublicLicensev3.0. the past by different groups including Romano et al.
2
(2006), Bloom et al. (2012), Brink et al. (2013), tuations of the background, inaccurate astrometry, and
and Goldstein et al. (2015). The feature engineering bad CCD columns. This produced 802,087 candidates
approachusuallyrequiresanimportantamountofwork which we labeled as negatives1. Notice these negatives
done by the scientist in order to create representative were produced by running the pipeline over real data,
features for the problem at hand. hence no simulation process is used for obtaining them.
An alternative approach is to learn the features from Positives were generated by selecting stamps of real
the data itself. Convolutional neural networks (CNNs, PSF-like sources and placing them at a different loca-
Fukushima 1980) are a type of artificial neural net- tion at the same epoch andin the same CCD they were
work (ANN, McCulloch & Pitts 1943) that have re- observed. By doing this we simulated positives using
cently gained interest in the machine learning commu- real point like sources, hence reproducing how tran-
nity. CNNs have achieved remarkable results in im- sients would look like. Point sources are obtained by
age processing challenges, outperforming previous ap- first selecting sextractor (Bertin & Arnouts 1996) iso-
proaches (e.g. Krizhevsky et al. 2012; Razavian et al. lated sources which are present in both frames with a
2014; Szegedy et al. 2014). Though CNNs have been consistent position and flux, and whose: size is within
applied to a variety of image processing problems, they 2 median absolute deviations from the median size of
have only recently caught the attention of the Astron- the sourcesofthe image,totalflux is positive, andmin-
omycommunitymainlythankstotheGalaxyZooChal- imum pixel value including sky emission is greater than
lenge. Dieleman et al. (2015) won the contest using a zero. We also filtered out those sources that show a
rotation invariant CNN approach. This approach was sextractor flux inconsistent with its optimal photome-
latter extended to data from the Hubble Space Tele- try flux (Naylor 1998), which we found to be a good
scope in Huertas-Company et al. (2015). discriminator between stars and galaxies. Positives are
In this paper we introduce Deep-HiTS, a framework then added scaledto mimic the SNR distributionof the
for transient detection based on rotational invariant candidates found in the image (fitted power law distri-
deep convolutional neural networks, and present the bution). Imageswiththesesimulatedtransientsarerun
advantages and improved results obtained using this throughthefirststepsoftheHiTSpipeline(astrometry,
framework. We start by explaining the HiTS differ- PSFmatchingandcandidateextraction)inordertoget
ence image data in Section 2. In Section 3 we explain stamps of positives. As explained above, non-simulated
the basics of ANN and CNN, as well as describing our detected candidates are taken as negatives. In order to
model architecture. We then describe our experiments keep a balanced data-set, we used the same number of
and show how the proposed CNN model significantly positivesasnegatives. Figure1showsexamplesofnega-
outperforms the feature engineering + RF approach in tiveandpositivecandidatesandtheirrespectivestamps.
Section4. Wefinallysummarisethisworkandconclude
in Section 5.
3. CONVOLUTIONAL NEURAL NETWORK
2. DATA MODEL
TheHiTSpipelinefindscandidatesbyusingthediffer- 3.1. Artificial neural networks
ence of a science and a template image and it produces
CNNsareatypeofartificialneuralnetwork(ANN,see
image stamps of 21×21 pixels for each candidate. In
Zhang 2000, and references therein). The basic unit of
this paper we use data from the 2013 run, in which we
anANNis calledneuron. Eachneuronreceivesavector
observed 40 fields in the u band every approximately 2
as input andperforms a dot product operationbetween
hoursduring 4 consecutivenights. Forthe classification
that vector and a set of parameters called weights. The
model we use four stamps: template image, science im-
resulting value plus a bias goes into a non-linear activa-
age, difference image, and signal-to-noise ratio (SNR)
tion function, usually a sigmoid. Neurons are grouped
difference image (the difference image divided by the
in layers,eachone withits ownweights. Inamultilayer
estimated local noise).
perceptron, layers are stacked one after the other. The
For the purpose of training the classification model, output of layer n, x , depends on the output of layer
n
we created a data-set of real non-transients (negatives n−1, xn−1, and is given by
hereafter)andsimulatedtransients(positiveshereafter).
Negatives were produced by running the first steps of
theHiTSpipelineincludingtheastrometry,PSFmatch- xn =f(Wn·xn−1+bn), (1)
ing,andcandidateextractionoverobservedimages. We
evaluate the proposed model on the next step, which is
1Weareawarethisdata-setcontainssomepointliketransients
candidate classification into non-transient or transient.
notpresentinthereferenceimage,butweconservativelyestimate
Negativesareartifacts causedmostlyby statisticalfluc- theywillbelessthana0.2%.
3
NEGATIVES POSITIVES
template science difference SNR image template science difference SNR image
5 2
5 9
5. 5.
= =
R R
N N
S S
template science difference SNR image template science difference SNR image
SNR = 9.27 SNR = 12.73
template science difference SNR image template science difference SNR image
2 5
0 0
3. 0.
4 5
= =
R R
N N
S S
Figure 1. Examples of artifacts (negatives) and simulated transients (positives) at different signal-to-noise ratios. From top to
bottom artifacts are caused by statistical fluctuationsof thebackground,inaccurate astrometry, and bad CCD column.
where Wn is the weight matrix, bn is the bias, and f stack of K arraysxk,n−1 (k =1,...,K), the outputs as
is the activation function. These layers are called fully- L arrays x (l = 1,...,L) and the filters as matrices
l,n
connected layers. W . The output of layer n is then obtained as
k,l,n
ANNs are usually trained using an algorithm called
K
error backpropagation, which adjust the weights of the xn,l =f Wk,l,n∗xk,n−1+bl,n , (3)
network in an iterative way proportional to the gradi- !
k=1
X
ent descent of the error with respect to the weights. In
where ∗ is the convolution operator, and b are the
l,n
this work neural networks are trained using stochastic
biases of layer n. In order to add some flexibility on
gradient descent (SGD, LeCun et al. 1998), where the
the output size of a layer,zeros can be appended in the
gradient of the error with respect to the weights is esti-
input layer’s borders. This is called zero-padding.
matedateachiterationusingasmallpartofthedata-set
Pooling layers return a subsampled version of the in-
called mini-batch. That means that for every iteration
put data. In the pooling layer the data is divided into
the gradient is estimated as the average instantaneous
small windows and a single representative value is cho-
gradients for a small groupof samples (usually between
sen for each window, e.g. the maximum (max-pooling)
10and500),whichprovidesagoodtrade-offintermsof
or the average (mean-pooling). A common design for
gradient estimation stability and computing time.
CNNs consist on a mix of convolutional and pooling
The learning rule for SGD is given by
layers at a first stage followed by dense fully-connected
θ =θ −η∇ L, (2) layers.
t+1 t θ
Though sigmoids are usually chosen as activation
where θ represents a parameter of the ANN (such as
functions, they are not the only choice. Rectified lin-
weights or biases), L is the loss function (e.g. mean
ear units (ReLU, Nair & Hinton 2010) are activation
squared error) and η is the learning rate. The learn-
functions that have gained popularity in the last years
ing rate controls the size of the steps that SGD makes
because they can achieve fast training and good perfor-
in each iteration. Reducing the learning rate during
mance. TheoutputofaReLUisthemaximumbetween
training allows having a good compromise between ex-
the input and zero
ploration at the beginning and exploitation (local fine
tuning) in the final part of the process. f(x)=max(0,x). (4)
LeakyReLUsarevariantswherethenegativeinputsare
3.2. Convolutional neural networks
notsettozerobutinsteadtheyaremultipliedbyasmall
Convolutional neural networks (CNNs) usually con- number (e.g. 0.01) to preserve gradient propagation,
tain two types of layers: convolutional and pooling lay-
ers. Convolutional layers perform a convolution opera- x if x>0,
f(x)= (5)
tion between the input of the layer and a set of weights
0.01x otherwise.
called kernel or filter. We extend Equation 1 by con-
sidering the input of the n convolutional layer to be a In order to reduce overfitting, a usual technique is
4
dropout (Srivastava et al. 2014),whichconsistsinturn- We used the candidates data-set described in Sec-
ingoffrandomneuronsduringtrainingtimewithaprob- tion 2 to assess the performance of the proposed CNN
abilityp,usuallychosenas0.5. Thegoalofdropoutisto model and compare it againstour previous random for-
prevent coadaptation of the outputs, hence neurons do est model. We split the data into 1,220,000 candi-
not depend on other neurons being present in the net- datesfortraining,100,000candidatesforvalidation,and
work. In order to preserve the scale of the total input, 100,000 for testing. The CNN model is trained using
the remaining neurons need to be rescaledby 1/(1−p). SGD with mini-batches of 50 examples from the train-
Dropout is activated only during training time. When ing set and 0.5 dropout. Learning rate was reduced
evaluating models, all neurons become active. to half every 100,000 iterations, starting with an ini-
tial value of 0.04. The validation data-set was used
3.3. Deep-HiTS architecture
for measuring the performance of the model at train-
Preliminaryresultswerepresentedinaconferencepa- ing time anddecidewhento stoptraining. We assumed
per by Cabrera et al. (2016) where a very basic CNN the model converged when after feeding 100,000 can-
model was proposed achieving a performance compara- didates the zero-one loss (fraction of misclassifications)
ble to the RF model. Here we extend these results by doesnotgolowerthana99%ofthe previousloss. Once
using a much deeper architecture and introducing rota- the model is trained, we use the test set (data unseen
tional invariance as well as leaky ReLUs and dropout, by the model) to calculate performance metrics. Figure
significantly outperforming the RF model. Following 3 shows the zero-one loss of the model as a function of
Dieleman et al. (2015)weintroducerotationinvariance the number of iterations for the training and validation
by applying the convolutional filters to various rotated sets, and for the test set after training. Due to insuffi-
versions of the image, thus exploiting rotational sym- cient memory to store the whole training set, the train-
metry present in transients images. Figure 2 shows the ingsetcurveiscalculatedoverthelast10,000candidates
proposed rotation invariant convolutional neural net- used for training at each iteration, while the validation
work architecture. The template, science, difference, curve shows the loss for the whole 100,000 candidates
andSNRdifferencestampsarepresentedtothenetwork in the validation set. We programmed our code using
asastackedarrayofsize 21×21×4. The convolutional Theano(Theano Development Team 2016)andtrained
architecture composed by the convolutional and pool- ourmodelonateslaK20graphicsprocessorunit(GPU).
ing layers yields 2,304 features. Rotation invariance is The model converged after 455,000 iterations and took
incorporatedby presenting every stamp to the convolu- roughly 37 hours to train achieving an error of 0.525%
tional architecture rotated four times, hence producing overthevalidationdata-setand0.531%overthetestset
2,304×4features. These9,216featuresarefedtothree for this particular experiment.
fully connected layers that return the probabilities of We compare the performance of the proposed CNN
being an artefact and a real transient. model against our previous RF model (implemented
After trying more than 50 different architectures and using scikit-learn Pedregosa et al. 2011). As stated
training strategies2 the best performance was obtained above, we consider transients to be positives (P) and
by the architecture shown in Figure 2. Our convolu- non-transients to be negatives (N). We use the follow-
tionalarchitectureiscomposedoftwoconvolutionallay- ing metrics to assess the performance of our models:
erswithfiltersofsize4×4and3×3respectively,a2×2
TP +TN
max-poolinglayer,three3×3convolutionallayersanda accuracy= , (6)
TP +FN +TN +FP
2×2max-poolinglayer. Zero-paddingwasusedinorder
TP
tosetthesizesofeachlayertotheonesshowninFigure precision= , (7)
TP +FP
2(24×24forthe firsttwoand12×12forthe lastfour).
TP
All convolutional layers use ReLUs. The concatenated recall= , (8)
TP +FN
outputofallrotationsarefedtothreestackedfullycon-
precision·recall 2TP
nected layers: two ReLU layers of 64 units each, and a f1-score=2 = ,
precision+recall 2TP +FP +FN
final logistic regressionlayer that outputs two probabil-
(9)
ities: fake and real transient.
where TP stands for true positives (number of posi-
4. EXPERIMENTS
tives correctly classified), TN stands for true negatives
(numberofnegativescorrectlyclassified),FP standsfor
false positives (number of negatives incorrectly classi-
2 Mostimportantimprovementswereachievedbyaddingmore fied aspositives by the model), andFN stands forfalse
layers, number of parameters per layer, different activation func-
negatives (number of positives incorrectly classified as
tions, different training strategies (learning rates, dropout), and
addingrotationinvariance. negatives by the model). In order to make a fair com-
5
Figure 2. Architecture of the proposed rotational invariant CNN model. Blue boxes represent convolutional layers and red
boxes represent max-pooling layers. The stacked images of each candidate are presented to the network rotated four times in
ordertoexploitrotationalsymmetry. Thecalculatedfeaturemapsofeachrotationareflattenedandstackedcreatinga2,304×4
vectorthat is fed to a fully connected network whose outputsare the probabilities of being an artefact and a real transient.
parison, we trained and tested the RF model with the
0.05
same number of candidates (1,220,000 for training and
Training set
Validation set 100,000 for testing). For both CNN and RF models,
0.04 Test set
six train-validation-testsplits were doneusing stratified
shuffle split (i.e. maintaining positives and negatives
0.03
balancedforallsets). Table3showsthemeanaccuracy,
or
err precision,recall,andf1-scoreobtainedwith the RF and
0.02
CNN models and their respective standard deviations.
All metrics show that the CNN model outperforms the
0.01
RF model. To assess the statisticalsignificance of these
results, weperformeda Welch’s hypothesistest (Welch
0.00 0 100000 200000 300000 400000 1947) obtaining a probability of less than a 10−7 that
iteration
these results were obtained by chance.
Figure 3. Learning curve (evolution of the error during the
training process). The training set error is calculated over
the last 10,000 candidates that were used for training. The
testset erroriscalculated at theendoftraining. Validation
error does not raise at the end of training, indicating that
themodel is not overfitted.
6
Table 1. Comparison of RF and CNN models. CNN shows consistent better results than RF in
terms of accuracy, precision, recall and f1-score. Results are statistically significant based on the
calculated Welch’s T-test p-values, being all of them lower than 10−7. Accuracy results show that
theerror of CNN model is almost half the error obtained with the RFmodel.
model accuracy precision recall f1-score
RF 98.96±0.03% 98.74±0.06% 99.18±0.02% 98.96±0.03%
CNN 99.45±0.03% 99.30±0.07% 99.59±0.07% 99.45±0.03%
Welch’s T-test p-value 4.7×10−11 2.8×10−08 9.7×10−08 4.5×10−11
Figure 4 shows the detection error tradeoff (DET) using a CNN model trained over 3×104.
curves of the RF and CNN models. The DET curve In order to test our framework over real data, we ap-
plots the false negativerate (FNR) versusthe false pos- plied the RF and CNN models over SNe found in the
itive rate (FPR) as we vary the probability threshold 2014 and 2015 campaigns. For our 2013 campaign we
used to determine positives and negatives, where observed in the u band filter, while in the 2014 and
2015 campaigns we observed in the g band filter. We
FP FN used stamps from SNe at all epochs with a SNR over5.
FPR= , FNR= . (10)
N P DuringobservationtimeHiTSfound47SNeinthe2014
campaign and 110 in the 2015 campaign, giving a total
Figure 4a shows the overallDET curvesfor the RF and
of 1166 candidates (each SN has candidates at different
CNN models, while Figure 4b shows the DET curves
epochs). Using the user-defined FPR of 10−2 described
for different SNR ranges. A better model would obtain
above, the RF model correctly classified 866 of these
smaller FPR and FNR, hence moving the curve to the
positive candidates (FNR = 0.257), while Deep-HiTS
bottom left of the plot. It can be seen again that the
correctly classified 921 of them (FNR = 0.210). Notice
proposedCNN model achievesbetter performance than
that though we trained our models on u band images,
ourpreviousRFmodel. TheDETcurveisalsousefulfor
we are still able to classify candidates in the g band fil-
evaluatingthetrade-offbetweenfalsepositivesandfalse
ter. Figure 6 shows the number of positive candidates
negatives. Inpractice,whenobservinginrealtime,most
incorrectlyclassifiedfortheSNepresentinthe2014and
candidates will be negatives. All negatives on our data-
2015 campaigns for the RF and CNN models in terms
set are real candidates taken from the HiTS pipeline:
of the SNR. It can be seen that for a SNR below 8 the
802,087 obtained during 4 consecutive nights, i.e. ap-
CNN model outperforms the RF model. These are the
proximately 200,000 negatives per night. The number
mostimportantcandidateswhendetectingtransientsin
ofestimatedrealtransientsis considerablysmallerthan
real time, as we hope to detect them at an early stage.
that. As an example, consider as user-defined opera-
ForaSNRover8the RFmodelbetterrecoversthetrue
tion point getting 2,000 false positives per night (FPR
positives, which suggests the use of a hybrid model be-
∼ 10−2). Figure 4 tells us that by using the proposed
tween RF and CNN. We will expand on this idea in
CNN we will get a FNR ∼2×10−3 which is much bet-
future work.
ter in comparisonto a FNR ∼10−2 when using the RF
As we tested our model on real data from a different
model. This means 5 times less real transients would
HiTS campaign, and also using a band filter different
be missed by using the CNN model. On Figure 4b we
from the one used for training, we did not achieve the
canseethatthisimprovementismostlyachievedbythe
low FNR shown in Figure 4. However our CNN model
proposed CNN being able to correctly classify low SNR
isabletorecovermoretruepositivesfromimagesinthe
sources.
g-band filter than the RF-based model given the user-
InFigure5weexploretheperformanceoftheRFand
defined operation point described above. We currently
CNNmodelsintermsofthesizeofthetrainingset. For
do not have a robust way of measuring the false pos-
this purpose we saved 100,000candidates for validation
itives automatically in the 2014 and 2015 data-sets as
and trained each model with training sets of different
the new observationalsetup yields a significantly larger
sizes. It can be seen that the CNN model outperforms
population of unknown asteroids which would need to
the RF model independently of the size of the training
be removed from the set of negatives. Further research
set used. Furthermore, by using a RF model trained
is needed in order to assess the best way of transferring
over 106 candidates we obtain similar performance as
7
100 100
10-1 10-1
10-2 10-2
R R
N N
F F
10-3 10-3
RF, SNR < 7, N = 16917
CNN, SNR < 7, N = 16917
RF, 7 SNR <15, N = 55772
10-4 10-4 ≤
CNN, 7 SNR <15, N = 55772
RF ≤
RF, SNR 15, N = 27311
CNN CNN, SNR≥ 15, N = 27311
10-5 10-5 ≥
10-5 10-4 10-3 10-2 10-1 10-5 10-4 10-3 10-2 10-1
FPR FPR
(a) (b)
Figure 4. Detectionerrortradeoff(DET)curvecomparingaRandomForestmodelagainsttheproposedCNN.DETcurveplots
FalseNegativeRate(FNR)versusFalsePositiveRate(FPR).Acurveclosertothelowerbottomindicatesbetterperformance.
(a)OverallDETcurves. (b)DETcurvesforsubsetsof candidatesbinnedaccording totheirsignal-to-noise ratio (SNR),where
N indicates the number of candidates per bin. The proposed CNN model achieves a better performance than the RF model
for a FPR between 10−4 and 10−1, where the HiTS pipeline operates. The biggest improvement occurs for the lowest SNR
candidates.
0.07 SNe in the 2014 and 2015 campaigns
180
Random Forest
0.06 CNN 160 CNN false negatives
RF false negatives
0.05 140
all positives
120
0.04
Error0.03 100
80
0.02 60
0.01 40
20
0.00
102 103 104 105 106
0
Training set size 0 10 20 30 40 50
SNR
Figure 5. Error overa validation data-set of 100,000 candi- Figure 6. ComparisonofRFandCNNmodelsoverrealSNe
datesintermsofthesizeofthetrainingsetfortheRandom from the2014 and 2015 campaigns.
Forest and Convolutional Neural Network models.
advantage of learning suitable features for the problem
our CNN model between different data-sets in terms of automatically from the data. The proposed classifica-
recovering positives and negatives. tion model was able to improve the overall accuracy of
the HiTSpipeline from98.96±0.03%to99.45±0.03%.
5. CONCLUSIONS Inpractice,byusingDeep-HiTSwecanreducethenum-
ber of missed transients to approximately 1/5 when ac-
We introduced Deep-HiTS, a rotation invariant deep
cepting around 2,000 transients per night. The use
convolutionalneuralnetwork (CNN) for detecting tran-
of deep learning models over next generation surveys,
sients in survey images. Deep-HiTS discriminates be-
such as the LSST may have a great impact on de-
tween real and fake transients in images from the High
tecting and classifying the unknown unknowns of our
cadenceTransientSurvey(HiTS),asurveythataimsto
universe. Deep-HiTS is open source and available at
findtransientswithcharacteristictimescalesfromhours
GitHub (https://github.com/guille-c/Deep-HiTS),
to days using the Dark Energy Camera (DECam). We
andtheversionusedinthispaperisarchivedonZenodo
compared the proposed CNN model against our previ-
(https://doi.org/10.5281/zenodo.190760).
ous approach based on a random forest (RF) classifier
trained over features designed by hand. Deep-HiTS not
only outperforms the RF approach, but it also has the We gratefully acknowledge financial support from
8
CONICYT-ChilethroughitsFONDECYTpostdoctoral Mitchell Institute for Fundamental PhysicsandAstron-
grant number 3160747; CONICYT-PCHA through its omyatTexasA&MUniversity,FinanciadoradeEstudos
national M.Sc. scholarship 2016 number 22162464; eProjetos,FundaoCarlosChagasFilhodeAmparo,Fi-
CONICYT through the Fondecyt Initiation into Re- nanciadora de Estudos e Projetos, Fundao Carlos Cha-
search project No. 11130228;CONICYT-Chile through gas Filho de Amparo Pesquisa do Estado do Rio de
grant Fondecyt 1140816; CONICYT-Chile and NSF Janeiro, Conselho Nacional de Desenvolvimento Cient-
through the Programme of International Cooperation ficoeTecnolgicoandtheMinistriodaCincia,Tecnologia
project DPI201400090; CONICYT through the infras- e Inovao,the Deutsche Forschungsgemeinschaftand the
tructure Quimal project No. 140003; Basal Project Collaborating Institutions in the Dark Energy Survey.
PFB–03; the Ministry of Economy, Development, and The Collaborating Institutions are Argonne National
Tourism’s Millennium Science Initiative through grant Laboratory, the University of California at Santa Cruz,
IC120009, awarded to The Millennium Institute of As- the University of Cambridge, Centro de Investigaciones
trophysics(MAS). Powered@NLHPC:this researchwas Enrgeticas,MedioambientalesyTecnolgicasMadrid,the
partially supported by the supercomputing infrastruc- University of Chicago, University College London, the
ture ofthe NLHPC (ECM-02). We acknowledgethe us- DES-Brazil Consortium, the University of Edinburgh,
ageoftheBelkaGPUcluster(FondequipEQM140101). the Eidgenssische Technische Hochschule (ETH) Zrich,
This project used data obtained with the Dark En- Fermi National Accelerator Laboratory, the University
ergy Camera (DECam), which was constructed by the ofIllinoisatUrbana-Champaign,theInstitutdeCincies
Dark Energy Survey (DES) collaboration. Funding for de l’Espai (IEEC/CSIC), the Institut de Fsica d’Altes
the DES Projects has been provided by the U.S. De- Energies, Lawrence Berkeley National Laboratory, the
partment of Energy, the U.S. National Science Founda- Ludwig-Maximilians Universitt Mnchen and the asso-
tion, the Ministry of Science and Education of Spain, ciated Excellence Cluster Universe, the University of
the Science and Technology Facilities Council of the Michigan, the National Optical Astronomy Observa-
United Kingdom, the Higher Education Funding Coun- tory,the UniversityofNottingham,theOhioStateUni-
cil for England, the National Center for Supercomput- versity, the University of Pennsylvania, the University
ing Applications at the University ofIllinois atUrbana- of Portsmouth,SLAC NationalAcceleratorLaboratory,
Champaign,theKavliInstituteofCosmologicalPhysics StanfordUniversity,theUniversityofSussex,andTexas
atthe UniversityofChicago,Center forCosmologyand A&M University. All plots were created using mat-
Astro-ParticlePhysicsatthe OhioState University,the plotlib (Hunter 2007).
REFERENCES
Bertin,E.,&Arnouts,S.1996,A&AS,117,393 LeCun,Y.,Bottou, L.,Orr,G.,Mller,K.1998, inNeural
Bloom,J.S.,Richards,J.W.,Nugent,P.E.etal.2012,PASP, Networks:TricksoftheTrade,ed:Montavon, G.,Orr,G.,,
124,921,1175 Mller,K.(Springer),9
Breiman,L.2001,MachineLearning,45,1,532 LSSTScienceCollaboration,Abell,P.A.,Allison,J.,etal.2009,
Brink,H.,Richards,J.W.,Poznanski,D.2013,MNRAS,432,2, arXiv:0912.0201
1047 Krizhevsky,A.,Sutskever,I.&Hinton,G.E.2012,in
Cabrera-Vives,G.,Reyes,I.,Fo¨rster,F.etal.2016, in Proceedings oftheNeuralInformationProcessingSystems
Proceedings ofthe2016International JointConferenceon Conference,Imagenet classificationwithdeepconvolutional
NeuralNetworks(IJCNN2016), SupernovaeDetection by neuralnetworks(LakeTahoe,NV,USA),1097
UsingConvolutional NeuralNetworks(Vancouver, Canada), McCulloch,W.&Pitts,W.1943,TheBulletinofMathematical
251 Biophysics,5,4,115
DarkEnergySurveyCollaboration,Abbott, T.,Abdalla,F.B., NairV.,HintonG.E.,2010,inProceedingsofthe27th
etal.2016,MNRAS,460,1270 International ConferenceonMachineLearning(ICML-10),
Dieleman,S.,Willett,K.W.,&Dambre,J.2015,MNRAS,450, Rectifiedlinearunitsimproverestrictedboltzmannmachines
1441 (Haifa,Israel),807
Flaugher,B.,Diehl,H.T.,Honscheid,K.,etal.2015,AJ,150, Naylor,T.1998,MNRAS,296,339
150 Pedregosa, F.,Varoquaux,G.,Gramfort,etal.2011,Journalof
Fo¨rster,F.,Maureira,J.C.,SanMart´ın,J.etal.2016,ApJ,832, MachineLearningResearch,12,2825
155 Razavian, A.S.,Azizpour,H.,Sullivan,J.etal.2014,in
Fukushima,K.1980, BiologicalCybernetics,36,4,193 Proceedings oftheIEEEConferenceonComputerVisionand
Goldstein,D.A.,DA´ndrea,C.B.,Fischer,J.A.etal.2015, AJ, PatternRecognitionWorkshops(CVPRW), CNNfeatures
150,82 off-the-shelf:anastoundingbaselineforrecognition
Hodapp,K.W.,Kaiser,N.,Aussel,H.,etal.2004, (Columbus,OH,USA),512
AstronomischeNachrichten,325,636 R.Romano,C.R.Aragon,andC.Ding,2006,inProceedings of
Huertas-Company,M.,Gravet, R.,Cabrera-Vives,G.etal.2015, the5thInternational ConferenceonMachineLearningand
ApJS,221,1,8 Applications,Supernovarecognitionusingsupportvector
Hunter,J.D.2007,ComputingInScience&Engineering,9,3,90 machines(Orlando,FL,USA),77
9
ShappeeB.J.etal.,2014,ApJ,788,48 Welch, B.L.1947,Biometrika,34,1/2,28
SrivastavaN.,HintonG.,KrizhevskyA.,Sutskever I., D.G.YorkandSDSSCollaboration,2000, AJ,120,15791587
Salakhutdinov R.,2014,JournalofMachineLearning Zhang, G.P.2000, Systems,Man,andCybernetics,PartC:
Research,15,1929 ApplicationsandReviews,IEEETransactionson,30,4,
Szegedy, C.Liu,W.,Jia,Y.etal.2014,arXiv::1409.4842 451462
TheanoDevelopmentTeam,2016,arXiv:1605.02688