ebook img

Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition PDF

0.3 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition

1 Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition Qiang Guo∗, Dan Tu, Guohui Li and Jun Lei Department of Information System and Management National University of Defense Technology Email: {guoqiang05∗, tudan, guohuili, junlei}@nudt.edu.cn Abstract—Text recognition in natural scene is a challenging 6 problem due to the many factors affecting text appearance. 1 In this paper, we presents a method that directly transcribes 0 scene text images to text without needing of sophisticated 2 character segmentation. We leverage recent advances of deep n neural networks to model the appearance of scene text images a withtemporaldynamics.Specifically,weintegratesconvolutional J neural network (CNN) and recurrent neural network (RNN) which is motivated by observing the complementary modeling 6 capabilitiesofthetwomodels.Themaincontributionofthiswork is investigating how temporal memory helps in an segmentation ] V free fashion for this specific problem. By using long short-term memory (LSTM) blocks as hidden units, our model can retain C long-term memory compared with HMMs which only maintain . short-termstatedependences.WeconductexperimentsonStreet s c View House Number dataset containing highly variable number [ images. The results demonstrate the superiority of the proposed method over traditional HMM based methods. 1 v 0 I. INTRODUCTION 0 Text recognition in natural scene is an important problem 1 1 in computer vision. However, due to the enormous appear- 0 ance variations in natural images, e.g. different fonts, scales, . rotations, illumination conditions, it is still quite challenging. 1 0 Identifyingthepositionofacharacterandrecognizingitare 6 two interdependent problems. Straight-forward methods treat Fig.1. ThewholearchitectureofCRNN. 1 thetaskasseparatecharactersegmentationandrecognition[1], : v [2]. This paradigm is fragile in unconstrained natural images i for it’s difficult to deal with low resolution, low contrast, the observation model. The model generally performs better X blurring, large diversity of text fonts and highly complicated than the GMM-HMM model thanks to the strong represen- r a background clutters. tation capacity of CNN, however still doesn’t eliminate the Due to the shortcoming of these methods, algorithms com- issues with HMM. bining segmentation and recognition were proposed. GMM- Inthiswork,weaddresstheissuesofHMMswhilekeeping HMMs are the mostly used models, especially in speech and thealgorithmfreeofsegmentation.Thenovaltyofourmethod handwriting communities[3], [4], [5], [6]. is using Recurrent Neural Network (RNN), which has the Inthisparadigm,scenetextimagesaretransformedtoframe ability of adaptively retaining long-term dynamic memory, as sequences by sliding window. GMMs are used for modeling the sequence model. We combine CNN with RNN to utilize frameappearanceandHMMsareusedtoinferthetargetlabels their representation abilities on different aspects. of the whole sequence.[7] RNN is a powerful connectionist model for sequences. The merit of this method is avoiding the need of fragile Comparing with static feed-forward networks, it introduces character segmentation. However HMMs have several obvi- recurrent connections enabling the network to maintain an ous shortcomings, e.g. lacking of long context consideration, internal state. It doesn’t make any hypothesis on the in- improper independent hypothesis etc. dependence of inputs, so each hidden unit can take into Asthedevelopmentsofdeepneuralnetworks(DNNs)flour- account more input information. Specifically, LSTM memory ishing, convolutional neural networks (CNNs) have been used blocks are used enabling RNN to retain longer range of inter- toformthehybridCNN-HMMmodel[7],replacingGMMsas dependences of input frames. Another virtue of using RNN 2 as the sequence model is the ease of build an end-to-end III. PROBLEMFORMULATION trainable system directly trained on the whole image without We formulate scene text recognition as a sequence labeling needing explicit segmentation. The main weakness of RNN is problem by treating scene text as frame sequences. The label its feature extraction capability. sequence is drawn from a fixed alphabet L. The length of the To alleviate the shortcomings of both HMM and RNN, we labelsequenceisnotrestrictedtobeequaltothatoftheframe proposeanovelend-to-endsequencerecognitionmodelnamed sequence. ConvolutionalRecurrentNeuralNetwork(CRNN).Themodel Each scene text image is treated as a sequence of frames iscomposedwithhierarchicalconvolutionalfeatureextraction denoted by x=(x ,x ,··· ,x ). The target sequence is z = layers and recurrent sequence modeling layers. 1 2 T (z ,z ,··· ,z ). We use bold typeface to denote sequences. 1 2 U CNN is good at appearance modeling and RNNs have We constrain that |z| = U ≤ |x| = T. The input space strong capacity for modeling sequences. The model is trained X = (RM)∗ is the set of all sequences of M real valued with Connectionist Temporal Classification (CTC)[8] object vectors. The target space Z = L∗ is the set of all sequences whichenablesthemodeldirectlylearnedfromimageswithout over the alphabet L of labels. We refer z ∈L∗ as a labeling. segmentation information. LetS beasetoftrainingsamplesdrawnindependentlyfrom Ourideaismotivatedbyobservingthecomplementarymod- afixeddistributionD composedofsequencepairs(x,z). elingcapacitiesofCNNandRNN,andinspiredbyrecentsuc- X×Z The task is to use S to train a sequence labeling algorithm cess applications of LSTM architectures to various sequential f : X (cid:55)→ Z to label the sequences in a test set S(cid:48) ∈ D problems,suchashandwriting[9],andspeechrecognition[10], X×Z as accurately as possible given the error criterion label error image description[11], [12]. The whole architecture of our rate Elab: model is illustrated in Figure 1. 1 (cid:88) Elab(h,S(cid:48))= ED(h(x),z) (1) Z II. RELATEDWORK (x,z)∈S(cid:48) In this section, we briefly survey methods that focus on where ED(p,q) is the edit distance between two sequences sequence modeling without segmentation. p and q. Theparadigmsofscenetextrecognitionalgorithmsaresim- ilarwithhandwritingrecognition.Straightforwardmethods[1], IV. METHOD [2] are composed of two separated parts. A segmentation algorithm followed by a classifier to determine the category A. The proposed model or each segment. Often, the classification results are post- The network architecture of our CRNN model is shown processed to form the final results. in Figure 1. The model is mainly composed with two parts, To eliminate the need of explicit character segmentation, a deep convolutional network for feature extraction and a many researchers use GMM-HMM for text recognition[13], bidirectional recurrent network for sequence modeling. [14]. GMM-HMM is a classical model widely used by the An scene text image is transformed into a sequence of speech community. HMMs make it possible to model se- frames which are fed into the CNN model to extract feature quenceswithoutthenecessityofsegmentation.However,there vectors. are many shortcomings of GMM-HMM, which make it not The CNN model only map one frame feature to its corre- widelyusedinscenetextrecognition.Firstly,GMMisaweak sponding output vector. The sequence of feature vectors are model for modeling characters in natural scene. Secondly, thenusedastheinputofRNNwhichtakesthewholesequence HMM has many limitations that are addressed in section I. history into consideration. To strengthen the representation capability of HMM based The recurrent connections allow the network to retain pre- model, CNN is then used to replace GMM as the observation vious inputs as memory in the internal states and discovery model which forms the hybrid CNN-HMM model[7], [15]. temporal correlations among time-steps even far from each While improves the performance in comparison with GMM- other. HMM, it still doesn’t eliminate the shortcomings of HMM. Given an input sequence x, a standard RNN computes the OurideaismotivatedbyrecentsuccessoftheRNNmodels hidden vector sequence h = (h ,h ,··· ,h ) and output 1 2 T applied to handwriting recognition[9], speech recognition[10] vector sequence y =(y ,y ,··· ,y ) as following: 1 2 T and image description[11], [12]. The main inspiration of our idea is observing the complementary modeling capacity of h =H(W x +W h +b ) (2) t ih t hh t−1 h CNN and RNN. CNN can automatically learn hierarchical y =W h +b (3) t ho t o image features but only as a static model. RNN is good at sequence modeling while lacking the ability of feature where the W terms denote weight matrices (e.g. W is the ih extraction. We integrate the two models to form an end-to- input-hidden weight matrix), the b terms denote bias vectors end scene text recognition system. (e.gb ishiddenbiasvector),H isthehiddenlayeractivation h Differentwithrecentworks[16]whichusesimilarideas,we function. investigate to use deep RNNs by stacking multiple recurrent We stack a CTC layer on top of RNN. With the CTC hidden states on top of each other. Our experiment shows the layer, we can train the RNN model directly on the sequences’ improvements of the endeavor. labellings with knowing frame-wise labels. 3 B. Feature extraction where σ is the logistic sigmoid function, i,f,o,c are respec- tivelytheinputgate,forgetgate,outputgateandcellactivation CNNs[17], [18] have shown exceptionally powerful repre- vectors, all of which are the same size as the hidden vector h. sentationcapabilityforimagesandhaveachievedstate-of-the- The core part of LSTM block is the memory cell c that art results in various vision problems. In this work, we build encodestheinformationoftheinputsthathavebeenobserved an CNN for feature extraction. up to that step. The gates determine whether the LSTM keeps We use CNN as a transforming function f(o) that takes an the value from the gate or discards it. The input gate controls inputimageoandoutputsanfixeddimensionalvectorxasthe whether the LSTM considers its current input, the forget gate feature.TheconvolutionandpoolingoperationsindeepCNNs allows to forget its previous memory, and the output gate are specially designed to extract visual features hierarchically, decides how much of the memory to transfer to the hidden from local low-level features to robust high-level ones. The state. Those features enable the LSTM architecture to learn hierarchically extracted features are robust to variable factors complex long-term dependences. that characters facing in natural scene. D. Training C. Dynamic modeling with Bidirectional RNN and LSTM Both the CNN and RNN models are deep models. When One shortcoming of conventional RNNs is that they only stacking them together, it is difficult to train them together. abletomakeuseofpreviouscontext.However,it’sreasonable Starting with an random initialization, the supervision infor- to exploit both previous and future contexts. For scene text mation propagated from RNN to CNN are quite ambiguous. recognition, the left and right context are both useful for So we separately train the two parts. determining the category of a specific frame image. TheCNNmodelistrainedwithstochasticgradientdescent. In our model, we use Bidirectional RNN (BRNN)[19] to The samples for training CNN is got by performing forced- processsequentialdatafrombothdirectionswithtwoseparate alignment on the scene text images with a GMM-HMM hidden layers. →− model[7]. BRNN computes the forward hidden sequence h, the ←− For training the RNN model, we need a loss function that backwardhiddensequence h.Eachtime-stepyt oftheoutput can directly compute the probability of the target labelling sequence is computed by integrating both directional hidden from the frame-wise outputs of RNN given the observations. states: Connectionist Temporal Classification (CTC)[8] is an ob- →− ←− yt =W→−hyht+W←h−yht+bo (4) jective function designed for sequence labeling problem when the segmentation of data is unknown. It does not require pre- BRNN provides the output layer with complete past and segmented training data, or post-processing to transform the future context for every time-step in the input sequence. network outputs into labelings. It trains the network to map During the forward pass, input sequence is fed to both direc- directly from input sequences to the conditional probabilities tional hidden layers, and the output layer is not updated until of the possible labelings. both the two hidden layers have processed the entire input A CTC output layer contains one more unit than there are sequence. The backward pass of BPTT for BRNN is similar elements in the alphabet L, denoted as L(cid:48) =L∪{nul}. The with unidirectional RNN, except that the output layer error δ elements in L(cid:48)∗ are refered as paths. For an input sequence is fed back to the both two directional hidden layers. x, the conditional probability of a path π ∈L(cid:48)T is given by While in principle RNN is a simple and powerful model, in practice, it’s unfortunately hard to train properly. RNN can (cid:89)T p(π|x)= yt (9) be seen as a deep neural network unfolding in time. A critical πt problem when training deep networks is the vanishing and t=1 exploding gradient problem[20]. When error signal transmit- where yt is the activation of output unit k at time t. An k ting along the network’s recurrent connections, it decays or operatorB isdefinedtomergetherepeatedlabelsandremove blowsupexponentially.Duetothis,therangeofcontextbeing blanks. For example, B(−,3,3,−,−,3,2,2) yields the label- accessed can be quite limited. ing (3,3,2). The conditional probability of a given labeling l LongShort-termMemory(LSTM)block[21]isdesignedto isthesumofthe probabilities ofallpathscorrespondingtoit: controltheinput,outputandtransitionofsignalsoastoretain (cid:88) p(l|x)= p(π|x). (10) useful information and discard useless one. LSTMs are used asmemoryblocksofRNNandcanalleviatethevanishingand π∈B−1(l) explodinggradientissues.Theyuseaspecialstructuretoform We use CTC[8] as the objective function. A forward- memorycells.TheLSTMupdatesfortime-steptgiveninputs backward algorithm[8] for CTC, which is similar to the x , h and c are: forward-backward algorithm of HMM, is designed to effec- t t−1 t−1 tively evaluate the probability. i =σ(W x +W h +W c +b ) (5) t xi t hi t−1 ci t−1 i The objective function for CTC is the negative log proba- f =σ(W x +W h +W c +b ) (6) bility of the correct labelings for the entire training set: t xf t hf t−1 cf t−1 f c =f c +i tanh(W x +W h +b ) (7) t t t−1 t xc t hc t−1 c (cid:88) OCTC =− ln(p(z|x)). (11) o =σ(W x +W h +W c +b ) (8) t xo t ho t−1 co t o (x,z)∈S 4 Given the partial derivatives of some differential loss func- The training samples are randomly splited out 10% as the tion L with respect to the network outputs, we use back validationset.Thevalidationsetisusedonlyfortuninghyper propagation through time algorithm (BPTT) to determine the parameters of different models. All the models use the same derivatives with respect to the weights. training, validation and testing set, which makes it fair to Like standard back propagation, BPTT follows the chain compare different models. rule to calculate derivatives. The subtle difference is that the loss function depends on the activation of the hidden layer B. Implementation details not only through its influence on the output layer, but also The full number images are normalized to the same height through the hidden layer of next time-step. So the error back 32 while keeping the scale ratio, then transformed to frame propagation formula is sequences by 32×20 sliding window. Each frame o is fed δt =θ(cid:48)(at)(cid:32)(cid:88)K σtw + (cid:88)H σt+1w (cid:33) (12) into CNN to producing feature xt = f(ot). We stantdardize h h k hk h(cid:48) hh(cid:48) the features by subtracting the global mean and dividing the k=1 h(cid:48)=1 standard deviation. where Our CNN model contains 3 convolutioin and pooling layer ∂L δt ≡ (13) pairs and 2 fully connected layers. The activation neurons j ∂at j are all rectified linear units (ReLU)[22]. The output number The same weights are reused at every timestep, so we sum sequence of the layers are 32,32,64,128,10. CNN is trained the whole sequence to get the derivatives with respect to the with stochastic gradient descent (SGD) under cross entropy network weights: loss with learning rate 10−3 and momentum 0.9. We choose the output of CNN’s first fully connected layer ∂L =(cid:88)T ∂L ∂atj =(cid:88)T δtbt (14) as frame features. The features are 128D and used for both ∂wij t=1 ∂atj ∂wij t=1 j j HMM based models and CRNN. We use 11 HMMs coinciding with the extended alphabet L(cid:48). All the HMMs are of 3-state left-to-right topology, except E. Decoding forthenulcategorywhichhas1self-loopedstate.TheGMM- The decoding task is to find the most probable labeling l∗ HMM model is trained with Baum-Welch algorithm. given an input sequence x: FortheproposedCRNNmodel,weuseadeepbidirectional l∗ =argmaxp(l|x). (15) RNN. We stack 2 RNN hidden layers, both of which are l bidirectional. All hidden units are LSTM blocks. The CRNN We use a simple and effective approximation by choosing model is trained with BPTT algorithm using SGD. We use a themostprobablepath,thengetthelabelingl˜∗ corresponding learning rate of 10−4 and a momentum of 0.9. to the path: l˜∗ =B(argmaxp(π|x)). (16) C. Results π V. EXPERIMENTS 55 train This section presents the details of our experiments. We 50 vbaelsidt a(ctitocnError) compare the CRNN model with 3 methods: (1) A baseline best (labelError) 45 methodbydirectlyusingCNNtopredictthecharacterateach 40 time-step,withapost-processingproceduretomergerepeated 35 outputs; (2) The GMM-HMM model; (3) Hybrid CNN-HMM model. 30 25 A. Dataset 20 We explore the use of CRNN model on a challenging 15 scenetextdatasetStreetViewHouseNumbers(SVHN)[2].The 10 0 2 4 6 8 10 12 14 16 18 dataset contains two versions of sample images. One contains onlyisolated32×32digitswith73257fortrainingand26032 Fig.2. TrainingcurveduringtrainingofCRNN-2-layers. fortesting.Theothercontainsunsegmentedfullhousenumber imagescontainingvariablenumberofunsegmenteddigits.The a) CNNbasedsequencelabelling: WetraintheCNNon fullnumberversioniscomposedof33402trainingimagesand the isolated version of SVHN, then use it to get the character 13068 testing. House numbers in the dataset show quite large predictions for each frame, choosing the most probable one appearance variability, blurring and unnormalized layouts. as the result. After that, we merge consecutively repeated We use the isolated version of the dataset for training the characters to get the final sequence labels. The recognition CNN,whichisthenusedtoextractfeaturesforeachsequence accuracy on the full number test set is only 0.23. Note that, frame.ThefullhousenumberversionisusedforHMMbased the CNN model achieve an accuracy of 0.92 on the isolated methods and CRNN. test set. The correctly recognized house numbers by CNN 5 CRNN-1-layer CRNN-2-layers promising results. CRNN performs much better than HMM epoch accuracy epoch accuracy based methods. However, CNN is still trained separately. CTCerror 12 0.84 9 0.90 labelerror 15 0.86 10 0.91 While the recognition process is segmentation-free, we still need cropped character samples for training the CNN. To TABLEI COMPARISONOFDIFFERENTCRNNARCHITECTURES. eliminate the needing of cropped samples in training, we plan to investigate using forced alignment of GMM-HMM for bootstrapping of CRNN. Better method would be directly performjointtrainingofCNNandRNNfromscratch.Another are mostly contain only 1 or 2 digits. The simple experiment promising direction would be to investigate the potential of can gain us an intuition of how hard the problem is and how stacking more hidden LSTM layers of RNN. important the sequential information is. b) HMM based models: We compare our method with 2 kindsofHMMbasedmodels.OneisGMM-HMMmodel,the ACKNOWLEDGMENT other is hybrid CNN-HMM model[7]. The number of mixture This work was supported by Open Project Program of components is an important factor for GMM-HMM. We the State Key Laboratory of Mathematical Engineering and evaluate different number of Gaussian mixture components. Advanced Computing(No.2015A04). The sequence accuracy stops improving at 800. The model tends to overfit when we continually increase the number of mixture components. REFERENCES Hybrid CNN-HMM improves GMM-HMM by using CNN [1] A.Bissacco,M.Cummins,Y.Netzer,andH.Neven,“Photoocr:Reading as the observation model. The training process is an iterative textinuncontrolledconditions.”inICCV,2013,pp.785–792. 1,2 procedure, where network retraining is alternated with HMM [2] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” re-alignment to generate more accurate state assignments. NIPS workshop on deep learning and unsupervised feature learning, c) CRNN: Recent developments of deep learning shows vol.2011,no.2,p.5,2011. 1,2,4 that Deep is an important factor for feed-forward models to [3] H. A. Bourlard and N. Morgan, Connectionist speech recognition: a hybridapproach. SpringerScience&BusinessMedia,2012,vol.247. gain stronger representation capability[23]. We evaluated two 1 architectures of CRNN. One uses 1 hidden layer denoted as [4] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre- CRNN-1-layer,theother2hiddenlayersdenotedasCRNN-2- traineddeepneuralnetworksforlarge-vocabularyspeechrecognition,” Audio,Speech,andLanguageProcessing,IEEETransactionson,vol.20, layers.CRNN-1-layerhas128LSTMmemorycells,CRNN-2- no.1,pp.30–42,2012. 1 layershas128forthefirsthiddenlayerand32forthesecond. [5] U.-V. Marti and H. Bunke, “Using a statistical language model to Figure 1 shows the architecture of CRNN-2-layers. improvetheperformanceofanhmm-basedcursivehandwritingrecogni- tionsystem,”InternationaljournalofPatternRecognitionandArtificial Experiment results are presented in Table I. Epoch column intelligence,vol.15,no.01,pp.65–90,2001. 1 liststheepochatwhichthebestmodelreacheswithrespectto [6] S.Espana-Boquera,M.J.Castro-Bleda,J.Gorbe-Moya,andF.Zamora- differenterrorcriterion.Accuracycolumnshowsthesequence Martinez, “Improving offline handwritten text recognition with hybrid hmm/ann models,” Pattern Analysis and Machine Intelligence, IEEE accuracy of the best model on test set. Transactionson,vol.33,no.4,pp.767–779,2011. 1 As can be seen, the deeper architecture performs not only [7] Q. Guo, D. Tu, J. Lei, and G. Li, “Hybrid cnn-hmm model for street better but also with less training epochs. This is a surprising viewhousenumberrecognition,”inACCV2014Workshops,ser.Lecture Notes in Computer Science, C. Jawahar and S. Shan, Eds., 2015, vol. finding, as intuitively the deeper model has more parameters 9008,pp.303–315. 1,2,3,5 which makes it more difficult to train. [8] A. Graves, S. Fernndez, F. J. Gomez, and J. Schmidhuber, “Connec- tionisttemporalclassification:labellingunsegmentedsequencedatawith Model Accuracy recurrentneuralnetworks.”inICML,2006,pp.369–376. 2,3 CNN 0.23 [9] A. Graves and J. Schmidhuber, “Offline handwriting recognition with GMM-HMM 0.56 multidimensional recurrent neural networks.” in NIPS, 2008, pp. 545– HybridCNN-HMM 0.81 552. 2 CRNN 0.91 [10] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrentneuralnetworks.”inICML,2014,pp.1764–1772. 2 TABLEII [11] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venu- PERFORMANCECOMPARISONOFDIFFERENTMODELS. gopalan,K.Saenko,andT.Darrell,“Long-termrecurrentconvolutional networksforvisualrecognitionanddescription.”2014. 2 [12] A.KarpathyandF.-F.Li,“Deepvisual-semanticalignmentsforgener- Performance comparison of CRNN with other models is atingimagedescriptions.”2014. 2 [13] A.Vinciarelli,S.Bengio,andH.Bunke,“Offlinerecognitionofuncon- represented in Table II. As shown by the experiments, CRNN strainedhandwrittentextsusinghmmsandstatisticallanguagemodels.” outperforms CNN and both HMM based models. 2004,pp.709–720. 2 [14] M.Kozielski,P.Doetsch,andH.Ney,“Improvementsinrwth’ssystem foroff-linehandwritingrecognition.”inICDAR,2013,pp.935–939. 2 VI. CONCLUSION [15] T.Bluche,H.Ney,andC.Kermorvant,“Featureextractionwithconvo- We have presented the Convolutional Recurrent Neural lutionalneuralnetworksforhandwrittenwordrecognition.”inICDAR, 2013,pp.285–289. 2 Network (CRNN) model for scene text recognition. It uses [16] K.Elagouni,C.Garcia,F.Mamalet,andP.Sbillot,“Textrecognitionin CNN to extract robust high-level features and RNN to learn videosusingarecurrentconnectionistapproach.”inICANN(2),2012, sequence dependences. The model eliminates the need of pp.172–179. 2 [17] Y.Lecun,L.Bottou,Y.Bengio,andP.Haffner,“Gradient-basedlearning character segmentation when doing scene text recognition. applied to document recognition,” Proceedings of the IEEE, vol. 86, We apply our method on street view images and achieve no.11,pp.2278–2324,Nov1998. 3 6 [18] A.Krizhevsky,I.Sutskever,andG.E.Hinton,“Imagenetclassification with deep convolutional neural networks,” in Advances in Neural In- formationProcessingSystems25,F.Pereira,C.Burges,L.Bottou,and K.Weinberger,Eds.,2012,pp.1097–1105. 3 [19] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- works,” Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp. 2673–2681,Nov1997. 3 [20] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrentneuralnetworks.”inICML(3),2013,pp.1310–1318. 3 [21] S. Hochreiter and J. Schmidhuber, “Long short-term memory.” 1997, pp.1735–1780. 3 [22] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmannmachines.”inICML,2010,pp.807–814. 4 [23] R.Pascanu,G.Montufar,andY.Bengio,“Onthenumberofresponse regions of deep feed forward networks with piece-wise linear activa- tions,” in International Conference on Learning Representations 2014 (ICLR2014),Banff,Alberta,Canada,2013. 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.