Table Of Content

Long Timescale Credit Assignment in Neural Networks with External Memory StevenS.Hansen DepartmentofPsychology StanfordUniversity Stanford,CA94303 7 sshansen@stanford.edu 1 0 2 n a 1 Introduction J 4 Neural networks with external memories (NNEM),such as the Neural Turing Machine [1] and 1 Memory Networks [2], have often been compared to the human hippocampus in their ability to maintainepisodicinformationoverlongtimescales[3]. Butsofar,thesenetworkshaveonlybeen ] I trainedontasksrequiringmemorystoragecomparabletoafewminutes,whereasthehippocampus A storesinformationontheorderofyears. Itiswellknownthatthissortoflong-termepisodicstorage . isvitalforovercomingtheproblemofcatastrophicinterference[4],whichiscentraltothechallenge s c ofcontinuallearning. [ Whilesomeofthisdiscrepancycanbeattributedtotheshort-termnatureofcurrentbenchmarktasks, 1 amajorbottleneckcomesformtherelianceonusingback-propagation-through-timetolearntheem- v beddingfunctionthatmapsthehigh-dimensionalobservationsontolowerdimensionalrepresentations 6 suitableforstorageintheexternalmemory. Thisisproblematicbecausememoriesmightbeusedan 6 extremelylongtimeaftertheembeddingfunctionwascalled,andback-propagation-through-time 8 wouldrequireustoholdontotheforwardactivationsofthisfunctionfortheentiredurationifwe 3 wanttoback-propagatethroughit. 0 . Consideringthatinacontinuallearningsetup,anexternalmemorystoreontheorderofhundredsof 1 thousands,orevenmillions,ofseparateembeddingscouldbeneeded,storingalloftheintermediate 0 7 embedding activations is woefully intractable. A naive solution would be to simply store all of 1 thememoriesinobservationspaceinadditiontoembeddingspaceandrecomputetheembeddings : as needed for credit assignment, but a significant computational burden would result from these v redundantforward-passes. Moreimportantly,thehighdimensionalityoftheobservationspacewould i X makestoringtheseadditionalmemoriestakeanunreasonableamountofmemory. Indeed,Blundellet al[5]notedthatevenintherelativelymodestAtaridomain,storingafewmillionobservationsin r a pixel-spacewouldrequireupwardsof300gigabytes1. 2 Methods 2.1 LongTimescaleCreditAssignmentinNNEMs Creditassignmentintraditionalrecurrentneuralnetworksusuallyinvolvesback-propagatingthrough alongchainoftiedweightmatrices. Thelengthofthischainscaleslinearlywiththenumberof time-steps as the same network is run at each time-step. This creates many problems, such as vanishinggradients,thathavebeenwellstudied[7]. Incontrast,aNNEM’sarchitecturerecurrent 1Whiletheirworkutilizesalongtimescaleexternalmemory,itdoesnotaddresstheproblemsraisedhere, astheembeddingfunctionwasnotoptimizedforusageinanexternalmemory.Rather,itwaseitherarandom projection or pre-trained on a separate objective. Indeed, the low performance of the latter suggests that end-to-endoptimizationmightbenecessaryforadaptiveembeddingfunctionstobeworthwhileinthiscontext. 30thConferenceonNeuralInformationProcessingSystems(NIPS2016),Barcelona,Spain. activitydoesn’tinvolvealongchainofactivity(thoughsomearchitecturessuchastheNTMdoutilize atraditionalrecurrentarchitectureasacontroller). Rather,theexternallystoredembeddingvectors areusedateachtime-step,butnomessagesarepassedfromprevioustime-steps. Thismeansthat vanishinggradientsaren’taproblem,asallofthenecessarygradientpathsareshort. However,these pathsareextremelynumerous(oneperembeddingvectorinmemory)andreusedforaverylongtime (untilitleavesthememory). Thus,theforward-passinformationofeachmemorymustbestoredfor theentiredurationofthememory. Thisisproblematicasthisadditionalstoragefarsurpassesthatof theactualmemories,totheextentthatlargememoriesoninfeasibletoback-propagatethroughin highdimensionalsettings. Figure 1: The computational graphes of a NNEM with differing credit assignment mechanisms. Forward-passinformationisshowninblack,truegradientsinred,andapproximategradientsinblue. A)Thestandardmethodforback-propagatingthroughanexternalmemory. Notethatthelefthalfof thisgraphmustbefrozenuntilthememoryleaves. B)Syntheticgradientmodel,M,assignscredit uponfirstenteringthememory,andthenitselflearningfromtheoldermemories’gradient. C)Exact gradientreinstatement. Thedashedmemoriesareonlinerecalculations. D)Approximategradient reinstatement. Thememoriesusedforinferenceandcreditassignmentaredifferent. 2 2.2 SyntheticGradients Syntheticgradientsarearecenttechniquecreatedtodealwithsituationwhetherapartofanetwork is‘blocked’fromfurtherusage[6],suchasinthecasepreviouslydiscussedwherebyinformation mustbestoreduntilallgradientcalculationsarecomplete. Theirwayaroundthisproblemistotreat gradientsignalasjustanotherfunctiontobeapproximated. Namely,afunctionmappingfroman activationvector, and anycontextualinformation, ontoto theerrorgradient w.r.t. thatactivation vector. Thisauxiliarynetworkcanbeusedtocalculateanestimateofthegradientbeforethetrue gradientisavailable. Thissyntheticgradientproducingnetworkcanbetrainedinapurelysupervised mannerbycomparingitsestimatestothetruegradient. AsshowninFigure1B,wecanapplythisapproachtoNNEMsbyupdatingtheembeddingnetwork withasyntheticgradientimmediatelyaftertheembeddingvectoriscalculated. Truegradientsignals areavailableforallofthepreviouslyembeddedmemories,andthesecanbeusedtotrainthesynthetic gradientnetwork. 2.3 ReinstatementforCreditAssignment Onewaytogetaroundtheneedtoholdontoforward-passinformationistorecalculatetheforward- pass whenever gradient information is available. However, as previously mentioned, even the observationsaretoolargetostoreinourdomainofinterest,whichpreventsadirectreinstatementof aforwardpassfromoccurring. Instead,werelyonalearnedautoencodertoreinstatetheobservation, andthenusetheembeddingnetworktorecalculatetheforward-pass. Sincetherecalculatedembed- dingvectorisunlikelytoperfectlymatchtheonestoredinmemory,itisnon-obvioushowtoutilize errorgradientw.r.t. thevectorinmemory. Therearetwodistinctpossibilities: ignorethemismatchanddirectlyusethememory’sgradient,or usethefreshembeddinginsteadofthememoryduringinference. Thesetwomethodsareillustrated inFigure1CandD,andwhiletheyaren’tmutuallyexclusive,theyaretreatedassuchforthepurposes ofthispaper,toaidininterpretingtheinitialresults. Thelatteroptionisatruegradientsignalwhereas theformerisonlyanapproximation,butitreliesonhavingagoodenoughreconstructiontonotharm theinferenceprocess. 3 Results Totestourmodel,weusedthestandardMNISTbenchmark,theobjectiveofwhichistocorrectly classify images of numerals by the integer they represent. While this task is trivial for modern networkswiththeappropriatearchitecture,weuseithereasitrepresentsoneofthesimplesttasksthat could,undercertainconditions,stillencounterthecreditassignmentproblempreviouslydescribed. Sinceouraimistotractablytrainembeddingfunctionswithinanetworkwithanexternalmemory, we constructed a simple NNEM consisting of an embedding network and an external store that holdstheembeddingsofthe5000mostrecentlyencounteredimagesalongwiththeircorrectlabels. Classificationsaremadebytakingthecosinesimilaritybetweentheembeddingofthecurrentimage andalloftheembeddingsinmemory,andusingthenormalizedresulttoweightclasslabelsassociated withthememorizedembeddings. Asourfocusisonweightupdatesderivedfromthememorized embeddings,westopthegradientgoingintotheembeddingfunctioncomingfromthecurrentimage, asthesearesufficientforgoodperformanceinthissimplifiedsetting. Inaddition, wefoundthat completelyunsupervisedlearningofanautoencoderwassufficienttoyielddecentperformance.Since we’reprimarilyinterestedingettingthesupervisedsignaltotravelthroughthestoredembeddings, weonlyallowthereconstructionerrortomodifythedecoder’sweights,asthedecoderisn’tusedfor inferenceliketheencoder/embeddingfunctionis. TheresultsareshowninFigure2. Allconditionsutilizethesamenetworkarchitecture2andhyper- parameters, different only in their mechanism for credit assignment. The reinstatement method withexactgradientsperformedthebest,butrequiresthemostadditionalcomputation. Synthetic gradientperformanceisperhapslowerthanexpected,butitispossiblethatthisisduetothenetwork architecture. Thesurroundingmemoriesaffectthegradientofanyindividualmemory,butthesecan’t 2asinglehiddenlayerforeachpieceofthenetwork,withdimension256.ADAMwasusedforoptimization withalearningrateof1e-4elsewhere 3 Figure2: Performanceresultsaveragedacross10runs. Exactgradientreinstatementisshowninblue, approximategradientreinstatementisshowningreen,andsyntheticgradientsareshowninred. All accuracyresultsarefromaseperatevalidationset. beaninputtothesynethicgradientnetwork,asfuturememoriesaren’tobservable. Theapproximate gradientversionofthereinstatementmethodwasthemostpronetodivergence,whichmakessense giventheapproximationinvolved. 4 FutureDirections TheabsoluteperformanceonthisMNISTtaskisn’tusefulinitself,astheperformancewaspurposely sabotagedtoisolatethenovelmechanismsunderdiscussion. However,whiletherelativeperformance changesdemonstratethepromiseofreinstatementbasedcreditassignment,thecontriveddomain necessitatesfurtherworktoconfirmtheseresultsonlargescaleproblems. Onepossibilitywould be tackle reinforcement learning problems like Atari, using a variant of the model-free episodic controlnetwork[5]modifiedsuchthattheembeddingfunctionislearnedon-line. Whiletheexact modifications needed to make that setting compatible with this approach to credit assignment is outsidethescopeofthispaper,itisclearthatthisdomainisoneofthefewpresentinthecurrent literaturethatcallsforalongtermepisodicstore. Inamorecomplexenvironment,additionalnuancesemergewhichmightimprovecreditassignment performance. Given the reliance on accurate decoding for proper reinstatement, pre-training the autoencoderofflinemightbecrucial. Vanillaautoencodersalsotendtooverfit,whichmightslow performancewhenusedtoassigncredittonewexamples. Variationalautoencodershavebeenshown tobemorerobust[8]andalsoprovideuncertaintyestimatesthatcouldbeusefulwhendecidingwhich embeddingstodiscard. AhybridapproachutilizingsyntheticgradientsandbothformsofAEbased credit assignment might also be worthwhile, as these approaches appear to have complementary weaknesses. Whilethefocushasbeenoncreditassignment,adaptivelong-termepisodicmemoriesraiseother concerns, such as the fact that stored embedding go ‘stale’ rapidly, as the embedding function’s parameterschange. Iftherateofchangeisslowenough,thenoperatingthememoriesasaqueue (firstin,firstout)shouldlimittheimpactthishasonperformance. Butinmanysetups,embeddings arediscardedasafunctionoftheirusage[5]suchthatfrequentlyneededembeddingsarekeptaround longenoughforstalenesstobecomeproblematic. Whileonlyspeculative,anautoencodercouldalso beusedto‘freshen’theseembeddingsbyre-encodingthemusingthecurrentparameters. 4 References [1]Graves,A.,Wayne,G.,&Danihelka,I.(2014).Neuralturingmachines.arXivpreprintarXiv:1410.5401. [2]Weston,J.,Chopra,S.,&Bordes,A.(2014).Memorynetworks.arXivpreprintarXiv:1410.3916. [3]Graves,A.,Wayne,G.,Reynolds,M.,Harley,T.,Danihelka,I.,Grabska-Barwin´ska,A.,...&Badia,A.P. (2016).Hybridcomputingusinganeuralnetworkwithdynamicexternalmemory.Nature. [4]McClelland,J.L.,McNaughton,B.L.,&O’Reilly,R.C.(1995).Whytherearecomplementarylearning systemsinthehippocampusandneocortex:insightsfromthesuccessesandfailuresofconnectionistmodelsof learningandmemory.Psychologicalreview,102(3),419. [5]Blundell,Charles,etal."Model-freeepisodiccontrol."arXivpreprintarXiv:1606.04460(2016). [6] Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals, O., Graves, A., & Kavukcuoglu, K. (2016). Decoupledneuralinterfacesusingsyntheticgradients.arXivpreprintarXiv:1608.05343. [7]Hochreiter,S.(1998). Thevanishinggradientproblemduringlearningrecurrentneuralnetsandproblem solutions.InternationalJournalofUncertainty,FuzzinessandKnowledge-BasedSystems,6(02),107-116. [8]Kingma,D.P.,&Welling,M.(2013).Auto-encodingvariationalbayes.arXivpreprintarXiv:1312.6114. 5

Long Timescale Credit Assignment in NeuralNetworks with External Memory PDF

0.22 MB·

by Steven Stenberg Hansen

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Long Timescale Credit Assignment in NeuralNetworks with External Memory PDF Free - Full Version

by Steven Stenberg Hansen| 0.22

Download Long Timescale Credit Assignment in NeuralNetworks with External Memory by Steven Stenberg Hansen in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Long Timescale Credit Assignment in NeuralNetworks with External Memory

No description available for this book.

Detailed Information

Author:	Steven Stenberg Hansen
File Size:	0.22
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Long Timescale Credit Assignment in NeuralNetworks with External Memory Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Long Timescale Credit Assignment in NeuralNetworks with External Memory PDF?

Yes, on https://PDFdrive.to you can download Long Timescale Credit Assignment in NeuralNetworks with External Memory by Steven Stenberg Hansen completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Long Timescale Credit Assignment in NeuralNetworks with External Memory on my mobile device?

After downloading Long Timescale Credit Assignment in NeuralNetworks with External Memory PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Long Timescale Credit Assignment in NeuralNetworks with External Memory?

Yes, this is the complete PDF version of Long Timescale Credit Assignment in NeuralNetworks with External Memory by Steven Stenberg Hansen. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Long Timescale Credit Assignment in NeuralNetworks with External Memory PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.