Table Of Content

Learning what to look in chest X-rays with a recurrent visual attention model Petros-PavlosYpsilantis GiovanniMontana DepartmentofBiomedicalEngineering DepartmentofBiomedicalEngineering King’sCollegeLondon King’sCollegeLondon 7 London,SE17EH London,SE17EH 1 [email protected] [email protected] 0 2 n Abstract a J 3 2 X-raysarecommonlyperformedimagingteststhatusesmallamountsofradiation toproducepicturesoftheorgans,tissues,andbonesofthebody. X-raysofthe ] chestareusedtodetectabnormalitiesordiseasesoftheairways, bloodvessels, L bones, heart, and lungs. In this work we present a stochastic attention-based M modelthatiscapableoflearningwhatregionswithinachestX-rayscanshouldbe visuallyexploredinordertoconcludethatthescancontainsaspecificradiological . t abnormality. Theproposedmodelisarecurrentneuralnetwork(RNN)thatlearns a t tosequentiallysampletheentireX-rayandfocusonlyoninformativeareasthat s arelikelytocontaintherelevantinformation. Wereportonexperimentscarriedout [ withmorethan100,000X-rayscontainingenlargedheartsormedicaldevices. The 1 modelhasbeentrainedusingreinforcementlearningmethodstolearntask-specific v policies. 2 5 4 6 0 1 Introduction . 1 0 7 ChestX-rays(CXR)arethemostcommonlyuseddiagnosisexamsforchest-relateddiseases. They 1 useaverysmalldoseofionizingradiationtoproducepicturesoftheinsideofthechest. CXRscans : v helpradiologiststodiagnoseormonitortreatmentforconditionssuchaspneumonia,heartfailure, i emphysema,lungcancer,positioningofmedicaldevices,aswellasfluidandaircollectionaround X thelungs. Anexpertradiologististypicallyabletodetectradiologicalabnormalitiesbylookinginthe r ’rightplaces’"’andmakingquickcomparisonstonormalstandards. Forexample,forthedetectionof a anenlargedheart,orcardiomegaly,thesizeoftheheartisassessedinrelationtothetotalthoracic width. GiventhatchestX-raysareroutinelyusedtodetectseveralabnormalitiesordiseases,acareful interpretationofascanrequiresexpertiseandtimeresourcesthatarenotalwaysavailable,especially sincelargenumbersofCXRexamsneedtobereporteddaily. Thisleadstodiagnosticerrorsthat,for somepathologies,havebeenestimatedtobeintherangeof23%[10]. Ourultimateobjectiveistodevelopafully-automatedsystemthatlearnstoidentifyradiological abnormalitiesusingonlylargevolumesoflabelledhistoricalexams. Wearemotivatedbyrecentwork onattention-basedmodelswhichhavebeenusedfordigitclassification[6],sequentialprediction ofstreetviewhousenumbers[1]andafine-grainedcategorizationtask[7]. However,wearenot awareofapplicationsofsuchattentionmodelstothechallengingtaskofchestX-rayinterpretation. Herewereportontheinitialperformanceofarecurrentattentionmodel(RAM),similartothemodel originallypresentedin[6],andtrainedend-to-endonaverylargenumberofhistoricalX-raysexams. 30thConferenceonNeuralInformationProcessingSystems(NIPS2016),Barcelona,Spain. 2 Dataset Forthisstudywecollectedandpreparedadatasetconsistingof745,480X-rayplainfilmsofthe chestalongwiththeircorrespondingradiologicalreports. Allthehistoricalexamswereextracted fromthehistoricalarchivesofGuy’sandStThomas’HospitalinLondon(UK),andcoveredmore thanadecade,from2005to2016. Eachscanwaslabelledaccordingtotheclinicalfindingsthatwere originallyreportedbytheconsultantradiologistandrecordedinanelectronicclinicalreport. The labellingtaskwasautomatedusinganaturallanguageprocessing(NLP)systemthatimplementsa combinationofmachinelearningandrule-basedalgorithmsforclinicalentityrecognition,negation detectionandentityclassification. Anearlyversionofthesystemusedabidirectionallong-shortterm memory(LSTM)modelformodellingtheradiologicallanguageanddetectingclinicalfindingsand theirnegations[2]. Forthepurposeofthisstudy,weonlyusedscanslabelledasnormal(i.e. thosewithnoreported abnormalities),andthosereportedashavinganenlargedheart(i.e. alargecardiacsilhouette)anda medicaldevice(e.g. apacemaker). Thenumberofscanswithinthesethreecategorieswas57,595, 31,528and22,804,respectively. Wewereinterestedinthedetectionofenlargedheartsandmedical devices,andaseparatemodelwastrainedforeachtaskandtestedon6,000and4,000randomly selectedexams,respectively. Alltheremainingimageswereusedforbothtrainingandvalidation. In allourexperimentswescaledthesizeoftheimagesdownto256×256pixels. 3 Recurrentattentionmodel(RAM) TheRAMmodelimplementedhereissimilartotheoneoriginallyproposedin[6]. Mimickingthe humanvisualattentionmechanism,thethismodellearnstofocusandprocessonlyacertainregion ofanimagethatisrelevanttotheclassificationtask. Inthissectionweprovideabriefoverviewof themodelanddescribehowourimplementationdiffersfromtheoriginalarchitecture. Wereferthe readerto[6]forfurtherdetailsonthetrainingalgorithm. GlimpseLayer: Ateachtimet,themodeldoesnothavefullaccesstotheinputimagebutinstead receives a partial observation, or “glimpse”, denoted by x . The glimpse consists of two image t patchesofdifferentsizecentredatthesamelocationl ,eachonecapturingadifferentcontextaround t l . Bothpatchesarematchedinsizeandpassedasinputtoanencoder,asillustratedinFigure1. t Encoder: Theencoderimplementedherediffersfromtheoneusedin[6]. Inourapplicationwehave acomplexvisualenvironmentfeaturinghighvariabilityinbothluminanceandobjectcomplexity. Thisisduetothelargevariabilityinpatient’s’anatomyaswellasimageacquisitionastheX-ray scanswereaquiredusingmorethan20differentX-raydevices.Thegoaloftheencoderistocompress theinformationoftheglimpsebyextractingarobustrepresentation. Toachievethis,eachimage oftheglimpseispassedthroughastackoftwoconvolutionalautoencoderswithmax-pooling[5]. Eachconvolutionalautoencoderinthestackispre-trainedseparatelyfromtheRAMmodel. During training,ateachtimettheglimpserepresentationisconcatenatedwiththelocationrepresentation andpassedasinputtoafullyconnected(FC)layer. TheoutputoftheFClayerisdenotedasb and t ispassedasinputtothecoreRAMmodel,asseeninFigure1. CoreRAM:Ineachtimestept,theoutputvectorb andtheprevioushiddenrepresentationh t t−1 arepassedasinputtotheLSTMlayer. Thelocatorreceivesthehiddenrepresentationh fromthe t LSTMunitandpassesontoaFClayer,resultinginavectoro ∈R2(seeFigure1). Thelocatorthen t decidesthepositionofthenextglimpsebysamplingl ∼N(o ,Σ),i.e.fromanormaldistribution t+1 t withmeano anddiagonalcovariancematrixΣ. Thelocationl representsthex-ycoordinatesof t t+1 theglimpseattimestept+1. Attheveryfirststep,weinitiatethealgorithmatthecenterofthe image,andalwaysuseafixedvarianceσ2. 4 Results Table1summariestheclassificationperformanceoftheRAMmodelalongsidewiththeperformance ofstate-of-the-artconvolutionalneuralnetworkstrainedandtestedonthesamedataset. RAM,using 5millionparameters,reaches90.6%and91.0%accuracyforthedetectionofmedicaldevicesand enlargedhearts,respectively. Forthesametasks,Inception-v3[9]achievesthehighestaccuracywith 91.3%and91.4%,butuses4timesmoreparameterscomparedtotheRAMmodel. 2 Figure1: RAM.AteachtimestepttheCoreRAMsamplesalocationl ofwheretoattendnext. t+1 Thelocationl isusedtoextracttheglimpse(redframesofdifferentsize). Theimagepatchesare t+1 down-scaledandpassedthroughtheencoder. Therepresentationoftheencoderandtheprevious hidden state h of the Core RAM are passed as inputs to the LSTM of the step t. The locator t−1 receivesasinputthehiddenstateh ofthecurrentLSTMandthenitsamplesthelocationcoordinates t fortheglimpseinthenextstept+1. Thisprocesscontinuousrecursivelyuntilstepnwherethe outputoftheLSTMh isusedtoclassifytheinputimage. n Table1: Accuracy(%)ontheclassificationbetweenimageswithnormalradiologicalappearanceand imageswithenlargedheartandmedicaldevices. Model HeartEnlarged MedicalDevices NumberofParameters VGG[8] 89.1 86.2 ∼68million ResNet-18[3] 89.5 88.4 ∼11million Inception-v3[9] 91.4 91.3 ∼21million AlexNet[4] 89.7 87.4 ∼70million RAM 91.0 90.6 ∼5million baselineResNet-18 90.0 87.2 ∼5million baselineVGG 88.9 86.3 ∼5million baselineAlexNet 90.0 86.3 ∼5million InFigure3weillustratetheperformanceonthevalidationset,andhighlightthelocationsattended bythemodelwhentryingtodetectmedicaldevices. Hereitcanbenotedhowinitiallythemodel exploresrandomlyselectedportionsoftheimage,anditsclassificationperformanceremainslow. Afteraspecificnumberofepochs,themodeldiscoversthatthemostinformativepartsofachest X-rayarethosecontainingthelungsandspine,andselectivelyprefersthoseregionsinsubsequent paths. ThisisareasonablepolicysincemostofthemedicaldevicestobefoundinchestX-rays,such aspacemakersandtubes,arelocatedinthoseareas. Figure 3 (A) shows the locations mostly attended by the RAM model when looking for medical devices. Fromthisfigureitisobviousthatthelearntpolicyexploresonlytherelevantareaswhere thesedevicescangenerallybefound. Twoexamplesofpathsfollowedbythealgorithmafterlearning thepolicyareillustratedin Figures3(B), (C).Inthese examples, startingfromthe centerofthe image,thealgorithmsmovesclosertoaregionthatislikelytocontainapacemaker,whichisthen correctlyidentified. Thecircleandtrianglepoints(inred)indicatethecoordinatesofthefirstandlast glimpseinthelearntpolicy,respectively. Analogously, Figure 4 (A) highlights frequently explored locations when trying to discriminate betweennormalandenlargedhearts. Hereitcanbeobservedhowthemodelslearnstofocusonthe cardiacarea. TwosamplesofthelearnedpolicyareillustratedinFigure4(B),(C).Thetrajectories followedheredemonstrateshowthepolicyhaslearnedthatexploringtheextremitiesoftheheartis requiredinordertoconcludewhethertheheartisenlargedornot. 3 Figure2: Top: Accuracy(%)onthevalidationsetduringthetrainingofthemodel. Bottom: Image locations that the model attends during the validation. The grids corresponds to image areas of size 25×25 pixels. We split the total number of epochs into chunks of 500 epochs. For each chunkthemodelisvalidated100timesandthelocationsfromeachvalidationaresummarizedinthe correspondingimage. Imageregionswithlowtransparencycorrespondtolocationsthatthemodel visitswithhighfrequency. Figure3: (A)ImagelocationsattendedbytheRAMmodelforthedetectionofmedicaldevices. (B) and(C)aretwodifferentsamplesofthelearntpolicyontestimages. Figure4: (A)ImagelocationsattendedbytheRAMmodelforthedetectionofenlargedhearts. (B) and(C)aretwodifferentsamplesofthelearntpolicyontestimages. 4 5 ConclusionandPerspectives Inthisworkwehaveinvestigatedwhetheravisualattentionmechanism,theRAMmodel,iscapable of learning how to interpret chest X-ray scans. Our experiments show that the model not only hasthepotentialtoachieveclassifcationperformancecomparabletostate-of-the-artconvolutional architecturesusingfarfewerparameters,butalsolearnstoidentifyspecificportionsoftheimagesthat arelikelytocontaintheanatomicalinformationrequiredtoreachcorrectconclusions. Therelevant areasareexploredaccordingtopoliciesthatseemappropriateforeachtask. Currentworkisbeing directedtowardsenablingthemodeltolearneachpolicyasquicklyandpreciselyaspossibleusing full-scaleimagesandforamuchlargernumberofclinicallyimportantradiologicalclasses. References [1] Ba,J.,Mnih,V.,Kavukcuoglu,K.: Multipleobjectrecognitionwithvisualattention. In: ICLR (2015) [2] Cornegruta,S.,Bakewell,R.,Withey,S.,Montana,G.: Modellingradiologicallanguagewith bidirectionallongshort-termmemorynetworks. In: InEMNLP(2016) [3] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385v1(2015) [4] Krizhevsky,A.,Sutskever,I.,Hinton,G.:Imagenetclassificationwithdeepconvolutionalneural networks. In: NIPS.pp.1106–1114(2012) [5] Masci,J.,Meier,U.,Ciresan,D.,Schmidhuber,J.: Stackedconvolutionalauto-encodersfor hierarchicalfeatureextraction. In: ICANN(2011) [6] Mnih,V.,etal.: Recurrentmodelsofvisualattention. In: NIPS(2014) [7] Sermanet,P.,Frome,A.,Real,E.: Attentionforfine-grainedcategorization. arXiv:1412.7054v3 (2015) [8] Simonyan,K.,Zisserman,A.: Verydeepconvolutionalnetworksforlargescaleimagerecogni- tion. In: ICLR(2015) [9] Szegedy,C.,Vanhoucke,V.,Ioffe,S.,Shlens,J.,Wojna,Z.:Rethinkingtheinceptionarchitecture forcomputervision. arXiv:1512.00567v3(2015) [10] Tudor,G.,Finlay,D.,Taub,N.: Anassessmentofinter-observeragreementandaccuracywhen reportingplainradiographs. ClinicalRadiology52(3),235–238(1997) 5

Learning what to look in chest X-rays with a recurrent visual attention model PDF

9.7 MB·

by Petros-Pavlos Ypsilantis

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning what to look in chest X-rays with a recurrent visual attention model

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.