ebook img

Video text detection PDF

272 Pages·2014·6.161 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Video text detection

Advances in Computer Vision and Pattern Recognition Tong Lu Shivakumara Palaiahnakote Chew Lim Tan Wenyin Liu Video Text Detection Advances in Computer Vision and Pattern Recognition Foundingeditor SameerSingh RailVisionEuropeLtd. CastleDonington Leicestershire,UK Serieseditor SingBingKang InteractiveVisualMediaGroup MicrosoftResearch Redmond,WA,USA Moreinformationaboutthisseriesathttp://www.springer.com/series/4205 Tong Lu • Shivakumara Palaiahnakote Chew Lim Tan (cid:129) Wenyin Liu Video Text Detection 123 TongLu ShivakumaraPalaiahnakote DepartmentofComputerScience FacultyofCSIT andTechnology UniversityofMalaya NanjingUniversity KualaLumpur,Malaysia Nanjing,China WenyinLiu ChewLimTan MultimediaSoftwareEngineering NationalUniversityofSingapore ResearchCenter Singapore,Singapore CityUniversityofHongKong KowloonTong,HongKongSAR ISSN2191-6586 ISSN2191-6594(electronic) ISBN978-1-4471-6514-9 ISBN978-1-4471-6515-6(eBook) DOI10.1007/978-1-4471-6515-6 SpringerLondonHeidelbergNewYorkDordrecht LibraryofCongressControlNumber:2014945203 ©Springer-VerlagLondon2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface With the increasing availability of low cost portable digital video recorders, we are witnessing a rapid growth of video data archives today. The need for efficient indexing and retrieval has drawn the attention of researchers towards handling the videodatabases.However,efficientlyhandlingvideocontentisstilladifficulttask inthepatternrecognitionandcomputervisioncommunity,especiallywhenthesize ofthedatabaseincreasesdramatically.Alotofideashavebeenproposed,butlike otherfrontiersinthiscommunity,thereisnoreliableapproachthathastheoretical groundinginvideocontentanalysis. Fortunately, studies have shown that humans often pay their first attention to text over other objects in video. It is probably due to the ability of humans to simultaneously process multiple channels of scene context and then focus the attention on texts in video scenes. This fact makes video text detection a feasible and probably the most efficient way for indexing, classifying, retrieving and understanding the visual contents in videos. This can be further used in transportationsurveillance,electronicpayment,trafficsafetydetection,sportvideos retrieval,andevencommercialonlineadvertisements.Oneexampleisvideo-based licenseplaterecognitionsystems,whichareaccordinglynecessarytohelpimprove the convenience of checking vehicle status at roadside and designated inspection pointsefficiently.Anotherexampleisonlinevideoadvertising.Drivenbytheadvent of broadband Internet access, today’s online video users face a daunting volume of video content from video sharing websites, personal blogs, or from IPTV and mobileTV.Accordingly,howtodevelopadvertisingsystemsespeciallyconsidering contextual video contents through efficient video text detection techniques has becomeanurgentneed. Actually,videotextdetectionhasnotbeensystematicallyexploredeventhough people have developed a lot of optical character recognition (OCR) techniques, whichareconsideredasoneofthemostsuccessfulapplicationsinthepastdecades. For example, to explain a typical Google street video scene view, popular visual understanding methods detect and identify objects such as car, person, tree, road v vi Preface and sky from the scene successfully. However, regions containing text tends to be ignored. It is probably due to the fact that text from video is sometimes difficult to detect and recognize. The performance of OCR thereby drastically drops when appliedtovideotextswhichareeitherartificiallyadded(graphictext)ornaturally existingonvideosceneobjects(scenetext).Thereareseveralreasonsforthisfact. First,thevarietyofcolor,font,sizeandorientationofvideotextbringdifficultiesto OCR techniques. Second, video scenes exhibit a wide range of unknown imaging conditions which in general add sensitivity to noises, shadows, occlusion, lights, motion blur and resolution. Finally, the inputs of most of the OCR engines are well segmented texts which have been distinguished from background pixels. Unfortunately,thesegmentationofvideotextismuchharder. This book tries to systematically introduce readers to the recent developments of video text detection for the first time. It covers what we feel a reader who is interested in video text detection ought to know. In our view, video text detection consists of a general introduction to the background of this exploration (Chap. 1), pre-processingtechniques(Chap.2),detectionofgraphictextfromvideo(Chap.3), detectionofscenetextfromvideo(Chap.4),post-processingtechniquessuchastext linebinarizationandcharacterreconstruction(Chap.5),charactersegmentationand recognition (Chap. 6), video text applications and systems in real-life (Chap. 7), video script identification (Chap. 8), multi-modal techniques which have been proved to be useful for video text detection and video content analysis (Chap. 9), and performance evaluation of the video text detection algorithms and systems (Chap.10).Areaderwhogoesfromcovertocoverwillhopefullybewellinformed. However,wealsotriedtoreducetheinterdependencebetweenthesechapterssothat thereaderinterestedinparticulartopicscanavoidwadingthroughthewholebook. Wepresenttheoreticalmaterialinasuccinctmanner.Thereadercaneasilyaccess toamoredetailedup-to-datesetofreferencesofthemethodsdiscussedforfurther reading. Thus we are able to maintain a focus on introducing the most important solutionsofvideotextdetectioninthisbook. We are indebted to a number of individuals both from academic circles and industrywhohavecontributedtothepreparationofthebook.WethankHaoWang, ZehuanYuan,YiruiWu,RunXinandTrungQuyPhanforcollectingmaterialsand theirexperimentalevaluationsonparticularmethodsintroducedinthisbook. Special thanks go to Simon Rees for providing us this chance, and for the commitment to excellence in all aspects of the production of the book. We truly appreciatehiscreativity,assistance,andpatience. Nanjing,China TongLu KualaLumpur,Malaysia ShivakumaraPalaiahnakote Singapore,Singapore ChewLimTan KowloonTong,HongKongSAR WenyinLiu April04,2014 Contents 1 IntroductiontoVideoTextDetection..................................... 1 1.1 IntroductiontotheResearchofVideoTextDetection.............. 1 1.2 CharacteristicsandDifficultiesofVideoTextDetection........... 5 1.3 RelationshipBetweenVideoTextDetectionandOtherFields..... 7 1.4 ABriefHistoryofVideoTextDetection............................ 8 1.5 PotentialApplications................................................ 14 References.................................................................... 16 2 VideoPreprocessing........................................................ 19 2.1 PreprocessingOperators ............................................. 20 2.1.1 ImageCroppingandLocalOperators....................... 21 2.1.2 NeighborhoodOperators .................................... 22 2.1.3 MorphologyOperators....................................... 25 2.2 Color-BasedPreprocessing.......................................... 26 2.3 TextureAnalysis...................................................... 29 2.4 ImageSegmentation.................................................. 34 2.5 MotionAnalysis...................................................... 39 2.6 Summary.............................................................. 44 References.................................................................... 46 3 VideoCaptionDetection................................................... 49 3.1 IntroductiontoVideoCaptionDetection............................ 49 3.2 Feature-BasedMethods.............................................. 51 3.2.1 Edge-BasedMethods ........................................ 51 3.2.2 Texture-BasedMethods...................................... 56 3.2.3 ConnectedComponent-BasedMethods..................... 63 3.2.4 FrequencyDomainMethods ................................ 67 3.3 MachineLearning-BasedMethods.................................. 72 3.3.1 SupportVectorMachine-BasedMethods................... 72 vii viii Contents 3.3.2 NeuralNetworkModel-BasedMethods.................... 72 3.3.3 BayesClassification-BasedMethods ....................... 73 3.4 Summary.............................................................. 78 References.................................................................... 78 4 TextDetectionfromVideoScenes ........................................ 81 4.1 VisualSaliencyofSceneTexts ...................................... 82 4.2 NaturalSceneTextDetectionMethods ............................. 90 4.2.1 Bottom-UpApproach........................................ 91 4.2.2 Top-DownApproach......................................... 96 4.2.3 StatisticalandMachineLearningApproach................ 100 4.2.4 TemporalAnalysisApproach ............................... 106 4.2.5 HybridApproach............................................. 110 4.3 SceneCharacter/TextRecognition .................................. 113 4.4 SceneTextDatasets.................................................. 116 4.5 Summary.............................................................. 122 References.................................................................... 124 5 Post-processingofVideoTextDetection ................................. 127 5.1 TextLineBinarization ............................................... 127 5.1.1 Wavelet-Gradient-FusionMethod(WGF).................. 128 5.1.2 TextCandidates .............................................. 129 5.1.3 Smoothing.................................................... 131 5.1.4 ForegroundandBackgroundSeparation.................... 132 5.1.5 Summary ..................................................... 132 5.2 CharacterReconstruction............................................ 133 5.2.1 RingRadiusTransform...................................... 135 5.2.2 HorizontalandVerticalMedialAxes ....................... 136 5.2.3 HorizontalandVerticalGapFilling......................... 138 5.2.4 LargeGapFilling ............................................ 139 5.2.5 BorderGapFilling........................................... 140 5.2.6 SmallGapFilling ............................................ 141 5.2.7 Summary ..................................................... 142 5.3 Summary.............................................................. 142 References.................................................................... 143 6 CharacterSegmentationandRecognition............................... 145 6.1 IntroductiontoOCRandItsUsageinVideoTextRecognition.... 145 6.2 WordandCharacterSegmentation .................................. 147 6.2.1 FourierTransform-BasedMethodforWord andCharacterSegmentation................................. 149 6.2.2 Bresenham’sLineAlgorithm................................ 149 6.2.3 Fourier-MomentsFeatures................................... 150 6.2.4 WordExtraction.............................................. 152 6.2.5 CharacterExtraction......................................... 153 6.2.6 Summary ..................................................... 153 Contents ix 6.3 CharacterSegmentationWithoutWordSegmentation ............. 154 6.3.1 GVFforCharacterSegmentation ........................... 155 6.3.2 CutCandidateIdentification................................. 155 6.3.3 Minimum-CostPathfinding.................................. 157 6.3.4 False-PositiveElimination................................... 158 6.3.5 Summary ..................................................... 159 6.4 VideoTextRecognition.............................................. 159 6.4.1 CharacterRecognition....................................... 160 6.4.2 HierarchicalClassificationBasedonVotingMethod ...... 160 6.4.3 StructuralFeaturesforRecognition......................... 164 6.4.4 Summary ..................................................... 166 6.5 Summary.............................................................. 166 References.................................................................... 167 7 VideoTextDetectionSystems............................................. 169 7.1 LicensePlateRecognitionSystems ................................. 170 7.1.1 PreprocessingofLPRSystems.............................. 172 7.1.2 LicensePlateDetection...................................... 175 7.1.3 SkewCorrection ............................................. 176 7.1.4 CharacterSegmentation ..................................... 177 7.1.5 CharacterRecognition....................................... 178 7.2 NavigationAssistantSystems ....................................... 181 7.3 SportVideoAnalysisSystems....................................... 183 7.4 VideoAdvertisingSystems.......................................... 188 7.5 Summary.............................................................. 191 References.................................................................... 191 8 ScriptIdentification ........................................................ 195 8.1 Language-DependentTextDetection................................ 196 8.1.1 MethodforBanglaandDevanagari(Indian Scripts)TextDetection ...................................... 197 8.1.2 Headline-BasedMethodforTextDetection................ 197 8.1.3 SampleExperimentalResults ............................... 199 8.1.4 Summary ..................................................... 199 8.2 MethodsforLanguage-IndependentTextDetection................ 200 8.2.1 RunLengthsforMulti-orientedTextDetection............ 201 8.2.2 SelectingPotentialRunLengths ............................ 201 8.2.3 BoundaryGrowingMethodforTraversing................. 202 8.2.4 ZeroCrossingforSeparatingTextLinesfromTouching .. 204 8.2.5 SampleExperiments......................................... 205 8.2.6 Summary ..................................................... 206 8.3 ScriptIdentification .................................................. 207 8.3.1 Spatial-Gradient-FeaturesforVideoScript Identification ................................................. 210 8.3.2 TextComponentsBasedonGradient HistogramMethod........................................... 210

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.