ebook img

Lampung - a New Handwritten Character Benchmark: Database, Labeling and Recognition PDF

12 Pages·2011·0.62 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Lampung - a New Handwritten Character Benchmark: Database, Labeling and Recognition

Lampung - a New Handwritten Character Benchmark: Database, Labeling and Recognition Akmal Junaidi, Szil´ard Vajda, Gernot A. Fink Computer Science Department, TU Dortmund, Germany {akmal.junaidi,szilard.vajda,gernot.fink}@udo.edu September 17, 2011 Overview of the talk: (cid:73) Labeling (cid:73) Features (cid:73) Introduction (cid:73) Motivation (cid:73) Experiments (cid:73) Script (cid:73) Conclusion Motivation New script: (cid:73) lack of publications (cid:73) no representative dataset Cultural heritage (cid:73) originated from Brahmi script (cid:73) preserving important heritage (cid:73) proof of script existence AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion1 Lampung alphabet Diacritics: Characteristics: Punctuation marks Handwriting sample (cid:73) not cursive (cid:73) curve(s) (cid:73) 20 letters (cid:73) the name: Kaganga AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion2 Semi-Automatic Labeling: An overview 1 1 Vajdaet.al,Semi-SupervisedEnsembleLearningApproachforCharacterLabelingwithMinimalHumanEffort, ICDAR,2011 AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion3 Features Water reservoir: Structural and statistical: (cid:73) top and bottom (cid:73) branch points (cid:73) gravity center (cid:73) end points (cid:73) size (volume) (cid:73) pixel density (cid:73) height and width AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion4 Experiments Dataset: Classification: Neural network (cid:73) fairy tales transcription (cid:73) 82 docs. written by students (cid:73) 35,193 character images (cid:73) clustered to 11 classes Composition: (cid:73) 21,122 for training set (60%) (cid:73) 10,547 for test set (30%) (cid:73) 3,524 for validation set (10%) Recognition result Features #Training #Test Rec (%) Branch points, end points, pixel density (BED) 21,122 10,547 93.2±0.48 Water reservoirs (WR) 21,122 10,547 91.3±0.54 BED and WR 21,122 10,547 94.3±0.44 AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion5 Misclassification Variability in writing style Different location of water reservoir Unfiltered punctuation marks Artifacts: (cid:73) touching characters (cid:73) character connected to diacritic(s) (cid:73) character connected to punctuation mark(s) AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion6 Conclusion (cid:73) The Lampung: (cid:73) scientific research challenge for handwritten recognition (cid:73) preserving efforts of the Lampung as a cultural heritage (cid:73) Semi-automatic labeling strategy: new approach (cid:73) efficient labeling task for large dataset, minimize human involvement (cid:73) only 20% samples need to be relabeled (cid:73) Water reservoir can effectively distinguish the Lampung characters: (cid:73) 91.3% recognition only based on water reservoir features (cid:73) 94.3% recognition combining with branch points, end points, pixel density (cid:73) Lampung character dataset: (cid:73) publicly available soon (cid:73) preferably on TC11 website AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion7 References I [1] U.BhattacharyaandB.B.Chaudhuri. DatabasesforResearchonRecognitionofHandwrittenCharactersofindianScripts. InInternationalConferenceonDocumentAnalysisandRecognition,volume2,pages789–793,2005. [2] B.B.ChaudhuriandS.Ghosh. OrientationDetectionofMajorIndianScripts. InProceedingsoftheInternationalWorkshoponMultilingualOCR,MOCR’09,pages8:1–8:7,NewYork,NY, USA,2009.ACM. [3] P.T.Daniels. TheWorld’sWritingSystems. OxfordUniversityPress,1996. [4] D.Ghosh,T.Dube,andA.Shivaprasad. ScriptRecognition:AReview. IEEETrans.PatternAnal.Mach.Intell.,32:2142–2161,December2010. [5] G.E.HintonandR.R.Salakhutdinov. ReducingtheDimensionalityofDatawithNeuralNetworks. Science,313(5786):504–507,July2006. [6] M.S.Khorsheed. RecognisingHandwrittenArabicManuscriptsUsingaSingleHiddenMarkovModel. PatternRecogn.Lett.,24:2235–2242,October2003. [7] L.I.Kuncheva. CombiningPatternClassifiers:MethodsandAlgorithms. Wiley-Interscience,2004. AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion8 References II [8] Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner. Gradient-BasedLearningAppliedtoDocumentRecognition. InIntelligentSignalProcessing,pages306–351.IEEEPress,2001. [9] C.-L.LiuandC.Y.Suen. ANewBenchmarkontheRecognitionofHandwrittenBanglaandFarsiNumeralCharacters. PatternRecognition,42:3287–3295,December2009. [10] L.M.LorigoandV.Govindaraju. OfflineArabicHandwritingRecognition:ASurvey. IEEETrans.PatternAnal.Mach.Intell.,28:712–724,May2006. [11] T.Mondal,U.Bhattacharya,S.K.Parui,K.Das,andV.Roy. DatabaseGenerationandRecognitionofOnlineHandwrittenBanglaCharacters. InProceedingsoftheInternationalWorkshoponMultilingualOCR,MOCR’09,pages9:1–9:6,NewYork,NY, USA,2009.ACM. [12] S.Mozaffari,H.E.Abed,V.M¨argner,K.Faez,andA.Amirshahi. IfN/Farsi-Database:aDatabaseofFarsiHandwrittenCityNames. InInternationalConferenceonFrontiersinHandwritingRecognition,2008. [13] S.Mozaffari,K.Faez,F.Faradji,M.Ziaratban,andS.M.Golzan. AComprehensiveIsolatedFarsi/ArabicCharacterDatabaseforHandwrittenOCRResearch. InTenthInternationalWorkshoponFrontiersinHandwritingRecognition,LaBaule(France),2006. [14] W.Niblack. AnIntroductiontoDigitalImageProcessing. StrandbergPublishingCompany,Birkeroed,Denmark,1985. AkmalJunaidi,Szil´ardVajda,GernotA.Fink MultilingualOCR2011,Beijing,China Introduction Labeling Features Experiments Conclusion9

Description:
Punctuation marks. Handwriting U. Bhattacharya and B. B. Chaudhuri. Databases [11] T. Mondal, U. Bhattacharya, S. K. Parui, K. Das, and V. Roy.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.