ebook img

Meta-Algorithmics: Patterns for Robust, Low-Cost, High-Quality Systems PDF

384 Pages·2013·5.59 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Meta-Algorithmics: Patterns for Robust, Low-Cost, High-Quality Systems

META-ALGORITHMICS META-ALGORITHMICS PATTERNS FOR ROBUST, LOW-COST, HIGH-QUALITY SYSTEMS Steven J. Simske HPLabs,Colorado,USA (cid:2)C 2013JohnWiley&Sons,Ltd Registeredoffice JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UnitedKingdom Fordetailsofourglobaleditorialoffices,forcustomerservicesandforinformationabouthowtoapplyfor permissiontoreusethecopyrightmaterialinthisbookpleaseseeourwebsiteatwww.wiley.com. TherightoftheauthortobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewiththeCopyright, DesignsandPatentsAct1988. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inany formorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedbytheUK Copyright,DesignsandPatentsAct1988,withoutthepriorpermissionofthepublisher. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmaynotbe availableinelectronicbooks. Designationsusedbycompaniestodistinguishtheirproductsareoftenclaimedastrademarks.Allbrandnamesand productnamesusedinthisbookaretradenames,servicemarks,trademarksorregisteredtrademarksoftheir respectiveowners.Thepublisherisnotassociatedwithanyproductorvendormentionedinthisbook. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsinpreparing thisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsof thisbookandspecificallydisclaimanyimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.Itis soldontheunderstandingthatthepublisherisnotengagedinrenderingprofessionalservicesandneitherthe publishernortheauthorshallbeliablefordamagesarisingherefrom.Ifprofessionaladviceorotherexpert assistanceisrequired,theservicesofacompetentprofessionalshouldbesought. LibraryofCongressCataloging-in-PublicationData Simske,StevenJ. Meta-algorithmics:patternsforrobust,low-cost,high-qualitysystems/Dr.StevenJ.Simske, Hewlett-PackardLabs. pagescm ISBN978-1-118-34336-4(hardback) 1.Computeralgorithms. 2.Parallelalgorithms. 3.Heuristicprogramming. 4.Computersystems–Costs. 5.Computersystems–Qualitycontrol. I.Title. QA76.9.A43S5432013 005.1–dc23 2013004488 AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ISBN:9781118343364 Typesetin10/12ptTimesbyAptaraInc.,NewDelhi,India Contents Acknowledgments xi 1 IntroductionandOverview 1 1.1 Introduction 1 1.2 WhyIsThisBookImportant? 2 1.3 OrganizationoftheBook 3 1.4 Informatics 4 1.5 EnsembleLearning 6 1.6 MachineLearning/Intelligence 7 1.6.1 RegressionandEntropy 8 1.6.2 SVMsandKernels 9 1.6.3 Probability 15 1.6.4 UnsupervisedLearning 17 1.6.5 DimensionalityReduction 18 1.6.6 OptimizationandSearch 20 1.7 ArtificialIntelligence 22 1.7.1 NeuralNetworks 22 1.7.2 GeneticAlgorithms 25 1.7.3 MarkovModels 28 1.8 DataMining/KnowledgeDiscovery 31 1.9 Classification 32 1.10 Recognition 38 1.11 System-BasedAnalysis 39 1.12 Summary 39 References 40 2 ParallelFormsofParallelism 42 2.1 Introduction 42 2.2 ParallelismbyTask 43 2.2.1 Definition 43 2.2.2 ApplicationtoAlgorithmsandArchitectures 46 2.2.3 ApplicationtoScheduling 51 2.3 ParallelismbyComponent 52 2.3.1 DefinitionandExtensiontoParallel-ConditionalProcessing 52 vi Contents 2.3.2 ApplicationtoDataMining,Search,andOtherAlgorithms 55 2.3.3 ApplicationtoSoftwareDevelopment 59 2.4 ParallelismbyMeta-algorithm 64 2.4.1 Meta-algorithmicsandAlgorithms 66 2.4.2 Meta-algorithmicsandSystems 67 2.4.3 Meta-algorithmicsandParallelProcessing 68 2.4.4 Meta-algorithmicsandDataCollection 69 2.4.5 Meta-algorithmicsandSoftwareDevelopment 70 2.5 Summary 71 References 72 3 DomainAreas:WhereAreTheseRelevant? 73 3.1 Introduction 73 3.2 OverviewoftheDomains 74 3.3 PrimaryDomains 75 3.3.1 DocumentUnderstanding 75 3.3.2 ImageUnderstanding 77 3.3.3 Biometrics 78 3.3.4 SecurityPrinting 79 3.4 SecondaryDomains 86 3.4.1 ImageSegmentation 86 3.4.2 SpeechRecognition 90 3.4.3 MedicalSignalProcessing 90 3.4.4 MedicalImaging 92 3.4.5 NaturalLanguageProcessing 95 3.4.6 Surveillance 97 3.4.7 OpticalCharacterRecognition 98 3.4.8 SecurityAnalytics 101 3.5 Summary 101 References 102 4 ApplicationsofParallelismbyTask 104 4.1 Introduction 104 4.2 PrimaryDomains 105 4.2.1 DocumentUnderstanding 112 4.2.2 ImageUnderstanding 118 4.2.3 Biometrics 126 4.2.4 SecurityPrinting 131 4.3 Summary 135 References 136 5 ApplicationofParallelismbyComponent 137 5.1 Introduction 137 5.2 PrimaryDomains 138 5.2.1 DocumentUnderstanding 138 5.2.2 ImageUnderstanding 152 Contents vii 5.2.3 Biometrics 162 5.2.4 SecurityPrinting 170 5.3 Summary 172 References 173 6 IntroductiontoMeta-algorithmics 175 6.1 Introduction 175 6.2 First-OrderMeta-algorithmics 178 6.2.1 SequentialTry 178 6.2.2 ConstrainedSubstitute 181 6.2.3 VotingandWeightedVoting 184 6.2.4 PredictiveSelection 189 6.2.5 TessellationandRecombination 192 6.3 Second-OrderMeta-algorithmics 195 6.3.1 ConfusionMatrixandWeightedConfusionMatrix 195 6.3.2 ConfusionMatrixwithOutputSpaceTransformation (ProbabilitySpaceTransformation) 199 6.3.3 TessellationandRecombinationwithExpertDecisioner 203 6.3.4 PredictiveSelectionwithSecondaryEngines 206 6.3.5 SingleEnginewithRequiredPrecision 208 6.3.6 MajorityVotingorWeightedConfusionMatrix 209 6.3.7 MajorityVotingorBestEngine 210 6.3.8 BestEnginewithDifferentialConfidenceorSecondBestEngine 212 6.3.9 BestEnginewithAbsoluteConfidenceorWeighted ConfusionMatrix 217 6.4 Third-OrderMeta-algorithmics 218 6.4.1 Feedback 219 6.4.2 ProofbyTaskCompletion 221 6.4.3 ConfusionMatrixforFeedback 224 6.4.4 ExpertFeedback 228 6.4.5 SensitivityAnalysis 232 6.4.6 RegionalOptimization(ExtendedPredictiveSelection) 236 6.4.7 GeneralizedHybridization 239 6.5 Summary 240 References 240 7 First-OrderMeta-algorithmicsandTheirApplications 241 7.1 Introduction 241 7.2 First-OrderMeta-algorithmicsandthe“BlackBox” 241 7.3 PrimaryDomains 242 7.3.1 DocumentUnderstanding 242 7.3.2 ImageUnderstanding 246 7.3.3 Biometrics 252 7.3.4 SecurityPrinting 256 7.4 SecondaryDomains 257 7.4.1 MedicalSignalProcessing 258 viii Contents 7.4.2 MedicalImaging 264 7.4.3 NaturalLanguageProcessing 268 7.5 Summary 271 References 271 8 Second-OrderMeta-algorithmicsandTheirApplications 272 8.1 Introduction 272 8.2 Second-OrderMeta-algorithmicsandTargetingthe“Fringes” 273 8.3 PrimaryDomains 279 8.3.1 DocumentUnderstanding 280 8.3.2 ImageUnderstanding 293 8.3.3 Biometrics 297 8.3.4 SecurityPrinting 299 8.4 SecondaryDomains 304 8.4.1 ImageSegmentation 305 8.4.2 SpeechRecognition 307 8.5 Summary 308 References 308 9 Third-OrderMeta-algorithmicsandTheirApplications 310 9.1 Introduction 310 9.2 Third-OrderMeta-algorithmicPatterns 311 9.2.1 ExamplesCovered 311 9.2.2 Training-Gap-TargetedFeedback 311 9.3 PrimaryDomains 313 9.3.1 DocumentUnderstanding 313 9.3.2 ImageUnderstanding 315 9.3.3 Biometrics 318 9.3.4 SecurityPrinting 323 9.4 SecondaryDomains 328 9.4.1 Surveillance 328 9.4.2 OpticalCharacterRecognition 334 9.4.3 SecurityAnalytics 337 9.5 Summary 340 References 341 10 BuildingMoreRobustSystems 342 10.1 Introduction 342 10.2 Summarization 342 10.2.1 GroundTruthingforMeta-algorithmics 342 10.2.2 Meta-algorithmicsforKeywordGeneration 347 10.3 CloudSystems 350 10.4 MobileSystems 353 10.5 Scheduling 355 10.6 Classification 356 10.7 Summary 358 Reference 359 Contents ix 11 TheFuture 360 11.1 Recapitulation 360 11.2 ThePatternofAllPatience 362 11.3 BeyondthePale 365 11.4 ComingSoon 367 11.5 Summary 368 References 368 Index 369 Acknowledgments Thegoalsofthisbookwereambitious—perhapstooambitious—bothinbreadth(domainsad- dressed)anddepth(numberandvarietyofparallelprocessingandmeta-algorithmicpatterns). The book represents, or at least builds on, the work of many previous engineers, scientists and knowledge workers. Undoubtedly most, if not all, of the approaches in this book have beenelaboratedelsewhere,eitherovertlyorinsomedisguise.Onecontributionofthisbook is to bring these disparate approaches together in one place, systematizing the design of in- telligentparallelsystems.Inprogressingfromthedesignofparallelsystemsusingtraditional by-component and by-task approaches to meta-algorithmic parallelism, the advantages of hybridizationforsystemaccuracy,robustnessandcostwereshown. I have a lot of people to thank for making this book a reality. First and foremost, I’d like tothankmywonderfulfamily—Tess,KieranandDallen—forputtingupwithayear’sworth ofweekendsandlatenightsspentdesigningandrunningthemany“throwaway”experiments necessarytoillustratetheapplicationofmeta-algorithmics(nottomentionthatlittlethingof actuallywriting).Theirpatienceandsupportmadeallthedifference!Thanks,yourpresence inmylifemakeswritingthisbookworthwhile. IalsothankHewlettPackard(HP),myemployer,forthego-aheadtowritethisbook.While my “day job” work load was in no way lightened during the writing of the book (for one example,asIwritethisIhave150%moredirectreportsthanIdidwhenthecontracttowrite thisbookwassignednearlyayearandahalfago!),HPdidstreamlinethecontractandinno waymicromanagedtheprocess.Sometimesthebiggesthelpisjustgettingoutoftheway,and Iappreciateit.SpecialthankstoYanLiu,QinZhuLippert,KeithMooreandEricHansonfor theirsupportalongthe(HP)way. TheeditorialstaffatJohnWiley&Sonshasbeentremendous.InparticularAlexKing,my maineditorandthepersonwhoconvincedmetowritethisbookinthefirstplace,hasbeena delight.BaljinderKaurhasbeentremendousinfindingtypographicalandlogicalerrorsduring theeditorialprocess.Hersharpeyeandwithavebothmadetheprocessasnap.I’dalsoliketo thankGennaManaog,RichardDavies,LizWingett,andClaireBaileyfortheirhelp. Mostofthephotosandallofthetablesanddiagramsinthebookweremycreation,withone notableexception.Thereisanexcellentscanofa1996brochurefortheCheyenneMountain ZoowhichIhaveusedextensivelytoillustratemixed-regionsegmentation. ThankstoErica Meyerforprovidingcopyrightpermission,nottomentionforhelpingrunoneofthecountry’s mostuniqueattractions. Have you found a mistake or two in reading the book? If not, you, like me, have Jason Aronofftothank.JasonreadeachchapterafterIfinishedthefirstdraft,andhisexcellentand xii Acknowledgments verytimelyfeedbackallowedmetosendaseconddrafttoWiley(beforedeadline!I’veheard that “never happens” from my friend and fellow author Bob Ulichney), where otherwise a verysloppy,choppyfirstdraftwouldhavehadtosuffice.ThankstoMarieVansforherextra pairofeyesontheproofs. Onthescienceofmeta-algorithmics,bigthanksgotoSherifYacoub,whoframedoutseveral of the patterns with me more than a decade ago. His analytical and design expertise greatly affectedChapter7inparticular.I’dalsoliketothankXiaofanLinforexcellentcollaboration onvariousmeta-algorithmicexperiments(partofspeechtaggingandOCR,forexample),not tomentionhisgreatleadershiponvotingpatterns.MyfriendandcolleagueIgorBoykoworked withmeonearlymeta-algorithmicsearchapproaches.YanXiongalsoworkedonseveralof theoriginalexperiments,andinparticulardiscoveredhybridwaystoperformjournalsplitting. John Burns led the team comprising all these u¨bersmart researchers, and was tremendously supportiveofearlywork. I would be remiss at best to fail to mention Doug Heins, my friend and confidant, who hasthemostmeta-algorithmicmindofanyoneIknow.That’sright,ithasimprovedaccuracy, robustnessandcost(yescost—Iowealottohim,buttodatehehasnotchargedme!).Mydeep thanksalsotoDaveWright,whohasextendedmeta-algorithmicstofantasyfootballandother areas. In addition to his great insights during the kernel meta-algorithmic text classification work,Davecontinuestobeasourceofwisdomandperspectiveforme. Icanonlybegintothankallmywonderfulcollaboratorsinthevariousdomains—imaging tosecuritytobiometricstospeechanalysis—coveredinpartinthisbook.Particularmention, however, goes to Guy Adams, Stephen Pollard, Reed Ayers, Henry Sang, Dave Isaacson, MarvLuttges,DavidAuter,DalongLiandMattGaubatz.IwishtoseparatelythankMargaret Sturgill, with whom I have collaborated for 18 years in various hybrid system architecture, classificationandimagingprojects. Finally, a huge thanks to my many supportive friends, including Dave Barry (the man of positiveenergy),JayVeazey(thewisementorandfontofinsight),JoostvanDerWater,Dave Klaus,MickKeyes,HelenBalinsky,GaryDispotoandEllisGayles,whohaveencouragedme throughoutthebookcreationprocess.Ihopethisdoesnotdisappoint! IfyouperformaTessellationandRecombinationpatternontheaboveparagraphs,theoutput wouldbequiteobvious.Iamaluckymanindeed.Thankssomuch! SteveSimske 17April2013

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.