ebook img

Thinking Machines: Machine Learning and Its Hardware Implementation PDF

324 Pages·2021·16.423 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Thinking Machines: Machine Learning and Its Hardware Implementation

Thinking Machines Machine Learning and Its Hardware Implementation This page intentionally left blank Thinking Machines Machine Learning and Its Hardware Implementation Shigeyuki Takano FacultyofComputerScienceandEngineering KeioUniversity Kanagawa,Japan AcademicPressisanimprintofElsevier 125LondonWall,LondonEC2Y5AS,UnitedKingdom 525BStreet,Suite1650,SanDiego,CA92101,UnitedStates 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom FirstPublishedinJapan2017byImpressR&D,©2017ShigeyukiTakano EnglishLanguageRevisionPublishedbyElsevierInc.,©2021ShigeyukiTakano Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangements withorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency, canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand experiencebroadenourunderstanding,changesinresearchmethods,professionalpractices,or medicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusinganyinformation,methods,compounds,orexperimentsdescribedherein.In usingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyof others,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-818279-6 ForinformationonallAcademicPresspublications visitourwebsiteathttps://www.elsevier.com/books-and-journals Publisher:MaraConner EditorialProjectManager:EmilyThomson ProductionProjectManager:NiranjanBhaskaran Designer:MilesHitchen TypesetbyVTeX Contents Listoffigures xi Listoftables xv Biography xvii Preface xix Acknowledgments xxi Outline xxiii 1. Introduction 1.1 Dawnofmachinelearning 1 1.1.1 IBMWatsonchallengeonJeopardy! 1 1.1.2 ImageNetchallenge 2 1.1.3 GoogleAlphaGochallengeofprofessionalGoplayer 2 1.2 Machinelearningandapplications 3 1.2.1 Definition 3 1.2.2 Applications 4 1.3 Learninganditsperformancemetrics 5 1.3.1 Preparationbeforelearning 5 1.3.2 Learningmethods 7 1.3.3 Performancemetricsandverification 9 1.4 Examples 13 1.4.1 Industry4.0 13 1.4.2 Transaction(blockchain) 15 1.5 Summaryofmachinelearning 17 1.5.1 Differencefromartificialintelligence 17 1.5.2 Hypecycle 18 2. Traditionalmicroarchitectures 2.1 Microprocessors 19 2.1.1 Coremicroarchitecture 19 2.1.2 Programmingmodelofamicroprocessor 20 2.1.3 Microprocessormeetsitscomplexity 21 2.1.4 Prosandconsonsuperscalarmicroprocessors 23 2.1.5 Scalingofregisterfile 23 2.1.6 Branchpredicationanditspenalty 23 v vi Contents 2.2 Many-coreprocessors 24 2.2.1 Conceptofmany-core 24 2.2.2 Programmingmodel 25 2.3 Digitalsignalprocessors(DSPs) 26 2.3.1 ConceptofDSP 26 2.3.2 DSPmicroarchitecture 27 2.4 Graphicsprocessingunits(GPU) 27 2.4.1 ConceptofGPU 27 2.4.2 GPUmicroarchitecture 28 2.4.3 Programmingmodelongraphicsprocessingunits 30 2.4.4 ApplyingGPUstoacomputingsystem 30 2.5 Field-programmablegatearrays(FPGAs) 30 2.5.1 ConceptofFPGA 30 2.5.2 FPGAmicroarchitecture 31 2.5.3 FPGAdesignflow 32 2.5.4 ApplyingFPGAstocomputingsystem 32 2.6 Dawnofdomain-specificarchitectures 34 2.6.1 Pastcomputerindustry 34 2.6.2 Historyofmachinelearninghardware 35 2.6.3 Revisitingmachinelearninghardware 36 2.7 Metricsofexecutionperformance 38 2.7.1 Latencyandthroughput 39 2.7.2 Numberofoperationspersecond 39 2.7.3 Energyandpowerconsumptions 40 2.7.4 Energy-efficiency 41 2.7.5 Utilization 44 2.7.6 Data-reuse 45 2.7.7 Area 45 2.7.8 Cost 46 3. Machinelearninganditsimplementation 3.1 Neuronsandtheirnetwork 49 3.2 Neuromorphiccomputing 51 3.2.1 Spiketimingdependentplasticityandlearning 51 3.2.2 Neuromorphiccomputinghardware 53 3.2.3 Address-eventrepresentation 54 3.3 Neuralnetwork 56 3.3.1 Neuralnetworkmodels 56 3.3.2 Previousandcurrentneuralnetworks 60 3.3.3 Neuralnetworkhardware 61 3.4 Memorycellforanalogimplementation 66 4. Applications,ASICs,anddomain-specific architectures 4.1 Applications 67 4.1.1 Conceptofapplications 67 Contents vii 4.2 Applicationcharacteristics 68 4.2.1 Locality 68 4.2.2 Deadlock 70 4.2.3 Dependency 72 4.2.4 Temporalandspatialoperations 74 4.3 Application-specificintegratedcircuit 76 4.3.1 Designconstraints 76 4.3.2 Modularstructureandmassproduction 80 4.3.3 Makimoto’swave 82 4.3.4 Designflow 83 4.4 Domain-specificarchitecture 83 4.4.1 Introductiontodomain-specificarchitecture 83 4.4.2 Domain-specificlanguages 84 4.5 Machinelearninghardware 85 4.6 Analysisofinferenceandtrainingondeeplearning 86 4.6.1 Analysisofinferenceondeeplearning 87 4.6.2 Analysisoftrainingondeeplearning 89 5. Machinelearningmodeldevelopment 5.1 Developmentprocess 93 5.1.1 Developmentcycle 93 5.1.2 Cross-validation 94 5.1.3 Softwarestacks 95 5.2 Compilers 97 5.2.1 ONNX 97 5.2.2 NNVM 98 5.2.3 TensorFlowXLA 98 5.3 Codeoptimization 98 5.3.1 Extractingdata-levelparallelism 98 5.3.2 Memoryaccessoptimization 99 5.4 Pythonscriptlanguageandvirtualmachine 101 5.4.1 Pythonandoptimizations 101 5.4.2 Virtualmachine 102 5.5 Computeunifieddevicearchitecture 103 6. Performanceimprovementmethods 6.1 Modelcompression 105 6.1.1 Pruning 105 6.1.2 Dropout 109 6.1.3 DropConnect 111 6.1.4 Distillation 111 6.1.5 Principalcomponentanalysis 113 6.1.6 Weight-sharing 114 6.2 Numericalcompression 117 6.2.1 Quantizationandnumericalprecision 117 6.2.2 Impactonmemoryfootprintandinferenceaccuracy 121 viii Contents 6.2.3 Edge-cuttingandclipping 128 6.3 Encoding 128 6.3.1 Run-lengthcoding 128 6.3.2 Huffmancoding 130 6.3.3 Effectofcompression 131 6.4 Zero-skipping 135 6.4.1 Conceptofzero-skipping 135 6.4.2 CSRandCSCsparsityrepresentations 135 6.4.3 Usecaseofzero-skipping 138 6.5 Approximation 140 6.5.1 Conceptofapproximation 140 6.5.2 Activationfunctionapproximation 140 6.5.3 Multiplierapproximation 142 6.6 Optimization 144 6.6.1 Modeloptimization 144 6.6.2 Data-flowoptimization 146 6.7 Summaryofperformanceimprovementmethods 149 7. Casestudyofhardwareimplementation 7.1 Neuromorphiccomputing 151 7.1.1 Analoglogiccircuit 151 7.1.2 Digitallogiccircuit 152 7.2 Deepneuralnetwork 156 7.2.1 Analoglogiccircuit 156 7.2.2 DSPs 158 7.2.3 FPGAs 160 7.2.4 ASICs 167 7.3 Quantumcomputing 195 7.4 Summaryofcasestudies 196 7.4.1 Casestudyforneuromorphiccomputing 196 7.4.2 Casestudyfordeepneuralnetwork 202 7.4.3 Comparisonbetweenneuromorphiccomputinganddeep neuralnetworkhardware 203 8. Keystohardwareimplementation 8.1 Marketgrowthpredictions 205 8.1.1 IoTmarket 205 8.1.2 Roboticsmarket 205 8.1.3 Bigdataandmachinelearningmarkets 206 8.1.4 Artificialintelligencemarketindrugdiscovery 206 8.1.5 FPGAmarket 207 8.1.6 Deeplearningchipmarket 208 8.2 Tradeoffbetweendesignandcost 208 8.3 Hardwareimplementationstrategies 210 8.3.1 Requirementsofstrategyplanning 211 8.3.2 Basicstrategies 214 Contents ix 8.3.3 Alternativefactors 217 8.4 Summaryofhardwaredesignrequirements 217 9. Conclusion A. Basicsofdeeplearning A.1 Equationmodel 221 A.1.1 Feedforwardneuralnetworkmodel 222 A.1.2 Activationfunctions 222 A.1.3 Outputlayer 223 A.1.4 Learningandbackpropagation 223 A.1.5 Parameterinitialization 228 A.2 Matrixoperationfordeeplearning 228 A.2.1 Matrixrepresentationanditslayout 228 A.2.2 Matrixoperationsequenceforlearning 229 A.2.3 Learningoptimization 230 A.2.4 Bias-varianceproblem 230 B. Modelingofdeeplearninghardware B.1 Conceptofdeeplearninghardware 233 B.1.1 Relationshipbetweenparameterspaceandpropagation 233 B.1.2 Basicdeeplearninghardware 234 B.2 Data-flowondeeplearninghardware 234 B.3 Machinelearninghardwarearchitecture 234 C. Advancednetworkmodels C.1 CNNvariants 237 C.1.1 Convolutionarchitecture 237 C.1.2 Backpropagationforconvolution 239 C.1.3 Convolutionvariants 243 C.1.4 Deepconvolutionalgenerativeadversarialnetworks 245 C.2 RNNvariants 245 C.2.1 RNNarchitecture 245 C.2.2 LSTMandGRUcells 246 C.2.3 Highwaynetworks 248 C.3 Autoencodervariants 248 C.3.1 Stackeddenoisingautoencoders 248 C.3.2 Laddernetworks 249 C.3.3 Variationalautoencoders 250 C.4 Residualnetworks 251 C.4.1 Conceptofresidualnetworks 251 C.4.2 Effectofresidualnetwork 251 C.5 Graphneuralnetworks 252 C.5.1 Conceptofgraphneuralnetworks 252

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.