ebook img

Compact and Fast Machine Learning Accelerator for IoT Devices PDF

157 Pages·2019·8.94 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Compact and Fast Machine Learning Accelerator for IoT Devices

Computer Architecture and Design Methodologies Hantao Huang Hao Yu Compact and Fast Machine Learning Accelerator for IoT Devices Computer Architecture and Design Methodologies Series editors Anupam Chattopadhyay, Noida, India Soumitra Kumar Nandy, Bangalore, India Jürgen Teich, Erlangen, Germany Debdeep Mukhopadhyay, Kharagpur, India Twilight zone of Moore’s law is affecting computer architecture design like never before. The strongest impact on computer architecture is perhaps the move from unicore to multicore architectures, represented by commodity architectures like general purpose graphics processing units (gpgpus). Besides that, deep impact of application-specificconstraintsfromemergingembeddedapplicationsispresenting designers with new, energy-efficient architectures like heterogeneous multi-core, accelerator-rich System-on-Chip (SoC). These effects together with the security, reliability, thermal and manufacturability challenges of nanoscale technologies are forcing computing platforms to move towards innovative solutions. Finally, the emergenceoftechnologiesbeyondconventionalcharge-basedcomputinghasledto a series of radical new architectures and design methodologies. The aim of this book series is to capture these diverse, emerging architectural innovations as well as the corresponding design methodologies. The scope will cover the following. Heterogeneous multi-core SoC and their design methodology Domain-specific Architectures and their design methodology Novel Technology constraints, such as security, fault-tolerance and their impact on architecture design Novel technologies, such as resistive memory, and their impact on architecture design Extremely parallel architectures More information about this series at http://www.springer.com/series/15213 Hantao Huang Hao Yu (cid:129) Compact and Fast Machine Learning Accelerator for IoT Devices 123 Hantao Huang Hao Yu Schoolof Electrical andElectronic Department ofElectrical Engineering andElectronic Engineering NanyangTechnological University SouthernUniversity ofScience Singapore, Singapore andTechnology Shenzhen,Guangdong, China ISSN 2367-3478 ISSN 2367-3486 (electronic) Computer Architecture andDesign Methodologies ISBN978-981-13-3322-4 ISBN978-981-13-3323-1 (eBook) https://doi.org/10.1007/978-981-13-3323-1 LibraryofCongressControlNumber:2018963040 ©SpringerNatureSingaporePteLtd.2019 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSingaporePteLtd. Theregisteredcompanyaddressis:152BeachRoad,#21-01/04GatewayEast,Singapore189721, Singapore Preface The Internet of Things (IoT) is the networked interconnection of every object to provideintelligentandhigh-qualityservice.ThepotentialofIoTanditsubiquitous computation reality are staggering, but limited by many technical challenges. One challenge is to have a real-time response to the dynamic ambient change. Machine learning accelerator on IoT edge devices is one potential solution since a centralizedsystemsufferslonglatencyofprocessinginthebackend.However,IoT edge devices are resource-constrained and machine learning algorithms are computational-intensive. Therefore, optimized machine learning algorithms, such as compact machine learning for less memory usage on IoT devices, are greatly needed. In this book, we explore the development of fast and compact machine learning accelerators by developing least-squares solver, tensor-solver, and distributed-solver. Moreover, applications such as energy management system using such machine learning solver on IoT devices are also investigated. Fromthefastmachinelearningperspective,thetargetistoperformfastlearning ontheneuralnetwork.Thisbookproposesaleast-squares-solverforasinglehidden layer neural network. Furthermore, this book explores the CMOS FPGA-based hardware accelerator and RRAM-based hardware accelerator. A 3D multilayer CMOS-RRAM accelerator architecture for incremental machine learning is pro- posed. By utilizing an incremental least-squares solver, the whole training process can be mapped on the 3D multilayer CMOS-RRAM accelerator with significant speed-up and energy-efficiency improvement. In addition, a CMOS-based FPGA realization of neural network with square-root-free Cholesky factorization is also investigated for training and inference. From the compact machine learning perspective, this book proposes a tensor-solver for the deep neural network compression with consideration of the accuracy. A layer-wise training of tensorized neural network (TNN) has been proposedtoformulatemultilayerneuralnetworksuchthattheweightmatrixcanbe significantly compressed during training. By reshaping the multilayer neural net- workweightmatrixintoahigh-dimensionaltensorwithalow-rankapproximation, significant network compression can be achieved with maintained accuracy. v vi Preface In addition, a highly parallel yet energy-efficient machine learning accelerator has been proposed for such tensorized neural network. From the large-scaled IoT network perspective, this book proposes a distributed-solver on IoT devices. Furthermore, this book proposes a distributed neural network and sequential learning on the smart gateways for indoor posi- tioning, energy management, and IoT network security. For indoor positioning system,experimentalresultsshowthattheproposedalgorithmcanachieve50(cid:1)and 38(cid:1) timing speedup during inference and training respectively with comparable positioningaccuracy,whencomparedtotraditionalsupportvectormachine(SVM) method.Similarimprovementisalsoobservedforenergymanagement systemand network intrusion detection system. Thisbookprovides astate-of-the-art summary forthelatest literature reviewon machinelearningacceleratoronIoTsystemsandcoversthewholedesignflowfrom machine learning algorithm optimization to hardware implementation. As such, besides Chap. 1 discusses the emerging challenges, Chaps. 2–5 discuss the details onalgorithmoptimizationandthemappingonhardware.Morespecifically,Chap.2 presents an overview of IoT systems and machine learning algorithms. Here, we first discuss the edge computing in IoT devices and a typical IoT system for smart buildings is presented. Then, machine learning is discussed in more details with machine learning basics, machine learning accelerators, distributed machine learning, and machine learning model optimization. Chapter 3 introduces a fast machine learning accelerator with the target to perform fast learning on neural network. Aleast-squares-solver for single hidden layer neural network isproposed accordingly.Chapter4presentsatensor-solverfordeepneuralnetworkwithneural network compression. Representing each weight as a high-dimensional tensor and then performing tensor-train decomposition can effectively reduce the size of weight matrix (number of parameters). Chapter 5 discusses a distributed neural network with onlinesequential learning.The applicationofsuchdistributedneural network is investigated in the smart building environment. With such common machine learning engine, energy management, indoor positioning, and network security can be performed. Finally, the authors would like to thank all the colleagues from the CMOS Emerging Technology Group at Nanyang Technological University: Leibin Ni, HangXu,ZichuanLiu,XiweiHuangandWenyeLiu.Theirsupportsareinvaluable tousduringthewriting ofthisbook.TheauthorHantaoHuangisalsogratefulfor the kind support from Singapore Joint Industry Program (JIP) with Mediatek Singapore. Singapore, Singapore Hantao Huang September 2018 Hao Yu Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Internet of Things (IoT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Machine Learning Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Fundamentals and Literature Review. . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Edge Computing on IoT Devices. . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 IoT Based Smart Buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 IoT Based Indoor Positioning System . . . . . . . . . . . . . . . . 11 2.2.2 IoT Based Energy Management System . . . . . . . . . . . . . . 12 2.2.3 IoT Based Network Intrusion Detection System. . . . . . . . . 14 2.3 Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Machine Learning Basics. . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Distributed Machine Learning . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Machine Learning Accelerator . . . . . . . . . . . . . . . . . . . . . 20 2.3.4 Machine Learning Model Optimization. . . . . . . . . . . . . . . 22 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Least-Squares-Solver for Shallow Neural Network. . . . . . . . . . . . . . 29 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Algorithm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 Incremental Least-Squares Solver . . . . . . . . . . . . . . . . . . . 34 3.3 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 CMOS Based Accelerator . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 RRAM-Crossbar Based Accelerator . . . . . . . . . . . . . . . . . 44 vii viii Contents 3.4 Experiment Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 CMOS Based Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.2 RRAM Based Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 Tensor-Solver for Deep Neural Network. . . . . . . . . . . . . . . . . . . . . . 63 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Algorithm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.2 Shallow Tensorized Neural Network. . . . . . . . . . . . . . . . . 67 4.2.3 Deep Tensorized Neural Network. . . . . . . . . . . . . . . . . . . 70 4.2.4 Layer-wise Training of TNN . . . . . . . . . . . . . . . . . . . . . . 72 4.2.5 Fine-tuning of TNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.6 Quantization of TNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.7 Network Interpretation of TNN . . . . . . . . . . . . . . . . . . . . 75 4.3 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.1 3D Multi-layer CMOS-RRAM Architecture . . . . . . . . . . . 76 4.3.2 TNN Accelerator Design on 3D CMOS-RRAM Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Experiment Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4.1 TNN Performance Evaluation and Analysis. . . . . . . . . . . . 83 4.4.2 TNN Benchmarked Result . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.3 TNN Hardware Accelerator Result . . . . . . . . . . . . . . . . . . 96 4.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5 Distributed-Solver for Networked Neural Network. . . . . . . . . . . . . . 107 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.1.1 Indoor Positioning System . . . . . . . . . . . . . . . . . . . . . . . . 108 5.1.2 Energy Management System . . . . . . . . . . . . . . . . . . . . . . 109 5.1.3 Network Intrusion Detection System. . . . . . . . . . . . . . . . . 109 5.2 Algorithm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.2.1 Distributed Neural Network . . . . . . . . . . . . . . . . . . . . . . . 110 5.2.2 Online Sequential Model Update . . . . . . . . . . . . . . . . . . . 112 5.2.3 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3 IoT Based Indoor Positioning System . . . . . . . . . . . . . . . . . . . . . 115 5.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.2 Indoor Positioning System . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4 IoT Based Energy Management System. . . . . . . . . . . . . . . . . . . . 121 5.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4.2 Energy Management System . . . . . . . . . . . . . . . . . . . . . . 122 5.4.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Contents ix 5.5 IoT Based Network Security System . . . . . . . . . . . . . . . . . . . . . . 132 5.5.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.5.2 Network Intrusion Detection System. . . . . . . . . . . . . . . . . 133 5.5.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.6 Conclusion and Future Works. . . . . . . . . . . . . . . . . . . . . . . . . . . 140 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2 Recommendations for Future Works . . . . . . . . . . . . . . . . . . . . . . 147 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Description:
This book presents the latest techniques for machine learning based data analytics on IoT edge devices. A comprehensive literature review on neural network compression and machine learning accelerator is presented from both algorithm level optimization and hardware architecture optimization. Coverag
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.