Ke-Lin Du · M. N. S. Swamy Neural Networks and Statistical Learning Neural Networks and Statistical Learning Ke-Lin Du M. N. S. Swamy • Neural Networks and Statistical Learning 123 Ke-Lin Du M.N.S. Swamy EnjoyorLabs Department of Electricaland Computer EnjoyorInc. Engineering Hangzhou ConcordiaUniversity China Montreal, QC Canada and Department of Electricaland Computer Engineering ConcordiaUniversity Montreal, QC Canada Additionalmaterialtothisbookcanbedownloadedfromhttp://extras.springer.com/ ISBN 978-1-4471-5570-6 ISBN 978-1-4471-5571-3 (eBook) DOI 10.1007/978-1-4471-5571-3 SpringerLondonHeidelbergNewYorkDordrecht LibraryofCongressControlNumber:2013948860 (cid:2)Springer-VerlagLondon2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) In memory of my grandparents K.-L. Du To my family M. N. S. Swamy To all the researchers with original contri- butions to neural networks and machine learning K.-L. Du, and M. N. S. Swamy Preface The human brain, consisting of nearly 1011 neurons, is the center of human intelligence. Human intelligence has been simulated in various ways. Artificial intelligence (AI) pursues exact logical reasoning based on symbol manipulation. Fuzzy logics model the highly uncertain behavior of decision making. Neural networks model the highly nonlinear infrastructure of brain networks. Evolu- tionarycomputationmodelstheevolutionofintelligence.Chaostheorymodelsthe highly nonlinear and chaotic behaviors of human intelligence. Softcomputingisanevolvingcollectionofmethodologiesfortherepresentation of ambiguity in human thinking; it exploits the tolerance for imprecision and uncertainty, approximate reasoning, and partial truth in order to achieve tracta- bility, robustness, and low-cost solutions. The major methodologies of softcom- puting are fuzzy logic, neural networks, and evolutionary computation. Conventional model-based data-processing methods require experts’ knowl- edgeforthemodelingofasystem.Neuralnetworkmethodsprovideamodel-free, adaptive, fault tolerant, parallel and distributed processing solution. A neural network is a black box that directly learns the internal relations of an unknown system, without guessing functions for describing cause-and-effect relationships. The neural network approach is a basic methodology of information processing. Neural network models may be used for function approximation, classification, nonlinearmapping,associativememory,vectorquantization,optimization,feature extraction, clustering, and approximate inference. Neural networks have wide applications in almost all areas of science and engineering. Fuzzy logic provides a means for treating uncertainty and computing with words. This mimics human recognition, which skillfully copes with uncertainty. Fuzzy systems are conventionally created from explicit knowledge expressed in theformoffuzzyrules,whicharedesignedbasedonexperts’experience.Afuzzy system canexplainitsactionbyfuzzyrules. Neurofuzzy systems,asasynergyof fuzzy logic and neural networks, possess both learning and knowledge represen- tation capabilities. This book is our attempt to bring together the major advances in neural net- worksandmachinelearning,andtoexplaintheminastatisticalframework.While some mathematical details are needed, we emphasize the practical aspects of the modelsandmethodsratherthanthetheoreticaldetails.Tous,neuralnetworksare merely some statistical methods that can be represented by graphs and networks. vii viii Preface Theycaniterativelyadjustthenetworkparameters.Asastatisticalmodel,aneural network can learn the probability density function from the given samples, and then predict, by generalization according to the learnt statistics, outputs for new samples that are not included in the learning sample set. The neural network approach is a general statistical computational paradigm. Neural network research solves two problems: the direct problem and the inverse problem. The direct problem employs computer and engineering techniques to model biological neural systemsofthehumanbrain. This problem isinvestigated bycognitivescientistsandcanbeusefulinneuropsychiatryandneurophysiology. Theinverseproblemsimulatesbiologicalneuralsystemsfortheirproblem-solving capabilities for application in scientific or engineering fields. Engineering and computerscientistshaveconductedextensiveinvestigationinthisarea.Thisbook concentrates mainly on the inverse problem, although the two areas often shed light on each other. The biological and psychological plausibility of the neural network models have not been seriously treated in this book, though some back- ground material is discussed. Thisbookisintendedtobeusedasatextbookforadvancedundergraduateand graduate students in engineering, science, computer science, business, arts, and medicine. It is also a good reference book for scientists, researchers, and practi- tioners in a wide variety offields, and assumes no previous knowledge of neural network or machine learning concepts. Thisbookisdividedinto25chaptersandtwoappendices.Itcontainsalmostall themajorneuralnetworkmodelsandstatisticallearningapproaches.Wealsogive an introduction to fuzzy sets and logic, and neurofuzzy models. Hardware implementations of the models are discussed. Two chapters are dedicated to the applications of neural network and statistical learning approaches to biometrics/ bioinformatics and data mining. Finally, in the appendices, some mathematical preliminariesaregiven,andbenchmarksforvalidatingallkindsofneuralnetwork methods and some web resources are provided. First and foremost we would like to thank the supporting staff from Springer London, especially Anthony Doyle and Grace Quinn for their enthusiastic and professional support throughout the period of manuscript preparation. K.-L.DualsowishestothankJiabinLu(GuangdongUniversityofTechnology, China), Jie Zeng (Richcon MC, Inc., China), Biaobiao Zhang and Hui Wang (Enjoyor, Inc., China), and many of his graduate students including Na Shou, Shengfeng Yu, Lusha Han, Xiaolan Shen, Yuanyuan Chen, and Xiaoling Wang (Zhejiang University of Technology, China) for their consistent assistance. In addition, we should mention at least the following names for their help: Omer Morgul (Bilkent University, Turkey), Yanwu Zhang (Monterey Bay Aquarium Research Institute, USA), Chi Sing Leung (City University of Hong Kong, Hong Kong), M. Omair Ahmad and Jianfeng Gu (Concordia University, Canada), Li Yu, Limin Meng, Jingyu Hua, Zhijiang Xu, and Luping Fang (Zhe- jiang University of Technology, China), Yuxing Dai (Wenzhou University, China), and Renwang Li (Zhejiang Sci-Tech University, China). Last, but not Preface ix least, we would like to thank our families for their support and understanding during the course of writing this book. Abookofthislengthiscertaintohavesomeerrorsandomissions.Feedbackis welcome via email at [email protected] or [email protected]. MATLAB code for the worked examples is downloadable from the website of this book. Hangzhou, China K.-L. Du Montreal, Canada M. N. S. Swamy Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Major Events in Neural Networks Research. . . . . . . . . . . . . 1 1.2 Neurons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 The McCulloch–Pitts Neuron Model. . . . . . . . . . . 5 1.2.2 Spiking Neuron Models. . . . . . . . . . . . . . . . . . . . 6 1.3 Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Fundamentals of Machine Learning . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Learning Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Learning and Generalization. . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Generalization Error . . . . . . . . . . . . . . . . . . . . . . 21 2.2.2 Generalization by Stopping Criterion. . . . . . . . . . . 21 2.2.3 Generalization by Regularization . . . . . . . . . . . . . 23 2.2.4 Fault Tolerance and Generalization. . . . . . . . . . . . 24 2.2.5 Sparsity Versus Stability . . . . . . . . . . . . . . . . . . . 25 2.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 Crossvalidation. . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Complexity Criteria. . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Bias and Variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Robust Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 Neural Network Processors . . . . . . . . . . . . . . . . . . . . . . . . 33 2.7 Criterion Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Computational Learning Theory. . . . . . . . . . . . . . . . . . . . . 39 2.8.1 Vapnik-Chervonenkis Dimension . . . . . . . . . . . . . 40 2.8.2 Empirical Risk-Minimization Principle . . . . . . . . . 41 2.8.3 Probably Approximately Correct Learning. . . . . . . 43 2.9 No-Free-Lunch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.10 Neural Networks as Universal Machines . . . . . . . . . . . . . . . 45 2.10.1 Boolean Function Approximation . . . . . . . . . . . . . 45 2.10.2 Linear Separability and Nonlinear Separability. . . . 47 2.10.3 Continuous Function Approximation. . . . . . . . . . . 49 2.10.4 Winner-Takes-All . . . . . . . . . . . . . . . . . . . . . . . . 50 xi xii Contents 2.11 Compressed Sensing and Sparse Approximation. . . . . . . . . . 51 2.11.1 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . 51 2.11.2 Sparse Approximation. . . . . . . . . . . . . . . . . . . . . 53 2.11.3 LASSO and Greedy Pursuit . . . . . . . . . . . . . . . . . 54 2.12 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3 Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.1 One-Neuron Perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2 Single-Layer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3 Perceptron Learning Algorithm. . . . . . . . . . . . . . . . . . . . . . 69 3.4 Least-Mean Squares (LMS) Algorithm . . . . . . . . . . . . . . . . 71 3.5 P-Delta Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.6 Other Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 76 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4 Multilayer Perceptrons: Architecture and Error Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Universal Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Backpropagation Learning Algorithm . . . . . . . . . . . . . . . . . 85 4.4 Incremental Learning Versus Batch Learning. . . . . . . . . . . . 90 4.5 Activation Functions for the Output Layer. . . . . . . . . . . . . . 95 4.6 Optimizing Network Structure . . . . . . . . . . . . . . . . . . . . . . 96 4.6.1 Network Pruning Using Sensitivity Analysis . . . . . 96 4.6.2 Network Pruning Using Regularization . . . . . . . . . 99 4.6.3 Network Growing. . . . . . . . . . . . . . . . . . . . . . . . 101 4.7 Speeding Up Learning Process. . . . . . . . . . . . . . . . . . . . . . 102 4.7.1 Eliminating Premature Saturation . . . . . . . . . . . . . 102 4.7.2 Adapting Learning Parameters . . . . . . . . . . . . . . . 104 4.7.3 Initializing Weights. . . . . . . . . . . . . . . . . . . . . . . 108 4.7.4 Adapting Activation Function. . . . . . . . . . . . . . . . 110 4.8 Some Improved BP Algorithms . . . . . . . . . . . . . . . . . . . . . 112 4.8.1 BP with Global Descent. . . . . . . . . . . . . . . . . . . . 113 4.8.2 Robust BP Algorithms. . . . . . . . . . . . . . . . . . . . . 115 4.9 Resilient Propagation (RProp) . . . . . . . . . . . . . . . . . . . . . . 115 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Multilayer Perceptrons: Other Learning Techniques. . . . . . . . . . 127 5.1 Introduction to Second-Order Learning Methods. . . . . . . . . . 127 5.2 Newton’s Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2.1 Gauss–Newton Method . . . . . . . . . . . . . . . . . . . . 129 5.2.2 Levenberg–Marquardt Method . . . . . . . . . . . . . . . 130