< BACK Elements of Artificial Neural Networks Kishan Mehrotra, Chilukuri K. Mohan and Sanjay Ranka Preface 1 Introduction 1.1 History of Neural Networks 1.2 Structure and Function of a Single Neuron 1.2.1 Biological neurons 1.2.2 Artificial neuron models 1.3 Neural Net Architectures October 1996 1.3.1 Fully connected networks ISBN 0-262-13328-8 1.3.2 Layered networks 344 pp., 144 illus. $70.00/£45.95 (CLOTH) 1.3.3 Acyclic networks ADD TO CART 1.3.4 Feedforward networks 1.3.5 Modular neural networks Series 1.4 Neural Learning Bradford Books 1.4.1 Correlation learning Complex Adaptive Systems 1.4.2 Competitive learning Related Links 1.4.3 Feedback-based weight adaptation Instructor's Manual ^ i I 1.5 What Can Neural Networks Be Used for? More about this book and ^ i I related software 1.5.1 Classification Request Exam/Desk Copy 1.5.2 Clustering Table of Contents 1.5.3 Vector quantization 1.5.4 Pattern association 1.5.5 Function approximation 1.5.6 Forecasting 1.5.7 Control applications 1.5.8 Optimization 1.5.9 Search 1.6 Evaluation of Networks 1.6.1 Quality of results 1.6.2 Generalizability 1.6.3 Computational resources 1.7 Implementation 1.8 Conclusion 1.9 Exercises 2 Supervised Learning: Single-Layer Networks 2.1 Perceptrons 2.4 Guarantee of Success 2.5 Modifications 2.5.1 Pocket algorithm 2.5.2 Adalines 2.5.3 Multiclass discrimination 2.6 Conclusion 2.7 Exercises 3 Supervised Learning: Multilayer Networks I 3.1 Multilevel Discrimination 3.2 Preliminaries 3.2.1 Architecture 3.2.2 Objectives 3.3 Backpropagation Algorithm 3.4 Setting the Parameter Values 3.4.1 Initialization of weights 3.4.2 Frequency of weight updates 3.4.3 Choice of learning rate 3.4.4 Momentum 3.4.5 Generalizability 3.4.6 Number of hidden layers and nodes 3.4.7 Number of samples 3.5 Theoretical Results* 3.5.1 Cover's theorem 3.5.2 Representations of functions 3.5.3 Approximations of functions 3.6 Accelerating the Learning Process 3.6.1 Quickprop algorithm 3.6.2 Conjugate gradient 3.7 Applications 3.7.1 Weaning from mechanically assisted ventilation 3.7.2 Classification of myoelectric signals 3.7.3 Forecasting commodity prices 3.7.4 Controlling a gantry crane 3.8 Conclusion 3.9 Exercises 4 Supervised Learning: Mayer Networks II 4.1 Madalines 4.2 Adaptive Multilayer Networks 4.2.6 Tiling algorithm 4.3 Prediction Networks 4.3.1 Recurrent networks 4.3.2 Feedforward networks for forecasting 4.4 Radial Basis Functions 4.5 Polynomial Networks 4.6 Regularization 4.7 Conclusion 4.8 Exercises 5 Unsupervised Learning 5.1 Winner-Take-All Networks 5.1.1 Hamming networks 5.1.2 Maxnet 5.1.3 Simple competitive learning 5.2 Learning Vector Quantizers 5.3 Counterpropagation Networks 5.4 Adaptive Resonance Theory 5.5 Topologically Organized Networks 5.5.1 Self-organizing maps 5.5.2 Convergence* 5.5.3 Extensions 5.6 Distance-Based Learning 5.6.1 Maximum entropy 5.6.2 Neural gas 5.7 Neocognitron 5.8 Principal Component Analysis Networks 5.9 Conclusion 5.10 Exercises 6 Associative Models 6.1 Non-iterative Procedures for Association 6.2 Hopfield Networks 6.2.1 Discrete Hopfield networks 6.2.2 Storage capacity of Hopfield networks* 6.2.3 Continuous Hopfield networks 6.3 Brain-State-in-a-Box Network 6.4 Boltzmann Machines 6.4.1 Mean field annealing 6.5 Hetero-associators 7.1.2 Solving simultaneous linear equations 7.1.3 Allocating documents to multiprocessors Discrete Hopfield network Continuous Hopfield network Performance 7.2 Iterated Gradient Descent 7.3 Simulated Annealing 7.4 Random Search 7.5 Evolutionary Computation 7.5.1 Evolutionary algorithms 7.5.2 Initialization 7.5.3 Termination criterion 7.5.4 Reproduction 7.5.5 Operators Mutation Crossover 7.5.6 Replacement 7.5.7 Schema Theorem* 7.6 Conclusion 7.7 Exercises Appendix A: A Little Math A.1 Calculus A.2 Linear Algebra A.3 Statistics Appendix B: Data B.1 Iris Data B.2 Classification of Myoelectric Signals B.3 Gold Prices B.4 Clustering Animal Features B.5 3-D Corners, Grid and Approximation B.6 Eleven-City Traveling Salesperson Problem (Distances) B.7 Daily Stock Prices of Three Companies, over the Same Period B.8 Spiral Data Bibliography Index Preface This book is intended as an introduction to the subject of artificial neural networks for readers at the senior undergraduate or beginning graduate levels, as well as professional engineers and scientists. The background presumed is roughly a year of college-level mathematics, and some amount of exposure to the task of developing algorithms and com puter programs. For completeness, some of the chapters contain theoretical sections that discuss issues such as the capabilities of algorithms presented. These sections, identified by an asterisk in the section name, require greater mathematical sophistication and may be skipped by readers who are willing to assume the existence of theoretical results about neural network algorithms. Many off-the-shelf neural network toolkits are available, including some on the Internet, and some that make source code available for experimentation. Toolkits with user-friendly interfaces are useful in attacking large applications; for a deeper understanding, we recom mend that the reader be willing to modify computer programs, rather than remain a user of code written elsewhere. The authors of this book have used the material in teaching courses at Syracuse Univer sity, covering various chapters in the same sequence as in the book. The book is organized so that the most frequently used neural network algorithms (such as error backpropagation) are introduced very early, so that these can form the basis for initiating course projects. Chapters 2, 3, and 4 have a linear dependency and, thus, should be covered in the same sequence. However, chapters 5 and 6 are essentially independent of each other and earlier chapters, so these may be covered in any relative order. If the emphasis in a course is to be on associative networks, for instance, then chapter 6 may be covered before chapters 2, 3, and 4. Chapter 6 should be discussed before chapter 7. If the "non-neural" parts of chap ter 7 (sections 7.2 to 7.5) are not covered in a short course, then discussion of section 7.1 may immediately follow chapter 6. The inter-chapter dependency rules are roughly as follows. l->2->3->4 l-»5 l->6 3-»5.3 6.2-• 7.1 Within each chapter, it is best to cover most sections in the same sequence as the text; this is not logically necessary for parts of chapters 4, 5, and 7, but minimizes student confusion. Material for transparencies may be obtained from the authors. We welcome suggestions for improvements and corrections. Instructors who plan to use the book in a course should XIV Preface send electronic mail to one of the authors, so that we can indicate any last-minute cor rections needed (if errors are found after book production). New theoretical and practical developments continue to be reported in the neural network literature, and some of these are relevant even for newcomers to the field; we hope to communicate some such results to instructors who contact us. The authors of this book have arrived at neural networks through different paths (statistics, artificial intelligence, and parallel computing) and have developed the mate rial through teaching courses in Computer and Information Science. Some of our biases may show through the text, while perspectives found in other books may be missing; for instance, we do not discount the importance of neurobiological issues, although these con sume little ink in the book. It is hoped that this book will help newcomers understand the rationale, advantages, and limitations of various neural network models. For details regarding some of the more mathematical and technical material, the reader is referred to more advanced texts such as those by Hertz, Krogh, and Palmer (1990) and Haykin (1994). We express our gratitiude to all the researchers who have worked on and written about neural networks, and whose work has made this book possible. We thank Syracuse Uni versity and the University of Florida, Gainesville, for supporting us during the process of writing this book. We thank Li-Min Fu, Joydeep Ghosh, and Lockwood Morris for many useful suggestions that have helped improve the presentation. We thank all the students who have suffered through earlier drafts of this book, and whose comments have improved this book, especially S. K. Bolazar, M. Gunwani, A. R. Menon, and Z. Zeng. We thank Elaine Weinman, who has contributed much to the development of the text. Harry Stanton of the MIT Press has been an excellent editor to work with. Suggestions on an early draft of the book, by various reviewers, have helped correct many errors. Finally, our families have been the source of much needed support during the many months of work this book has entailed. We expect that some errors remain in the text, and welcome comments and correc tions from readers. The authors may be reached by electronic mail at [email protected], [email protected], and [email protected]. In particular, there has been so much recent research in neural networks that we may have mistakenly failed to mention the names of researchers who have developed some of the ideas discussed in this book. Errata, computer programs, and data files will be made accessible by Internet. A Introduction If we could first know where we are, and whither we are tending, we could better judge what to do, and how to do it. —Abraham Lincoln Many tasks involving intelligence or pattern recognition are extremely difficult to auto mate, but appear to be performed very easily by animals. For instance, animals recognize various objects and make sense out of the large amount of visual information in their surroundings, apparently requiring very little effort. It stands to reason that computing sys tems that attempt similar tasks will profit enormously from understanding how animals perform these tasks, and simulating these processes to the extent allowed by physical lim itations. This necessitates the study and simulation of Neural Networks. The neural network of an animal is part of its nervous system, containing a large number of interconnected neurons (nerve cells). "Neural" is an adjective for neuron, and "net work" denotes a graph-like structure. Artificial neural networks refer to computing sys tems whose central theme is borrowed from the analogy of biological neural networks. Bowing to common practice, we omit the prefix "artificial." There is potential for confus ing the (artificial) poor imitation for the (biological) real thing; in this text, non-biological words and names are used as far as possible. Artificial neural networks are also referred to as "neural nets," "artificial neural sys tems," "parallel distributed processing systems," and "connectionist systems." For a com puting system to be called by these pretty names, it is necessary for the system to have a labeled directed graph structure where nodes perform some simple computations. From elementary graph theory we recall that a "directed graph" consists of a set of "nodes" (ver tices) and a set of "connections" (edges/links/arcs) connecting pairs of nodes. A graph is a "labeled graph" if each connection is associated with a label to identify some property of the connection. In a neural network, each node performs some simple computations, and each connection conveys a signal from one node to another, labeled by a number called the "connection strength" or "weight" indicating the extent to which a signal is amplified or diminished by a connection. Not every such graph can be called a neural network, as il lustrated in example 1.1 using a simple labeled directed graph that conducts an elementary computation. EXAMPLE 1.1 The "AND" of two binary inputs is an elementary logical operation, imple mented in hardware using an "AND gate." If the inputs to the AND gate are x\ e {0,1} and X2 e {0,1}, the desired output is 1 if x\ = X2 = 1, and 0 otherwise. A graph representing this computation is shown in figure 1.1, with one node at which computation (multiplica tion) is carried out, two nodes that hold the inputs (x\,x ), and one node that holds one 2 output. However, this graph cannot be considered a neural network since the connections 1 Introduction Multiplier *,s{0,l}- •*> o = xx ANDJC2 X26{0,1}- Figure 1.1 AND gate graph. (W,J:1)(H'2J(2) 1 »» o = JCI AND x^ Figure 1.2 AND gate network. between the nodes are fixed and appear to play no other role than carrying the inputs to the node that computes their conjunction. We may modify the graph in figure 1.1 to obtain a network containing weights (connec tion strengths), as shown in figure 1.2. Different choices for the weights result in different functions being evaluated by the network. Given a network whose weights are initially random, and given that we know the task to be accomplished by the network, a "learn ing algorithm" must be used to determine the values of the weights that will achieve the desired task. The graph structure, with connection weights modifiable using a learning al gorithm, qualifies the computing system to be called an artificial neural network. EXAMPLE 1.2 For the network shown in figure 1.2, the following is an example of a learning algorithm that will allow learning the AND function, starting from arbitrary val ues of w\ and u>2. The trainer uses the following four examples to modify the weights: {(*l = l,JC =l,</=l),(*i=0, * = 0,d = 0), (*i = l,*2 = (W = 0),(*i=0,JC2 = 2 2 1, d = 0)}. An (*i, JC2) pair is presented to the network, and the result o computed by the network is observed. If the value of o coincides with the desired result, d, the weights are not changed. If the value of o is smaller than the desired result, w\ is increased by 0.1; and if the value of o is larger than the desired result, w\ is decreased by 0.1. For instance, Introduction 3 if w\ = 0.7 and W2 = 0.2, then the presentation of (jq = 1, X2 = 1) results in an output of o = 0.14 which is smaller than the desired value of 1, hence the learning algorithm in creases w\ to 0.8, so that the new output for (*i = 1, X2 = 1) would be o = 0.16, which is closer to the desired value than the previous value (p = 0.14), although still unsatisfactory. This process of modifying tt>i or u>2 may be repeated until the final result is satisfactory, with weights w\ = 5.0, W2 = 0.2. Can the weights of such a net be modified so that the system performs a different task? For instance, is there a set of values for w\ and W2 such that a net otherwise identical to that shown in figure 1.2 can compute the OR of its inputs? Unfortunately, there is no possible choice of weights w\ and u>2 such that {w\ • x\) • (tU2 • *2) will compute the OR of x\ and X2. For instance, whenever x\ = 0, the output value (w\ - x\) > (\V2 • X2) = 0, irrespective of whether X2 = 1. The node function was predetermined to multiply weighted inputs, imposing a fundamental limitation on the. capabilities of the network shown in figure 1.2, although it was adequate for the task of computing the AND function and for functions described by the mathematical expression o = wiW2XiX2. A different node function is needed if there is to be some chance of learning the OR function. An example of such a node function is (x\ + X2 — x\ • X2), which evaluates to 1 if x\ = 1 or X2 = 1, and to 0 if x\ = 0 and X2 = 0 (assuming that each input can take only a 0 or 1 value). But this network cannot be used to compute the AND function. Sometimes, a network may be capable of computing a function, but the learning algo rithm may not be powerful enough to find a satisfactory set of weight values, and the final result may be constrained due to the initial (random) choice of weights. For instance, the AND function cannot be learnt accurately using the learning algorithm described above if we started from initial weight values w\ — W2 = 0.3, since the solution w\ = 1/0.3 cannot be reached by repeatedly incrementing (or decrementing) the initial choice of w\ by 0.1. We seem to be stuck with a one node function for AND and another for OR. What if we did not know beforehand whether the desired function was AND or OR? Is there some node function such that we can simulate AND as well as OR by using different weight values? Is there a different network that is powerful enough to learn every conceivable function of its inputs? Fortunately, the answer is yes; networks can be built with suffi ciently general node functions so that a large number of different problems can be solved, using a different set of weight values for each task. The AND gate example has served as a takeoff point for several important questions: what are neural networks, what can they accomplish, how can they be modified, and what are their limitations? In the rest of this chapter, we review the history of research in neural networks, and address four important questions regarding neural network systems.
Description: