To Our Parents Jun-Hwa Huang & Wen-Chuan Wang, Danica & Mane Kecman, ˇ Stefanija & Antun Kopriva, and to Our Teachers Preface This is a book about (machine) learning from (experimental) data. Many booksdevotedtothisbroadfieldhavebeenpublishedrecently.Oneevenfeels tempted to begin the previous sentence with an adjective extremely. Thus, there is an urgent need to introduce both the motives for and the content of the present volume in order to highlight its distinguishing features. Before doing that, few words about the very broad meaning of data are in order. Today, we are surrounded by an ocean of all kind of experimental data (i.e.,examples,samples,measurements,records,patterns,pictures,tunes,ob- servations,..., etc) produced by various sensors, cameras, microphones, pieces of software and/or other human made devices. The amount of data produced is enormous and ever increasing. The first obvious consequence of such a fact is - humans can’t handle such massive quantity of data which are usually appearing in the numeric shape as the huge (rectangular or square) matri- ces. Typically, the number of their rows (n) tells about the number of data pairs collected, and the number of columns (m) represent the dimensionality of data. Thus, faced with the Giga- and Terabyte sized data files one has to develop new approaches, algorithms and procedures. Few techniques for cop- ing with huge data size problems are presented here. This, possibly, explains the appearance of a wording ’huge data sets’ in the title of the book. Anotherdirectconsequenceisthat(instead of attempting todiveintothe sea of hundreds of thousands or millions of high-dimensional data pairs) we are developing other ‘machines’ or ‘devices’ for analyzing, recognizing and/or learning from, such huge data sets. The so-called ‘learning machine’ is pre- dominantly a piece of software that implements both the learning algorithm andthefunction(network,model)whichparametershastobedeterminedby the learning part of the software. Today, it turns out that some models used for solving machine learning tasks are either originally based on using kernels (e.g.,supportvectormachines),ortheirnewestextensionsareobtainedbyan introduction of the kernel functions within the existing standard techniques. Many classic data mining algorithms are extended to the applications in the high-dimensionalfeaturespace.Thelistislongaswellasthefastgrowingone, VIII Preface andjustthemostrecentextensionsarementionedhere.Theyare-kernelprin- cipalcomponentanalysis,kernelindependentcomponentanalysis,kernelleast squares, kernel discriminant analysis, kernel k-means clustering, kernel self- organizing feature map, kernel Mahalanobis distance, kernel subspace classi- fication methods and kernel functions based dimensionality reduction. What thekernelsare,aswellaswhyandhowtheybecamesopopularinthelearning from data sets tasks, will be shown shortly. As for now, their wide use as well as their efficiency in a numeric part of the algorithms (achieved by avoiding the calculation of the scalar products between extremely high dimensional feature vectors), explains their appearance in the title of the book. Next, it is worth of clarifying the fact that many authors tend to label similar(orevensame)models,approachesandalgorithmsbydifferentnames. Oneisjustdestinetocopewithconceptsofdatamining,knowledgediscovery, neural networks, Bayesian networks, machine learning, pattern recognition, classification, regression, statistical learning, decision trees, decision making etc.Allofthemusuallyhavealotincommon,andtheyoftenusethesameset of techniques for adjusting, tuning, training or learning the parameters defin- ing the models. The common object for all of them is a training data set. All the various approaches mentioned start with a set of data pairs (xi,yi) where xi represent the input variables (causes, observations, records) and yi denote the measured outputs (responses, labels, meanings). However, even with the very commencing point in machine learning (namely, with the training data set collected), the real life has been tossing the coin in providing us either with • a set of genuine training data pairs (xi,yi) where for each input xi there is a corresponding output yi or with, • thepartiallylabeleddatacontainingboththepairs(xi,yi)andthesolein- putsxi withoutassociatedknownoutputsyi or,intheworstcasescenario, with • thesetofsoleinputs(observationsorrecords)xi withoutanyinformation about the possible desired output values (labels, meaning) yi. It is a genuine challenge indeed to try to solve such differently posed machine learning problems by the unique approach and methodology. In fact, this is exactly what did not happen in the real life because the development in the field followed a natural path by inventing different tools for unlike tasks. The answer to the challenge was a, more or less, independent (although with someoverlappingandmutualimpact)developmentofthreelargeanddistinct sub-areas in machine learning - supervised, semi-supervised and unsupervised learning. This is where both the subtitle and the structure of the book are originated from. Here, all three approaches are introduced and presented in details which should enable the reader not only to acquire various techniques but also to equip him/herself with all the basic knowledge and requisites for further development in all three fields on his/her own. Preface IX The presentation in the book follows the order mentioned above. It starts with seemingly most powerful supervised learning approach in solving classi- fication (pattern recognition) problems and regression (function approxima- tion) tasks at the moment, namely with support vector machines (SVMs). Then, it continues with two most popular and promising semi-supervised ap- proaches (with graph based semi-supervised learning algorithms; with the Gaussian random fields model (GRFM) and with the consistency method (CM)). Both the original setting of methods and their improved versions will beintroduced.Thismakesthevolumetobethefirstbookonsemi-supervised learning at all. The book’s final part focuses on the two most appealing and widely used unsupervised methods labeled as principal component analysis (PCA) and independent component analysis (ICA). Two algorithms are the working horses in unsupervised learning today and their presentation, as well asapointingtotheirmajorcharacteristics,capacitiesanddifferences,isgiven the highest care here. The models and algorithms for all three parts of machine learning men- tionedaregiveninthewaythatequipsthereaderfortheirstraightimplemen- tation. This is achieved not only by their sole presentation but also through the applications of the models and algorithms to some low dimensional (and thus, easy to understand, visualize and follow) examples. The equations and modelsprovidedwillbeabletohandlemuchbiggerproblems(theoneshaving much more data of much higher dimensionality) in the same way as they did the ones we can follow and ‘see’ in the examples provided. In the authors’ experience and opinion, the approach adopted here is the most accessible, pleasant and useful way to master the material containing many new (and potentially difficult) concepts. The structure of the book is shown in Fig. 0.1. The basic motivations and presentation of three different approaches in solvingthreeunlikelearningfromdatatasksaregiveninChap.1.Itisakind of both the background and the stage for a book to evolve. Chapter2introducestheconstructivepartoftheSVMswithoutgoinginto allthetheoreticalfoundationsofstatisticallearningtheorywhichcanbefound in many other books. This may be particularly appreciated by and useful for theapplicationsorientedreaderswhodonotneedtoknowallthetheoryback toitsrootsandmotives.Thebasicquadraticprogramming(QP)basedlearn- ing algorithms for both classification and regression problems are presented here. The ideas are introduced in a gentle way starting with the learning al- gorithm for classifying linearly separable data sets, through the classification taskshavingoverlappedclassesbutstillalinearseparationboundary,beyond thelinearityassumptionstothenonlinearseparationboundary,andfinallyto thelinearandnonlinearregressionproblems.Theappropriateexamplesfollow each model derived, just enabling in this way an easier grasping of concepts introduced. The material provided here will be used and further developed in two specific directions in Chaps. 3 and 4.