ebook img

Machine Learning Models and Algorithms for Big Data Classification PDF

364 Pages·2015·9.31 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning Models and Algorithms for Big Data Classification

Integrated Series in Information Systems 36 Series Editors: Ramesh Sharda · Stefan Voß Shan Suthaharan Machine Learning Models and Algorithms for Big Data Classifi cation Thinking with Examples for Eff ective Learning Integrated Series in Information Systems Volume 36 SeriesEditors RameshSharda OklahomaStateUniversity,Stillwater,OK,USA StefanVoß UniversityofHamburg,Hamburg,Germany Moreinformationaboutthisseriesathttp://www.springer.com/series/6157 Shan Suthaharan Machine Learning Models and Algorithms for Big Data Classification Thinking with Examples for Effective Learning 123 ShanSuthaharan DepartmentofComputerScience UNCGreensboro Greensboro,NC,USA ISSN1571-0270 ISSN2197-7968 (electronic) IntegratedSeriesinInformationSystems ISBN978-1-4899-7640-6 ISBN978-1-4899-7641-3 (eBook) DOI10.1007/978-1-4899-7641-3 LibraryofCongressControlNumber:2015950063 SpringerNewYorkHeidelbergDordrechtLondon ©SpringerScience+BusinessMediaNewYork2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerInternationalPublishingAGSwitzerland ispartofSpringerScience+Business Media(www. springer.com) Itis thequalityof ourworkwhichwill please Godand notthequantity– MahatmaGandhi If you can’t explainitsimply,you don’t understanditwell enough– AlbertEinstein Preface The interest in writing this book began at the IEEE International Conference on IntelligenceandSecurityInformaticsheldinWashington,DC(June11–14,2012), where Mr. Matthew Amboy, the editor of Business and Economics: OR and MS, publishedbySpringerScience+BusinessMedia,expressedtheneedfora bookon thistopic,mainlyfocusingona topicin datascience field.Theinterestwenteven deeperwhenIattendedtheworkshopconductedbyProfessorBinYu(Department ofStatistics,UniversityofCalifornia,Berkeley)andProfessorDavidMadigan(De- partmentofStatistics,ColumbiaUniversity)attheInstituteforMathematicsandits Applications,UniversityofMinnesotaonJune16–29,2013. Data scienceisoneoftheemergingfieldsinthetwenty-firstcentury.Thisfield has been created to address the big data problems encountered in the day-to-day operationsofmanyindustries,includingfinancialsectors,academicinstitutions,in- formationtechnologydivisions,health care companies,and governmentorganiza- tions. One of the important big data problems that needs immediate attention is in big data classifications. The network intrusion detection, public space intruder detection, fraud detection, spam filtering, and forensic linguistics are some of the practicalexamplesofbigdataclassificationproblemsthatrequireimmediateatten- tion. We need significant collaboration between the experts in many disciplines, in- cludingmathematics,statistics,computerscience,engineering,biology,andchem- istrytofindsolutionstothischallengingproblem.Educationalresources,likebooks andsoftware,arealsoneededtotrainstudentstobethenextgenerationofresearch leadersinthisemergingresearchfield.Oneofthecurrentfieldsthatbringsthein- terdisciplinaryexperts,educationalresources,andmoderntechnologiesunderone roofismachinelearning,whichisasubfieldofartificialintelligence. Many models and algorithmsfor standard classification problemsare available inthemachinelearningliterature.However,afewofthemaresuitableforbigdata classification.Bigdataclassificationisdependentnotonlyonthemathematicaland softwaretechniquesbutalsoonthecomputertechnologiesthathelpstore,retrieve, andprocessthedatawith efficientscalability,accessibility,andcomputabilityfea- tures.Onesuchrecenttechnologyisthedistributedfilesystem.Aparticularsystem vii viii Preface thathasbecomepopularandprovidesthese featuresis theHadoopdistributedfile system,whichusesthemoderntechniquescalledMapReduceprogrammingmodel (oraframework)withMapperandReducerfunctionsthatadopttheconceptcalled the (key, value) pairs. The machine learning techniques such as the decision tree (a hierarchicalapproach),randomforest(an ensemblehierarchicalapproach),and deeplearning(alayeredapproach)arehighlysuitableforthesystemthataddresses bigdataclassificationproblems.Therefore,thegoalofthisbookistopresentsome ofthemachinelearningmodelsandalgorithms,anddiscussthemwithexamples. Thegeneralobjectiveofthisbookistohelpreaders,especiallystudentsand newcomerstothefieldofbigdataandmachinelearning,togainaquickunder- standingofthetechniquesandtechnologies;therefore,thetheory,examples, and programs (Matlab and R) presented in this book have been simplified, hardcoded, repeated, or spaced for improvements.They provide vehicles to testandunderstandthecomplicatedconceptsofvarioustopicsinthefield.It isexpectedthatthereadersadopttheseprogramstoexperimentwiththeex- amples,andthenmodifyorwritetheirownprogramstowardadvancingtheir knowledgeforsolvingmorecomplexandchallengingproblems. The presentationformatof thisbookfocusesonsimplicity,readability,andde- pendability so that both undergraduate and graduate students as well as new re- searchers, developers, and practitioners in this field can easily trust and grasp the concepts,andlearn themeffectively.The goalofthe writingstyle is to reducethe mathematical complexity and help the vast majority of readers to understand the topicsandgetinterestedinthefield.Thisbookconsistsoffourparts,withatotalof 14chapters.PartImainlyfocusesonthetopicsthatareneededtohelpanalyzeand understandbigdata.PartIIcoversthetopicsthatcanexplainthesystemsrequired forprocessingbigdata.PartIIIpresentsthetopicsrequiredtounderstandandselect machine learning techniques to classify big data. Finally, Part IV concentrates on the topics that explain the scaling-up machine learning, an important solution for modernbigdataproblems. Greensboro,NC,USA ShanSuthaharan Acknowledgements The journey of writing this book would not have been possible without the sup- portofmanypeople,includingmycollaborators,colleagues,students,andfamily. Iwouldliketothankallofthemfortheirsupportandcontributionstowardthesuc- cessfuldevelopmentofthisbook.First,IwouldliketothankMr.MatthewAmboy (Editor,BusinessandEconomics:ORandMS,SpringerScience+BusinessMedia) forgivingmeanopportunitytowritethisbook.IwouldalsoliketothankbothMs. ChristineCrigler(AssistantEditor)andMr.Amboyforhelpingmethroughoutthe publicationprocess. I am grateful to Professors Ratnasingham Shivaji (Head of the Department of MathematicsandStatisticsattheUniversityofNorthCarolinaatGreensboro)and FadilSantosa(DirectoroftheInstituteforMathematicsanditsApplicationsatUni- versity of Minnesota) for the opportunitiesthat they gave me to attend a machine learning workshop at the institute. Professors Bin Yu (Department of Statistics, University of California, Berkeley) and David Madigan(Departmentof Statistics, Columbia University)deliveredan excellentshortcourse on appliedstatistics and machine learning at the institute, and the topics covered in this course motivated meandequippedmewithtechniquesandtoolstowritevarioustopicsinthisbook. My sincere thanks go to them. I would also like to thank Jinzhu Jia, Adams Blo- niaz,andAntonyJoseph,themembersofProfessorBinYu’sresearchgroupatthe DepartmentofStatistics, UniversityofCalifornia,Berkeley,fortheirvaluabledis- cussionsinmanymachinelearningtopics. MyappreciationgoesouttoUniversityofCalifornia,Berkeley,andUniversityof NorthCarolinaatGreensborofortheirfinancialsupportandtheresearchassignment awardin 2013to attendUniversityofCalifornia,Berkeleyasa Visitingscholar— this visit helped me better understand the deep learning techniques. I would also liketoshowmyappreciationtoMr.BrentLadd(DirectorofEducation,Centerfor the Science ofInformation,PurdueUniversity)andMr. RobertBrown(Managing Director, Center for the Science of Information,Purdue University) for their sup- porttodevelopacourseonbigdataanalyticsandmachinelearningatUniversityof North Carolina at Greensborothrougha sub-awardapprovedby the NationalSci- ence Foundation. I am also thankful to Professor Richard Smith, Director of the ix

Description:
has been created to address the big data problems encountered in the day-to-day One of the popular tools available in the market for this 100. Feature 1. Feature 54. Feature 59. 50. 00. 100. 200. 300. 0. 50. 100. 150. 200. 250. Fig. 3.9 A 3D scatter plot of hardwood (blue) and carpet (red) floors
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.