Journeys to Data Mining . Mohamed Medhat Gaber Editor Journeys to Data Mining Experiences from 15 Renowned Researchers Editor MohamedMedhatGaber SchoolofComputing UniversityofPortsmouth Portsmouth UnitedKingdom ISBN978-3-642-28046-7 ISBN978-3-642-28047-4(eBook) DOI10.1007/978-3-642-28047-4 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2012942594 ACMComputingClassification(1998):H.3,I.2,I.7,G.3,K.7 #Springer-VerlagBerlinHeidelberg2012 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerpts inconnectionwithreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeing enteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplication ofthispublicationorpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthe Publisher’s location, in its current version, and permission for use must always be obtained from Springer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter. ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Contents Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 MohamedMedhatGaber DataMining:ALifetimePassion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 DeanAbbott FromCombinatorialOptimizationtoDataMining. . . . . . . . . . . . . . . 27 CharuC.Aggarwal FromPatternstoDiscoveries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 MichaelR.Berthold DiscoveringPrivacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 ChrisClifton DrivingFullSpeed,EyesontheRear-ViewMirror. . . . . . . . . . . . . . . 61 JohnF.ElderIV VoyagesofDiscovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 DavidJ.Hand AFieldbyAnyOtherName. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 CherylG.Howard AnUnusualJourneytoExcitingDataMiningApplications. . . . . . . . . 101 J.DustinHux MakingDataAnalysisUbiquitous:MyJourneyThrough AcademiaandIndustry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 HillolKargupta OperationalSecurityAnalytics:MyPathofDiscovery. . . . . . . . . . . . 131 ColleenMcLaughlinMcCue AnEnduringInterestinClassification:SupervisedandUnsupervised. 147 G.J.McLachlan v vi Contents TheJourneyofKnowledgeDiscovery. . . . . . . . . . . . . . . . . . . . . . . . . 173 GregoryPiatetsky-Shapiro DataMining:FromMedicalDecisionSupporttoHospital Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 ShusakuTsumoto RattleandOtherDataMiningTales. . . . . . . . . . . . . . . . . . . . . . . . . . 211 GrahamJ.Williams AJourneyinPatternMining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 MohammedJ.Zaki List of Contributors Dean W. Abbott Abbott Analytics, Inc., San Diego, CA, USA, [email protected] Charu C. Aggarwal IBM T.J. Watson Research Center, Hawthorne, NY, USA, [email protected] MichaelR.Berthold DepartmentofComputerandInformationScience,University ofKonstanz,Konstanz,Germany,[email protected] Christopher W. Clifton Department of Computer Sciences, Purdue University, WestLafayette,IN,USA,[email protected] John F. Elder IV Elder Research, Inc., Charlottesville, VA, USA, [email protected] David J. Hand Department of Mathematics, Imperial College, London, UK, [email protected] CherylG.HowardIBMCorporation,Washington,DC,USA,[email protected] J. Dustin Hux VP Analytics, Elder Research, Inc., Charlottesville, VA, USA, [email protected] Hillol Kargupta Computer Science and Electrical Engineering Department, University of Maryland, Baltimore County, MD, USA; Agnik, LLC, Columbia, MD,USA,[email protected];[email protected] Colleen McLaughlin McCue GeoEye, Herndon, VA, USA, [email protected] Geoff McLachlan Department of Mathematics, University of Queensland, St.Lucia,Brisbane,QLD,Australia,[email protected] vii viii ListofContributors Gregory Piatetsky-Shapiro KDnuggets, Brookline, MA, USA, [email protected] Shusaku Tsumoto Department of Medical Informatics, Faculty of Medicine, ShimaneUniversity,Shimane,Japan,[email protected] Graham J. Williams Togaware Pty Ltd., Canberra, ACT, Australia, [email protected] MohammedJ.ZakiRensselaerPolytechnicInstitute,Troy,NY,USA,[email protected] Introduction MohamedMedhatGaber “IfIhaveseenfurtheritisonlybystandingontheshouldersofgiants” bySirIsaacNewton(1643–1727) 1 Preamble Ithasbeenagreathonourtohavebeengiventheopportunitytoeditthisbookanda great pleasure to work with such a respected group of data mining scientists and professionals.Itisourbeliefthattheknowledgeprovidedbystudyingthejourneys theserespectedandrecognisedindividualstookthroughtheareaofdataminingis asimportantassimplygainingtherequiredknowledgeinthefield.Thecontributors to this volume are successful scientists and professionals within the field of data analytics. All the authors in this volume have helped to shape the field of data analyticsthroughtheirmanyvaluablecontributions. Itallbeganwithaworkshopco-organisedbyoneofthecontributorstothisbook, namely,Dr.GregoryPiatetsky-Shapiroinconjunction withtheInternationalJoint Conference on Artificial Intelligence (IJCAI) in 1989. Today, the number of publicationvenuesanddedicateddataanalyticscompaniesreflectsthefastgrowing interestinthedataminingfield. Myownjourneywhileeditingthisbookhasbeenquiteremarkable.Invitations were sent to a number of renowned researchers and practitioners in the field. The feedback received from the invitees was very positive. However, other commitments made it difficult for some of these great researchers to contribute. Despite not being able to contribute, many of them were very supportive of the M.M.Gaber(*) SchoolofComputing,UniversityofPortsmouth,BuckinghamBuilding,BK1.41,LionTerrace, Portsmouth,HampshirePO13HE,UK e-mail:[email protected] M.M.Gaber(ed.),JourneystoDataMining, 1 DOI10.1007/978-3-642-28047-4_1,#Springer-VerlagBerlinHeidelberg2012