Wil van der Aalst Process Mining Data Science in Action Second Edition Process Mining Wil van der Aalst Process Mining Data Science in Action Second Edition WilvanderAalst DepartmentofMathematicsandComputerScience EindhovenUniversityofTechnology Eindhoven,TheNetherlands url:http://www.vdaalst.com ISBN978-3-662-49850-7 ISBN978-3-662-49851-4(eBook) DOI10.1007/978-3-662-49851-4 LibraryofCongressControlNumber:2016938641 SpringerHeidelbergNewYorkDordrechtLondon ©Springer-VerlagBerlinHeidelberg2011,2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) ThankstoKarinforunderstanding that scienceismore rewarding thanrunning errands Thankstoallpeoplethatcontributedto ProM; thefruitsoftheireffortsdemonstrate thatsharing acommongoalismore meaningfulthan“cashing inthenext publon”1 In remembranceofGerryStraatman-Beelen (1932–2010) 1Publon=smallestpublishableunit. Preface Theinterestindatascienceisrapidlygrowing.Manyconsiderdatascienceasthe profession of the future. Just like computer science emerged as a discipline in the 1970s, we now witness the rapid creation of research centers and bachelor/master programs in data science. The hype related to Big Data and predictive analytics illustrates this. Data (“Big” or “small”) are essential for people and organizations andtheirimportancewillonlyincrease.However,itisnotsufficienttofocusondata storage and data analysis. A data scientist also needs to relate data to operational processesandbeabletoasktherightquestions.Thisrequiresanunderstandingof end-to-end processes. Process mining bridges the gap between traditional model- based process analysis (e.g., simulation and other business process management techniques)anddata-centricanalysistechniquessuchasmachinelearninganddata mining.Processminingprovidesanewmeanstoimproveprocessesinavarietyof applicationdomains.Theomnipresenceofeventdatacombinedwithprocessmining allowsorganizationstodiagnoseproblemsbasedonfactsratherthanfiction. AlthoughtraditionalBusinessProcessManagement(BPM)andBusinessIntelli- gence(BI)technologiesreceivedlotsofattention,theydidnotliveuptotheexpec- tationsraisedbyacademics,consultants,andsoftwarevendors.Probably,thesame willhappentomostoftheBigDatatechnologiesvigorouslypromotedtoday.The goalshouldbetoimprovetheoperationalprocessesthemselvesratherthanthear- tifacts(models,data,andsystems)theyuse.Aswillbedemonstratedinthisbook, therearenovelwaystoput“datascienceinaction”andimproveprocessesbasedon thedatatheygenerate. Processminingisanemergingdisciplineprovidingcomprehensivesetsoftools toprovidefact-basedinsightsandtosupportprocessimprovements.Thisnewdisci- plinebuildsonprocessmodel-drivenapproachesanddatamining.However,process mining is much more than an amalgamation of existing approaches. For example, existingdataminingtechniquesaretoodata-centrictoprovideacomprehensiveun- derstandingoftheend-to-endprocessesinanorganization.BItoolsfocusonsim- ple dashboards and reporting rather than clear-cut business process insights. BPM suites heavily rely on experts modeling idealized to-be processes and do not help thestakeholderstounderstandtheas-isprocesses. vii viii Preface Thisbookpresentsarangeofprocessminingtechniquesthathelporganizations to uncover their actual business processes. Process mining is not limited to pro- cessdiscovery.Bytightlycouplingeventdataandprocessmodels,itispossibleto checkconformance,detectdeviations,predictdelays,supportdecisionmaking,and recommend process redesigns. Process mining breathes life into otherwise static processmodelsandputstoday’smassivedatavolumesinaprocesscontext.Hence, managements trends related to process improvement (e.g., Six Sigma, TQM, CPI, andCPM)andcompliance(SOX,BaselII,etc.)canbenefitfromprocessmining. Processmining,asdescribedinthisbook,emergedinthelastdecade[156,160]. However, the roots date back about half a century. For example,Anil Nerode pre- sentedanapproachtosynthesizefinite-statemachinesfromexampletracesin1958 [108],CarlAdamPetriintroducedthefirstmodelinglanguageadequatelycapturing concurrency in 1962 [111], and Mark Gold was the first to systematically explore different notions of learnability in 1967 [61]. When data mining started to flour- ish in the 1990s, little attention was given to processes. Moreover, only recently event logs have become omnipresent thus enabling end-to-end process discovery. Sincethefirstsurveyonprocessminingin2003[156],progresshasbeenspectacu- lar.Processminingtechniqueshavebecomematureandsupportedbyvarioustools. Moreover, whereas initially the primary focus was on process discovery, the pro- cess mining spectrum has broadened markedly. For instance, conformance check- ing,multi-perspectiveprocessmining,andoperationalsupporthavebecomeintegral partsofProM,oneoftheleadingprocessminingtools. The book provides a comprehensive overview of the state-of-the-art in process mining.Itisintendedasanintroductiontothetopicforpractitioners,students,and academics. On the one hand, the book is accessible for people that are new to the topic. On the other hand, the book does not avoid explaining important concepts onarigorousmanner.Thebookaimstobeself-containedwhilecoveringtheentire processminingspectrumfromprocessdiscoverytooperationalsupport.Therefore, it also serves as a reference handbook for people dealing with BPM or BI on a day-to-daybasis. Thefirsteditionofthisbookappearedin2011underthetitle“ProcessMining: Discovery, Conformance and Enhancement of Business Processes” [140]. Given the rapid developments in process mining, there was a clear need for an updated version.Theoriginalbookhasbeenextendedinseveralways.Firstofall,process mininghasbeenputintothebroadercontextofdatascience(seethenewChap.1). This explains the new subtitle “Data Science in Action”. There is an urgent need for data scientists able to help organizations improve their operational processes. Therefore,theneweditionofthebookpositionsprocessmininginthisbroadercon- text and relates it to statistics, data mining, Big Data, etc. Second, there has been significantprogressinprocessdiscoveryinrecentyears.Thisisexemplifiedbythe family of inductive mining techniques that can handle large incompleteevent logs withinfrequentbehavior,butstillprovideformalguarantees.Thebasicelementsof inductivemining(Sect.7.5)andthenotionofprocesstrees(Sect.3.2.8)havebeen addedtothisbook.Third,thenotionofalignmentshasbecomeakeyconcepttore- lateobservedbehaviorandmodeledbehavior.Thechapteronconformancecheck- inghasbeenextendedtocarefullyintroducealignments(Sect.8.3).Moreover,next Preface ix to fitness, also quality dimensions like precision are now defined. Fourth, a chap- ter on “process mining in the large” (Chap. 12) has been added to illustrate that process mining can exploit modern infrastructures and that process discovery and conformancecheckingcanbedecomposedanddistributed.Sincethefirsteditionof thebook,manynewprocessminingproductsemerged(ofteninspiredbytheopen sourceplatformProMandthepreviouseditionofthisbook).Thechapterontools (Chap.11)hasbeencompletelyrewrittenanddiscussescommercialtoolslikeCelo- nisProcessMining,Disco,EnterpriseDiscoverySuite,InterstageBusinessProcess ManagerAnalytics,Minit,myInvenio,PerceptiveProcessMining,QPRProcessAn- alyzer, Rialto Process, SNP Business Process Analysis, and webMethods Process PerformanceManager(nexttoopen-sourceinitiativeslikeProMandRapidProM). Finally,pointerstorecentliteraturehavebeenaddedandanewsectionofdataqual- ityhasbeenadded(Sect.5.4).Thesechangesjustifyarevisededitionofthebook. Thereadercanimmediatelyputprocessminingintopracticeduetotheapplica- bilityof thetechniques,theavailabilityof (open-source)process miningsoftware, and the abundance of event data in today’s information systems. I sincerely hope thatyouenjoyreadingthisbookandstartusingsomeoftheamazingprocessmin- ingtechniquesavailabletoday. Eindhoven,TheNetherlands WilvanderAalst January2016 Acknowledgements Many individuals and organizations contributed to the techniques and tools de- scribedinthisbook.Therefore,itisapleasuretoacknowledgetheirsupport,efforts, andcontributions. All of this started in 1999 with a research project named “Process Design by Discovery:HarvestingWorkflowKnowledgefromAd-hocExecutions”initiatedby Ton Weijters and myself. At that time, I was still working as a visiting professor attheUniversityofColoradoinBoulder.However,theresearchschoolBETAhad encouragedme to start collaboratingwith existingstaff in my new research group at TU/e (Eindhoven University of Technology). After talking to Ton it was clear thatwecouldbenefitfromcombininghisknowledgeofmachinelearningwithmy knowledgeofworkflowmanagementandPetrinets.Processmining(atthattimewe calleditworkflowmining)wastheobvioustopicforwhichwecouldcombineour expertise.Thiswasthestartofaverysuccessfulcollaboration.ThanksTon! Since the turn of the century, many PhD students have been working on the topic: Arya Adriansyah,Ana Karla Alves de Medeiros, Alfredo Bolt Iriondo, R.P. JagadeeshChandra(JC)Bose,CarmenBratosin,JoosBuijs,AlokDixit,Boudewijn vanDongen,MaikelvanEck,RobertEngel,EduardoGonzálezLopézdeMurillas, Christian Günther, Bart Hompes, Anna Kalenkova, Marie Koorneef, Maikel Lee- mans,SanderLeemans,GuangmingLi,CongLiu,XixiLu,FelixMannhardt,Ronny Mans,LauraMaruster,AlexeyMitsyuk,RichardMüller,JorgeMunoz-Gama,Joyce Nakatumba,MajaPesic,ElhamRamezani,AnneRozinat,AlifahSyamsiyah,Helen Schonenberg, Dennis Schunselaar, Minseok Song, Niek Tax, and Bas van Zelst. Iamextremelygratefulfortheirefforts. Ana Karla Alves de Medeiros was the first PhD student to work on the topic undermysupervision(geneticprocessmining).Shedidawonderfuljob;herthesis on genetic process mining was awarded with the prestigious ASML 2007 Promo- tionPrizeandwasselectedasthebestthesisbytheKNAWresearchschoolBETA. Also Boudewijn van Dongen has been involved in the development of ProM right from the start. As a Master student he already developed the process mining tool EMiT,i.e.,thepredecessorofProM.HeturnedouttobeabrilliantPhDstudentand developedavarietyofprocessminingtechniques.EricVerbeekdidaPhDonwork- xi xii Acknowledgements flow verification, but over time he got more and more involved in process mining researchandthedevelopmentofProM.Manypeopleunderestimatetheimportance of a scientific programmer like Eric. Tool development and continuity are essen- tialforscientificprogress!BoudewijnandErichavebeenthedrivingforcebehind ProMandtheircontributionshavebeencrucialforprocessminingresearchatTU/e. Moreover,theyarealwayswillingtohelpothers.Thanksguys! ChristianGüntherandAnneRozinatjoinedtheteamin2005.Theircontributions havebeenofcrucialimportanceforextendingthescopeofprocessminingandlift- ingtheambitionlevel.ChristianmanagedtomakeProMlookbeautifulwhilesig- nificantlyimprovingitsperformance.Moreover,hisFuzzyminerfacilitateddealing withSpaghettiprocesses.Annemanagedtowidentheprocessminingspectrumby addingconformancecheckingandmulti-perspectiveprocessminingtoProM.Itis greatthattheysucceededinfoundingaprocessminingcompany(Fluxicon).Anne andChristianaregreatprocessminingambassadorsandbuildsoftwarethatpeople can and also want to use. Another person crucial for the development of ProM is PetervandenBrand.Hesetuptheinitialframeworkandplayedanimportantrole in the development of the architecture of ProM 6. Based on his experience with ProM,hesetupaprocessminingcompany(FuturaProcessIntelligence)thatjoined forceswithPallasAthenawhich,inturn,wastakenoverbyLexmark’sPerceptive Software. It is great to work with people like Peter, Christian, and Anne; they are essentialforturningresearchresultsintocommercialproducts(althoughIamstill waitingforthesportscarstheypromised...). NexttoBoudewijn,Eric,andthePhDsmentioned,thecurrent“processmining team” at TU/e consists of Joos Buijs, Dirk Fahland, Massimiliano de Leoni, Hajo Reijers, Natalia Sidorova, Patrick Mukala, Nour Assy, Farideh Heidari, and—of course—InevanderLigt,oursecretary. Academics from various universities contributed to ProM and supported our process mining research. We are grateful to the Technical University of Lis- bon,KatholiekeUniversiteitLeuven,UniversitatPolitècnicadeCatalunya,Univer- sitätPaderborn,UniversityofRostock,Humboldt-UniversitätzuBerlin,University of Calabria, Queensland University of Technology, Tsinghua University, Univer- sität Innsbruck, Ulsan National Institute of Science and Technology, Università di Bologna, Zhejiang University, Vienna University of Technology, Universität Ulm, Open University, Jilin University, National Research University Higher School of Economics,FreeUniversityofBozen-Bolzano,UniversityofTartu,PontificiaUni- versidadCatólicadeChile,UniversityofVienna,PontificiaUniversidadeCatólica do Paraná, Technion, VU University Amsterdam, Hasso-Plattner-Institut, Univer- sityofFreiburg,ViennaUniversityofEconomicsandBusiness,UniversityofHaifa, UniversityofNaplesFedericoII,UniversityofPadua,andUniversityofNancyfor theirhelp.IwouldalsoliketothankthemembersoftheIEEETaskForceonPro- cessMiningforpromotingthetopic.Wearegratefultoallotherorganizationsthat supported process mining research at TU/e: NWO, STW, EU, IOP, LOIS, BETA, SIKS,StichtingEITInformaticaOnderwijs,PallasAthena,IBM,LaQuSo,Philips Healthcare, Philips Research, Vanderlande, BrandLoyalty, ESI, Jacquard, Nuffic, BPM Usergroup, and WWTF. Special thanks go to Pallas Athena and Fluxicon
Description: