ebook img

Redescription Mining PDF

88 Pages·2017·3.023 MB·English
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Redescription Mining

SPRINGER BRIEFS IN COMPUTER SCIENCE Esther Galbrun Pauli Miettinen Redescription Mining 123 SpringerBriefs in Computer Science Serieseditors StanZdonik,BrownUniversity,Providence,RhodeIsland,USA ShashiShekhar,UniversityofMinnesota,Minneapolis,Minnesota,USA XindongWu,UniversityofVermont,Burlington,Vermont,USA LakhmiC.Jain,UniversityofSouthAustralia,Adelaide,SouthAustralia,Australia DavidPadua,UniversityofIllinoisUrbana-Champaign,Urbana,Illinois,USA Xuemin(Sherman)Shen,UniversityofWaterloo,Waterloo,Ontario,Canada BorkoFurht,FloridaAtlanticUniversity,BocaRaton,Florida,USA V.S.Subrahmanian,UniversityofMaryland,CollegePark,Maryland,USA MartialHebert,CarnegieMellonUniversity,Pittsburgh,Pennsylvania,USA KatsushiIkeuchi,UniversityofTokyo,Tokyo,Japan BrunoSiciliano,UniversitàdiNapoliFedericoII,Napoli,Italy SushilJajodia,GeorgeMasonUniversity,Fairfax,Virginia,USA NewtonLee,NewtonLeeLaboratories,LLC,Tujunga,California,USA Moreinformationaboutthisseriesathttp://www.springer.com/series/10028 Esther Galbrun (cid:129) Pauli Miettinen Redescription Mining 123 EstherGalbrun PauliMiettinen InriaNancy–GrandEst Max-Planck-InstituteforInformatics Villers-lès-Nancy,France Saarbrücken,Germany ISSN2191-5768 ISSN2191-5776 (electronic) SpringerBriefsinComputerScience ISBN978-3-319-72888-9 ISBN978-3-319-72889-6 (eBook) https://doi.org/10.1007/978-3-319-72889-6 LibraryofCongressControlNumber:2017963350 ©TheAuthor(s)2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface ‘Whatisredescriptionmining?’isaquestionourcolleaguesandstudentsoftenask whenwementionitasatopicweareworkingon.Inshort,redescriptionminingis the art of saying ‘that is’. That is, redescription mining tries to describe the same phenomenonintwodifferentways.Weusuallyaugmentthisminimalisticdefinition withvariousdetails,leadingtofurtherquestionssuchas‘WhydoyouuseJaccard similarity?’,‘Canyouapplyyourmethodtomydata?’,or‘Whatifyouhavemore thantwodatasets?’ Wantingtoprovideanswersforallofthesequestions,wepreparedandpresented tutorials on redescription mining. But not everybody could attend those tutorials, and although the slides are available online, they are not entirely self-explanatory. Moreover,thereisathirdgroupofpersonswhoaskusthosedetailedquestionsand who are not satisfied if we say that we discussed the answers in our tutorial—the reviewersofourpapers. Thisshortbookisintendedasanintroductiontoredescriptionmining,accessible to both practitioners and researchers alike. It develops a uniform framework in whichthevariousformulationsofredescriptionminingcanbedefined,explainsthe mainalgorithmicapproachesusedtomineredescriptions,andpresentsapplications andvariantsofredescriptionmining,inadditiontofutureresearchdirections. We hope this book will help those new to the field of redescription mining become familiar with the topic and let them consider how to apply redescription miningintheirfieldsofinterest.Thosealreadyfamiliarwithredescriptionmining willhopefullyfindthebookusefulasashortreferencework,andperhapsasaguide for further research directions. Reviewers for papers on redescription mining will hopefullyfindanswerstosomeoftheirfavouritequestionsinthisbook. We are grateful to Krista Ames for providing language feedback for this book. Anymistakesthatremainare—naturally—ourown. Saarbrücken,Germany EstherGalbrun October2017 PauliMiettinen v Contents 1 WhatIsRedescriptionMining .............................................. 1 1.1 FirstExamplesofRedescriptions ....................................... 1 1.2 FormalDefinitions....................................................... 5 1.2.1 TheData.......................................................... 5 1.2.2 TheDescriptions................................................. 6 1.2.3 TheRedescriptions .............................................. 8 1.2.4 OtherConstraints ................................................ 11 1.2.5 DistanceFunctions:WhyJaccard? ............................. 13 1.2.6 SetsofRedescriptions........................................... 16 1.3 RelatedDataMiningProblems.......................................... 18 1.4 AShortHistory .......................................................... 20 References...................................................................... 21 2 AlgorithmsforRedescriptionMining ...................................... 25 2.1 FindingQueriesUsingItemsetMining ................................. 26 2.1.1 TheMIDAlgorithm.............................................. 28 2.1.2 MiningRedescriptionswiththeCHARM-LAlgorithm ........ 29 2.2 QueriesBasedonDecisionTreesandForests.......................... 30 2.2.1 TheCARTwheelsAlgorithm.................................. 32 2.2.2 TheSplitT andLayeredT Algorithms.................... 35 2.2.3 TheCLUS-RM Algorithm ...................................... 38 2.3 GrowingtheQueriesGreedily........................................... 40 2.3.1 TheReReMiAlgorithm......................................... 40 2.4 AComparativeDiscussion .............................................. 44 2.5 HandlingMissingValues................................................ 46 References...................................................................... 48 3 Applications,Variants,andExtensionsofRedescriptionMining ....... 51 3.1 ApplicationsofRedescriptionMining.................................. 51 3.1.1 InBiology........................................................ 52 3.1.2 InEcology........................................................ 55 vii viii Contents 3.1.3 InSocialandPoliticalSciencesandinEconomics ............ 56 3.1.4 InEngineering ................................................... 59 3.2 RelationalRedescriptionMining........................................ 61 3.2.1 AnExampleofRelationalRedescriptions...................... 61 3.2.2 FormalDefinition................................................ 63 3.3 Storytelling............................................................... 66 3.3.1 DefinitionandAlgorithms....................................... 67 3.3.2 Applications...................................................... 69 3.4 FutureWork:RicherQueryLanguages ................................ 73 3.4.1 Time-SeriesRedescriptions ..................................... 73 3.4.2 SubgraphRedescriptions........................................ 75 3.4.3 Multi-QueryandMultimodalRedescriptions .................. 76 References...................................................................... 79 List of Figures Fig.1.1 Visualization:mapofabioclimaticniche.............................. 2 Fig.1.2 Visualization:parallelcoordinatesplotofPremier Leaguematches ......................................................... 3 Fig.1.3 Exampleofmammalsandclimateastwo-tabledata.................. 6 Fig.1.4 Venndiagramofthesupportsubsets................................... 11 Fig.2.1 Classificationofredescriptionminingalgorithms..................... 26 Fig.2.2 Simpleclassificationtree ............................................... 30 Fig.2.3 Visualization:treediagramofaredescription......................... 32 Fig.2.4 DepictionoftheCARTwheels,SplitT,andLayeredT algorithmsasasequenceofsteps ...................................... 37 Fig.2.5 Exampleofrepartitionoftheentitiesforanumericalattribute....... 43 Fig.3.1 Visualization:2Dembeddingofthepoliticaldata..................... 58 Fig.3.2 Exampleofasequentialcircuit......................................... 60 Fig.3.3 GenealogicgraphfromtheAlyawarradataset........................ 62 Fig.3.4 KinshipgraphfromtheAlyawarradataset............................ 62 Fig.3.5 Exampleofgraphqueries............................................... 63 Fig.3.6 Exampleofpredicatesintherelationalredescription miningvariant........................................................... 64 Fig.3.7 Storytellingexamplewithtreediagrams............................... 69 Fig.3.8 Exampleofdataforuncoveringtheplot............................... 72 Fig.3.9 Exampleofanuncoveredplot.......................................... 73 ix List of Symbols E Setofallentitiesinthedata A Setofattributesoftheentities V Setofviewstheattributescanbedividedinto D Entities-by-attributestablecorrespondingtooneview D Thedata P SetofallpredicatesoverE (cid:2)A L Setofallliterals,i.e.allpredicatesandtheirnegations Q Querylanguage,i.e.asetofBooleanfunctionsoverL supp.q/ Support(set)ofaqueryq att.q/ Attributesofaqueryq views.q/ Viewsofaqueryq d.p;q/ Distancebetweenthesupportsofqueriespandq p(cid:3)q Setsupp.p/issimilartosetsupp.q/ p(cid:4)q Setssupp.p/andsupp.q/arethesame J.p;q/ Jaccardsimilaritybetweensupp.p/andsupp.q/ xi

See more

The list of books you might like

book image

The Mountain Is You

Brianna Wiest
·2020
·0.34 MB

book image

The 48 Laws of Power

Robert Greene
·1998
·2.84 MB

book image

Haunting Adeline

H. D. Carlton
·2021
·3.65 MB

book image

Believe Me

Tahereh Mafi
·177 Pages
·2021
·2.19 MB

book image

Dynamics of Globalization and Development

Satya Dev Gupta, Nanda K. Choudhry (auth.), Satya Dev Gupta, Nanda K. Choudhry (eds.)
·348 Pages
·1997
·14.434 MB

book image

Greek Government Gazette: Part 7, 2006 no. 892

The Government of the Hellenic Republic
·2006
·0.16 MB

book image

Latino Politics in Massachusetts: Struggles, Strategies and Prospects

Carol Hardy-Fanta; Jeffrey Gerson
·2014
·3.242 MB

book image

Recovering Lost Landscapes

Vujadin Ivanišević, Tatjana Veljanovski, David Cowley, Grzegorz Kiarszys & Ivan Bugarski
·176 Pages
·2015
·12.706 MB

book image

Sneakers: Fashion, Gender, and Subculture

Yuniya Kawamura
·2016
·7.894 MB

book image

Werwolves by Elliott O'Donnell

O'Donnell, Elliott, 1872-1965
·2008
·15.11 MB

book image

indx

8 Pages
·2021
·0.02 MB

book image

IIA6

2 Pages
·2021
·0.08 MB

book image

More than good ideas

60 Pages
·2005
·3.56 MB