Lecture Notes in Computer Science 6600 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA MosheY.Vardi RiceUniversity,Houston,TX,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany James F. Peters Andrzej Skowron Hiroshi Sakai Mihir Kumar Chakraborty Dominik Slezak Aboul Ella Hassanien William Zhu (Eds.) Transactions on Rough Sets XIV 1 3 Editors-in-Chief JamesF.Peters UniversityofManitoba,Winnipeg,MB,Canada E-mail:[email protected] AndrzejSkowron UniversityofWarsaw,Poland E-mail:[email protected] GuestEditors HiroshiSakai KyushuInstituteofTechnology,Tobata,Kitakyushu,Japan E-mail:[email protected] MihirKumarChakraborty JadavpurUniversityandIndianStatisticalInstitute,Calcutta,India E-mail:[email protected] DominikSlezak UniversityofWarsaw,Poland E-mail:[email protected] AboulEllaHassanien CairoUniversity,Orman,Giza,Egypt E-mail:[email protected] WilliamZhu ZhangzhouNormalUniversity,Zhangzhou,Fujian,China E-mail:[email protected] ISSN0302-9743(LNCS) e-ISSN1611-3349(LNCS) ISSN0302-9743(TRS) e-ISSN1611-3349(TRS) ISBN978-3-642-21562-9 e-ISBN978-3-642-21563-6 DOI10.1007/978-3-642-21563-6 SpringerHeidelbergDordrechtLondonNewYork LibraryofCongressControlNumber:2011929861 CRSubjectClassification(1998):F.4.1,F.1.1,H.2.8,I.5,I.4,I.2 ©Springer-VerlagBerlinHeidelberg2011 Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorsubjecttocopyright.Allrights arereserved,whetherthewholeorpartofthematerialisconcerned,specificallytherightsoftranslation, reprinting,re-useofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherway, andstorageindatabanks.Duplicationofthispublicationorpartsthereofispermittedonlyunderthe provisionsoftheGermanCopyrightLawofSeptember9,1965,initscurrentversion,andpermissionfor usemustalwaysbeobtainedfromSpringer.ViolationsareliabletoprosecutionundertheGermanCopyright Law. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply, evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws andregulationsandthereforefreeforgeneraluse. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Volume XIV of the Transactions on Rough Sets includes extensions of papers presentedatthe 12thInternationalConferenceonRoughSets,FuzzySets,Data Mining and Granular Computing (RSFDGrC 2009)held in the Indian Institute ofTechnology,Delhi,India,andpublishedintheLectureNotesinArtificialIntel- ligence volume 5908.The authorsof14 paperswere invitedto prepareextended manuscripts. Each submission was peer-reviewed by three reviewers. After two rounds of reviews, 10 articles were accepted for publication. Thecontentsofthisspecialissuerefertoboththeoryandpractice.Thetopics include various rough set generalizations in combination with formal concept analysis,latticetheory,fuzzysetsandbelieffunctions,roughandfuzzyclustering techniques, as well as applications to gene selection,Web page recommendation systems, facial recognition, and temporal pattern detection. In addition, this publicationcontainsaregulararticleonroughmultisetanditsmultisettopology. Theeditorsofthisspecialissuewouldliketoexpressgratitudetotheauthors ofallsubmitted papers.Specialthanksaredue to the followingreviewers:Hide- naoAbe,SuruchiChawla,ZiedElouedi,HomaFashandi,DarylHepting,Andrzej Janusz,NaJiao,MichiroKondo,PawanLingras,DuoqianMiao,E.K.R.Nagara- jan,BidyutKr.Patra,SheelaRamanna,andPulakSamanta.Wewouldalsolike to acknowledge Sheela Ramanna for technical management of the TRS online system.The editors andauthors of this volume extend their gratitude to Alfred Hofmann, Anna Kramer, Ursula Barth, Christine Reiss, and the LNCS staff at Springer for their support in making this volume of the TRS possible. The Editors-in-Chief were supported by the Ministry of Science and Higher Education of the Republic of Poland research grants N N516 368334, N N516 077837,and the Natural Sciences and Engineering Research Council of Canada (NSERC) research grant 185986, Canadian Network of Excellence (NCE), and a Canadian Arthritis Network (CAN) grant SRI-BIO-05. February 2011 Hiroshi Sakai Mihir K. Chakraborty Dominik S´le¸zak Aboul E. Hassanien William Zhu James F. Peters Andrzej Skowron LNCS Transactions on Rough Sets The Transactions on Rough Sets series has as its principal aim the fostering of professional exchanges between scientists and practitioners who are interested in the foundations and applications of rough sets. Topics include foundations and applications of rough sets as well as foundations and applications of hybrid methodscombiningroughsetswithotherapproachesimportantforthedevelop- ment of intelligent systems. The journal includes high-quality research articles accepted for publication on the basis of thorough peer reviews. Dissertations and monographs up to 250 pages that include new research results can also be considered as regular papers. Extended and revised versions of selected papers from conferences can also be included in regular or special issues of the journal. Editors-in-Chief: James F. Peters, Andrzej Skowron Managing Editor: Sheela Ramanna Technical Editor: Marcin Szczuka Editorial Board Mohua Banerjee Ewa Orl(cid:3)owska Jan Bazan Sankar K. Pal Gianpiero Cattaneo Lech Polkowski Mihir K. Chakraborty Henri Prade Davide Ciucci Sheela Ramanna Chris Cornelis Roman Sl(cid:3)owin´ski Ivo Du¨ntsch Jerzy Stefanowski Anna Gomolin´ska Jaros(cid:3)law Stepaniuk Salvatore Greco Zbigniew Suraj Jerzy W. Grzymal(cid:3)a-Busse Marcin Szczuka Masahiro Inuiguchi Dominik S´le¸zak Jouni J¨arvinen Roman S´winiarski Richard Jensen Shusaku Tsumoto Boz˙ena Kostek Guoyin Wang Churn-Jung Liau Marcin Wolski PawanLingras Wei-Zhi Wu Victor Marek Yiyu Yao Mikhail Moshkov Ning Zhong Hung Son Nguyen Wojciech Ziarko Table of Contents Evaluatinga TemporalPatternDetectionMethod for Finding Research Keys in BibliographicalData ...................................... 1 Hidenao Abe and Shusaku Tsumoto High Scent Web Page Recommendations Using Fuzzy Rough Set Attribute Reduction.............................................. 18 Punam Bedi and Suruchi Chawla A Formal Concept Analysis Approach to Rough Data Tables .......... 37 Bernhard Ganter and Christian Meschke Rough Multiset and Its Multiset Topology .......................... 62 K.P. Girish and Sunil Jacob John A Rough Set Exploration of Facial Similarity Judgements ............. 81 Daryl H. Hepting, Richard Spring, and Dominik S´lezak (cid:2) Evolutionary Tolerance-Based Gene Selection in Gene Expression Data ........................................................... 100 Na Jiao New Approach in Defining Rough Approximations ................... 119 E.K.R. Nagarajan and D. Umadevi Tolerance Rough Set Theory BasedData Summarizationfor Clustering Large Datasets .................................................. 139 Bidyut Kr. Patra and Sukumar Nandi Projected Gustafson-Kessel Clustering Algorithm and Its Convergence..................................................... 159 Charu Puri and Naveen Kumar Generalized Rough Sets and Implication Lattices..................... 183 Pulak Samanta and Mihir Kumar Chakraborty Classification with Dynamic Reducts and Belief Functions............. 202 Salsabil Trabelsi, Zied Elouedi, and Pawan Lingras Author Index.................................................. 235 Evaluating a Temporal Pattern Detection Method for Finding Research Keys in Bibliographical Data Hidenao Abe and Shusaku Tsumoto Department of Medical Informatics, ShimaneUniversity, School of Medicine 89-1 Enya-cho,Izumo, Shimane 693-8501, Japan [email protected], [email protected] Abstract. Accordingtotheaccumulationoftheelectricallystoreddoc- uments, acquisition of valuable knowledge with remarkable trends of technical terms has drawn the attentions as the topic in text mining. In order to support for discovering key topics appeared as key terms in suchtemporaltextualdatasets,weproposeamethodbasedontemporal patternsinseveraldata-drivenindicesfortextmining.Themethodcon- sists ofan automatic termextraction method ingiven documents,three importanceindices,temporalpatternextractionbyusingtemporalclus- tering, and trend detection based on linear trends of their centroids. Empirical studiesshow that thethreeimportance indicesare applied to thetitlesoftwoacademicconferencesaboutartificialintelligencefieldas the sets of documents. After extracting the temporal patterns of auto- matically extracted terms, we discuss the trends of the terms including therecent burst words among thetitles of theconferences. Keywords: Text Mining, Trend Detection, TF-IDF, Jaccard’s Match- ing Coefficient, Temporal Clustering, Linear Regression. 1 Introduction The development of information systems in every field such as business, aca- demics, and medicine in recent years have been enabled to store huge amount of various type data. Especially, document data Accumulation is advanced to documentdataby notthe exceptionbut variousfields.Document dataprovides valuablefindingsto notonly domainexpertsbutalsonoviceuser.Forsuchelec- trical documents, it becomes more important to sense emergent targets about researchers, decision makers, marketers by using temporal text mining tech- niques [1]. In order to realize such detection, emergent term detection (ETD) methods have been developed [2,3]. However,becausethefrequencyofwordswasusedinearliermethods,theirde- tectionsweredifficultaslongasthewordthatbecameanobjectdidnotappear. Besides, emergent or new concepts are usually appeared as new combination of multiplewords,coinagescreatedbyanauthor,andwordswithdifferentspellings J.F.Petersetal.(Eds.):TransactionsonRoughSetsXIV,LNCS6600,pp.1–17,2011. (cid:2)c Springer-VerlagBerlinHeidelberg2011 2 H. Abeand S. Tsumoto of currentwords.Most conventionalmethods did not consider above-mentioned natures of terms and importance indices separately. This causes difficulties in text mining applications, such as limitations on the extensionality of time di- rection, time consuming post-processing, and generality expansions. For this reason, conventional ETD methods have been developed for detecting the par- ticular state of appeared words or/and phrases1 without considering any effect nor similarity between the keywords. After considering these problems, we focus on temporal behaviors of impor- tance indices of terms and their temporal patterns. Temporal behaviors of the importance indices of extracted phrases are paid attention so that a specialist mayrecognizeemergenttermsand/orsuchfields.Inordertodetectvarioustem- poralpatterns of behaviorsofterms in the sets ofdocuments, we haveproposed a method to identify the remarkable terms as continuous changes of multiple metrics of the terms [4]. Furthermore, we improved this method by clustering the temporal behaviors of the index values of each terms for extracting similar terms at the same time. In this paper, we describe an integrated method for detecting trends of tech- nical terms by combining automatic term extraction methods, importance in- dices of the terms, and temporal clustering in Section 3. After implementing this frameworkwith the three importanceindices,we performeda casestudy to extract remarkable temporal patterns of technical terms on the titles of AAAI and IJCAI in Section 4. By regarding these results, we evaluate the degree of the clustersofknownemergenttermsthatbursttheir appearancefrequenciesin recent years. Finally, in Section 6, we summarize this paper. 2 Related Work Textminingisagenericframeworktofindoutvaluableinformationfromtextual databyusingsomestatisticalmethods.Findingkeywordsinthesetofdocument is one of the useful approaches in text mining. Based on a corpus, by using clusteringmethods,theprocesstofindoutkeywordswithsimilartermsiscalled term clustering [5]. For this approach, rough set theory can be also applied as vocabularymining[6].However,suchconventionaltermclusteringmethodshave not treated temporal information. There exist some conventional studies on the detection of emergent trend of terms/topics/themes in textual data. As the first step, Lent et. al [2] pro- posedamethodforfindingtemporaltrendsofwords.Then,byapplyingvarious metrics such as frequency [7], n-gram [8], and tf-idf [9], researchers developed some emergent term detection (ETD) methods [3]. The methods developed in [10,11] suggested for finding emergent theme patterns on the basis of a finite state machine by using Hidden Markov Model (HMM) as one of the advanced ETD method. Topic modeling [12] is a related method from the viewpoint of temporal text analysis. In these methods, researchers consider the changes in 1 Wecalltheseimportantwordsandphrasesinacorpusas‘keywords’inthisarticle.In addition,wordsandphrasesincludingthekeywordsinthecorpusarecalled‘terms’. Evaluating a Method for Finding Research Keysin Bibliographical Data 3 eachparticular index of the terms rather,andthey considerthe emergenttrend of terms as a discrete status based on appearance of the words. In conventional studies on the detection of emergent words and/or phrases in documents such as Web pages and particular electronic message boards, re- searchers did not explicitly treat the trends of the calculated indices of words and/orphrases.However,basedontwodifferenttechniques,weconsideraframe- work for detecting temporal trends of phrases that consist of from two to nine words.Wehavefocusedonshortphrasesbecauseaconsiderablylongphrasemay be a pattern including grammatical structure and anonymous words, as shown in [11]. As for examining the linear trends of the importance indices in temporal set of documents, we detected the two kinds of trends in our previous study [4,13]. In the previous studies, we can find both of emergent and subsiding technical phrases based on the trends of technical phrases. We used the degree based on the linear regression technique and the intercept for y-axis for ranking the two trends, however, some emergent phrases appears different behaviors based on the valuesofthe importanceindicescomparedtothe composedindices byusing PCA (Principal Component Analysis) [14]. In addition, the ranking lists of the emergent and subsiding remain a difficulty to understand the similarity among the listed terms. In order to overcome the difficulty, we introduced temporal patternextractionphasetothepreviouswork.Thisprovidesanabstractedlayer for understanding the meanings of the groups of similar terms on the basis of temporal behaviors of terms, which are calculated as values of an importance index. 3 An Integrated Method for Detecting Remarkable Trends of Technical Terms as Temporal Patterns of Importance Indices In this section, we describe the difference between conventional ETD methods and our proposal; detecting continuous temporal patterns of terms in temporal sets of documents. As illustratedin Fig.1, in orderto find remarkabletemporaltrends ofterms, we developed a framework for detecting various temporal trends of technical terms by using multiple importance indices consisting of the following four components: 1. Technical term extraction in a corpus 2. Importance indices calculation 3. Temporal pattern extraction 4. Trend detection Therearesomeconventionalmethodsfordetectingtemporaltrendsofkeywords in a corpus on the basis of each particular importance index. Although these methodscalculateeachindexinordertodetectimportantkeywords,information
Description: