Data Mining, Rough Sets and Granular Computing Studies in Fuzziness and Soft Computing Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw, Poland E-mail: [email protected] http://www.springer.de/cgi-binlsearch_book.pl ?series =2 941 Further volumes of this series can Vol. 84. L. C. Jain and J. Kacprzyk (Eds.) be found at our homepage. New Learning Paradigms in Soft Computing, 2002 ISBN 3-7908-1436-9 Vol. 74. H.-N. Teodorescu, L. C. Jain and Vol. 85. D. Rutkowska A. Kandel (Eds.) Neuro-Fuz;:;,' Architectures and Hybrid Learning, Hardware Implementation of Intelligent Systems, 2002 2001 ISBN 3-7908-1438-5 ISBN 3-7908-1399-0 Vol. 86. M. B. Gorzalczany Vol. 75. V. Loia and S. Sessa (Eds.) Computational Intelligence Systems Soft Computing Agents, 2001 and Applications, 2002 ISBN 3-7908-1404-0 ISBN 3-7908-1439-3 Vol. 87. C. Bertoluzza, M.A. Gil and D.A. Ralescu Vol. 76. D. Ruan, J. Kacprzyk and M. Fedrizzi (Eds.) (Eds.) Soft Computing for Risk Evaluation and Statistical Modeling, Analysis and Management Management, 2001 of Fuzzy Data, 2002 ISBN 3-7908-1406-7 ISBN 3-7908-1440-7 Vol. 77. W Liu Vol. 88. R. P. Srivastava and T.J. Mock (Eds.) Propositional. Probabilistic and Evidential Belief Functions in Business Decisions, 2002 Reasoning, 200] ISBN 3-7908-1451-2 ISBN 3-7908-1414-8 Vol. 89. B. Bouchon-Meunier, J. Gutierrez-Rlos. Vol. 78. U. Seiffert and L. C. Jain (Eds.) L. Magdalena and R. R. Yager (Eds.) Self-Organdng Neural Networks, 2002 Technologies for Constructing Intelligent Systems 1, ISBN 3-7908-1417-2 2002 ISBN 3-7908-1454-7 Vol. 79. A. Osyczka Evolutionary Algorithms for Single and Vol. 90. B. Bouchon-Meunier, J. Gutierrez-Rios, Multicriteria Design Optimization, 2002 L. Magdalena and R.R. Yager (Eds.) ISBN 3-7908-1418-0 Technologies for Constructing Intelligent Svstems 2, 2002 Vol. 80. P. Wong, F. Anrinzadeh and M. Nikravesh ISBN 3-7908-1455-5 (Eds.) Soft Computing for Reservoir Characterization Vol. 91. 1.1. Buckley, E. EsIami and T. Feuring Fuzzy Mathematics in Economics and Engineering, and Modeling, 2002 ISBN 3-7908-1421-0 2002 ISBN 3-7908-1456-3 Vol. 81. V. Dimitrov and V. Korotkich (Eds.) Vol. 92. P. P. Angelov Fuz-..-y Logic, 2002 Evolving Rule-Based Models, 2002 ISBN 3-7908-1425-3 ISBN 3-7908-1457-1 Vol. 82. Ch. Carlsson and R. Fuller Vol. 93. v.v. Cross and T. A. Sudkamp Fuz,..'T)' Reasoning in Decision Making and Similllrity and Compatibility in Fuzzy Set Theury, Optimization, 2002 2002 ISBN 3-7908-1428-8 ISBN 3-7908-1458-X Vol. 83. S. Barro and R. Marin (Eds.) Vol. 94. M. MacCrimmon and P. Tillers (Eds.) Fuz-..,y Logic in Medicine, 2002 The Dynamics of Judicial Proof, 2002 ISBN 3-7908-1429-6 ISBN 3-7908-1459-8 Tsau Young Lin Yiyu Y. Yao . Lotfi A. Zadeh Editors Data Mining, Rough Sets and Granular Computing With 104 Figures and 56 Tables Springer-Verlag Berlin Heidelberg GmbH Professor Tsau Young Lin San lose State University The Metropolitan University of Silicon Valley Department of Mathematics and Computer Science One Washington Square San lose, CA 95192-0103 USA [email protected] Professor Yiyu Y. Yao University of Regina Department of Computer Science Regina, Saskatchewan, S4S OA2 Canada [email protected] Professor Lotfi A. Zadeh University of California Berkeley Initiative in Soft Computing (BISC) Computer Science Division and Electronics Research Laboratory Department of Electrical and Electronics Engineering and Computer Science Berkeley, CA 94720-1776 USA [email protected] ISSN 1434-9922 ISBN 978-3-7908-2508-4 ISBN 978-3-7908-1791-1 (eBook) DOI 10.1007/978-3-7908-1791-1 Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Data mining, rough sets, and granular computing: with 56 tables / Tsau Young Lin .. ed. - Heidelberg; New York: Physica-Verl., 2002 (Studies in fuzziness and soft computing; VoI. 95) This work is subject to copyrigbt. AII rights are reserved, whether the whole or part of the material is concemed, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way. and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and pennission for use must always be obtained from Physica· Verlag. Viola tions are liable for prosecution under the German Copyright Law. <E Springer-Verlag Berlin Heidelberg 2002 Originally published by Physica-Verlag Heidelberg in 2002 Softcover reprint ofthe hardcover l st edition 2002 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Hardcover Design: Erich Kirchner, Heidelberg Preface During the past few years, data mining has grown rapidly in visibility and importance within information processing and decision analysis. This is par ticularly true in the realm of e-commerce, where data mining is moving from a "nice-to-have" to a "must-have" status. In a different though related context, a new computing methodology called granular computing is emerging as a powerful tool for the conception, analysis and design of information/intelligent systems. In essence, data mining deals with summarization of information which is resident in large data sets, while granular computing plays a key role in the summarization process by draw ing together points (objects) which are related through similarity, proximity or functionality. In this perspective, granular computing has a position of centrality in data mining. Another methodology which has high relevance to data mining and plays a central role in this volume is that of rough set theory. Basically, rough set theory may be viewed as a branch of granular computing. However, its applications to data mining have predated that of granular computing. This volume is the result of a two-year project aimed at coalescing the concepts and techniques of granular computing on one side, and rough set theory on another. It consists of a collection of up-to-date and authoritative expositions of the basic theories underlying data mining, granular computing and rough set theory, and stresses their wide-ranging applications. A principal aim of our work is to stimulate an exploration of ways in which progress in data mining can be enhanced through integration with granular computing and rough set theory. T.Y. Lin, Y.Y. Yao, L.A. Zadeh Contents Preface v T.Y. Lin, Y.Y. Yao and L.A. Zadeh PART 1: GRANULAR COMPUTING - A NEW PARADIGM Some Reflections on Information Granulation and its Centrality in Granular Computing, Computing with Words, the Computational Theory of Perceptions and Precisiated Natural Language 3 L.A. Zadeh PART 2: GRANULAR COMPUTING IN DATA MINING Data Mining Using Granular Computing: Fast Algorithms for Finding Association Rules 23 T.Y. Lin and E. Louie Knowledge Discovery with Words Using Cartesian Granule Features: An Analysis for Classification Problems 46 J.G. Shanahan Validation of Concept Representation with Rule Induction and Linguistic Variables 91 S. Tsumoto Granular Computing Using Information Tables 102 Y.Y. Yao and N. Zhong A Query-Driven Interesting Rule Discovery Using Association and Spanning Operations 125 J.P. Yoon and L. Kerschberg VIII PART 3: DATA MINING An Interactive Visualization System for Mining Association Rules 145 J. Han, N. Cercone and X. Hu Algorithms for Mining System Audit Data 166 W. Lee, S.J. Stolfo and K.W. Mok Scoring and Ranking the Data Using Association Rules 190 B. Liu, Y. Ma and C.K. Wong Finding Unexpected Patterns in Data 216 B. Padmanabhan and A. Tuzhilin Discovery of Approximate Knowledge in Medical Databases Based on Rough Set Model 232 S. Tsumoto PART 4: GRANULAR COMPUTING Observability and the Case of Probability 249 C. Alsina, J. Jacas and E. Trillas Granulation and Granularity via Conceptual Structures: A Perspective Prom the Point of View of Fuzzy Concept Lattices 265 R. Belohlavek Granular Computing with Closeness and Negligibility Relations 290 D. Dubois, A. Hadj-Ali and H. Prade Application of Granularity Computing to Confirm Compliance with Non-Proliferation Treaty 308 A. Fattah, V. Pouchkarev, A.Belenki, A.Ryjov and L.A. Zadeh IX Basic Issues of Computing with Granular Probabilities 339 G.J. Klir Multi-dimensional Aggregation of Fuzzy Numbers Through the Extension Principle 350 G. Mayor, A.R. de Soto, J. Suiier and E. Trillas On Optimal Fuzzy Information Granulation 364 A. Ryjov Ordinal Decision Making with a Notion of Acceptable: Denoted Ordinal Scales 398 R.R. Yager A Framework for Building Intelligent Information-Processing Systems Based on Granular Factor Space 414 F. Yu and C. Huang PART 5: ROUGH SETS AND GRANULAR COMPUTING GRS: A Generalized Rough Sets Model 447 X. Hu, N. Cercone, J. Han and W. Ziarko Structure of Upper and Lower Approximation Spaces of Infinite Sets 461 D.S. Malik and J.N. Mordeson Indexed Rough Approximations, A Polymodal System, and Generalized Possibility Measures 474 S. Miyamoto Granularity, Multi-valued Logic, Bayes' Theorem and Rough Sets 487 Z. Pawlak The Generic Rough Set Inductive Logic Programming (gRS-ILP) Model 499 A. Siromoney and K. Inoue Possibilistic Data Analysis and Its Similarity to Rough Sets 518 H. Tanaka and P. Guo Part 1 Granular Computing - A New Paradigm Some Reflections on Information Granulation and its Centrality in Granular Computing, Computing with Words, the Computational Theory of Perceptions and Precisiated Natural Language Lotfi A. Zadeh Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; : [email protected]. The past few years have witnessed what in retrospect may be seen as a turning point in the evolution of fuzzy logic. What I have in mind is the debut of four linked methodologies: granular computing, computing with words, the computational theory of perceptions and precisiated natural language. What follows is a view of the links between the underlying structures of these methodologies - a view which is presented from a personal perspective. In our quest for machines which are capable of performing complex tasks, we are developing a better understanding of the centrality of information granulation in human cognition, reasoning and decision-making [2]. In many contexts, information granulation is a reflection of the bounded ability of sensory organs and, ultimately, the brain, to resolve detail and store information. In other contexts, granulation is employed to solve a complex problem by partitioning it into simpler subproblems. This is the essence of the strategy of divide and conquer. In a general setting, a granule is a clump of real or mental objects (points) drawn together by indistinguishability, similarity, proximity or functionality. Modes of information granulation (IG) in which granules are crisp, play important roles in a wide variety of methods, approaches and techniques. Among them are: interval analysis, quantization, chunking, rough set theory, diakoptics, divide and conquer, Dempster-Shafer theory, machine learning from examples, qualitative process theory, decision trees, semantic networks, analog-to-digital conversion, constraint programming, cluster analysis and many others. In this context, particularly worthy of note is Professor Pawlak's theory of rough sets [1]. To Professors Z. Pawlak and J. Kacprzyk Research supported in part by ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-1177, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.