ebook img

Decomposition Methodology for Knowledge Discovery and Data Mining. Theory and Application PDF

345 Pages·4.48 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Decomposition Methodology for Knowledge Discovery and Data Mining. Theory and Application

DECOMPOSITION METHODOLOGY FOR KNOWLEDGE DISCOVERY AND DATA MINING THEORY AND APPLICATIONS SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE* Editors: H. Bunke (Univ. Bern, Switzerland) P. S. P. Wang (Northeastern Univ., USA) - ~ ~ Vol. 46: Syntactic Pattern Recognition for Seismic Oil Exploration (K. Y. Huang) Vol. 47: Hybrid Methods in Pattern Recognition (Eds. H. Bunke and A. Kandel) Vol. 48: Multimodal Interface for Human-MachineC ommunications (Eds. P. C. Yuen, Y. Y. Tang and P. S. P. Wang) VOl. 49: Neural Networks and Systolic Array Design (Eds. D. Zhang and S. K. Pal) Vol. 50: Empirical Evaluation Methods in Computer Vision (Eds. H. 1. Christensen and P. J. Phillips) Vol. 51: Automatic Diatom Identification (Eds. H. du Buf and M. M. Bayed Vol. 52: Advances in Image Processing and Understanding A Festschrift for Thomas S. Huwang (Eds. A. C. Bovik, C. W. Chen and D. Goldgof) VOl. 53: Soft Computing Approach to Pattern Recognition and Image Processing (Eds. A. Ghosh and S. K. Pal) VOl. 54: Fundamentals of Robotics - Linking Perception to Action (M. Xie) VOl. 55: Web Document Analysis: Challenges and Opportunities (Eds. A. Antonacopoulos and J. Hu) Vol. 56: Artificial Intelligence Methods in Software Testing (Eds. M. Last, A. Kandel and H. Bunke) VOl. 57: Data Mining in Time Series Databases (Eds. M. Last, A. Kandel and H. Bunke) Vol. 58: Computational Web Intelligence: Intelligent Technology for Web Applications (Eds. Y. Zhang, A. Kandel, T. Y. Lin and Y. Yao) Vol. 59: Fuzzy Neural Network Theory and Application (P. Liu and H. Lo Vol. 60: Robust Range Image Registration Using Genetic Algorithms and the Surface Interpenetration Measure (L. Silva, 0. R. P. Bellon and K. L. Boyer) Vol. 61 : Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications (0. Maimon and L. Rokach) *For the complete list of titles in this series, please write to the Publisher. DECOMPOSITION METHODOLOGY FOR KNOWLEDGE DISCOVERY AND DATA MINING THEORY AND APPLICATIONS Oded Maimon Lior Rokach Tel-Aviv University, Israel World Scientific - - NEW JERSEY * LONDON * SINGAPORE * BEIJING SHANGHAI HONG KONG * TAIPEI * CHENNAI Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA ofice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK ofice: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-PublicationD ata A catalogue record for this book is available from the British Library. DECOMPOSITION METHODOLOGY FOR KNOWLEDGE DISCOVERY AND DATA MINING: THEORY AND APPLICATIONS - Series in Machine Perception and Artificial Intelligence Vol. 61 Copyright 0 2005 by World Scientific Publishing Co. Re. Ltd. All rights reserved. This book, or parts there01 may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, wifhout written permissionfiotn the Publisher. For photocopying of material in this volume, please pay acopying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 98 1-256-079-3 Printed in Singapore by World Scientific Printers (S) Ple Lld To our families Oded Maimon and Lior Rokach Preface Data mining is the science and technology of exploring data in order to discover previously unknown patterns. It is a part of the overall process of knowledge discovery in databases (KDD). The accessibility and abun- dance of information today makes data mining a matter of considerable importance and necessity. One of the most practical approaches in data mining is to use induction algorithms for constructing a model by generalizing from given data. The induced model describes and explains phenomena which are hidden in the data. Given the recent growth of the field as well as its long history, it is not surprising that several mature approaches to induction are now available to the practitioner. However according to the “no free lunch” theorem, there is no single approach that outperforms all others in all possible domains. Evidently, in the presence of a vast repertoire of techniques and the com- plexity and diversity of the explored domains, the main challenge today in data mining is to know how to utilize this repertoire in order to achieve maximum reliability, comprehensibility and complexity. Multiple classifiers methodology is considered an effective way of over- coming this challenge. The basic idea is to build a model by integrating multiple models. Researchers distinguish between two multiple classifier methodologies: ensemble methodology and decomposition methodology. Ensemble methodology combines a set of models, each of which solves the same original task. Decomposition methodology breaks down the classifica- tion task into several manageable classification tasks, enabling each inducer to solve a different task This book focuses on decomposition in general data mining tasks and for classification tasks in particular. The book presents a complete method- ology for decomposing classification problems into smaller and more man- vii viii Preface ageable sub-problems that are solvable by using existing tools. The various elements are then joined together to solve the initial problem. The benefits of decomposition methodology in data mining include: in- creased performance (classification accuracy); conceptual simplification of the problem; enhanced feasibility for huge databases; clearer and more comprehensible results; reduced runtime by solving smaller problems and by using parallel/distributed computation; and the opportunity of using different solution techniques for individual sub-problems. These features are discussed in the book. Obviously the most essential question that decomposition methodology should be able to answer is whether a given classification problem should be decomposed and in what manner. The main theory presented in this book is that the decomposition can be achieved by recursively performing a se- quence of single, elementary decompositions. The book introduces several fundamental and elementary decomposition methods, namely: Attribute Decomposition, Space Decomposition, Sample Decomposition, Function Decomposition, and Concept Decomposition. We propose a unifying frame- work for using these methods in real applications. The book shows that the decomposition methods developed here extend the envelope of problems that data mining can efficiently solve. These methods also enhance the comprehensibility of the results that emerge and suggest more efficient implementation of knowledge discovery conclusions. In this comprehensive study of decomposition methodology, we try to answer several vital questions: What types of elementary decomposition methods exist in concept learning? Which elementary decomposition type performs best for which problem? What factors should one take into account when choosing the appropriate decomposition type? Given an elementary type, how should we infer the best decompo- 0 sition structure automatically? How should the sub-problems be re-composed to represent the orig- inal concept learning? The decomposition idea shares properties with other fields mainly en- semble methods, structured induction and distributed data mining. Nu- merous researches have been performed in these areas and the methodol- ogy described in this book exploits the fruits of these insightful studies. However, the book introduces a broader methodology, which results from Preface ix a rather different motivation: the desire to decompose data mining tasks and gain the benefit mentioned above. This book was written to provide investigators in the fields of informa- tion systems, engineering, computer science, statistics and management, with a comprehensive source for decomposition techniques. In addition, those engaged in the social sciences, psychology, medicine, genetics, and other data-rich fields can very much benefit from this book. Much of the material in this book has been developed and taught in undergraduate and graduate courses at Tel Aviv University. In particular we would like to acknowledge four distinguished graduate students that contributed to this book: Omri Arad, Lital Keshet, Inbal Lavi and Anat Okon. Therefore, the book can also serve as a text or reference book for graduate/advanced undergraduate level courses in data mining and ma- chine learning. Practitioners among the readers may be particularly inter- ested in the descriptions of real-world data mining projects performed with decomposition methodology. Oded Maimon Lior Rokach

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.