Lecture Notes in Artificial Intelligence 3275 EditedbyJ.G.CarbonellandJ.Siekmann Subseries of Lecture Notes in Computer Science Petra Perner (Ed.) Advances in Data Mining Applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications 4th Industrial Conference on Data Mining, ICDM 2004 Leipzig, Germany, July 4-7, 2004 Revised Selected Papers 1 3 SeriesEditors JaimeG.Carbonell,CarnegieMellonUniversity,Pittsburgh,PA,USA JörgSiekmann,UniversityofSaarland,Saarbrücken,Germany VolumeEditor PetraPerner InstituteofComputerVisionandAppliedComputerSciences Körnerstr.10,04107Leipzig,Germany E-mail:[email protected] LibraryofCongressControlNumber:2004116339 CRSubjectClassification(1998):I.2.6,I.2,H.2.8,K.4.4,J.3,I.4,J.6,J.1 ISSN0302-9743 ISBN3-540-24054-3SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. SpringerisapartofSpringerScience+BusinessMedia springeronline.com ©Springer-VerlagBerlinHeidelberg2004 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:11365860 06/3142 543210 Preface The Industrial Conference on Data Mining ICDM-Leipzig was the fourth meeting in a series of annual events which started in 2000, organized by the Institute of Computer Vision and Applied Computer Sciences (IBaI) in Leipzig. The mission of the conference is to bring together researchers and people from industry in order to discuss together new trends and applications in data mining. This year a broad spectrum of work of different applications was presented ranging from image mining, medicine and biotechnology, management and environmental control, to telecommunications. Besides that an industrial exhibition showed the successful application of data mining methods by industries in different areas such as medical devices, mass data management systems, data mining tools, etc. During the discussion many projects were inspired leading to new and joint work. The fruitful discussions, the exchange of ideas and the spirit of the conference made it a remarkable event for both sides, industry and research. We would like to express our appreciation to the reviewers for their precise and highly professional work. We appreciate the help and understanding of the editorial staff at Springer and in particular Alfred Hofmann, who supported the publication of these proceedings in the LNAI series. Last, but not least, we wish to thank all speakers, participants and industrial exhibitors who contributed to the success of the conference. We are looking forward to welcoming you to ICDM 2005 (www.data-mining- forum.de) and to the new work you will present there. July 2004 Petra Perner Table of Contents Case-Based Reasoning Neuro-symbolic System for Business Internal Control Juan M. Corchado, M. Lourdes Borrajo, María A. Pellicer, J. Carlos Yáñez……………………………………………………………….. …1 Applying Case Based Reasoning Approach in Analyzing Organization Change Management Data Orit Raphaeli, Jacob Zahavi, Ron Kenett……………………………..……… .11 Improving the K-NN Classification with the Euclidean Distance Through Linear Data Transformations Leon Bobrowski, Magdalena Topczewska……………….…..………………. 23 An IBR System to Quantify the Ocean’s Carbon Dioxide Budget Juan M. Corchado, Emilio S. Corchado, Jim Aiken……………………..…… 33 A Beta-Cooperative CBR System for Constructing a Business Management Model Emilio S. Corchado, Juan M. Corchado, Lourdes Sáiz, Ana Lara…………… 42 Image Mining Braving the Semantic Gap: Mapping Visual Concepts from Images and Videos Da Deng……………………………………………………………………… 50 Mining Images to Find General Forms of Biological Objects Petra Perner, Horst Perner, Angela Bühring, Silke Jänichen……………….. 60 Applications in Process Control and Insurance The Main Steps to Data Quality Joachim Schmid……………………………………………………………… 69 Cost-Sensitive Design of Claim Fraud Screens Stijn Viaene, Dirk Van Gheel, Mercedes Ayuso, Montserrat Guillén…….…. 78 An Early Warning System for Vehicle Related Quality Data Matthias Grabert, Markus Prechtel, Tomas Hrycej, Winfried Günther…………..88 VIII Table of Contents Clustering and Association Rules Shape-Invariant Cluster Validity Indices Greet Frederix, Eric J. Pauwels…………..……………………………...…… 96 Mining Indirect Association Rules Shinichi Hamano, Masako Sato…………………………………………..…... 106 An Association Mining Method for Time Series and Its Application in the Stock Prices of TFT-LCD Industry Chiung-Fen Huang, Yen-Chu Chen, An-Pin Chen…………………………… 117 Clustering of Web Sessions Using Levenstein Metric Andrei Scherbina, Sergey Kuznetsov………….………………………..…….. 127 Telecommunication A Data Mining Approach for Call Admission Control and Resource Reservation in Wireless Mobile Networks Sherif Rashad, Mehmed Kantardzic, Anup Kumar………………....………... 134 Mining of an Alarm Log to Improve the Discovery of Frequent Patterns Françoise Fessant, Fabrice Clérot, Christophe Dousson………………..….. 144 Medicine and Biotechnology Feature Selection and Classification Model Construction on Type 2 Diabetic Patient's Data Yue Huang, Paul McCullagh, Norman Black, Roy Harper……………..…… 153 Knowledge Based Phylogenetic Classification Mining Isabelle Bichindaritz, Stephen Potter, Société Française de Systématique…. 163 Author Index…………………………………………………………………… 173 Neuro-symbolic System for Business Internal Control Juan M. Corchado1 , M. Lourdes Borrajo2, María A. Pellicer1, and J. Carlos Yáñez3 1 Deparatamento de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain [email protected] 2 Department of Computer Science, University of Vigo, Campus As Lagoas, s/n, 32004 Ourense, Spain 3 Department of Financial Accounting, University of Vigo, Campus As Lagoas, s/n, 32004 Ourense, Spain {lborrajo, jcyanez}@uvigo.es Abstract. The complexity of current organization systems, and the increase in importance of the realization of internal controls in firms, make it necessary to construct models that automate and facilitate the work of auditors. An intelligent system has been developed to automate the internal control process. This system is composed of two case-based reasoning systems. The objective of the system is to facilitate the process of internal auditing in small and medium firms from the textile sector. The system, analyses the data that characterises each one of the activities carried out by the firm, then determines the state of each activity, calculates the associated risk, detects the erroneous processes, and generates recommendations to improve these processes. As such, the system is a useful tool for the internal auditor in order to make decisions based on the risk generated. Each one of the case-based reasoning systems that integrates the system uses a different problem solving method in each of the steps of the reasoning cycle: fuzzy clustering during the retrieval phase, a radial basis function network and a multi-criterion discreet method during the reuse phase and a rule based system for recommendation generation. The system has been proven successfully in several small and medium companies in the textile sector, located in the northwest of Spain. The accuracy of the technologies employed in the system has been demonstrated by the results obtained over the last two years. 1 Introduction Nowadays, organization systems employed in enterprises are increasing in complexity. Moreover, in recent years, the number of regulatory norms has increased considerably. As a consequence of this, the need has arisen for periodic internal audits. But the evaluation and the prediction of the evolution of these types of systems, characterized by their great dynamism, are, in general, complicated. It is necessary to construct models that facilitate analysis work carried out in changing environments, such as finance. P. P erner (Ed.): ICDM 2004, LNAI 3275, pp. 1–10, 2004. © Springer-Verlag Berlin Heidelberg 2004 2 J.M. Corchado et al. The processes carried out inside a firm can be included in functional areas [19]. Each one of these areas is denominated a “Function”. A Function is a group of coordinated and related activities, which are necessary to reach the objectives of the firm and are carried out in a systematic and reiterated way [11]. Functions are divided into activities, which are associated to well defined objectives. The functions that are usually carried out within a firm are: Purchases, Cash Management, Sales, Information Technology, Fixed Assets Management, Compliance to Legal Norms and Human Resources. In turn, each one of these functions is broken down into a series of activities. For example, the function Information Technology is divided in the following areas: Computer Plan Development, Study of Systems, Installation of Systems, Treatment of Information Flows and Security Management. Each activity is composed of a number of tasks, for example: register, authorise, approve, harmonise, separate obligations, operate, etc. Control procedures are established in the tasks to assure that the established objectives are achieved. Rule- based systems (RBS) have traditionally been used with the purpose of delimiting the audit decision-making tasks [6]. However, Messier and Hansen [13] found many situations in which auditors resolved problems by referring to previous situations. This contrasts with the very nature of RBS systems, since they have very little capacity for extracting information from past experience and present problems in order to adapt to changes in the environment. In contrast, case based reasoning systems (CBR) are able to relate past experiences or cases to current observations, solving new problems through the memorization and adaptation of previously tested solutions. This is an effective way of learning, similar to the general structure of human thought. CBR systems are especially suitable when the rules that define a knowledge system are difficult to obtain, or the number and complexity of the rules is too large to create an expert system. Moreover, CBR systems have the capacity to update their memory dynamically, based on new information (new cases), as well as, improving the resolution of problems [14]. However, in problems like those presented in this study, standard techniques of monitoring and prediction cannot be applied due to the complexity of the problem, the existence of certain preliminary knowledge, the great dynamism of the system, etc. In these types of systems it is necessary to use models that combine the advantages of several mechanisms of problem-solving capable to of resolving specific parts of the general problem and attending other parts. In this sense, an adaptive system has been developed. The system possesses the flexibility to behave in different ways and to evolve, depending on the environment in which it operates. The developed system is composed of two fundamental subsystems: (cid:131) Subsystem IEA (Identification of the State of the Activity) whose objectives are: 1. to identify the state or situation of each one of activities of the company. 2. to calculate the risk associated with this state. (cid:131) Subsystem GR (Generation of Recommendations), whose goal is: 1. generation of recommendations from the detection of inconsistent processes. These recommendations will allow the positive evolution of the internal processes of the company.