e n i l b u S l a n r u o J Transactions on 0 2 0 0 1 Rough Sets XX S C N L James F. Peters · Andrzej Skowron Editors-in-Chief 123 Lecture Notes in Computer Science 10020 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at http://www.springer.com/series/7151 James F. Peters Andrzej Skowron (Eds.) (cid:129) Transactions on Rough Sets XX 123 Editors-in-Chief James F.Peters Andrzej Skowron University of Manitoba University of Warsaw Winnipeg, MB Warsaw Canada Poland ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Computer Science ISSN 1861-2059 ISSN 1861-2067(electronic) Transactions onRough Sets ISBN 978-3-662-53610-0 ISBN978-3-662-53611-7 (eBook) DOI 10.1007/978-3-662-53611-7 LibraryofCongressControlNumber:2016954935 ©Springer-VerlagGmbHGermany2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringer-VerlagGmbHGermany Theregisteredcompanyaddressis:HeidelbergerPlatz3,14197Berlin,Germany Preface VolumeXXoftheTransactionsonRoughSets(TRS)isacontinuationofanumberof researchstreamsthathavegrownoutoftheseminalworkofZdzisławPawlak1during the first decade of the twenty-first century. The paper co-authored by Javad Rahimipour Anaraki, Saeed Samet, Wolfgang Banzhaf,andMahdiEftekhariintroducesanewhybridmeritbasedonaconjunctionof correlationfeatureselectionandfuzzy-roughfeatureselectionmethods.Thenewmerit selects fewer redundant features and finds the most relevant features resulting in rea- sonable classification accuracy. The paper co-authored by Mohammad Azad, Mikhail Moshkov, and Beata Zielosko presents a study of a greedy algorithm for construction of approximate decision rules. This algorithm has polynomial time complexity for binary decision tables with many-valued decisions. The proposed greedy algorithm constructs relatively short α-decision rules. The paper by Mani presents algebraic semanticsofproto-transitiveroughsets.Proto-transitivity,accordingtotheauthor,can be considered as a possible generalization of transitivity that happens often in the context of applications. The paper by Piero Pagliani presents a uniform approach to previously introduced covering-based approximation operators from the point of view ofpointlesstopology.ThemonographauthoredbyMohammadAquilKhanisdevoted to the study of multiple-source approximation systems, evolving information systems, and corresponding logics based on rough sets. The editors would like to express their gratitude to the authors of all submitted papers. Special thanks are due to the following reviewers: Jan Bazan, Chris Cornelis, Davide Cuicci, Ivo Düntsch, Soma Dutta, Jouni Järvinen, Richard Jensen, Pradipta Maji, Sheela Ramanna, Zbigniew Suraj, and Marcin Wolski. The editors and authors of this volume extend their gratitude to Alfred Hofmann, ChristineReiss,andtheLNCSstaffatSpringerfortheirsupportinmakingthisvolume of TRS possible. The Editors-in-Chief were supported by the Polish National Science Centre (NCN)grantsDEC-2012/05/B/ST6/06981andDEC-2013/09/B/ST6/01568,thePolish National Centre for Research and Development (NCBiR) DZP/RID-I-44/8/NCBR/ 2016, as well as the Natural Sciences and Engineering Research Council of Canada (NSERC) discovery grant 185986. August 2016 James F. Peters Andrzej Skowron 1 See, e.g., Pawlak, Z., A Treatise on Rough Sets, Transactions on Rough Sets IV, (2006), 1–17. See, also, Pawlak, Z., Skowron, A.: Rudiments of rough sets, Information Sciences 177 (2007) 3–27; Pawlak, Z., Skowron, A.: Rough sets: Some extensions, Information Sciences 177 (2007) 28–40; Pawlak, Z., Skowron, A.: Rough sets and Boolean reasoning, Information Sciences 177 (2007) 41–73. LNCS Transactions on Rough Sets The Transactions on Rough Sets series has as its principal aim the fostering of professional exchanges between scientists and practitioners who are interested in the foundationsandapplicationsofroughsets.Topicsincludefoundationsandapplications of rough sets as well as foundations and applications of hybrid methods combining roughsetswithotherapproachesimportantforthedevelopmentofintelligentsystems. Thejournalincludeshigh-qualityresearcharticlesacceptedforpublicationonthebasis of thorough peer reviews. Dissertations and monographs of up to 250 pages that include new research results can also be considered as regular papers. Extended and revisedversionsofselectedpapersfromconferencescanalsobeincludedinregularor special issues of the journal. Editors-in-Chief: James F. Peters, Andrzej Skowron Managing Editor: Sheela Ramanna Technical Editor: Marcin Szczuka Editorial Board Mohua Banerjee Ewa Orłowska Jan Bazan Sankar K. Pal Gianpiero Cattaneo Lech Polkowski Mihir K. Chakraborty Henri Prade Davide Ciucci Sheela Ramanna Chris Cornelis Roman Słowiński Ivo Düntsch Jerzy Stefanowski Anna Gomolińska Jarosław Stepaniuk Salvatore Greco Zbigniew Suraj Jerzy W. Grzymała-Busse Marcin Szczuka Masahiro Inuiguchi Dominik Ślȩzak Jouni Järvinen Roman Świniarski Richard Jensen Shusaku Tsumoto Bożena Kostek Guoyin Wang Churn-Jung Liau Marcin Wolski Pawan Lingras Wei-Zhi Wu Victor Marek Yiyu Yao Mikhail Moshkov Ning Zhong Hung Son Nguyen Wojciech Ziarko Contents A New Fuzzy-Rough Hybrid Merit to Feature Selection . . . . . . . . . . . . . . . 1 Javad Rahimipour Anaraki, Saeed Samet, Wolfgang Banzhaf, and Mahdi Eftekhari Greedy Algorithm for the Construction of Approximate Decision Rules for Decision Tables with Many-Valued Decisions. . . . . . . . . . . . . . . . . . . . 24 Mohammad Azad, Mikhail Moshkov, and Beata Zielosko Algebraic Semantics of Proto-Transitive Rough Sets . . . . . . . . . . . . . . . . . . 51 A. Mani Covering Rough Sets and Formal Topology – A Uniform Approach Through Intensional and Extensional Constructors. . . . . . . . . . . . . . . . . . . . 109 Piero Pagliani Multiple-Source Approximation Systems, Evolving Information Systems and Corresponding Logics: A Study in Rough Set Theory . . . . . . . . . . . . . . 146 Md. Aquil Khan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 A New Fuzzy-Rough Hybrid Merit to Feature Selection B Javad Rahimipour Anaraki1( ), Saeed Samet2, Wolfgang Banzhaf3, and Mahdi Eftekhari4 1 Department of Computer Science, Memorial University of Newfoundland, St. John’s, Nl A1B 3X5, Canada [email protected] 2 Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Nl A1B 3V6, Canada [email protected] 3 Department of Computer Science, Memorial University of Newfoundland, St. John’s, Nl A1B 3X5, Canada [email protected] 4 Department of Computer Engineering, Shahid Bahonar University of Kerman, 7616914111 Kerman, Iran [email protected] Abstract. Feature selecting is considered as one of the most impor- tant pre-process methods in machine learning, data mining and bioin- formatics. By applying pre-process techniques, we can defy the curse of dimensionality by reducing computational and storage costs, facilitate data understanding and visualization, and diminish training and test- ing times, leading to overall performance improvement, especially when dealing with large datasets. Correlation feature selection method uses a conventionalmerittoevaluatedifferentfeaturesubsets.Inthispaper,we propose a new merit by adapting and employing of correlation feature selection in conjunction with fuzzy-rough feature selection, to improve theeffectivenessandqualityoftheconventionalmethods.Italsooutper- formsthenewlyintroducedgradientboostedfeatureselection,byselect- ingmorerelevantandlessredundantfeatures.Thetwo-stepexperimental resultsshowtheapplicabilityandefficiencyofourproposedmethodover some well known and mostly used datasets, as well as newly introduced ones,especiallyfromtheUCIcollectionwithvarioussizesfromsmallto large numbers of features and samples. · · Keywords: Feature selection Fuzzy-rough dependency degree Correlation merit 1 Introduction Each year the amount of generated data increases dramatically. This expansion needs to be handled to minimize the time and space complexities as well as the (cid:2)c Springer-VerlagGmbHGermany2016 J.F.PetersandA.Skowron(Eds.):TRSXX,LNCS10020,pp.1–23,2016. DOI:10.1007/978-3-662-53611-71 2 J.R. Anaraki et al. comprehensibilitychallengesinherentinbigdatasets.Machinelearningmethods tend to sacrifice some accuracy to decrease running time, and to increase the clarity of the results [1]. Datasets may contain hundreds of thousand of samples with thousands of features that make further processing on data a tedious job. Reduction can be done on either features or on samples. However, due to the high cost of sam- ple gathering and their undoubted utility, such as in bioinformatics and health systems, data owners usually prefer to keep only the useful and informative fea- tures and remove the rest, by applying Feature Selection (FS) techniques that areusuallyconsideredasapreprocessingsteptofurtherprocessing(suchasclas- sification).Thesemethodsleadtolessclassificationerrorsoratleasttominimal diminishing of performance [2]. In terms of data usability, each dataset contains three types of features: 1- informative, 2- redundant, and 3- irrelevant. Informative features are those that contain enough information on the classification outcome. In other words, they are non-redundant, relevant features. Redundant features contain identi- cal information compared to other features, whereas irrelevant features have no information about the outcome. The ideal goal of FS methods is to remove the last two types of features [1]. FS methods can generally be divided into two main categories [3]. One app- roach is wrapper based, in which a learning algorithm estimates the accuracy of the subset of features. This approach is computationally intensive and slow due to the large number of executions over selected subsets of features, that make it impractical for large datasets. The second approach is filter based, in which features are selected based on their quality regardless of the results of learning algorithm. As a result, it is fast but less accurate. Also, a combinational app- roach of both methods called embedded has been proposed to accurately handle big datasets [4]. Inthemethods basedonthis approach, featuresubsetselection is done while classifier structure is being built. One of the very first feature selection methods for binary classification datasets is Relief [5]. This method constructs and updates a weight vector of a feature, based on the nearest feature vector of the same and different classes using Euclidean distance. After a predefined number of iterations l, relevant vector is calculated by dividing the weight vector by l, and the features with relevancy higher than a specific threshold will be selected. Hall [1] has proposed ameritbasedontheaverageintra-correlationoffeaturesandinter-correlationof featuresandtheoutcome.Thosefeatureswithhighercorrelationtotheoutcome and lower correlation to other features are selected. Jensen et al. [6] have introduced a novel feature selection method based on lower approximation of the fuzzy-rough set, in which features and outcome dependencies are calculated using a merit called Dependency Degree (DD). In [7], two modifications of the fuzzy-rough feature selection have been introduced to improve the performance of the conventional method: 1- Encompassing the selection process in equal situations, where more than one feature result in an identical fitness value by using correlation merit [1] and 2- Combining the first
Description: