ebook img

Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications PDF

200 Pages·2017·6.781 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications

Muhammad Summair Raza Usman Qamar Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications Muhammad Summair Raza (cid:129) Usman Qamar Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications MuhammadSummairRaza UsmanQamar DepartmentofComputerEngineering, DepartmentofComputerEngineering, CollegeofElectrical&Mechanical CollegeofElectrical&Mechanical Engineering Engineering NationalUniversityofSciences NationalUniversityofSciences andTechnology(NUST) andTechnology(NUST) Rawalpindi,Pakistan Rawalpindi,Pakistan ISBN978-981-10-4964-4 ISBN978-981-10-4965-1 (eBook) DOI10.1007/978-981-10-4965-1 LibraryofCongressControlNumber:2017942777 ©TheEditor(s)(ifapplicable)andTheAuthor(s)2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthis book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained hereinor for anyerrors oromissionsthat may havebeenmade. Thepublisher remainsneutralwith regardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface Rough set theory (RST) has become a prominent tool for data science in various domains due to its analysis-friendly nature. From scientific discovery to business intelligence, both practitioners and scientists are using RST in various domains. Feature selection (FS) community is one to name. Various algorithms have been proposedinliteratureusingRST,andalotofsearchisstillinprogress. For any practitioner and research community, this book provides a strong foundationoftheconceptsofRSTandFS.Itstartswiththeintroductionoffeature selection and rough set theory (along with the working examples) right to the advancedconcepts.Sufficientexplanationisprovidedforeachconceptsothatthe reader does not need any other source. A complete library of RST-based APIs implementingfullRSTfunctionalityisalsoprovidedalongwithdetailedexplana- tionofeachoftheAPI. The primary audience of this book is the research community using rough set theory (RST) to perform feature selection (FS) on large-scale datasets in various domains.However,anycommunityinterestedinfeatureselectionsuchasmedical, banking,finance,etc.canalsobenefitfromthebook. Rawalpindi,Pakistan MuhammadSummairRaza Rawalpindi,Pakistan UsmanQamar v Contents 1 IntroductiontoFeatureSelection. . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Numerical. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 CategoricalAttributes. . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 FeatureSelection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 SupervisedFeatureSelection. . . . . . . . . . . . . . . . . . . . 4 1.2.2 UnsupervisedFeatureSelection. . . . . . . . . . . . . . . . . . 6 1.3 FeatureSelectionMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 FilterMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 WrapperMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3 EmbeddedMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 ObjectiveofFeatureSelection. . .. . . . . . .. . . . . . .. . . . . . .. . 11 1.5 FeatureSelectionCriteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.1 InformationGain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.2 Distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.3 Dependency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.4 Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.5 ClassificationAccuracy. . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 FeatureGenerationSchemes. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.6.1 ForwardFeatureGeneration. . . . . . . . . . . . . . . . . . . . . 15 1.6.2 BackwardFeatureGeneration. . . . . . . . . . . . . . . . . . . . 16 1.6.3 RandomFeatureGeneration. . . . . . . . . . . . . . . . . . . . . 17 1.7 RelatedConcepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.7.1 SearchOrganization. . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.7.2 GenerationofaFeatureSelectionAlgorithm. . . . . . . . . 18 1.7.3 FeatureRelevance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7.4 FeatureRedundancy. . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7.5 ApplicationsofFeatureSelection. . . . . . . . . . . . . . . . . 20 1.7.6 FeatureSelection:Issues. . . . . . . . . . . . . . . . . . . . . . . 21 vii viii Contents 1.8 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1 CurseofDimensionality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Transformation-BasedReduction. . . . .. . . . . . . . .. . . . . . . . .. 28 2.2.1 LinearMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.2 NonlinearMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Selection-BasedReduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 FeatureSelectioninSupervisedLearning. . . . . . . . . . . 36 2.3.2 FilterTechniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.3 WrapperTechniques. . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.4 FeatureSelectioninUnsupervisedLearning. . . . . . . . . 40 2.4 Correlation-BasedFeatureSelection. . . . . . . . . . . . . . . . . . . . . 42 2.4.1 Correlation-BasedMeasures. . . . . . . . . . . . . . . . . . . . . 43 2.4.2 Correlation-BasedFilterApproach(FCBF). . . . . . . . . . 44 2.4.3 EfficientFeatureSelectionBasedonCorrelation Measure(ECMBF). . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5 MutualInformation-BasedFeatureSelection. . . . . . . . . . . . . . . 46 2.5.1 AMutualInformation-BasedFeatureSelectionMethod (MIFS-ND). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.2 Multi-objectiveArtificialBeeColony(MOABC) Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 RoughSetTheory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1 ClassicalSetTheory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Sets. . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . 53 3.1.2 Subsets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.3 PowerSets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.4 Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.5 MathematicalSymbolsforSetTheory. . . . . . . . . . . . . 56 3.2 KnowledgeRepresentationandVagueness. . . . . . . . . . . . . . . . . 56 3.3 RoughSetTheory(RST). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 InformationSystems. . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 DecisionSystems. .. . . . . . . . .. . . . . . . .. . . . . . . .. . 59 3.3.3 Indiscernibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.4 Approximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3.5 PositiveRegion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.6 DiscernibilityMatrix. . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.7 DiscernibilityFunction. . . . . . . . . . . . . . . . . . .. . . . . . 63 3.3.8 Decision-RelativeDiscernibilityMatrix. . . . . . . . . . . . 63 3.3.9 Dependency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.10 ReductsandCore. . . . .. . . . . .. . . . . .. . . . .. . . . . .. 68 Contents ix 3.4 DiscretizationProcess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.5 MiscellaneousConcepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.6 ApplicationsofRST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4 AdvanceConceptsinRST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 FuzzySetTheory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1.1 FuzzySet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1.2 FuzzySetsandPartialTruth. . . . . . . . . . . . . . . . . . . . . 83 4.1.3 MembershipFunction. . . . . . . . . . . . . . . . . . . . . . . . . 83 4.1.4 FuzzyOperators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.5 FuzzySetRepresentation. . . . . . . . . . . . . . . . . . . . . . . 86 4.1.6 FuzzyRules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2 Fuzzy-RoughSetHybridization. . . . . . . . . . . . . .. . . . . . . . . . . 88 4.2.1 SupervisedLearningandInformationRetrieval. . . . . . . 89 4.2.2 FeatureSelection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.3 RoughFuzzySet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.4 Fuzzy-RoughSet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3 DependencyClasses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.1 IncrementalDependencyClasses(IDC). . . . . . . . . . . . 92 4.3.2 DirectDependencyClasses(DDC). . . . . . . . . . . . . . . . 97 4.4 RedefinedApproximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.4.1 RedefinedLowerApproximation. . . . . . . . . . . . . . . . . 102 4.4.2 RedefinedUpperApproximation. . . . . . . . . . . . . . . . . 104 4.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 RoughSet-BasedFeatureSelectionTechniques. . . . . . . . . . . . . . . . 109 5.1 QuickReduct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 HybridFeatureSelectionAlgorithmBasedonParticleSwarm Optimization(PSO). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3 GeneticAlgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4 IncrementalFeatureSelectionAlgorithm(IFSA). . . . . . . . . . . . 115 5.5 FeatureSelectionMethodUsingFishSwarmAlgorithm(FSA). . 116 5.5.1 RepresentationofPosition. . . . . . . . . . . . . . . . . . . . . . 117 5.5.2 DistanceandCentreofFish. . . . . . . . . . . . . . . . . . . . . 118 5.5.3 PositionUpdateStrategies. . . . . . . . . . . . . . . . . . . . . . 119 5.5.4 FitnessFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5.5 HaltingCondition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.6 FeatureSelectionMethodBasedonQuickReduct andImprovedHarmonySearchAlgorithm(RS-IHS-QR). . . . . . 119 x Contents 5.7 AHybridFeatureSelectionApproachBasedonHeuristic andExhaustiveAlgorithmsUsingRoughsetTheory (FSHEA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.7.1 FeatureSelectionPreprocessor. . . . . . . . . . . . . . . . . . . 120 5.7.2 UsingRelativeDependencyAlgorithmtoOptimize theSelectedFeatures. . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.8 ARoughSet-BasedFeatureSelectionApproachUsingRandom FeatureVectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.9 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6 UnsupervisedFeatureSelectionUsingRST. . . . . . . . . . . . . . . . . . . 131 6.1 UnsupervisedQuickReductAlgorithm(USQR). . . . . . . . . . . . . 131 6.2 UnsupervisedRelativeReductAlgorithm. . . . . . . . . . . . . . . . . . 134 6.3 UnsupervisedFuzzy-RoughFeatureSelection. . . . . . . . . . . . . . 136 6.4 UnsupervisedPSO-BasedRelativeReduct(US-PSO-RR). . . . . . 137 6.5 UnsupervisedPSO-BasedQuickReduct(US-PSO-QR). . . . . . . 140 6.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7 CriticalAnalysisofFeatureSelectionAlgorithms. . . . . . . . . . . . . . 145 7.1 ProsandConsofFeatureSelectionTechniques. . . . . . . . . . . . . 145 7.1.1 FilterMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.1.2 WrapperMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.1.3 EmbeddedMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.2 ComparisonFramework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2.1 PercentageDecreaseinExecutionTime. . . . . . . . . . . . 147 7.2.2 MemoryUsage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.3 CriticalAnalysisofVariousFeatureSelectionAlgorithms. . . . . 148 7.3.1 QuickReduct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3.2 RoughSet-BasedGeneticAlgorithm. . . . . . . . . . . . . . . 149 7.3.3 PSO-QR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.3.4 IncrementalFeatureSelectionAlgorithm(IFSA). . . . . . 151 7.3.5 AFSA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.3.6 FeatureSelectionUsingExhaustiveandHeuristic Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.3.7 FeatureSelectionUsingRandomFeatureVectors. . . . . 153 7.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8 RSTSourceCode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.1 ASimpleTutorial. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . 155 8.1.1 VariableDeclaration. . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.1.2 ArrayDeclaration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.1.3 Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.1.4 If-ElseStatement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Contents xi 8.1.5 Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.6 Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.1.7 LBoundandUBoundFunctions. . . . . . . . . . . . . . . . . . 158 8.2 HowtoImporttheSourceCode. . . . . . . . . . . . . . . . . . . . . . . . 158 8.3 CalculatingDependencyUsingPositiveRegion. . . . . . . . . . . . . 163 8.3.1 MainFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.3.2 CalculateDRRFunction. . . . . . . . . . . . . . . . . . . . . . . 164 8.3.3 SetDClassesMethod. . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.3.4 FindIndexFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.3.5 ClrTCCFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.3.6 AlreadyExistsMethod. . . . . . . . . . . . . . . . . . . . . . . . . 169 8.3.7 InsertObjectMethod. . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.3.8 MatchCClassesFunction. . . . . . . . . . . . . . . . . . . . . . . 171 8.3.9 PosRegFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.4 CalculatingDependencyUsingIncrementalDependency Classes. . . . .. . . .. . . .. . . .. . . .. . . .. . . . .. . . .. . . .. . . .. 173 8.4.1 MainFunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.4.2 CalculateDIDFunction. . . . . . . . . . . . . . . . . . . . . . . . 173 8.4.3 InsertMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.4.4 MatchChromMethod. . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.4.5 MatchDClassMethod. . .. . . . .. . . .. . . .. . . . .. . . .. 178 8.5 LowerApproximationUsingConventionalMethod. . . . . . . . . . 178 8.5.1 MainMethod. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 178 8.5.2 CalculateLAObjectsMethod. . . .. . . .. . . .. . . .. . . .. 180 8.5.3 FindLAOMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.5.4 SetDConceptMethod. . . . . . . . . . . . . . . . . . . . . . . . . . 182 8.6 LowerApproximationUsingRedefinedPreliminaries. . . . . . . . . 183 8.7 UpperApproximationUsingConventionalMethod. . . . . . . . . . 186 8.8 UpperApproximationUsingRedefinedPreliminaries. . . . . . . . . 187 8.9 QuickReductAlgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 8.9.1 MiscellaneousMethods. . . . . . . . . . . . . . . . . . . . . . . . 191 8.9.2 RestoreMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 8.9.3 C_RMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 8.10 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.