ebook img

privacy-preserving data mining: models and - Charu Aggarwal PDF

12 Pages·2014·0.05 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview privacy-preserving data mining: models and - Charu Aggarwal

PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS Editedby CHARUC.AGGARWAL IBMT.J.WatsonResearchCenter,Hawthorne,NY10532 PHILIPS.YU UniversityofIllinoisatChicago,Chicago,IL60607 KluwerAcademicPublishers Boston/Dordrecht/London Contents ListofFigures xv ListofTables xx Preface xxi 1 AnIntroductiontoPrivacy-PreservingDataMining 1 CharuC.Aggarwal,PhilipS.Yu 1. Introduction 1 2. Privacy-PreservingDataMiningAlgorithms 3 3. ConclusionsandSummary 7 References 8 2 AGeneralSurveyofPrivacy-PreservingDataMiningModelsandAlgorithms 11 CharuC.Aggarwal,PhilipS.Yu 1. Introduction 11 2. TheRandomizationMethod 13 2.1 PrivacyQuantification 15 2.2 AdversarialAttacksonRandomization 18 2.3 RandomizationMethodsforDataStreams 18 2.4 MultiplicativePerturbations 19 2.5 DataSwapping 19 3. GroupBasedAnonymization 20 3.1 Thek-AnonymityFramework 20 3.2 PersonalizedPrivacy-Preservation 24 3.3 UtilityBasedPrivacyPreservation 24 3.4 SequentialReleases 25 3.5 Thel-diversityMethod 26 3.6 Thet-closenessModel 27 3.7 ModelsforText,BinaryandStringData 27 4. DistributedPrivacy-PreservingDataMining 28 4.1 Distributed Algorithms over Horizontally Partitioned Data Sets 30 4.2 DistributedAlgorithmsoverVerticallyPartitionedData 31 4.3 DistributedAlgorithmsfork-Anonymity 31 5. Privacy-PreservationofApplicationResults 32 5.1 AssociationRuleHiding 33 5.2 DowngradingClassifierEffectiveness 34 vi PRIVACY-PRESERVINGDATAMINING:MODELSANDALGORITHMS 5.3 QueryAuditingandInferenceControl 34 6. LimitationsofPrivacy: TheCurseofDimensionality 37 7. ApplicationsofPrivacy-PreservingDataMining 38 7.1 MedicalDatabases: TheScrubandDataflySystems 38 7.2 BioterrorismApplications 40 7.3 HomelandSecurityApplications 40 7.4 GenomicPrivacy 42 8. Summary 43 References 43 3 ASurveyofInferenceControlMethodsforPrivacy-PreservingDataMining 53 JosepDomingo-Ferrer 1. Aclassificationofmicrodataprotectionmethods 55 2. Perturbativemaskingmethods 58 2.1 Additivenoise 58 2.2 Microaggregation 59 2.3 Dataswappingandrankswapping 61 2.4 Rounding 62 2.5 Resampling 62 2.6 PRAM 62 2.7 MASSC 63 3. Non-perturbativemaskingmethods 63 3.1 Sampling 64 3.2 Globalrecoding 64 3.3 Topandbottomcoding 65 3.4 Localsuppression 65 4. Syntheticmicrodatageneration 65 4.1 Syntheticdatabymultipleimputation 65 4.2 Syntheticdatabybootstrap 66 4.3 SyntheticdatabyLatinHypercubeSampling 66 4.4 PartiallysyntheticdatabyCholeskydecomposition 67 4.5 Otherpartiallysyntheticandhybridmicrodataapproaches 67 4.6 Prosandconsofsyntheticmicrodata 68 5. Tradingoffinformationlossanddisclosurerisk 69 5.1 Scoreconstruction 69 5.2 R-Umaps 71 5.3 k-anonymity 71 6. Conclusionsandresearchdirections 72 References 73 4 MeasuresofAnonymity 81 SureshVenkatasubramanian 1. Introduction 81 1.1 Whatisprivacy? 81 1.2 DataAnonymizationMethods 83 1.3 AClassificationOfMethods 84 2. StatisticalMeasuresofAnonymity 85 Contents vii 2.1 QueryRestriction 85 2.2 AnonymityviaVariance 85 2.3 AnonymityviaMultiplicity 86 3. ProbabilisticMeasuresofAnonymity 86 3.1 MeasuresBasedonRandomPerturbation 87 3.2 MeasuresBasedonGeneralization 90 3.3 UtilityvsPrivacy 93 4. ComputationalMeasuresOfAnonymity 94 4.1 AnonymityviaIsolation 96 5. ConclusionsAndNewDirections 97 5.1 NewDirections 98 References 98 5 k-AnonymousDataMining: ASurvey 103 V.Ciriani,S.DeCapitanidiVimercati,S.Foresti,andP.Samarati 1. Introduction 103 2. k-Anonymity 105 3. AlgorithmsforEnforcingk-Anonymity 108 4. k-AnonymityThreatsfromDataMining 115 4.1 AssociationRules 116 4.2 ClassificationMining 116 5. k-AnonymityinDataMining 118 6. Anonymize-and-Mine 121 7. Mine-and-Anonymize 124 7.1 Enforcingk-AnonymityonAssociationRules 124 7.2 Enforcingk-AnonymityonDecisionTrees 127 8. Conclusions 130 Acknowledgments 131 References 131 6 ASurveyofRandomizationMethodsforPrivacy-PreservingDataMining 135 CharuC.Aggarwal,PhilipS.Yu 1. Introduction 135 2. ReconstructionMethodsforRandomization 137 2.1 TheBayesReconstructionMethod 137 2.2 TheEMReconstructionMethod 139 2.3 UtilityandOptimalityofRandomizationModels 141 3. ApplicationsofRandomization 142 3.1 Privacy-PreservingClassificationwithRandomization 142 3.2 Privacy-PreservingOLAP 143 3.3 CollaborativeFiltering 143 4. ThePrivacy-InformationLossTradeoff 144 5. VulnerabilitiesoftheRandomizationMethod 147 6. RandomizationofTimeSeriesDataStreams 149 7. MultiplicativeNoiseforRandomization 150 7.1 VulnerabilitiesofMultiplicativeRandomization 151 viii PRIVACY-PRESERVINGDATAMINING:MODELSANDALGORITHMS 7.2 SketchBasedRandomization 151 8. ConclusionsandSummary 152 References 152 7 ASurveyofMultiplicative 155 Perturbationfor Privacy-PreservingDataMining KekeChenandLingLiu 1. Introduction 156 1.1 DataPrivacyvs. DataUtility 157 1.2 Outline 158 2. DefinitionofMultiplicativePerturbation 159 2.1 Notations 159 2.2 RotationPerturbation 159 2.3 ProjectionPerturbation 160 2.4 Sketch-basedApproach 162 2.5 GeometricPerturbation 162 3. TransformationInvariantDataMiningModels 163 3.1 DefinitionofTransformationInvariantModels 164 3.2 Transformation-InvariantClassificationModels 164 3.3 Transformation-InvariantClusteringModels 165 4. PrivacyEvaluationforMultiplicativePerturbation 166 4.1 AConceptualMultidimensionalPrivacyEvaluationModel 166 4.2 VarianceofDifferenceasColumnPrivacyMetric 167 4.3 IncorporatingAttackEvaluation 168 4.4 OtherMetrics 169 5. AttackResilientMultiplicativePerturbations 169 5.1 NaiveEstimationtoRotationPerturbation 169 5.2 ICA-BasedAttacks 171 5.3 Distance-InferenceAttacks 172 5.4 AttackswithMorePriorKnowledge 174 5.5 FindingAttack-ResilientPerturbations 175 6. Conclusion 175 References 176 8 ASurveyofQuantificationofPrivacyPreservingDataMiningAlgorithms 181 ElisaBertinoandDanLinandWeiJiang 1. MetricsforQuantifyingPrivacyLevel 184 1.1 DataPrivacy 184 1.2 ResultPrivacy 189 2. MetricsforQuantifyingHidingFailure 190 3. MetricsforQuantifyingDataQuality 191 3.1 QualityoftheDataResultingfromthePPDMProcess 191 3.2 QualityoftheDataMiningResults 196 4. ComplexityMetrics 198 5. HowtoSelectaProperMetric 199 6. ConclusionandResearchDirections 200 References 200 Contents ix 9 ASurveyofUtility-based 205 Privacy-PreservingData TransformationMethods MingHuaandJianPei 1. Introduction 206 1.1 WhatisUtility-basedPrivacyPreservation? 207 2. TypesofUtility-basedPrivacyPreservationMethods 208 2.1 PrivacyModels 208 2.2 UtilityMeasures 210 2.3 SummaryoftheUtility-BasedPrivacyPreservingMethods 212 3. Utility-BasedAnonymizationUsingLocalRecoding 212 3.1 GlobalRecodingandLocalRecoding 213 3.2 UtilityMeasure 214 3.3 AnonymizationMethods 215 3.4 SummaryandDiscussion 217 4. TheUtility-basedPrivacyPreservingMethodsinClassificationProb- lems 217 4.1 TheTop-DownSpecializationMethod 218 4.2 TheProgressiveDisclosureAlgorithm 222 4.3 SummaryandDiscussion 226 5. AnonymizedMarginal: InjectingUtilityintoAnonymizedDataSets 226 5.1 AnonymizedMarginal 227 5.2 UtilityMeasure 228 5.3 InjectingUtilityUsingAnonymizedMarginals 229 5.4 SummaryandDiscussion 231 6. Summary 232 References 232 10 MiningAssociationRulesunderPrivacyConstraints 237 JayantR.Haritsa 1. ProblemFramework 238 2. EvolutionoftheLiterature 244 3. TheFRAPPFramework 249 4. SampleResults 257 5. ClosingRemarks 261 References 261 11 ASurveyofAssociationRuleHidingMethodsforPrivacy 265 VassiliosS.VerykiosandArisGkoulalas-Divanis 1. Introduction 265 2. TerminologyandPreliminaries 267 3. TaxonomyofAssociationRuleHidingAlgorithms 268 4. ClassesofAssociationRuleAlgorithms 269 4.1 HeuristicApproaches 270 4.2 Border-basedApproaches 275 4.3 ExactApproaches 276 x PRIVACY-PRESERVINGDATAMINING:MODELSANDALGORITHMS 5. OtherHidingApproaches 277 6. MetricsandPerformanceAnalysis 279 7. DiscussionandFutureTrends 282 8. Conclusions 283 References 284 12 ASurveyofStatistical 289 ApproachestoPreserving ConfidentialityofContingency TableEntries StephenE.FienbergandAleksandraB.Slavkovic 1. Introduction 289 2. TheStatisticalApproachPrivacyProtection 290 3. DataminingAlgorithms,AssociationRules,andDisclosureLimita- tion 292 4. Estimation and Disclosure Limitation for Multi-way Contingency Tables 293 5. TwoIllustrativeExamples 299 5.1 Example1: DatafromaRandomizedClinicalTrial 299 5.2 Example 2: Data from the 1993 U.S. Current Population Survey 303 6. Conclusions 306 References 307 13 ASurveyof 313 Privacy-PreservingMethods AcrossHorizontallyPartitioned Data MuratKantarcioglu 1. Introduction 313 2. BasicCryptographicTechniquesforPrivacy-PreservingDistributed DataMining 315 3. CommonSecureSub-protocolsUsedinPrivacy-PreservingDistributed DataMining 318 4. Privacy-preservingDistributed Data Mining on HorizontallyParti- tionedData 323 5. ComparisontoVerticallyPartitionedDataModel 326 6. ExtensiontoMaliciousParties 327 7. LimitationsoftheCryptographicTechniquesUsedinPrivacy-Preserving DistributedDataMining 329 8. PrivacyIssuesRelatedtoDataMiningResults 330 9. Conclusion 332 References 332 14 ASurveyof 337 Privacy-PreservingMethods acrossVerticallyPartitioned Data Contents xi JaideepVaidya 1. Classification 339 1.1 Na¨ıveBayesClassification 342 1.2 BayesianNetworkStructureLearning 343 1.3 DecisionTreeClassification 344 2. Clustering 346 3. AssociationRuleMining 347 4. Outlierdetection 349 4.1 Algorithm 351 4.2 SecurityAnalysis 352 4.3 ComputationandCommunicationAnalysis 354 5. ChallengesandResearchDirections 355 References 356 15 ASurveyofAttackTechniquesonPrivacy-PreservingDataPerturbation 361 Methods KunLiu,ChrisGiannella,andHillolKargupta 1. Introduction 362 2. DefinitionsandNotation 362 3. AttackingAdditiveDataPerturbation 363 3.1 Eigen-AnalysisandPCAPreliminaries 364 3.2 SpectralFiltering 365 3.3 SVDFiltering 366 3.4 PCAFiltering 367 3.5 MAPEstimationAttack 368 3.6 DistributionAnalysisAttack 369 3.7 Summary 370 4. AttackingMatrixMultiplicativeDataPerturbation 371 4.1 KnownI/OAttacks 372 4.2 KnownSampleAttack 375 4.3 OtherAttacksBasedonICA 376 4.4 Summary 377 5. Attackingk-Anonymization 378 6. Conclusion 379 Acknowledgments 379 References 379 16 PrivateDataAnalysisvia 385 OutputPerturbation KobbiNissim 1. TheAbstractModel–StatisticalDatabases,Queries,andSanitizers 387 2. Privacy 390 2.1 InterpretingthePrivacyDefinition 392 3. TheBasicTechnique: CalibratingNoisetoSensitivity 396 3.1 Applications: FunctionswithLowGlobalSensitivity 398 4. ConstructingSanitizersforComplexFunctionalities 402 4.1 k-MeansClustering 403

Description:
Privacy-Preserving Data Mining Algorithms. 3. 3. Conclusions and Summary. 7. References. 8. 2. A General Survey of Privacy-Preserving Data Mining Models
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.