ebook img

Enabling Accurate Analysis of Private Network Data PDF

201 Pages·2014·1.9 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Enabling Accurate Analysis of Private Network Data

University of Massachusetts Amherst ScholarWorks@UMass Amherst Open Access Dissertations 9-2010 Enabling Accurate Analysis of Private Network Data Michael Hay University of Massachusetts Amherst, [email protected] Follow this and additional works at:https://scholarworks.umass.edu/open_access_dissertations Part of theComputer Sciences Commons Recommended Citation Hay, Michael, "Enabling Accurate Analysis of Private Network Data" (2010).Open Access Dissertations. 319. https://scholarworks.umass.edu/open_access_dissertations/319 This Open Access Dissertation is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. ENABLING ACCURATE ANALYSIS OF PRIVATE NETWORK DATA A Dissertation Presented by MICHAEL G. HAY Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY September 2010 Computer Science (cid:13)c Copyright by Michael G. Hay 2010 All Rights Reserved ENABLING ACCURATE ANALYSIS OF PRIVATE NETWORK DATA A Dissertation Presented by MICHAEL G. HAY Approved as to style and content by: Gerome Miklau, Co-chair David Jensen, Co-chair Don Towsley, Member Andrew Papachristos, Member Andrew G. Barto, Department Chair Computer Science ACKNOWLEDGMENTS This work would not have been possible without the support and mentorship of David Jensen and Gerome Miklau. David has been a generous and accommodating advisor. He gave me the latitude to explore the world of research freely and he taught me the methods that made the exploration fruitful. I have found his research methods invaluable in practice, but more importantly, they were foundational to my success, because they made the mysterious craft of research accessible to me. In addition to his teachings, I am grateful for his patience and encouragement. David also oriented and prepared me for a career in research, giving me valuable perspective on the challenges and rewards of such a career. Gerome introduced me to the topic of data privacy and deserves much credit for guiding the effort that went into this dissertation. By working closely with me, he taught me how to formalize a problem and how to transform insights into substan- tive contributions. He helped me avoid doubtful thinking and embrace the intrinsic uncertainty of research with optimism. He also taught me the art of scholarly pre- sentation: how to write and speak with precision and clarity, how to structure an argument, and how to embrace the mindset of the intended audience. I have been the grateful recipient of his advice: he is gentle, honest, and brings to every problem a clarity of mind that is enviable. In addition to Gerome and David, I am grateful for the valuable input of the other members of my committee: Don Towsley contributed theoretical results to our early work and gave insightful comments on the rest; Andrew Papachristos guided me to iv relevant sociology literature and supplied a valuable perspective as a potential end user. The work in this dissertation is a product of the hard work and valuable insights of many people. I am grateful to my coauthors for their contributions: Gerome Miklau, David Jensen, Dan Suciu, Vibhor Rastogi, Chao Li, Don Towsley, Andrew McGregor, Philipp Weis, and Siddharth Srivastava. In addition, I am thankful to Vibhor for many thoughtful discussions and also for his friendship. I have benefited greatly from interacting with the members, past and present, of the Knowledge Discovery Laboratory and the Database Group. Jen Neville and Andy Fast gave me hope and encouragement that a Ph.D. is within my reach. Brian Gal- lagher provided valuable friendship. Lisa Friedland entertained countless impromptu requests to act as a sounding board for my ideas, as well as my gripes. Chao Li has been a great collaborator and a memorable travel partner. The Computer Science Department at UMass Amherst is a special place and I have learned so much as a student here. I am especially grateful to Micah Adler, Yanlei Diao, Neil Immerman, and Andrew McCallum for their instruction and to James Allan, Deb Bergeron, Rachel Lavery, Leeanne LeClerc, and Sharon Mallory for their support. Finally, I would not have completed this journey without the love and support of family and friends. Al and Buffy have been incredibly supportive, inviting us to enjoy several wonderful beach vacations, welcoming us into their home for a summer, and providing a safe haven for Carey and the boys when paper deadlines neared. My mom has helped keep a roof over our heads and yummy, organic food in our fridge, and she has been a sweet Nana to our boys and a lifeline to us. My Dad gave me encouragement to stick with it when I needed it most and has been incredibly helpful, especially with our transition to Ithaca. Nancy, Chris, and Clara have provided some memorable excursions, filled with tasty treats and big screen fun. Declan, Jesse, v Tanner, Dylan, Talia, Rosalie and their respective parents have been great friends and make us sad to leave Amherst. My boys motivated me to work hard and brought me joy at home. And Carey, you have been there from the beginning and by my side for every twist of the road. It has been a pleasant journey indeed and you made it so. Thank you. vi ABSTRACT ENABLING ACCURATE ANALYSIS OF PRIVATE NETWORK DATA SEPTEMBER 2010 MICHAEL G. HAY A.B., DARTMOUTH COLLEGE M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Gerome Miklau and Professor David Jensen This dissertation addresses the challenge of enabling accurate analysis of network data while ensuring the protection of network participants’ privacy. This is an impor- tant problem: massive amounts of data are being collected (facebook activity, email correspondence, cell phone records), there is huge interest in analyzing the data, but the data is not being shared due to concerns about privacy. Despite much research in privacy-preserving data analysis, existing technologies fail to provide a solution because they were designed for tables, not networks, and cannot be easily adapted to handle the complexities of network data. We develop several technologies that advance us toward our goal. First, we de- velop a framework for assessing the risk of publishing a network that has been “an- onymized.” Using this framework, we show that only a small amount of background knowledge about local network structure is needed to re-identify an “anonymous” in- dividual. This motivates our second contribution: an algorithm that transforms the vii structure of the network to provably lower re-identification risk. In comparison with other algorithms, we show that our approach more accurately preserves important features of the network topology. Finally, we consider an alternative paradigm, in which the analyst can analyze private data through a carefully controlled query in- terface. We show that the degree sequence of a network can be accurately estimated under strong guarantees of privacy. viii CONTENTS Page ACKNOWLEDGMENTS ............................................. iv ABSTRACT......................................................... vii LIST OF TABLES ...................................................xiii LIST OF FIGURES..................................................xiv CHAPTER 1. INTRODUCTION ................................................. 1 1.1 Problem setting of prior work: privacy in tabular data.................2 1.2 Our problem setting: privacy in network data ........................9 1.3 Overview of contributions.........................................16 1.3.1 Assessing the risk of network data publication.................16 1.3.2 Mitigating risk through network transformation ...............18 1.3.3 Estimating network statistics under strong privacy.............19 2. BACKGROUND ................................................. 23 2.1 K-anonymity ...................................................23 2.2 Differential privacy ..............................................28 2.3 Differentially private query answering ..............................31 2.4 Differential privacy for graphs .....................................33 2.5 Network analyses and statistics ....................................36 3. ASSESSING RE-IDENTIFICATION RISK ....................... 40 3.1 Modeling the adversary...........................................42 3.1.1 Naive anonymization ......................................42 3.1.2 Threats ..................................................43 3.1.3 Anonymity through structural similarity......................46 ix

Description:
Directed by: Professor Gerome Miklau and Professor David Jensen in privacy-preserving data analysis, existing technologies fail to provide a solution because they .. 3.4 The inferred edge probabilities resulting from attempted.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.