Table Of ContentAnonymity and Trust in Large-Scale
Distributed Storage Systems
Thèse présentée à la Faculté de Sciences
Institut d’Informatique
Université de Neuchâtel
Pour l’obtention du grade de docteur ès science par
JOSÉ VALERIO
Acceptée sur proposition du jury :
Prof. Pascal FELBER Université de Neuchâtel, directeur de thèse
Prof. Peter KROPF Université de Neuchâtel
Dr. Etienne RIVIÈRE Université de Neuchâtel
Prof. Vivien QUÉMA INP, Grenoble, France
Dr. Martin RAJMAN EPF Lausanne
Soutenu le 3 Juillet 2013
Université de Neuchâtel
2013
ii
Abstract
Large-Scale Distributed Storage Systems (LS-DSSs) are at the core of
several Cloud services. These externalized services may run atop multiple ad-
ministrative domains. While a client may trust the organization that provides
a given Web service, a single server may belong to another organization that
the client does not trust. The design of a Distributed Storage System is itself
a challenging task, in particular when scalability, availability and consistency
are required.
Thisthesisexploresthreeimportantaspectswithinthiscontext: anonymity
and trust in LS-DSSs, auditing for Large-Scale Distributed Aggregation Sys-
tems (LS-DASs), and the evaluation of Distributed File Storage Service
(DFSS) prototypes. The three systems proposed are evaluated using proto-
type implementations.
We present SPADS, a system that provides publisher anonymization and
rate limitation to any generic LS-DSS over a Distributed Hash Table (DHT),
running on unreliable and untrustworthy peers.
Then, we propose CADA, a system that sits on top of a generic LS-DAS,
and detects when one or several servers attempt to bias an aggregation. The
system performs probabilistic auditing, and raises suspicions based on the
statistical deviation from an expected behavior.
Finally, we study DFSSs. Performing a fair comparison of the File System
Consistency Levels (FSCLs) as supported by existing DFSSs is difficult
because these systems feature different base performance and optimization
levels. We make an empirical comparison of these FSCLs by instantiating
them into a novel DFSS testbed named FlexiFS.
iii
iv
Contents
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . 11
2 Anonymity and Trust in Large-Scale Distributed Storage Sys-
tems 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Problem definition: Anonymity vs. Trust . . . . . . . . . . . . 21
2.4 SPADS: Publisher Anonymization for DHT Storage . . . . . . 25
2.5 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Auditing for Large-Scale Distributed Aggregation Systems 41
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 System model and Problem definition . . . . . . . . . . . . . . 42
3.3 Collaborative Auditing for Distributed Aggregation . . . . . . 46
3.4 Auditing mechanisms for insertion bias . . . . . . . . . . . . . 51
3.5 Auditing mechanisms for aggregation bias . . . . . . . . . . . 56
3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
v
CONTENTS
4 Framework for the evaluation of Distributed File Storage Ser-
vice prototypes 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 FlexiFS design and architecture . . . . . . . . . . . . . . . . . 74
4.4 File System Consistency Levels . . . . . . . . . . . . . . . . . 79
4.5 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.6 Implementation and Deployment . . . . . . . . . . . . . . . . 89
4.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 Conclusions 95
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Integration of the three systems . . . . . . . . . . . . . . . . . 96
5.3 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
vi
List of Figures
1.1 Principle of an aggregation of <value, counter> pairs at server
S for key k. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Buzzaar: general principle and flow of information. . . . . . . 7
1.3 Multidomain data storage system. . . . . . . . . . . . . . . . . 8
2.1 DHT ring example. . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Chord fingers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 CAN two-dimensional example. . . . . . . . . . . . . . . . . . 17
2.4 Example of onion routing with 3 layers and return address. . . 20
2.5 SPADS: general principle. . . . . . . . . . . . . . . . . . . . . 26
2.6 SPADS: one-time legitimation and authentication. . . . . . . . 28
2.7 Computational cost for a_put(). . . . . . . . . . . . . . . . . 34
2.8 Computational cost of the encoding phase at the client side. . 35
2.9 CDF of the total time spent in the anonymizing path. . . . . . 36
3.1 Ways of introducing bias in the aggregation middleware. . . . 46
3.2 General description of the oracles. . . . . . . . . . . . . . . . . 48
3.3 Confidence level vs. variance-to-mean ratio. . . . . . . . . . . 52
3.4 False positives vs. Confidence threshold . . . . . . . . . . . . . 55
3.5 Performance of the insertion bias oracle (without bias). . . . . 60
3.6 Performance of the insertion bias oracle (with bias). . . . . . . 61
3.7 Performance of the aggregation bias oracle (without bias). . . 63
3.8 Performance of the aggregation bias oracle (with bias). . . . . 64
4.1 Failure-free execution of the Basic Paxos protocol. . . . . . . . 71
4.2 General architecture of FlexiFS. . . . . . . . . . . . . . . . . . 74
4.3 Example of a file system structure stored in FlexiFS. . . . . . 78
4.4 File System Consistency Levels. . . . . . . . . . . . . . . . . . 80
vii
LIST OF FIGURES
4.5 Close-to-Open (CTO) semantics vs. atomic operations. . . . . 87
4.6 Evaluating consistency at the key/value store level. . . . . . . 91
4.7 Evaluating FSCLs and the Replication Factor by writing a file. 92
5.1 Buzzaar general architecture with SPADS and CADA. . . . . 97
5.2 Architecture of a DFSS with SPADS and FlexiFS. . . . . . . . 99
5.3 Integration of the three systems. . . . . . . . . . . . . . . . . . 100
viii
List of Tables
2.1 Server roles in SPADS. . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Notations for the base aggregation system without auditing
support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Additions to the Aggregation Point (AP) state for the insertion
bias oracle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Additions to the AP state for the aggregation bias oracle. . . . 50
4.1 Important File System in User Space (FUSE) operations. . . . 75
ix
LIST OF TABLES
x
Description:several Cloud services. These externalized services may run atop multiple ad- ministrative domains. While a client may trust the organization systems that use Chaum mixes for anonymization are AP3 [91], Bifrost [71], and Tarzan [46]. Publius [124] allows users to publish anonymously Web content