ebook img

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices PDF

0.66 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices Jihun Hamm, Adam Champion, Guoxing Chen, Mikhail Belkin, Dong Xuan Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 5 Abstract—Smart devices with built-in sensors, computational motion, environmental measures, etc.), train an algorithm 1 capabilities, and network connectivity have become increasingly or model that can accurately predict a variable of interest 0 pervasive.Thecrowdsofsmartdevicesofferopportunitiestocol- (temperature setting, current user activity, amount of traffic, 2 lectivelysenseandperformcomputingtasksinanunprecedented audioevents,etc.).Conventionally,crowdsensingandmachine scale. This paper presents Crowd-ML, a privacy-preserving n machinelearningframeworkforacrowdofsmartdevices,which learning are performed as two separate processes: devices a can solve a wide range of learning problems for crowdsensing passively collect and send data to a central location, and J data with differential privacy guarantees. Crowd-ML endows analyses or learning procedures are performed at the remote 1 a crowdsensing system with an ability to learn classifiers or location. However, current generations of smart devices have 1 predictorsonlinefromcrowdsensingdataprivatelywithminimal computing capabilities in addition to sensing. In this paper, computational overheads on devices and servers, suitable for ] a practical and large-scale employment of the framework. We we propose to utilize computing capabilities of smart devices, G analyze the performance and the scalability of Crowd-ML, and and integrate sensing and learning processes together into a L implement the system with off-the-shelf smartphones as a proof crowdsensing system. As we will show, the integration allows . of concept. We demonstrate the advantages of Crowd-ML with us to design a system with better privacy and scalability. s real and simulated experiments under various conditions. c [ B. Privacy I. INTRODUCTION 1 Privacyisanimportantissueforcrowdsensingapplications. A. Crowdsensing v By assuring participants’ privacy, a crowdsensing system 4 Smart devices are increasingly pervasive in daily life. can appeal to a larger population of potential participants, 8 These devices are characterized by their built-in sensors (e.g., which increases the utility of such a system. However, many 4 accelerometers, cameras, and, microphones), programmable crowdsensing systems in the literature do not employ any 2 0 computation ability, and Internet connectivity via wireless or privacy-preservingmechanism(seeSectionII-B),andexisting . cellular networks. These include stationary devices such as mechanismsusedincrowdsensing(see [1])areoftendifficult 1 0 smart thermostats and mobile devices such as smartphones tocomparequalitativelyacrossdifferentsystemsordatatypes. 5 or in-vehicle systems. More and more devices are also being In the last decade, differential privacy has gained popularity 1 interconnected, often referred to as the “Internet of Things.” as a formal and quantifiable measure of privacy risk in data : v Inter-connectivity offers opportunities for crowds of smart publishing [2]–[4]. Briefly, differential privacy measures how i devicestocollectivelysenseandcomputeinanunprecedented muchtheoutcomeofaprocedurechangesprobabilisticallyby X scale. Various applications of crowdsensing have been pro- presence/absenceofanysinglesubjectintheoriginaldata.The r a posed, including personal health/fitness monitoring, environ- measure provides an upper bound on privacy loss regardless mental sensing, and monitoring road/traffic conditions (see of the content of data or any prior knowledge an adversary Section II-A), and the list is currently expanding. might have. While differential privacy has been applied in Crowdsensingisusedprimarilyforcollectingandanalyzing data publishing and machine learning, (see Section II-B), aggregate data from a population of participants. However, it has not been broadly adopted in crowdsensing systems. more complex and useful tasks can be performed beyond In this paper, we integrate differentially private mechanisms calculation of aggregate statistics, by using machine learning into the crowdsensing system as well, which can provide algorithms on crowdsensing data. Examples of such tasks strongprotectionagainstvarioustypesofpossibleattacks(see include: learning optimal settings of room temperatures for Section III-C). smart thermostats; predicting user activity for context-aware C. Proposed work services and physical monitoring; suggesting the best driving routes; recognizing audio events from microphone sensors. This paper presents Crowd-ML, a privacy-preserving ma- Specificalgorithmsanddatatypesforthesetasksaredifferent, chine learning framework for crowdsensing system that con- but they can all be trained in standard unsupervised or super- sists of a server and smart devices (see Fig. 1). Crowd-ML vised learning settings: given sensory features (time, location, is a distributed learning framework that integrates sensing, privacy preservation can be minimized and that the com- putational and communication overheads on devices are only moderate, allowing a large-scale deployment of the framework. • We implement a prototype and evaluate the framework with a demonstrative task in a real environment as well as large-scale experiments in a simulated environment. The remainder of this paper is organized as follows. We first review related work in Section II. Section III describes the Crowd-ML framework. Section IV analyzes Crowd-ML Fig. 1: Crowd-ML consists of a server and a number of smart in terms of privacy-performance trade-off, computation, and devices. The system integrates sensing, learning, and privacy communication loads. Section V presents an implementation mechanisms together, to learn a classifier or predictor from of Crowd-ML and experimental evaluations. We discuss re- device-generated data in an online and distributed way, with maining issues and conclude in Section VI. formal privacy guarantees. II. RELATEDWORK Crowd-ML integrates distributed learning algorithms and learning, and privacy mechanisms together, and can build differential privacy mechanisms into a crowdsensing system. classifiers or predictors of interest from crowdsensing data In this section, we review related work in crowdsensing and using computing capability of devices with formal privacy learning systems, and privacy-preserving mechanisms. guarantees. Algorithmically, Crowd-ML learns a classifier or predictor A. Crowdsensing and learning byadistributedincrementaloptimization.Optimalparameters There is a vast amount of work in crowdsensing, and we of a classifier or predictor are found by minimizing the risk focus on the system aspect of previous work with represen- function associated with a given task [5] (see Section III-A tative papers (we refer the reader to survey papers [7] and for details). Specifically, the framework finds optimal param- [1]).Crowdsensingsystemsaimtoachievemasscollectionand eters by incrementally minimizing the risk function using a miningofenvironmentalandhuman-centricdatasuchassocial variant of stochastic (sub)gradient descent (SGD) [6]. Unlike interactions, political issues of interest, exercise patterns, and batch learning, SGD requires only the gradient information people’s impact on the environment [8]. Examples of such to be communicated between devices and a server, which systemsincludeMicro-Blog[9],PoolView[10],BikeNet[11], has two important consequences: 1) computation load can be and PEIR [12]. Data collected by crowdsensing can also be distributedamongthedevices,enhancingscalabilityofthesys- used to mine high-level patterns or to predict variables of in- tem; 2) private data of the devices need not be communicated terestusingmachinelearning.Applicationsoflearningapplied directly,enhancingprivacy.Byexploitingthesetwoproperties, tocrowdsensingincludelearningofbuswaitingtimes[13]and Crowd-ML efficiently learns a classifier or predictor from a recognizing useractivities (see[14] fora review). Jigsaw[15] crowd of devices, with a guarantee of (cid:15)-differential privacy. and Lifestreams [16] also use pattern recognition in sensed Differential privacy mechanism is applied locally on each data from mobile devices. From the system perspective, these device, using Laplace noise for the gradients and exponential work use devices to passively sense and send data to a central mechanisms for other information (see Section III-C). serveronwhichanalysestakeplace,whichwewillrefertoas We show advantages of Crowd-ML by analyzing its scala- thecentralizedapproach.Incontrast,sensingandlearningcan bilityandprivacy-performancetrade-offs(SectionIV),andby beperformedpurelyinsideeachdevicewithoutaserver,which testing the framework with demonstrative tasks implemented we call the decentralized approach. For example, SoundSense onAndroidsmartphonesandinsimulatedenvironmentsunder [17] learns a classifier on a smartphone to recognized various various conditions (see Section V). audioeventswithoutcommunicatingwiththeback-end.Mixed In summary, we make the following contributions: centralizedanddecentralizedapproachesarealsousedin[18], • WepresentCrowd-ML,ageneralframeworkformachine [19], where a portion of computation is performed off-line on learningwithsmartdevicesfromcrowdsensingdata,with a server. CQue [18] provides a query interface for privacy- many potential applications. aware probabilistic learning of users’ contexts, and ACE [19] • We show differential privacy guarantees of Crowd-ML uses static association rules to learn users’ contexts. System- that provide a strong privacy mechanism against various wise, our work differs from those centralized or decentralized types of attacks in crowdsensing. To the best of our approaches in that we use a distributed approach to perform knowledge, Crowd-ML is the first general framework learning by devices and server together, which improves thatintegratessensing,learning,anddifferentiallyprivate privacy and scalability of the system. We are not aware of mechanisms for crowdsensing. any other crowdsensing system that takes a similar approach. • We analyze the framework to show that the cost of Also,thecitedpapersareorientedtowardsnovelapplications, but our work focuses on a general framework for learning a Suppose we use a classifier/predictor h(x;w) with a tunable wide range of algorithms and applications. parameter vector w, and a loss function l(y,h(x;w)) to mea- Crowd-ML also builds on recent advances in incremental sure the performance of the classifier/predictor with respect distributedlearning[20],[21],whichshowthatanear-optimal to the true target y. A wide range of learning algorithms can convergence rate is achievable despite communication delays. be represented by h and l, e.g., regression, logistic regression, A privacy-preserving stochastic gradient descent method is and Support Vector Machine (see [32] for more examples). If presented briefly in [22]. Unlike the latter, we presents a there are M smart devices, we find the optimal parameters completeframeworkforprivacy-preservingmulti-devicelearn- w of the classifier/predictor by minimizing the empirical risk ing, with performance analysis and demonstrations in real over all M devices: environments. M (cid:88) 1 (cid:88) λ R(w)= l(h(x;w),y)+ (cid:107)w(cid:107)2, (2) B. Privacy-preserving mechanisms |D | 2 m m=1 (x,y)∈Dm Privacy is an important issue in data collection and analy- where D is a set of samples generated from device m only, sis. In particular, preserving privacy of users’ locations has m and λ(cid:107)w(cid:107)2 isaregularizationterm.Thisriskfunction(2)can been studied by many researchers (see [23] for a survey). 2 be minimized by many optimization methods. In this work To preserve privacy of general data types formally, several mechanisms such as k-anonymity [24] and secure multiparty we use stochastic (sub)gradient descent (SGD) [33] which is one of the simplest optimization methods and is also suitable computation [25] have been proposed, for data publishing for large-scale learning [32], [34]. SGD minimizes the risk by [26] and also for participatory sensing [1]. Recently, differ- updating w sequentially ential privacy [2]–[4] has addressed several weaknesses of k-anonymity [27], and gained popularity as a quantifiable w(t+1)←Π [w(t)−η(t)g(t)], (3) W measureofprivacyrisk.Differentialprivacyhasbeenusedfor privacy-preservingdataanalysisplatform[28],forsanitization where η(t) is the learning rate, and g(t) is the gradient of the of learned models parameters from data [29], and for privacy- loss function preserving data mining from distributed time-series data [17]. g =∇ l(h(x;w),y), (4) w So far, formal and general privacy mechanisms have not been evaluated with the sample (x,y) and the current parameter adopted broadly in crowdsensing. Among the crowdsensing w(t).WeassumetheparameterdomainW isad-dimensional systems cited in the previous section ( [9]–[13], [15]–[19], ball of some large radius R, and the projection is Π = [30], [31]), only [10], [12], [18] provide privacy mechanisms, W min(1,R/(cid:107)w(cid:107))w. By default, we use the learning rate of which only [10] address the privacy more formally. To our c best knowledge, Crowd-ML is the first framework to provide η(t) = √ , (5) formal privacy guarantees in general crowd-based learning t with smart devices. where c is a constant hyperparameter. When computing gra- dients, we use a ‘minibatch’ of b samples to compute the III. CROWD-ML averaged gradient Inthissection,wedescribeourCrowd-MLindetail:system, algorithms, and privacy mechanisms. g˜= 1(cid:88)∇ l(h(x ;w),y ), (6) b w i i A. System and workflow i which plays an important role in the performance-privacy The Crowd-ML system consists of a server and multiple trade-off and the scalability (Section IV). In Crowd-ML, risk smart devices that are capable of sensory data collection, minimization by SGD is performed by distributing the main numerical computation, and communication over a public workload(=computationofaveragedgradients)toM devices. network with the server (see Fig. 1). The goal of Crowd- Note that each device generates data and compute gradients ML is to learn a classifier or predictor of interest from using its own data. The workflow is described in Fig. 2. crowdsensing data collectively by multiple devices. A wide- rangeofclassifiersorpredictorscanbelearnedbyminimizing B. Algorithms an empirical risk associated with a given task, a common Crowd-MLalgorithmsarepresentedinAlgorithms1and2. method in statistical learning [5]. Formally, let x ∈ RD be a Device Routine 1 collects samples. When the number of featurevectorfrompreprocessingsensoryinputsuchasaudio, samples reaches the minibatch size b, the routine tries to video, accelerometer, etc, and y be a target variable we aim checks out the current model parameters w from the server to predict from x, such as user activity. For regression, y can and calls Device Routine 2. Device Routine 2 computes the be a real number and for classification, y is a discrete label averaged gradient from the stored samples and w received y ∈ {1,...,C} with C classes. We define data as N pairs from the server, sanitizes information by Device Routine 3, of (feature vector, target variable) generated i.i.d. from an and sends the sanitized information to the server. Device unknowndistributionbyallparticipatingdevicesuptopresent: Routine 3 uses Laplace noise and exponential mechanisms D ={(x ,y ),...,(x ,y )}. (1) (in the next section) to sanitize the averaged gradient gˆ, the 1 1 N N Algorithm 1 Device side Input: privacy levels (cid:15) ,(cid:15) ,(cid:15) , minibatch size b, max buffer g e yk size B, classifier model (C, h, l, λ from Eq. (2)) Init: set n =0, n =0, nk =0, k =1,...,C s e y Communication to server: gˆ,n ,nˆ ,nˆk s e y Communication from server: w Device Routine 1 if n ≥B then s stop collection to prevent resource outage else receive a sample (x,y) (in a regular interval or triggered by events), and add to the secure local buffer n =n +1 s s Fig.2:Crowd-MLworkflow.1.Adevicepreprocessessensory end if data and generates a sample(s). 2. When the number of if n ≥b then s samples{(x,y)}exceedsacertainnumber,thedevicerequests checkout w from the server via https current model parameters w from the server. 3. The server call Device Routine 2. authenticatethedeviceandsendsw.4.Usingw and{(x,y)}, end if the device computes the gradient g and send it to the server Device Routine 2 using privacy mechanisms. 5. The server receives the gradient Usingw fromtheserverand{(x,y)}fromthelocalbuffer, g and updates w. While one device is performing routines 1- 5, another device(s) are allowed to perform the same routines for i=1,...,n do asynchronously.Devicescanjoinorleavethetaskatanytime. s make a prediction ypred =h(x ;w) i (y ) (y ) ny i =ny i +1 n =n +I[ypred (cid:54)=y ] nnˆukm. bDeerviocfe mRiosuctliansessifi1ed-3saarmeppleesrfonˆremeadndintdheepelnadbeenltlcyouanntds Inecur aeloss l(yipred,yi)i y Compute a subgradient g =∇ l(h(x ;w)) asynchronously by multiple devices. i w i end for Server Routine 1 sends out current parameters w when Compute the average g˜= 1 (cid:80) g +λw requested and Server Routine 2 receives checkins (gˆ, ns, ns i i nˆ ,nˆk) from devices when requested. The whole procedure Sanitize data with Device Routine 3 e y Checkin gˆ, n , nˆ nˆk, k =1,...,C with server via https ends when the total number of iteration exceeds a maximum s e y Reset n =0, n =0, nk =0, k =1,...,C value T , or the overall error is below a threshold ρ. s e y max Remark 1: In Device Routine 1, if check-out fails, the Device Routine 3 device keeps collecting samples and retries check-out later. Sample gˆ=g˜+z from Eq. (10) A prolonged period of network outage for a device can make Sample nˆ =n +z from Eq. (11) e e the parameter outdated for the device, but it does not affect Sample nˆk =nk+z, k =1,...,C from Eq. (12) y y the overall learning critically. Similarly, failure to check-in information with server in Device Routine 2 is non-critical. Remark 2: In Device Routine 2, we can randomly set aside a small portion of samples as test data. In this case, the misclassification error is computed only from these held-out as legitimate devices, by hackers poaching data stored on the samples, and their gradients will not be used in the average gˆ. server or eavesdropping on communication between devices Remark3:InServerRoutine2,morerecentupdatemethods and servers. Instead of preserving privacy separately for each [35], [36] can be used in place of the simple update rule attack type, we can preserve privacy from all these attacks (3) without affecting differential privacy nor changing device by a local privacy-preserving mechanism that is implemented routines. Similarly, adaptive learning rates [37], [38] can be on each device and sanitizes any information before it leaves used in place of (5), which can provide a robustness to large the device. A local mechanism assumes that an adversary gradients from outlying or malignant devices. canpotentiallyaccessallcommunicationbetweendevicesand the server, which subsumes the other attack attacks. This is C. Privacy mechanism becausetheotherformsofdatathatare1)visibletomalignant Incrowdsensingsystems,privatedataofuserscanbeleaked device, 2) stored in the server, or 3) released in public, are all by many ways. System administrators/analysts can violate the derived from what is communicated between devices and the privacy intentionally, or they may leak private information server.Weadoptalocal(cid:15)-differentialprivacyasaquantifiable unintentionally when publishing data analytics. There are also measure of privacy in Crowd-ML. Formally, a (randomized) more hostile types of attacks: by malignant devices posing algorithmwhichtakesdataD asinputandoutputsf,iscalled Algorithm 2 Sever side TABLE I: Multiclass logistic regression Input: number of devices M, learning rate schedule η(t), t= 1E,q2.,(.2..),)Tmax,desirederrorρ,classifiermodel(C,h,l,λfrom PRriesdkiction aRλr(g(cid:80)wm)a(cid:107)xwkw(cid:107)=k(cid:48)2x N1 (cid:80)i[−wy(cid:48)ixi + log(cid:80)lewl(cid:48)xi] + 1In,i.t.:.,tM=,0k,=ran1d,o..m.,iCzed w, Nsm = 0, Nem = 0, Nyk,m, m = Gradient ∇2wkkR=kN1 (cid:80)ixi[−I[yi=k]+P(y=k|xi)]+ λwk Stopping criteria: t≥Tmax or (cid:80)(cid:80)MmMNNemm ≤ρ Server Routine 1 m s on the choice of loss functions. We compute this value for while Stopping criteria not met do multiclasslogisticregression(TableI),butitcanbecomputed Listen to and accept checkout requests similarly for other loss functions as well. By adding element- Authenticate device wise independent Laplace noise z to averaged gradients g˜ Send current parameters w to device end while gˆ= 1b (cid:88)gi+z, P(z)∝e−(cid:15)g4b|z|, (10) Server Routine 2 i while Stopping criteria not met do we have the following privacy guarantee: Listen to and accept checkin requests Theorem 1 (Averaged gradient perturbation). The transmis- Authenticate device (suppose it is device m) sion of g˜ by Eq. (10) is (cid:15) -differentially private. Receive gˆ, n , nˆ , nˆk, k =1,...,C. g s e y Nm =Nm+n See Appendix A for proof. s s s Nm =Nm+nˆ Tosanitizen andnk,weadd‘discrete’Laplacenoise[39] e e e e y Nk,m =Nk,m+nˆk as follows: y y y w =w−η(t)gˆ t=t+1 nˆe = ne+z, P(z)∝e−(cid:15)2e|z|, (11) end while nˆky = nky +z, P(z)∝e−(cid:15)y2k|z|, (12) wherez =0,±1,±2,....Thesemechanismshasthefollowing privacy guarantees: (cid:15)-differentially private if P(f(D)∈S) Theorem 2 (Error and label counts). The transmission of ne ≤e(cid:15) (7) and nk by Eqs. (11) and (12) is (cid:15) - and (cid:15) - differentially P(f(D(cid:48))∈S) y e yk private, respectively. for all measurable S ⊂ T of the output range, and for all data sets D and D(cid:48) differing in a single item. That is, even See Appendix B for proof. if an adversary has the whole data D except a single item, Practically, a system administrator chooses (cid:15) depending on it cannot infer much more about that item from the output the desired level of privacy for the data collected. A small of the algorithm f. A smaller (cid:15) makes such an inference (cid:15)(→ 0) may be used for data that users deem highly private moredifficult,andthereforemakesthealgorithmmoreprivate- suchascurrentlocation,andalarge(cid:15)(→∞)maybeusedfor preserving. When the algorithm outputs a real-valued vector less private data such as ambient temperature. f ∈RD, its global sensitivity can be defined by IV. ANALYSIS S(f)=max(cid:107)f(D)−f(D(cid:48))(cid:107)1. (8) In this section, we analyze the privacy-performance trade- D,D(cid:48) off and the scalability of Crowd-ML. As discussed in Related where (cid:107)·(cid:107) is the L norm. A basic result from the definition 1 1 Work, most existing crowdsensing systems use purely cen- of differential privacy is that a vector-valued function f with tralized or purely decentralized approaches, while Crowd-ML sensitivity S(f) can be made (cid:15)-differentially private [3] by uses a distributed approach. By design, Crowd-ML achieves adding an independent Laplace noise vector z1 differential privacy with little loss of performance (O(1/b)), P(z)∝e−S((cid:15)f)(cid:107)z(cid:107)1. (9) onlymoderatecomputationloadduetoitssimpleoptimization method,andreducedcommunicationloadanddelay(O(1/b)), InCrowd-ML,weconsider(cid:15)-differentialprivacyofanysingle where b is the minibatch size. (feature,label)-sample, revealed by communications from all devices to the server, which are the gradients g˜, the numbers A. Privacy vs Performance of samples ns, the number of misclassified samples ne, and Privacy costs performance: the more private we make the the labels counts nk 2. The amount of noise required depends system, the less accurate the outcome of analysis/learning y is. From Theorem 1, Crowd-ML is (cid:15)-differentially private 1Asavariant,((cid:15),δ)-differentialprivacycanbeachievedbyaddingGaussian by perturbing averaged gradients. The centralized approach noise. can also be made (cid:15)-differentially private by feature and 2The communication from the server to devices {w(t)} can be recon- structedby(3)from{g(t)},andthereforeisredundanttoconsider. label perturbation (Appendix C). Below we compare the impact of privacy on performance between the centralized performtheseoperations.Thedecentralizedlearningapproach and Crowd-ML. The performance of an SGD-based learning can use any learning algorithms, including SGD similar to can be represented by its rate of convergence to the optimal Crowd-ML.However,ifthedecentralizedapproachistomake value/parametersE[l(w(t))−l(w∗)]atiterationt,whichinturn up for the smaller sample size (1/M) compared to Crowd- depends on the properties of the loss l(·) (such as Lipschitz- ML,itmayrequiremorecomplexoptimizationmethodswhich continuity and strong-convexity) and the step size η(t), with results in higher computation load. For all three approaches, thebestknownratebeingO(1/t)[40].Whenotherconditions thenumberofdevicesM donotaffectper-devicecomputation are the same, the convergence rate is roughly proportional load. Computational load on the server is also different for E[l(w(t)) − l(w∗)] ∝ G2 to the amount of noise in the these approaches. The centralized approach puts the highest estimated gradient G2 = sup E[(cid:107)gˆ(t)(cid:107)2] [41]. For Crowd- load on the server, as all computations take place on the t ML, we have from (10) server.Incontrast,Crowd-MLputsminimalloadontheserver which is the SGD update (3), since the main computation is 1 32D E[(cid:107)gˆ(cid:107)2]=E[(cid:107)g˜(cid:107)2]+E[(cid:107)z(cid:107)2]= E[(cid:107)g(cid:107)2]+ , (13) performed distributed by the devices. b (b(cid:15) )2 g 2) Communication load: To process incoming streams of where the first term is the amount of noise due to sampling, datafromthedeviceintime,thenetworkandtheservershould and the latter is due to Laplace noise mechanism with D- have enough throughput. The centralized learning approach dimensional features. By choosing a large enough batch size requires N number of samples to be sent over the network to b,theimpactofsamplingnoiseandLaplacenoisecanbemade the server. For Crowd-ML with a minibatch size of b, devices arbitrarily small3. In contrast, the centralized approach has send N/b gradients altogether, and receives the same number to add Laplace noise of constant variance 8 to each feature (cid:15)2 ofcurrentparameters,bothofthesamedimensionasafeature and perturb labels with a constant probability (Appendix C). vector. Therefore, the data transmission is reduced by a factor Regardless of which optimization method is used (SGD or of b/2 compared to the centralized approach. not), the centralized approach has no means of mitigating the negative impact of constant noise on the accuracy of learned 3) Communication latency: When using a public (and model, which will be especially problematic with a small (cid:15). mobile) network, latency is non-negligible. In the centralized In the decentralized approach, a device need not interact approach, latency may not be an issue, since the server need withaserver,andisalmostfreeofprivacyconcerns.However, not required to send any real-time feedback to the devices. In the increased privacy comes at the cost of performance. In Crowd-ML,latencyisanissuethatcanaffectitsperformance. Crowd-MLandthecentralizedapproach,samplespooledfrom There are three possible delays that add up to the overall all devices are used in the learning process, whereas in the latency of communication: decentralized approach, each device can use only a fraction (∼1/M)ofsamples.Thisunderminestheaccuracyofamodel • Request delay(τreq): time since the check-out request fromadeviceuntilthereceiptoftherequestattheserver learned by the decentralized approach. For example, it is known from the VC-theory for binary classification problems • Check-out delay (τco): time since the receipt of a request attheserverandthereceiptoftheparameteratthedevice thattheupper-boundoftheestimationerrorwitha1/M-times smaller sample size is (cid:112)M/logM-times larger [43]. • Check-indelay(τci):timesincethereceiptoftheparam- etersatthedeviceuntilthereceiptofthecheck-insatthe B. Scalability server Scalability is determined by computation and communi- Due to delays, if a device checks out the parameter w at time cation loads and latencies on both device and server sides. t and checks in the gradient gˆ and the server receives gˆ 0 We compare these factors between centralized, crowd, and at time t +τ +τ , the server may have already updated 0 co ci decentralized learning approaches. the parameters w multiple times using the gradients from 1) Computation load: For all three approaches, we assume other devices received during this time period. This number the same preprocessing is performed on each device to com- of updates is roughly (τ +τ )×MF /b, where M is the co ci s pute features from raw sensory input or metadata. On the de- numberofdevices,F isthedatasamplingrateperdevice,and s viceside,thecentralizedlearningapproachrequiresgeneration 1/b is the reduction factor due to minibatch. Again, choosing of Laplace noise per sample on the device. The crowd and a large batch size b relative to MF can reduce the latency. s the decentralized approaches perform partial and full learning Whileexactanalysisofimpactoflatencyisdifficult,thereare on the device, respectively, and requires more processing. several related results known in the literature without consid- Specifically,Crowd-MLrequirescomputationofagradientper ering privacy. Nedic´ et al. proved that delayed asynchronous sample, a vector summation (for averaging) per sample, and incrementalupdateconvergeswithprobability1toanoptimal generation of Laplace random noise per minibatch. A low- value, assuming a finite maximum latency. Recent work in end smart device capable of floating-point computation can distributed incremental update [20], [21] also shows that a near-optimal convergence rate is achievable despite delays. In 3althoughalargerbatchsizemeansfewerupdatesgiventhesamenumber particular, Dekel et al. [21] shows that delayed incremental ofsamplesN,andtoolargeabatchsizecannegativelyaffecttheconvergence rate(see[42]fordiscussion). updates are scalable with M by adapting the minibatch size. V. EVALUATION previousvalue.Forexample,samplesacquiredduringsleeping are discard automatically as they all have “Still” labels. This In this section, we describe a prototype of Crowd-ML lowers the actual sampling rate to about F = 1/352 Hz (or implemented on off-the-shelf android phones and activity s everysixminuteorso).Withthislowrate,nobatteryproblem recognitionexperimentsonandroidsmartphones.Wealsoper- was observed. form digit and object recognition experiments under varying We use 3-class logistic regression (Table I) with λ=0,b= conditions in simulated environments and demonstrate the 1,(cid:15)−1 = 0 and a range of η values. Repeated experiments advantages of Crowd-ML analyzed in Section IV. with different parameters are time-consuming, and we leave A. Implementation the full investigation to the second experiment in a simulated We implement a Crowd-ML prototype with three compo- environment.InFig.3,weshowsthecollectiveerrorcurvesfor nents: a Web portal, commercial off-the-shelf smart devices, and a central server. On the device side, we implement Prediction error 1 Algorithm 1 on commercial off-the-shelf smartphones as an c=10−6 appusingAndroidOS4.3+.Ourprototypeusessmartphones, c=10−4 0.5 c=10−2 but will be easily ported to other smart device platforms. c=100 On the server side, we implement Algorithm 2 on a Lenovo 0 ThinkCentre M82 machine with a quad-core 3.2 GHz Intel 0 50 100 150 200 250 300 Iteration Core i5-3470 CPU and 4 GB RAM running Ubuntu Linux Fig. 3: Time-averaged error across all devices for activity 14.04. The server runs the Apache Web server (version 2.4) recognition task. and a MySQL database (version 5.5). Also on the server side, our Crowd-ML prototype provides a Web portal over HTTPS where users can browse ongoing the first 300 samples from the 7 devices. The error is a time- crowd-learning tasks and join them by downloading the app averaged misclassification error as the learning progresses: to their smart devices. To enhance transparency, details of Err(t) = 1t (cid:80)ti=1I[yi (cid:54)= yipred(wi)]. The error curves for tasks (objective, sensory data collected, labels collected, and different learning rates (5) are very similar, and virtually learning algorithms used) and our privacy mechanisms is ex- converge after only 50 samples (=7 samples per device). This plained. It also displays timely statistics about crowd-learning experiment is a proof-of-concept that Crowd-ML can learn a applicationssuchaserrorratesandactivitylabeldistributions, common classifier fast, from only a small number of samples which are differentially private. We implement the portal in per user. Python using the Django4 Web application framework and Matplotlib5 for statistical visualization. C. Digit/Object Recognition in Simulated Environments To evaluate Crowd-ML under various conditions, we per- B. Activity Recognition in Real Environments form a series of experiments on handwritten digit recognition Inthisexperiment,weperformactivityrecognitiononsmart and visual object recognition. Since the two results are quite devices.ThepurposeofthisdemonstrationistoshowCrowd- similar, we only describe the digit recognition results (object MLworkinginarealenvironment,sowechooseasimpletask recognition result is in Appendix D). The MNIST dataset6 ofrecognizingthreetypesofuseractivities(“Still”,“OnFoot”, consistsof60000trainingand10000testimagesofhandwrit- and “In Vehicle”). We install a prototype Crowd-ML applica- ten digits (0 to 9), which is a standard benchmark dataset for tion on 7 smartphones (Galaxy Nexus, Nexus S, and Galaxy learning algorithms. The task is to classify a test image as S3) running Android 4.3 or 4.4. The seven smartphones are one of the 10 digit classes. The images from MNIST data are carried by college students and faculty over a period of a preprocessed with PCA to have a reduced dimension of 50, few days. The devices’ triaxial accelerometers are sampled andL normalized.Inthisexperiment,wecomparetheperfor- 1 at 20 Hz. In this demonstration, we avoid manual annotation mance of centralized, Crowd-ML, and decentralized learning of activity labels to facilitate data acquisition, and instead approaches using the same data and classifier (multiclass use Google’s activity recognition service to obtain ground logistic regression), under different conditions such as privacy (cid:113) truth labels. Acceleration magnitudes |a| = a2 +a2 +a2 level (cid:15), minibatch size b, and delays. To test the algorithms x y z are computed continuously over 3.2 s sliding windows. Fea- with a full control of parameters, we run the algorithms in ture extraction is performed by computing the 64-bin FFT a simulated environment instead of on a real network. We of the acceleration magnitudes. We set the sampling rate can therefore choose the number of devices and maximum F = 1/30 Hz, that is, a feature vector x and its label y delays arbitrarily. For simplicity, we set τ =τ =τ =τ s req co ci is generated every 30 s. However, to avoid getting highly (SectionIV-B3).Theτ isthemaximumdelay,andtheactually correlated samples and to increase diversity of features, we delays are sampled randomly and uniformly from [0,τ] for collect a sample only when its label has changed from its each communication instance.7 4http://www.djangoproject.com 6http://yann.lecun.com/exdb/mnist/ 5http://matplotlib.org 7Wecantestwithanydistributionotherthanuniformdistributionaswell. All results in this section are averaged test errors from 10 performance is very poor (∼0.9) regardless of the minibatch trials. For each trial, assignment of samples, order of devices, size,duetothelargernoiserequiredtoprovidethesamelevel perturbation noise, and amounts of delay are randomized. (cid:15) of privacy as Crowd-ML. Test errors are computed as functions of the iteration (=the number of samples used), up to five passes through the data. Test error 1 Hyperparameters λ (Table I) and c (5) are selected from the averagedtesterrorfrom10trials.Wesetthenumberofdevices 0.8 Central (SGD,b=1) M =1000. Consequently, each device has 60 training and 10 Central (SGD,b=10) 0.6 Central (SGD,b=20) test samples on average. Crowd−ML (SGD,b=1) Fig. 4 compares the performance of the centralized, crowd, 0.4 Crowd−ML (SGD,b=10) Crowd−ML (SGD,b=20) and decentralized learning approaches, without privacy or Central (batch) 0.2 delay ((cid:15)−1 = 0, b = 1, τ = 0). The error of centralized batch training is the smallest (0.1), in a tie with Crowd-ML. 0 0 0.5 1 1.5 2 2.5 3 TheerrorcurveofCrowd-MLconvergestothesamelowvalue Iteration x 105 as centralized approach. It shows that incremental update by Fig. 5: Comparison of test error for centralized and crowd SGD in Crowd-ML is as accurate as batch learning, when learning approaches with privacy ((cid:15)−1 = 0.1), varying mini- privacy and delay are not considered. In contrast, the error batch sizes (b), and no delay. curve of decentralized approach converges at a slower rate and also converges to a high error (∼0.5), despite using the Lastly, we look at the impact of delays on Crowd-ML with same overall number of samples as other algorithms, due to privacy (cid:15)−1 = 0.1. We test with different delays in the unit the lack of data sharing. of ∆ = τ/(MF ), that is, the number of samples generated s by all device during the delay of size τ. In Fig. 6, we show Test error the results with two minibatch sizes (b = 1,20) and varying 1 delays (1∆,10∆,100∆,1000∆). The delay of 1000∆ means Decentral (SGD) that a maximum of 3×1000 samples are generated among the 0.5 Crowd−ML (SGD) devices,betweenthetimeasingledevicerequestsacheck-out Central (batch) from the server and the time the server received the check-in 0 from that device, which is quite large. Fig. 6 shows that the 0 1 2 3 4 5 6 Iterations x 104 increase in the delay somewhat slows down the convergence Fig. 4: Comparison of test error for centralized, crowd, and with a minibatch size of 1, and the converged value of error decentralized learning approaches, without delay or privacy is similar to or worse than Central (batch). However, it also consideration. The curves show how error decreases as the shows that with a minibatch size of 20, delay has little effect numberofiteration(=numberofsamplesused)increasesover on the convergence, and the error is much lower than Central time. The batch algorithm is not incremental and therefore is (batch). Note that with the minibatch size of 20, there is a a constant. small plateau in the beginning of error curves, reflecting the fact that the devices are initially waiting for their minibatches We perform tests with varying levels of privacy (cid:15). The to be filled before computing begins. After this initial waiting privacy impacts the centralized approach via (15) and (16)8 time, the error starts to decrease at a fast rate. and also Crowd-ML via (10). With low privacy ((cid:15)−1 → 0), Test error the performance of both centralized and crowd approaches 1 are almost the same as Fig. 4, and we omit the result. With Crowd−ML (b=1,1∆) high privacy ((cid:15) → 0), the performance of both approaches 0.8 Crowd−ML (b=1,10∆) Crowd−ML (b=1,100∆) degradestoaunusablelevel.Hereweshowtheirperformances 0.6 Crowd−ML (b=1,1000∆) at(cid:15)−1 =0.1inFig.5,wheretheperformanceisinatransition Crowd−ML (b=20,1∆) statebetweenhighandlowprivacyregions.Firstly,thecentral- 0.4 Crowd−ML (b=20,10∆) Crowd−ML (b=20,100∆) ized and crowd approaches both perform worse than they did 0.2 Crowd−ML (b=20,1000∆) in Fig. 4, which is the price of privacy preservation. Among Central (batch) theseresults,Crowd-MLwithaminibatchsizeb=20hasthe 00 0.5 1 1.5 2 2.5 3 smallestasymptoticerror,muchbelowthecentralized(batch). Iteration x 105 Crowd-ML with b = 1 and 10 still achieves similar or better Fig. 6: Impact of delays on Crowd-ML with privacy ((cid:15)−1 = asymptotic error compared to Central (batch). As predicted 0.1), varying minibatch sizes, and varying delays. from Section IV, increasing the minibatch size improves the performanceofCrowd-ML.WhenSGDisusedforcentralized VI. CONCLUSION approach(CentralSGD)withperturbedfeaturesandlabels,its In this paper, we proposed Crowd-ML, a machine learn- 8Thefeaturesandlabelsfortestdataarenotperturbed. ing framework for a crowd of smart devices. Compared to previous crowdsensing systems, Crowd-ML is a framework (cid:15) -differential privacy follows from Theorem 6 of [44]. Proof e that integrates sensing, learning, and privacy mechanisms of (cid:15) -differential privacy of nk is similar. yk y together, and can build classifiers or predictors of interest Remark 1: Unlike the gradient gˆ, the information (n , nˆ , s e from crowdsensing data using computing capability of smart nˆk) is not required for learning itself, but for monitoring the y devices. Algorithmically, Crowd-ML utilizes recent advances progress of each device on the server side. Therefore, (cid:15) and e indistributedandincrementallearning,andimplementsstrong (cid:15) can be set to be very small without affecting the learning yk differentially private mechanisms. We analyzed Crowd-ML performance, so that (cid:15)=(cid:15) +(cid:15) +C(cid:15) ≈(cid:15) . g e yk g and showed that Crowd-ML can outperform centralized ap- Remark 2: nˆ and nˆk can be negative with a small proba- e y proaches while providing better privacy and scalability, and bility, but have a limited effect on the estimates of the error can also take advantages of larger shared data which decen- rateandthepriorattheserver.AfterreceivingT minibatches, tralized approaches cannot. We implemented a prototype of the error rate and the prior estimates are Crowd-MLandevaluatedtheframeworkwithasimpleactivity recognition task in a real environment as well as larger-scale (cid:80)T nˆ (i) (cid:80)T nˆk(i) Errest = i e and Pest(y =k)= i y . (14) experiments in simulated environments which demonstrate (cid:80)T n (i) (cid:80)T n (i) the advantages of the design of Crowd-ML. Crowd-ML is a i s i s generalframeworkforarangeofdifferentlearningalgorithms Sincenˆ (i)−n (i)isindependentfori=1,2,...andhaszero- e e withcrowdsensingdata,andisopentofurtherrefinementsfor mean and constant variance 2e−(cid:15)e/2 [39], the estimate of specific applications. (1−e−(cid:15)e/2)2 error rate converge almost surely to the true error rate with APPENDIX vanishing variances as T increases. The same can be said of the estimate of prior P(y). A. Proof of Theorem 1 In our algorithms, a device receives w from the server and C. Differential Privacy in Centralized Approach sends averaged gradients gˆ along with other information. We assume(cid:107)x(cid:107) ≤1whichcanbeeasilyachievedbynormalizing For completeness of the paper, we also describe the (cid:15)- 1 the data. The sensitivity of an averaged gradient for logistic differential privacy mechanisms for the centralized approach. regression is 4/b as shown below. There are C parameter Inthecentralizedapproach,dataaredirectlysenttotheserver. vectors w ,...,w for multiclass logistic regression. Let the Without a privacy mechanism, an adversary can potentially 1 C matrix of gradient vectors corresponding to C parameter observe all data. To prevent this, (cid:15)-differential privacy can be vectors be enforced by perturbing the features g = [g1 g2 ··· gC]=x[P1 ··· Py−1 ··· PC]+λ[w1 ··· wC] f(x)=x+z, , P(z)∝e−(cid:15)2x|z|, (15) = xM +λ[w ··· w ], 1 C and also perturbing the labels. To perturb labels, we use whereP =P(y =j|x;w)istheposteriorprobability,andM j exponential mechanism to sample a noisy label yˆgiven a true is a row vector of P ’s. Without loss of generality, consider j label y from two minibatches D and D(cid:48) that differ in only the first sample x1. The difference of averaged gradients g˜(D) and g˜(cid:48)(D(cid:48)) is P(yˆ|y)∝e(cid:15)2yd(y,yˆ), y,yˆ∈{1,...,C} (16) 1 4 (cid:107)g˜−g˜(cid:48)(cid:107) ≤ ((cid:107)x M (cid:107) +(cid:107)x(cid:48)M(cid:48)(cid:107) )≤ , 1 b 1 1 1 1 1 1 b where we use the score function d(y,yˆ)=I[y =yˆ]. Tosee(cid:107)M1(cid:107)1 ≤2,notethattheabsolutesumoftheentriesof Theorem3(Featureandlabelperturbation). Thetransmission M1 is2(1−Py1)≤2.Thesensitivityofmultipleminibatches of x and y by feature perturbation (15) and exponential gˆ(1),...,gˆ(T) is the same as the sensitivity of a single gˆ(t), mechanism (16) is (cid:15) - and (cid:15) -differentially private. x y and the (cid:15)-differential privacy follows from Proposition 1 of [3]. Proof: Assume (cid:107)x(cid:107)1 ≤ 1. Feature transmission is an identity operation and therefore has sensitivity 2. For label B. Proof of Theorem 2 transmission, the score function d(yˆ,y) = I[yˆ = y] changes In addition to the averaged gradients, a device sends to the at most by 1 by changing y. From Proposition 1 of [3] serverthenumbersofsamplesns,thenumberofmisclassified and Theorem 6 of [44], respectively, we achieve (cid:15)x- and (cid:15)y- samples n , and the labels counts nk. Perturbation by adding differential privacy of data. e y discrete Laplace noise is equivalent to random sampling by Note that the sensitivity is independent of the number of exponential mechanism [44] with P(nˆe|ne) ∝ e−(cid:15)2e|nˆe−ne|, features and labels sent, and we have to add the same level nˆ ∈ Z. If two datasets D and D(cid:48) are different in only one of independent noise to the features and apply the same e item, then the score function d=−|nˆ −n | changes at most amount of label perturbation. An overall (cid:15)-differential privacy e e by 1. That is, max |d(nˆ ,n (D))−d(nˆ ,n (D(cid:48)))| = 1. is achieved by (cid:15)=(cid:15) +(cid:15) . The required privacy levels (cid:15) and D,D(cid:48) e e e e x y x As with multiple gradients, the sensitivity of multiples sets (cid:15) canbechosen differently,andweuse (cid:15) =(cid:15) =(cid:15)/2 inthe y x y of (nˆ , nˆk) is the same as the sensitivity of a single set, and experiments. e y D. Experiments with Visual Object Recognition Task Test error 1 We repeat the experiments in Section V-C for an object Crowd−ML (b=1,1∆) 0.9 Crowd−ML (b=1,10∆) recognition task using CIFAR-10 dataset, which consists of 0.8 Crowd−ML (b=1,100∆) images of 10 types of objects (airplane, automobile, bird, Crowd−ML (b=1,1000∆) 0.7 Crowd−ML (b=20,1∆) cat, deer, dog, frog, horse, ship, truck) collected by [45]. Crowd−ML (b=20,10∆) 0.6 Crowd−ML (b=20,100∆) We use 50,000 training and 10,000 test images from CIFAR- Crowd−ML (b=20,1000∆) 0.5 10. To compute features, we use a convolutional neural Central (batch) network 9 trained using ImageNet ILSVRC2010 dataset10, 0.40 0.5 1 1.5 2 2.5 which consists of 1.2 million images of 1000 categories. We Iteration x 105 apply CIFAR-10 images to the network, and use the 4096- Fig. 9: Impact of delays on Crowd-ML with privacy ((cid:15)−1 = dimensionaloutputfromthelasthiddenlayerofthenetworkas 0.1), varying minibatch sizes, and varying delays. features. Those features are preprocessed with PCA to have a reduceddimensionof100,andareL normalized.Weusethe 1 same setting in Section V-C to test Crowd-ML on this object [3] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise recognition task. The results are given in Figs. 7, 8, 9. The to sensitivity in private data analysis,” in Theory of Cryptography. figures are very similar to the handwritten digit recognition Springer,2006,pp.265–284. [4] C.Dwork,“Differentialprivacy,”inAutomata,languagesandprogram- task (Figs. 4, 5, 6), except that the error is larger (e.g., 0.3 in ming. Springer,2006,pp.1–12. Fig.7)thantheerrorfordigitrecognition(0.1inFig.4).This [5] V.Vapnik,Thenatureofstatisticallearningtheory. springer,2000. is because CIFAR dataset is more challenging than MNIST [6] H. Robbins and S. Monro, “A stochastic approximation method,” The annalsofmathematicalstatistics,pp.400–407,1951. due to variations in color, pose, view point, and background [7] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. of object images. Campbell,“Asurveyofmobilephonesensing,”Comm.Mag.,vol.48, pp.140–150,September2010. [8] M. Srivastava, T. Abdelzaher, and B. Szymanski, “Human-Centric Test error 1 Sensing,”Phil.Trans.R.Soc.A,vol.370,pp.176–197,2012. [9] S.Gaonkar,J.Li,R.R.Choudhury,L.Cox,andA.Schmidt,“Micro- Decentral (SGD) Blog:SharingandQueryingContentThroughMobilePhonesandSocial 0.5 Crowd−ML (SGD) Participation,”inProc.ACMMobiSys,2008. Central (batch) [10] R. K. Ganti, N. Pham, Y.-E. Tsai, and T. F. Abdelzaher, “PoolView: Stream Privacy for Grassroots Participatory Sensing,” in Proc. ACM 0 0 1 2 3 4 5 SenSys,2008. Iterations x 104 [11] S. B. Eisenman, E. Miluzzo, N. D. Lane, R. A. Peterson, G.-S. Ahn, Fig. 7: Comparison of test error for centralized, crowd, and and A. T. Campbell, “BikeNet: A Mobile Sensing System for Cyclist ExperienceMapping,”ACMTrans.SensorNetworks,vol.6,no.1,2009. decentralized learning approaches, without delay or privacy [12] M.Mun,S.Reddy,K.Shilton,N.Yau,J.Burke,D.Estrin,M.Hansen, consideration. E. Howard, R. West, and P. Boda, “PEIR, the Personal Environmental Impact Report, as a Platform for Participatory Sensing Systems Re- search,”inProc.ACMMobiSys,2009. [13] P. Zhou, Y. Zheng, and M. Li, “How Long to Wait? Predicting Bus Test error ArrivalTimewithMobilePhonebasedParticipatorySensing,”inProc. 1 ACMMobiSys,2012. 0.9 [14] O.D.LaraandM.A.Labrador,“Asurveyonhumanactivityrecognition Central (SGD,b=1) using wearable sensors,” Communications Surveys & Tutorials, IEEE, 0.8 Central (SGD,b=10) Central (SGD,b=20) vol.15,no.3,pp.1192–1209,2013. 0.7 Crowd−ML (SGD,b=1) [15] H.Lu,J.Yang,Z.Liu,N.D.Lane,T.Choudhury,andA.T.Campbell, Crowd−ML (SGD,b=10) 0.6 Crowd−ML (SGD,b=20) “The Jigsaw Continuous Sensing Engine for Mobile Phone Applica- Central (batch) tions,”inProc.ACMSenSys,2010. 0.5 [16] C.-K.Hsieh,H.Tangmunarunkit,F.Alquaddoomi,J.Jenkins,J.Kang, 0.4 C. Ketcham, B. Longstaff, J. Selsky, B. Dawson, D. Swendeman, 0 0.5 1 1.5 2 2.5 Iteration x 105 D.Estrin,andN.Ramanathan,“Lifestreams:AModularSense-Making ToolsetforIdentifyingImportantPatternsfromEverydayLife,”inProc. Fig. 8: Comparison of test error for centralized and crowd ACMSenSys,2013. learning approaches with privacy ((cid:15)−1 = 0.1), varying mini- [17] H. Lu, W. Pan, N. D. Lane, T. Choudhury, and A. T. Campbell, “Soundsense:scalablesoundsensingforpeople-centricapplicationson batch sizes (b), and no delay. mobile phones,” inProceedings of the7th international conference on Mobilesystems,applications,andservices. ACM,2009,pp.165–178. [18] A. Parate, M.-C. Chiu, D. Ganesan, and B. M. Marlin, “Leveraging REFERENCES Graphical Models to Improve Accuracy and Reduce Privacy Risks of MobileSensing,”inProc.ACMMobiSys,2013. [1] D.Christin,A.Reinhardt,S.S.Kanhere,andM.Hollick,“ASurveyon [19] S.Nath,“ACE:ExploitingCorrelationforEnergy-EfficientandContin- Privacy in Mobile Participatory Sensing Applications,” J. Syst. Softw., uousContextSensing,”inProc.ACMMobiSys,2012. vol.84,pp.1928–1946,2011. [20] A.AgarwalandJ.C.Duchi,“Distributeddelayedstochasticoptimiza- [2] C.DworkandK.Nissim,“Privacy-PreservingDataMiningonVertically tion.”inProc.NIPS,2011,pp.873–881. PartitionedDatabases,”inProc.CRYPTO. Springer,2004. [21] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, “Optimal dis- tributedonlineprediction,”inProc.ICML,2011. 9https://github.com/jetpacapp/DeepBeliefSDK [22] S.Song,K.Chaudhuri,andA.D.Sarwate,“Stochasticgradientdescent 10http://www.image-net.org/challenges/LSVRC withdifferentiallyprivateupdates,”inProc.IEEEGlobalSIP,2013.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.