1 Privacy Protection for Mobile Cloud Data: A Network Coding Approach Yu-Jia Chen, Student Member, IEEE and Li-Chun Wang, Fellow, IEEE, National Chiao Tung University, Taiwan Email: [email protected] and [email protected] Abstract—Taking into account of both the huge computing distributionnatureofnetworknodesisintegratedintonetwork power of intruders and the untrustedness of cloud servers, we coding, it is impossible to decipher the complete plaintext develop an enhanced secure pseudonym scheme to protect the unless all the nodes are attacked. Secondly, for the insider privacyof mobileclouddata.Tofacethehugecomputingpower securitythreatfromtheattackersstayinginthesameODB,we challenge, we develop an unconditionally secure light-weight 7 network coding pseudonym scheme. For the privacy issue of develop a two-tier network coding technique for decoupling 1 untrusted cloud server, we further design a two-tier network data ownership.As a result, the insiderattackerscannotknow 0 2 codingtodecouplethestoredmobileclouddatafromtheowner’s the relationship of the stored data and their owners at all. pseudonyms. Therefore, our proposed network coding based To demonstrate the advantages of our proposed network n pseudonym scheme can simultaneously defend against attackers codingprivacyprotectionscheme,weimplementitforagroup a from both outside and inside. We implement our proposed two- J tier light-weight network coding mechanism in a group location locationbasedservice(LBS)withODB[3].AgroupLBScan 1 based service (LBS) using untrusted cloud database. Compared help a group of users share their locations. Fig. 1 shows the 1 tocomputationallysecureHash-basedpseudonym,ourproposed proposed security system model for group LBS. The privacy scheme is not only unconditionally secure, but also can reduce issues of providing such a group LBS can be classified into ] more than 90% of processing time as well as 10% of energy R consumption. the following three aspects: C • Authenticating users’ data while preventing an untrusted IndexTerms—Privacy protection;bigdataandcloudcomput- . ing; network coding; location based services. ODB provider from knowing the relation of a user and s c his/her stored data [4]; [ • Sharing data with multiple members in the group of I. INTRODUCTION interest while maintaining the same protection level as 1 Data collected from the sensors embedded in smartphones v the two user case [5]; offer great commercial potential for mobile cloud services, 5 • Storing data in a multi-tenant cloud computing environ- 7 but also pose new challenges on privacy protection. With mentwiththethreatofattackersstayinginthesameODB 0 the properties of continuous changing and updated over time, [6]. 7 mobile data are particularly valuable for big data analytics 0 An unconditionally secure framework for mobile cloud ap- to understand and predict each individual behaviors. Because . plications has been rarely seen in the literature. Our proposed 1 mobile data provide highly personal information, the privacy unconditionally secure network coding based pseudonym 0 issues of mobile cloud data raise a lot of concerns [1]. 7 scheme can overcome the aforementioned three challenges, However, using current security techniques to protect mobile 1 and has the following two major contributions: cloud data privacy will face a number of new challenges in : v protecting mobile privacy [2]. First, we can no longer rely on • A data ownership decoupling mechanism is presented, Xi aprotectionmechanismwithcomputationalsecurityassuming which can achieve unconditional security level and re- r thatsophisticated ciphercannotbe brokeneasily byattackers. solve the security concerns for untrusted ODB. a This is because the attackers become more powerful in the • Alight-weightnetworkcodingschemeisproposedwhich era of cloud computing. Second, mobile devices with limited can reduce the processing time by 90% compared to computing power cannot conduct complex encryption and the Hash-based pseudonym. A user only needs to en- decryption algorithms. Third, private mobile cloud data are code his/her identity one time when logging in to the stored in public cloud servers, which increases the opportuni- trusted certifying server. The remaining operations for ties to be attacked by other cloud tenants. pseudonym changing and uncloaking are performed by To defendagainstmaliciousattackerswith hugecomputing thetrustedcertifyingserver.Noneofdecodingoperations power in outsourced database (ODB), we propose an uncon- are needed in mobile terminals. ditionally secure network coding based pseudonym scheme The rest of the paper is organized as follows. Section II to protect mobile privacy against the following two security discusses the related work for location privacy protection. threats. First, for the outsider security threat, the ownership In Section III, we explain the system security model for information of mobile data is protected to the highest un- group LBS using ODB. In Sections IV and V, we present conditional security level, in which the ciphertext will not be the proposed network coding based pseudonym and analyze brokenevenwithinfinitecomputationtime.Sincetheinherent its security performance.In Section VI, we discuss the issues 2 Certifying Server User/Device Data Pseudonym Processor Authentication Processor User Cloud ODB 2. Activate ServiceEngine 11..UUsseerr//DDeevviiccee IIddentification Login Processor Service Data 5. Data Query 3. Service Request Service Processor 4. Location Data 6. Privacy Level Result 9. Simplified Location Record Location Translator Record Data 8. Service Result 7. Pseudonymous Result Fig.1. TheproposedsecuritymodelforaLBSsystemwithoutsourced database (ODB). of data ownership,user authentication,and service continuity. Another type of pseudonym techniques focus on designing Section VII compares the performance of the proposed net- the unlinkable and irreversible pseudonym based on crypto- work coding based pseudonym with Hash-based pseudonym. graphicmethod.TheHashfunctioniswidelyusedtogenerate Finally, concluding remarks are given in Section VIII. a unique and secure pseudonym [13]. However, it has been shown that attackers with computational power provided by II. RELATEDWORKS cloudcomputingcanbreakHash-basedcryptography[16].The brute force method can deduce the original message from a In the literature, location privacy is generally protected by Hash value[17]. Furthermore,existingpseudonymtechniques either pseudonyms or anonymization. cannotcompletelyprotectdata ownershipif an ODB provider can link a particular pseudonym with its stored data in ODB A. Pseudonym-based Location Privacy Protection during an authentication process. Pseudonym techniques can protect location privacy by dis- connectinga user’slocationdata with his/hergenuineidentity B. Anonymization-based Location Privacy Protection [7]–[13]. Existing pseudonym schemes can be classified into two types. The first type of pseudonym techniques aim to Anonymization techniques can protect location privacy by changethepseudonymsintelligentlysothattheattackercannot generalizationandsuppressiontechniques,i.e.,withanexpres- distinguish the spatial difference from other members. The sionoflowergranularity.In[18]thek-anonymityconceptwas authors of [7] proposed to frequently change pseudonyms introduced so that a user’s location cannot be distinguished based on the mixed-zone concept, where users’ pseudonyms from other k − 1 users’ locations. Instead of reporting the are mixed together. In [10], disposable interface identifiers exact location, a user sends a region containing the locations werefrequentlyassignedtousers.Whenanodeisnotallowed of other k − 1 people. In [19], the CliqueCloak algorithm to disclose addresses, a silence period (defined as a transition was proposed to provide different k-anonymousrequirements period between changing pseudonyms) was introduced [12]. for each user. A privacy-preservingarchitecture for LBS with When users’ pseudonyms in some area are changed after a different anonymization techniques was suggested in [20]. A silence period, a hacker could not find the target user. In node density-based location privacy scheme was proposed [11], the pseudonym mechanism was improved by using a to improve anonymity without degrading quality of service Bayesianapproachtoresolvethefrequencychangingissue.In (QoS) [21]. The authors of [22] proposed a cluster based [14],agame-theoreticmodelforthenon-cooperativebehavior anonymization scheme to replace the real node identities analysis of mobile nodes in a mixed zone was proposed. A with random identities generated by the cluster heads. In location proof updating system for vehicular networks was [23], a clustering anonymization for vehicular networks was designed in [10]. In [15], the authors proposed a dynamic developed to hide the road and traffic information. However, mixed zone of which the size is determined by the vehicle’s alltheaboveanonymizationtechniquescannotprotectlocation predicted location, traffic statistics, and privacy requirement. data shared by a group of users since an individual’s location 3 is concealed. C. Objective of This Paper The objective of this paper is to improve location privacy protectiontounconditionalsecuritylevelandpreventanODB provider from linking a particular pseudonym with its stored data. To our knowledge, privacy protection for group LBS in thecontextofODBhasnotbeenfullyinvestigated.Wesuggest toadoptnetworkcodingforpseudonymgeneration,whichhas received little attention in the literature so far. Network coding is a generalized routing method in which messages are computed by intermediate nodes with algebraic encoding.Previousworkshaveshownthatnetworkcodingcan provide robustness [24]–[26] and improve throughput [25], [27] as well as confidentiality [28]. More recently, network coding is employed to improve the security and reliability of distributed storage systems such as recovering lost data in Fig.2. Anillustrative example ofagroupLBSforscheduling ameetup. multiple storage nodes [29], preventing eavesdropping over untrusted networks [30] and checking integrity of outsourced data [31]. the cloud database for future commercialpromotions,such as Although it was proved that network coding possesses the group coupon distribution. properties of irreversibility [32] and verifiable data integrity In this group LBS application scenario, it is worthwhile [33], a stronger security property of network coding that can investigating the following security issues. First, customers achieveunlinkabilitybetweencodeddatahasbeenrarelystud- shallreleasetheiridentitiestotheserviceproviderandtoother ied yet. In our previous work [34], the network coding based group members. During the sharing process, it is likely that pseudonymschemeandthecorrespondingsecuritymechanism a hacker will eavesdrop the location data and their identities. for group LBS using ODB was introduced, but the security Moreover, due to the use of ODB, an adversary can possibly analysis of the proposed mechanism is not yet investigated. pretend to be an authorized user and then modify the cus- To this end, we develop a two-tier coding method to mix tomers’ private information. Clearly, a more trustworthy LBS user identity with watchword/seed and then generate two provider needs to protect customers’ identities and location keys for certification and anonymization. In this paper, on data in a more sophisticated security mechanism. top of the work in [34], we further analyze the security performance of the proposed pseudonym scheme. We will provethatnetworkcodingcanprovideunlinkablepseudonyms B. Security Models and Assumptions to unconditional security level for group LBS by analysis. Figure 1 shows the proposed security model for a mobile It will be demonstrated that the proposed two-tier network cloudsystemwithODB,consistingofusersdevices,aservice coding method can preserve the privacy of data ownership engine, cloud databases, and a trusted certifying server. The evenifanuntrustedODBproviderhashugecomputingpower. service engine contains three major processors: the login processor, service data processor (e.g., scheduling a meetup), III. SYSTEMMODEL and service result translator (e.g., anonymization). The cloud A. Location Based Service (LBS) Applications Scenario database stores the service information and location data. Rather than in cloud databases, customers’ data are stored at As mentioned before, the group LBS is considered as an the certifying servers, consisting of authentication processor example to investigate the security performance issues for and pseudonym processor. Referring to Fig. 1, the general mobile cloud applications. With users’ interests as well as information flows of privacy preserved LBS are described as their locations, LBS system establishes a “group” of users follows. with the same interests to help customers’ meetup activities. We implemented a location-based group scheduling service 1) Step 1: A customer sends his/her identification as well on the ODB model, called JOIN [35], which can easily asthedeviceidentificationtothecertifyingserver.Then arrange a meetup by sharing their current available time the authenticationprocessor within the certifying server and locations for polling, voting, and broadcasting. Fig. 2 verifies customers’ identities. illustrates a service scenario of JOIN, in which users A, B 2) Steps 2 and 3: The login processor is activated to and C join the “Let’s drink coffee” group. While user A is respond the service requests of customers. approaching to his preferred coffee shop, users B and C are 3) Steps 4 and 5: The customer sends the location data nearby coincidentally.JOIN server will provide the proximity to the service engine. Then the service processor can information of its group members users B and C to customer access other service-related information from the cloud A, and the advertisements for this coffee shop. The locations database, such as the addresses of the nearby coffee and activities of customers in this meetup will be stored in shops. 4 4) Steps 6 and 7: After receiving the privacy level re- Key Generation (a) First tier: IMSI + Watchword KeyA quested by the customer, the result translator sends the Function pseudonym to the certificate server. 5) Step 8: The pseudonym processor “uncloaks” the Key Generation pseudonymousresult. (b) Second tier: KeyA + Random Seed KeyB Function 6) Step 9: The simplified location records are stored in the cloud database with various privacy levels. A low privacy level could show group members at the scale Fig.3. Proposedtwo-tiercodingschemewithencryptedinternationalmobile of street locations, whereasthe high privacylevel could subscriberidentity(IMSI),whereKeyAandKeyBareusedforauthentication andpseudonym generation, respectively. show your friends only at the scale of a city. Tomaketheprivacyanalysismoretractable,theconsidered JOIN groupLBSisimplementedinanexperimentalplatform, • Network coding: In efficient routing and secure stor- invoking the following assumptions and conditions: age schemes [32], network coding technique has been • A group of users trust each other so as to being willing adopted. Vandermonde transform matrix is the key ele- to share their locations and identities. ment in network coding. • The service providerand the clouddatabase providerare Vandermondematrixisusedinourproposedtwo-tiernetwork semi-honest[36]. Hence,these providersshallfollowthe coding scheme. Denote A as an n×n Vandermonde matrix, protocol properly, while allowing them to keep all the where [A , ] = (ai −1) and the coefficients a are distinct i j j i records of involved computations and data exchanges. nonzero elements in a finite field F , q = 2u > n. Let b = q Noteworthily, under the semi-honest requirement, there still (b ,...,b )T be the mix of IMSI and watchwords. Next, we 1 n exists a possibility of deriving private data of other parties compute c = (c ,...,c )T = Ab and randomly select the 1 n from the intermediate computation records. segment of c as the key value in the two-tier network coding scheme. IV. IMSI-BASED GROUP SECURITY (IGS) ALGORITHM The aforementioned key generation functions can be A. Algorithm Design adopted in our pseudonym generation scheme. Fig. 3 shows the proposed two-tier coding because a pseudonym in our In this subsection, we present network coding based schemeisgeneratedfromtwosequentialprocesses.Inthefirst pseudonymintegratedwiththeinternationalmobilesubscriber tier, IMSI and a watchword are mixed by the key generation identity (IMSI). Stored in a subscriber identity module (SIM) function to yield a cipher pseudonym. This can protect the card,IMSI isa uniquenumberassociated withcurrentmobile dataownershipprivacyduringanauthenticationprocess.Inthe phonestoidentifythesubscribers.CurrentAndroid-basedmo- secondtier,thecipherpseudonymofthefirsttierismixedwith bile platform provides the application programming interface random seeds to result in the second cipher pseudonym. We (API) to access the IMSI number of mobile phones. Clearly, use the second cipher pseudonym to protect the data privacy location data themselves are not quite useful as long as a when sharing data with others and storing data in an ODB. hacker does not know the ownership of these data. Hence, Now we detail the proposed IGS algorithm. Initially, a we propose to encrypt IMSI to conceal a customer’s privacy. customer sends his/her genuine identity and KeyA to the Fig. 3 (a) and (b) illustrate the basic idea of the proposed certifying server. If KeyA is certificated, the certifying server IMSI-based group security (IGS) algorithm, in which a two- generates an initial pseudonym (KeyB) by the input KeyA, tier codingschemeis developedto generateKeyAforauthen- and then activates the login processor. Algorithm 1 describes ticating a legal customer’s identity, and KeyB (pseudonym) the detail procedures of the proposed pseudonym generation for protecting customer’s private data (e.g., location). First, algorithm. Note that the tolerance distance and the silence KeyAisgeneratedbymixingcustomers’IMSIandcustomers’ period reflect the customers’ preferred privacy level and QoS watchwords, which are selected by customers as their secrets requirements. The former indicate the maximum acceptable to protect their IMSI information. Watchwords shall not be location bias of a customer, and the later specifies the period revealed to any party. Second, mixing each customer’s KeyA of changing pseudonyms [37]. The pseudonym processor and a uniformly distributed random seed results in KeyB. exchanges KeyBs under the condition that the pseudonym Note that the length of watchwords and random seed should timer reaches the silence period and any other group member be longer than or equal to the length of the IMSI. The key iswithintherangeofthetolerancedistance.Otherwise,anew generation functions in the proposed two-tier coding scheme KeyB will be regenerated. Since the pseudonyms are mixed shall satisfy the following two requirements even if the key with nearby members’ pseudonyms, the proposed IGS algo- generation function is revealed: (1) difficult to obtain any rithm will impose ambiguity on a customer’s exact location information from KeyA or KeyB; (2) difficult to obtain the and thus protect his/her location privacy. Finally, a customer correspondence between KeyA and KeyB. We consider two uses the generated KeyB to update the current location. kinds of key generation functions in this paper. • Hash function: It has been used extensively in the appli- B. Proposed IMSI-based Group Secure (IGS) Algorithm cationsofmessageintegritycheckingandauthentication, suchasmessagedigest(MD5)andsecureHashalgorithm In this subsection we detail the IGS algorithm and the (SHA). service procedures when applying the proposed pseudonym 5 Algorithm 1 Pseudonym (KeyB) generation algorithm V. PRIVACY ANALYSIS 1: if a customer sends the KeyA which is authorized by the A. Security Problem Formulation certifying server then 2: activate the login processor In this subsection, we will provethat the proposednetwork 3: generate and send KeyB to the customer coding based pseudonym scheme can guarantee the unlinka- 4: start a timer bility of locationsandcustomersto the unconditionalsecurity 5: else level. Although the irreversibility feature of network coding 6: send a reject information to the customer canpreventahackerfromdeducingvaluableinformation[32], 7: end if a hackercanstill linkthe relationbetweenthepseudonymsof 8: if timer ≥ silence period then the samecustomer.Thus,the unlinkabilityfeatureisdesirable 9: if tolerance distance ≥ distance(customer, a randomly for a pseudonym scheme. First, the hardness assumptions for selected friend F within the same group) then Hash functions are given. Then we present the information- 10: exchangeKeyBof the customerand F andalso reset theoretic security proof for the proposed network coding the timer pseudonym scheme. The security of the group LBS security 11: else modelis evaluated based on the probabilityof a hacker being 12: sendacommandofKeyB’schangingtothecustomer able to break into a security system. The basic idea of our and also reset the timer security system is to use a customer’s pseudonym to hide 13: end if his/her identity. The pseudonymsare generated by encrypting 14: end if the IMSI of the customer’s device through either network coding or a Hash function. Consider a scenario where adversary A can eavesdrop all thelinksontheInternet,andknowthedetailedinformationof generation algorithm in JOIN. Fig. 4(a) shows the register key generation functions (i.e., Hash function or Vandermode process. Each customer should register a unique account. matrixA).Thegoalofthisadversaryistofindthecorrespon- Initially, a customer generatesKeyA with his/her watchwords dence between the customer’s ID and his/her locations. Since and transmits KeyA to the server with password and genuine a customer’s (ID, KeyA) and (KeyB, location) are stored in identity. Fig. 4(b) shows the login process of the JOIN two separate storagenodes,adversaryA can breakthe system services. A process thread is initiated and retained after the once the relations of (ID, KeyA) and (KeyB, location) are verification of the identification and the password. Since known.A securityproblemforthiskindofgroupLBS canbe the thread is retained, the certifying server can identify the formulated as follows. customer in the communication afterwards. Furthermore, the certifying server generates an initial KeyB from KeyA and a IMSI Security Problem Given k(Ai) and k(Bj), determine random seed. Then the KeyB is delivered to the customer. whether IMSIi =IMSIj. Here k(Ai) and k(Bj) represent the i-th customer’s keyA and the j-th customer’s keyB gener- Fig. 5(a) shows the activity initiation process of the JOIN ated from the same key generation function k(), respectively. services, where the dotted line and the solid line in this We denote the mixing of IMSI and watchwords as A and the figure is used to denote data stored in the memory and on i mixing of KeyA and random seeds as B . the hard disk, respectively. A customer can initiate a new i activity (i.e., invites friends to go somewhere) and receive Next, we derive the following theorem. informationofnearbyfriendsbysendingKeyBandlocationto Theorem 1 AdversaryAcanfindthecorrespondencebetween theserver.Thentheserversendstherequeststoallothergroup ID and location if and only if A can correctly solve IMSI members. Each certificated member has to send his/her KeyB Security Problem. and locations to the server responding to this request. Next the JOIN server searches the nearby friendsfrom the location Proof: Assume that adversary A can find the correspon- record table and provides some group information (e.g., the dence between ID and location. Given the location associated top 10 restaurants, coffee shops, or bakeries with fresh-baked with k(B ), adversary A can know its corresponding ID of j bread) from the service data in the cloud-based ODB. Then the customer. By eavesdropping the link between the device thisgroupinformationaswellasKeyBandthelocationofthe and the certifying server, adversary A can get the customer’s nearby friends are sent to the certifying server. The certifying k(A ). If k(A ) = k(A ), the answer for IMSI Security j j i server decodes KeyB with a genuine identity and generates a Problem is “YES”; otherwise, the answer is “NO”. By new KeyB based on the proposed IGS algorithm. Finally, the contrast, assume an adversary A can solve IMSI Security customerreceivestheserviceresultfromthecertifyingserver. Problem.Foranygivenk(B )anditscorrespondinglocation j Fig. 5(b) shows the storage process of the JOIN services. l,adversaryAcananswerIMSI SecurityProblemforeach The JOIN server storeseach customer’sKeyBand location in KeyA in the database records. If the answer is “YES” for a the cloud database. Note that storing a less accurate location certain KeyA with ID d, then d is located at l. in the ODB can result in a higher privacy protection. For To reduce the possibility of IMSI Security Problem example, storing New York City is more secure than storing to be solved, the hardness assumption of the Hash function Manhattan in the database. Finally, data in the memory will used as the key generation function is given as follows: It is be cleared after the activity is finished. computationallyinfeasible in determiningwhether two output 6 Client Certificate Server Compute KeyAfrom IMSI Register ID PW KeyA Alan 123 AA (a) Client Certificate Server Compute KeyAfrom IMSI Login ID PW KeyA KeyB Alan 123 AA 8w Verify KeyAand Compute KeyB (b) Fig.4. Register andloginprocessoftheJOINservices. Hash values lead to the fact that part of two corresponding Proof: Let input messages are the same. P ,Pr{A[k(A ),k(B )]=Yes|IMSI =IMSI } (3) Y i j i j B. Analysis of Network Coding based Pseudonym and Nowweevaluateoursecurityframeworkforwhichnetwork P ,Pr{A[k(A ),k(B )]=No|IMSI 6=IMSI } . (4) N i j i j coding techniques are used to encrypt the customer’s IMSI. Referring to Theorem 1, the perfect privacy protection is to Without loss of generality, assume an LBS system with only ensure the probability of IMSI Security Problem being twocustomers.WehavethatPr{Anadversarycorrectlysolves solvedisequaltotheprobabilityofrandomguessing.Next,we IMSI Security Problem}= 21PY +21PN ≤ 14+12ε1+41+ give the definition that the considered LBS system is secure. 21ε2. Thus, it follows that |Pr{An adversary correctly solves IMSI Security Problem}− 1|≤ε, where ε= 1(ε +ε ) 2 2 1 2 is negligible. Definition 1 The system is secure if |Pr{An adversary cor- Next, we analyze the independence of the IMSI and the rectly solves IMSI Security Problem}− 1| ≤ ε, where 2 generated pseudonyms. ε is negligible. Note that a function f : N → R is called negligible if for every polynomial p there exists a positive Theorem 2 The information mutually held between IMSI integer N(p) such that |f(n)≤ p(1n)| for all n≥N(p). and KeyA or KeyB of a customer is I(k(Ai);IMSIi) = I(k(B );IMSI ) = 0, where the mutual information I() is j j Lemma 1 Assume an LBS system generate {k(A ),k(B )}. i j defined by I(X;Y)= p(x,y)log p(x,y) . Let A[k(Ai),k(Bj)] be the guess of adversary A. If yP∈Y xP∈X (cid:16)p(x)p(y)(cid:17) 1 Pr{A[k(A ),k(B )]=Yes|IMSI =IMSI }≤ +ε Proof: Let e(h) represent a subset containing arbitrary h i j i j 1 2 (1) components of vector e. Denote ei:j as the subvector formed and from the i-th to the j-th position of vector e. The set of the i-th and the j-th rows of matrix D is represented as D . 1 i:j Pr{A[k(Ai),k(Bj)]=No|IMSIi 6=IMSIj}≤ +ε2 , AssumethatthelengthofIMSIbemandthelengthKeyAand 2 (2) KeyBbek. Representb asthe uncodeddata (e.g.,the mixing where ε and ε are negligible, then the system is secure. of IMSI and watchwords), where b are uniformly distributed 1 2 i 7 Client Certificate Server JOIN Server Initiate Activity KeyB Location 8w 40.7,-74 All other group members. Request location of all other group members. KeyB Location 8w 40.7,-74 z9 40.9,-73.9 QK 42.3,-71.1 Find out nearby members. ID KeyB Sally z9 Replace KeyBwith ID. Generate keyBwith IGS algorithm . ID Location Sally 40.9,-73.9 (a) JOIN Server Cloud Database Store KeyB City z9 New York QK Boston 8w New York (b) Fig.5. Activity initiation andstorageprocessoftheJOINservices. independent random variables over F with entropy H(b )= where q i n H(b), and H() is defined by H(X) = − p(x )logp(x ). i i H(b(m)|c ) For simplicity, the key is selected from aribP=it1rary contiguous m p+1:p+k k components of c = (c1,··· ,cn)T = Ab. The mutual = H(bseq(j)|cp+1:p+k, information I(cp+1:p+k;b(m)) for 0 ≤ p ≤ n − k can be Xj=1 calculated as: bseq(j−1) =ˇbseq(j−1),··· ,bseq(1) =ˇbseq(1)) , (6) In(6)seq(j)isthej-thelementofarandomintegersequence within the range 1 to n, andˇb represents the given value for I(c ;b(m)) i p+1:p+k the random variable b . Since the n×n Vendermondematrix i =I(b(m);cp+1:p+k) A is nonsingular[38], we can applythe Gaussian elimination =H(b(m))−H(b(m)|c ) , (5) toobtainthereducedrowechelonformofthesubmatrixS, in p+1:p+k 8 mp ... mp 1 ... 0 mp ... mp 1 p p+k+1 n Mp+1:p+k = ... ... ... 0 ... ... ... ... ... (7) mp+k−1 ... mp+k−1 0 ... 1 mp+k−1 ... mp+k−1 1 p p+k+1 n [S ]=[A ], p+1≤i,j ≤p+k. ThenAdversary Reduced same as random guessing. This is implied that i,j i,j Matrix M is obtained as (7), where the other element of M are the same as A. We can also obtain Adversary Reduced Pr{A[k(Ai),k(Bj)]=Yes|IMSIi =IMSIj} vector v of which the elements vp+1···vp+k are represented = Pr{A[k(Ai),k(Bj)]=Yes} as 1 = 2 vp+1 1 vp+1:p+k = ... . (8) ≤ 2 +ε1 (15) vp+k and EachelementinMandvresultsfromthebasicrowoperations Pr{A[k(Ai),k(Bj)]=No|IMSIi 6=IMSIj} to reduce S to Ik. Thus, we obtain k equations to solve n = Pr{A[k(Ai),k(Bj)]=No} unknown elements b : 1 i = 2 p n 1 vi = mji−1bj +bi+ mij−1bj , (9) ≤ 2 +ε2 . (16) Xj=1 j=pX+k+1 Based on Lemma 1, we conclude that the proposed network wherep+1≤i≤p+k.Sincewecannotsolveanyb without coding scheme can achieve the unconditional security level. i n−k components of b for 1≤j ≤n−k, it follows that VI. DISCUSSION H(bseq(j)|cp+1:p+k,bseq(j−1) =ˇbseq(j−1),··· ,bseq(1) =ˇbseq(1)) In this section, we discuss the security issues in terms of =H(bseq(j)) privacy,authentication,and continuity for the proposed group =H(b) . (10) LBS scheme. Then, we compare the security performance of the proposed pseudonym scheme with that of traditional For n−k≤j ≤m, the numberof equationsis morethanthe pseudonym schemes. number of unknown elements. Thus, we obtain A. Data Ownership Privacy H(bseq(j)|cp+1:p+k,bseq(j−1) =ˇbseq(j−1),··· ,bseq(1) =ˇbseq(1)) As shown in the previous section, network coding can =0 . (11) provideunconditionalsecurityratherthancomputationalsecu- rity provided by Hash functions. Note that the unconditional Substituting (10) and (11) into (6), we have security property of the proposed IMSI pseudonym is con- tributed from the uncertainty of the customer’s watchword. mH(b) , m≤n−k H(b(m)|cp+1:p+k) =(cid:26) (n−k)H(b) , m>n−k . (12) Therefore, the privacy of location data ownership is fully preserved even if one can eavesdrop IMSI, KeyA, or KeyB. Since b are i.i.d random variables, it follows that This property ensures that the proposed pseudonym scheme i canpreventtheownershipinformationfrombeingaccessedby H(b(m)) = H b ,b ,...,b eavesdroppers. For the same reason, our system is collusion- seq(1) seq(2) seq(m) = mH(cid:0)(b) . (cid:1) (13) resistanttotheinsiderattackevenwhenaserviceproviderand a cloud database provider collaboratively attempt to decrypt Finally, substituting (12) and (13) into (5), we can obtain the pseudonyms.Furthermore,it is discussed that the connec- tion between a pseudonymand its ownership informationcan 0 , m≤n−k be derived by analyzing messages (e.g., location and group I c ;b(m) = . (cid:16) p+1:p+k (cid:17) (cid:26) (m−n+k)H(b) , m>n−k member information) subsequently [39]. In the proposed IGS (14) algorithm, customers’ keyB is exchanged randomly within Notethatweselectk =n/2andm≤n/2inournetworkcod- the same group after a silence period, thereby increasing the ing based pseudonymscheme to ensureI cp+1:p+k;b(m) = difficulty of de-anonymization. Compared to the approach 0. (cid:0) (cid:1) using a new KeyB, the proposed method can confuse the According to Theorem 2, we know that KeyA or KeyB eavesdropper because the customers in the same group have are independent of IMSI. As a result, the probability for an similar interests and movement patterns. The silence period adversaryto correctlysolve IMSI Security Problem is the is a design parameter specified by a random variable within 9 a certain range [37]. When the durations of two customers’ 600 KeyBs are overlapped, the temporal relation of these two MD5 customers cannot be correctly linked from the new KeyB 550 SHA−1 SHA−2 and the old KeyB. In general, a longer silence period results 500 Network Coding :GF(28) in lower location timeliness (i.e., the elapsed time since the 450 Network Coding :GF(210) location was acquired)butprovideshigherprivacyprotection. ms)400 Thus, a cloud database cannot store the precise locations me (350 because they are strongly associated with customers. For g Ti300 n example,ifacustomerusuallystaysinoneplace,wecanguess ssi250 thathe/she mayown the house.Fortunately,manygroupLBS Proe200 do not require precise locations for each customer. At last, 150 we notice that the collision issue (i.e., two or more customers 100 havingthesameKeyA/KeyB)willnothappeninourproposed 50 scheme since the certifying server will check the validity of each key in the login process. 20 40 60 80 100 120 140 Input Data Length (bits) B. Customer Authentication Fig.6. Processingtimefordifferent keygeneration schemes. Withthehelpoftheauthenticationcenterofcellularmobile service operators, the proposed IMSI pseudonym can prevent greatrobustnessagainstbruteforceattacksincloudcomputing illegal access which will be blocked in the login process. In environment. Besides, the designed two-tier coding method the proposed IMSI pseudonym, a customer generates KeyA can reduce the computational complexity and energy con- based on his/her own IMSI. A customer can update location sumption in mobile devices compared to Hash schemes. A information only if he/she is an authorized customer. Even if customer only needs to encode its identity one time when an adversary can steal a customer’s SIM card to get IMSI, logging in to the system. No further decoding procedures this adversary still cannot log in to the system without the are required. Moreover, the designed certifying server can customer’s watchword. authenticateand authorizeindividualcustomers. Note that the certifying server handles the key collision and uncloaks the C. Service Continuity pseudonymonlyaccordingtoone’skeyB.Thus,thecertifying In group LBS, service continuity is important because the process can be implemented in a distributed manner to avoid system providercanrecordandanalyzethe historicallocation the service bottleneck and the single point of failure in the data of a customer.TheproposedIMSI-basedpseudonymcan system [40], [41]. Last but not least, the proposed network retaintheindividualidentificationofacustomer’spseudonyms coding based pseudonym is compatible with the existing because the unique IMSI is used in pseudonym generation. pseudonymapproachesandsecurity communicationprotocols Therefore, a customer can enquire his/her location record by such as IP security (IPSec) and secure socket layer (SSL). providingKeyAto thecertifyingserver.Inaddition,historical location can be used for social data mining services. In VII. EXPERIMENTALRESULTS contrast, most of conventional random pseudonym approach Since mobile devices should perform key generation when will eliminate the individual identification of pseudonyms. customers log in to the LBS or update their locations, pro- Therefore, new location data cannot be appended to the cessing delay and power consumption for key generation same historic records when users change their identifications, functionareimportantperformanceissues.Theapplicationsof reinstallthe program,orchangedevices.Inthis case, location network coding may be limited because of the computational records become difficult to be traced since the ownership complexityandenergyconsumptioninmobiledevices[42].To information of a pseudonym is hard to be known. With the examine the delay performance and the energy consumption uniquenessprovidedbyIMSI,theproposedprivacyprotection of the proposed network coding scheme, we implemented scheme can be operated and retain the service continuity the JOIN services and performed experiments based on HTC simultaneously. Desire with a Qualcomm QSD8250, which is a low-power ARM based 1 GHZ processor. We choose the three most D. Comparison with Conventional Pseudonym Schemes widely used Hash functions (i.e., MD5, SHA-1, and SHA-2), Compared to conventionalrandom pseudonym approaches, and network coding schemes with differentGalois field sizes. the proposed network coding based pseudonym has the fol- Figure 6 shows the processing time versus the length of lowing advantages. First, as mentioned above, the proposed IMSI for different key generation schemes. One can observe IMSI pseudonymcan furtherimprovedata ownershipprivacy, that the processing time of SHA increases sharply when the customer authenticity, and service continuity. Especially, the length of IMSI is around 140 bits. This is because input data proposed pseudonym scheme can provide unconditional se- arebrokendownintochunksforprocessinginSHAandMD5. curity rather than the computational security of Hash-based Chunksofinputdatawillbeappendedtofitinwiththechunk pseudonym. The property of unconditional security offers size. Also, the processingtime of networkcodingapproachis 10 pseudonym scheme. Our theoretical results show that the 1600 Network Coding:GF(210) proposed scheme can achieve unconditionally secure privacy SHA−1 protection.Ourexperimentalresultsindicatethattheproposed 1400 SHA−2 network coding based pseudonym cannot only exhibit better MD5 1200 Network Coding:GF(28) delayperformance,butalsoprovidelowerenergyconsumption compared to Hash-based pseudonyms. Our proposed scheme ule)1000 isfullycompatiblewithexistingdatasecuritytechniques.This o J y ( 800 workisthefirststeptoexploitthepotentialofnetworkcoding g er in providing secure pseudonym. Besides the LBS, we will n E 600 further investigate the applicability and limitation of network coding based pseudonym in the applications of machine-to- 400 machine (M2M) communications. 200 REFERENCES 5 10 15 20 25 30 35 40 45 50 55 60 Time (Minutes) [1] J. Reilly, S. Dashti, M. Ervasti, J. D. Bray, S. D. Glaser, and A. M. Bayen,“Mobilephonesasseismologicsensors:Automatingdataextrac- tionfortheiShake system,”IEEETransactions onAutomationScience Fig. 7. Energy consumption of smartphones for different key generation andEngineering, vol.10,no.2,pp.242–251,2013. schemes. [2] S. Yu, “Big privacy: Challenges and opportunities of privacy study in theageofbigdata,”IEEEaccess,vol.4,pp.2751–2763, 2016. [3] L. J. Chen, C. R. Hong, D. Deng, H. C. Lee, and H. H. Hsieh, nothighlyaffectedbyGaloisfieldsizebecausetheGaloisfield “SUDO: a secure database outsourcing solution for location-based systems,”InternationalWorkshoponPrivacyinGeographicInformation mathematical operations are always in the range of field size Collection andAnalysis,p.4,2014. rather than growing exponentially. Finally, it is important to [4] S.M.Khanand K.W.Hamlen, “AnonymousCloud: Adata ownership notethattheprocessingtimeofnetworkcodingisshortenthan privacy provider framework in cloud computing,” IEEE 11th Interna- tional Conference on Trust, Security and Privacy in Computing and that of Hash functions when the IMSI length is shorter than Communications, pp.170–176,2012. 120 bits. In the existing communication systems, a standard [5] W. Lou and K. Ren, “Security, privacy, and accountability in wireless IMSI is usually 50 bitslong oreven shorter.Our resultsshow accessnetworks,”IEEEWirelessCommunications,vol.16,August2009. [6] Y.Zhu,D.Ma,D.Huang,andC.Hu,“Enablingsecurelocation-based that using network coding to encrypt a standard IMSI can services in mobile cloud computing,” Proceedings of the Second ACM reduce more than 95% of processing time in mobile devices SIGCOMMWorkshoponMobileCloudComputing, 2013. compared to using Hash functions. [7] A.BeresfordandF.Stajano,“Locationprivacyinpervasivecomputing,” IEEEPervasiveComputing, pp.46–55,Jan-Mar2003. Figure 7 shows the system energy consumption versus [8] D.Singele´eandB.Preneel,“Locationprivacyinwirelesspersonalarea runningtimefordifferentkeygenerationschemes.Thesystem networks,”Proceedingsofthe5thACMWorkshoponWirelesssecurity, energy,includingbothcodingenergyandtransmissionenergy, 2006. [9] J.M.delA´lamo,A.M.Ferna´ndez,R.Trapero,J.C.Yelmo,andM.A. is tested with eight packets coded together, each of one KB Monjas, “Aprivacy-considerate framework foridentity management in length, and a transmission energy per bit of 200 pJ/bit [43]. mobileservices,”MobileNetworksandApplications,vol.16,no.4,pp. Paper [43] has identified when a small field size is used, the 446–459, 2011. [10] Z.ZhuandG.Cao,“Towardprivacypreservingandcollusionresistance total system energy consumption is dominated by the extra in a location proof updating system,” IEEE Transactions on Mobile RF retransmissions. In contrast, as field size becomes large, Computing, vol.12,no.1,pp.51–64,2013. significantly increased energy is required for performing the [11] X.Liu,K.Liu,L.Guo,X.Li,andY.Fang,“Agame-theoreticapproach forachievingk-anonymityinlocationbasedservices,”IEEEConference coding processwith less influence on the expectednumberof onComputer Communications, pp.2985–2993, 2013. transmitted packets. Thus, it is worthwhile to investigate the [12] R.Lu,X.Li,T.H.Luan,X.Liang,andX.Shen,“Pseudonymchanging optimal field size of the proposed network coding scheme in at social spots: An effective strategy for location privacy in vanets,” IEEETransactions onVehicular Technology, vol.61,no.1,pp.86–96, terms of energy consumption. As shown in Fig. 7, although 2012. network coding with field size 210 consumes more energy [13] S. Mathews and B. Jinila, “An effective strategy for pseudonym comparedto MD5 or SHA, network coding with field size 28 generation & changing scheme with privacy preservation for vanet,” International Conference on Electronics and Communication Systems, can reduce about 10% of the energy consumption compared 2014. toMD5orSHA.Hence,wecanconcludethatnetworkcoding [14] F.Julien,M.M.Hossein,H.Jean,andP.David,“Non-cooperativeloca- with field size 28 is a promising function for energy-limited tionprivacy,”IEEETransactionsonDependableandSecureComputing, vol.10,no.2,pp.84–98,2013. devices.Finally,itisworthwhilementioningthattheproposed [15] B.Ying,D.Makrakis,andH.Mouftah,“Dynamicmix-zoneforlocation scheme can provide unconditional security with any network privacy invehicular networks,” IEEECommunications Letters, vol.17, coding parameters. no.8,pp.1524–1527,2013. [16] C. Teat and S. Peltsverger, “The security of cryptographic hashes,” Proceedings of the 49th ACM Annual Southeast Regional Conference, VIII. CONCLUSIONS 2011. [17] H.Kumar,S.Kumar,R.Joseph,D.Kumar,S.K.S.Singh,A.Kumar, In this paper, we proposed a lightweight network coding and P. Kumar, “Rainbow table to crack password using md5 hashing pseudonym scheme to protect the ownership of mobile data algorithm,”IEEEConferenceonInformationandCommunicationTech- in an untrusted cloud database by disconnecting the relation nologies, 2013. [18] M. Gruteser and D. Grunwald, “Anonymous usage of location-based of data and its identity. We develop the IMSI-based group servicesthroughspatialandtemporalcloaking,”TheFirstInternational secure (IGS) algorithm for group LBS based on the proposed Conference onMobileSystems,Applications, andServices, 2003.