ebook img

Optimized, Direct Sale of Privacy in Personal-Data Marketplaces PDF

1.5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Optimized, Direct Sale of Privacy in Personal-Data Marketplaces

1 Optimized, Direct Sale of Privacy in Personal-Data Marketplaces Javier Parra-Arnau Abstract—Veryrecently,wearewitnessingtheemergenceofanumberofstart-upsthatenablesindividualstoselltheirprivatedatadirectly tobrokersandbusinesses.Whilethisnewparadigmmayshiftthebalanceofpowerbetweenindividualsandcompaniesthatharvestdata,it raisessomepractical,fundamentalquestionsforusersoftheseservices:howtheyshoulddecidewhichdatamustbevendedandwhichdata protected,andwhatagooddealis.Inthiswork,weinvestigateamechanismthataimsathelpingusersaddressthesequestions.The investigatedmechanismreliesonahard-privacymodelandallowsuserstosharepartialorcompleteprofiledatawithbrokercompaniesin exchangeforaneconomicreward.Thetheoreticalanalysisofthetrade-offbetweenprivacyandmoneyposedbysuchmechanismisthe objectofthiswork.WeadoptagenericmeasureofprivacyalthoughpartofouranalysisfocusesonsomeimportantexamplesofBregman 7 divergences.Wefindaparametricsolutiontotheproblemofoptimalexchangeofprivacyformoney,andobtainaclosed-formexpressionand 1 characterizethetrade-offbetweenprofile-disclosureriskandeconomicrewardforseveralinterestingcases. 0 2 IndexTerms—userprivacy,disclosurerisk,databrokers,privacy-moneytrade-off. n (cid:70) a J 3 1 INTRODUCTION ]OVERthelastrecentyears,muchattentionhasbeenpaidto R government surveillance, and the indiscriminate collec- Ctionandstorageoftremendousamountsofinformationinthe .nameofnationalsecurity.However,whatmostpeoplearenot s cawareofisthatamoreseriousandsubtlethreattotheirprivacy [is posed by hundreds of companies they have probably never 1 heardof,inthenameofcommerce. v They are called data brokers, and they gather, analyze and 0package massive amounts of sensitive personal information, 4whichtheysellasaproducttoeachother,toadvertisingcom- 7 panies or marketers, often without our knowledge or consent. 0 A substantial chunk of this is the kind of harmless consumer 0 .marketingthathasbeengoingonforyears.Nevertheless,what Fig.1:ScreenshotofDatacoupwhichallowsuserstoearnmoneybysharing 1 has recently changed is the amount and nature of the data theirpersonaldata. 0 being extracted from the Internet and the rapid growth of a 7 1tremendouslyprofitableindustrythatoperateswithnocontrol purchasers, among others, retailers, insurance companies and v:whatsoever.Ourhabits,preferences,ourfriends,personaldata banks. Rather, they typically build a profile that gives these suchasdateofbirth,numberofchildrenorhomeaddress,and i companiesanoverviewofauser’sdata. Xevenourdailymovements,aresomeexamplesofthepersonal rinformation we are giving up without being aware it is being The emergence of these start-ups is expected to provide acollected,storedandfinallysoldtoawiderangeofcompanies. a win-win situation both for users and data buyers. On the A majority of the population understands that this is part one hand, users will receive payments, discounts or various ofanunwrittencontractwherebytheygetcontentandservices rewards from purchasing companies, which will take advan- free in return for letting advertisers track their behavior; this tage of the notion that users are receiving a poor deal when isthebarkereconomythat,forexample,currentlysustainsthe theytradepersonaldatainforaccessto“free”services.Onthe Web. But while a significant part of the population finds this other hand, companies will earn more money as the quality trackinginvasive,therearepeoplewhodonotgiveatossabout of the data these start-ups will offer to them will be much beingminedfordata[1]. greater than that currently provided by traditional brokers — Veryrecentlywearewitnessingtheemergenceofanumber the problem with the current brokers is often the stale and of start-ups that hope to exploit this by buying access to our inaccuratedata[2]. social-networks accounts and banking data. One such com- Thepossibilitythatindividualsmayvendtheirprivatedata pany is Datacoup, which lets users connect their apps and directly to businesses and retailers will be one step closer with services via APIs in order to sell their data. Datacoup and theemergenceofcompanieslikeDatacoup.Formany,thiscan similarstart-ups,however,donotproviderawdatatopotential havealiberatingeffect.Itpermeatestheopaquedata-exchange processwithanewtransparency,andempowersonlineusersto • The authors is with the Dept. of Comput. Sci., Math., Universitat Rovira i decide what to sell and what to retain. However, the prospect Virgili(URV),Tarragona,Spain.E-mail:[email protected] ofpeoplesellingdatadirectlytobrokersposesamyriadofnew problems for their owners. How should they manage the sale 2 category category responsible for the protection of their data. In the literature, rates rates numerous attempts to protect privacy have followed the tra- Data Data $’ $ ditional method of pseudonymization and anonymization [5], broker purchasers whichareessentiallybasedontheassumptionsofsoftprivacy. Unfortunately,thesemethodsarenotcompletelyeffective.they Fig.2:Conceptualdepictionofthedata-purchasingmodelassumedinthis normally come at the cost of infrastructure, and suppose that work.Inthismodel,usersfirstsendthedatabrokertheircategoryrates, usersarewillingtotrustotherparties. thatis,themoneytheywouldliketobepaidforcompletelyexposingtheir actual interests in each of the categories of a profile. Based on the rates Themechanisminvestigatedinthiswork,percontra,capi- chosenforeachcategory,databuyersdecidethenwhethertopaytheuser talizesontheprincipleofhardprivacy,whichassumesthatusers forlearningtheirprofileandgainingaccesstotheunderlyingdata.Finally, mistrust communicating entities and are therefore reluctant dependingontheoffermade,thedisclosuremayrangefromportionsof theirprofiletothecompleteactualprofile. to delegate the protection of their privacy to them. In the motivatingscenarioofthiswork,hardprivacymeansthatusers oftheirdata?Howshouldtheydecidewhichelementsmustbe donottrustthenewdatabrokeragefirms—nottomentiondata offeredupandwhichdataprotected?Whatisagooddeal? purchasers— to safeguard their personal data. Consequently, becauseusersjusttrustthemselves,itistheirownresponsibility 1.1 ContributionandPlanofthisPaper toprotecttheirprivacy. Inthispaper,weinvestigateamechanismthataimsathelping In the data-purchasing model supported by most of these users address these questions. The investigated mechanism new data brokers, users, just after registering —and without builds upon the new data-purchasing paradigm developed by having received any money yet—, must give these companies broker companies like Datacoup, CitizenMe and DataWallet, access to one or several of their accounts. As mentioned in which allows users to sell their private data directly to busi- the introductory section, brokers at first do not provide raw nesses and retailers. The mechanism analyzed in this work, datatopotentialbuyers.Rather,purchasersareshownaprofile however, relies on a variant of such paradigm which gives of the data available at those accounts, which gives them an priority to users, in the sense that they are willing to disclose accurate-enoughdescriptionofauser’sinterests,soastomake partial or complete profile data only when they have an offer adecisiononwhethertobidornotforthatparticularuser.Ifa on the table from a data buyer, and not the other way round. purchaserisfinallyinterestedinagivenprofile,thedataofthe Also, we assume a hard-privacy model by which users take correspondingaccountaresoldatthepricefixedbythebroker. charge of protecting their private data on their own, without Obviously,thebuyercanatthatpointverifythatthepurchased therequirementoftrustedintermediaries. datacorrespondstotheprofileitwasinitiallyshown,thatis,it The theoretical analysis of the trade-off between disclosure can check the profile was built from such data. At the end of risk and economic reward posed by said mechanism is the thisprocess,usersarenotifiedofthepurchase. object of this work. We tackle the issue in a mathematically, Inthiswork,weassumeavariationofthisdata-purchasing systematic fashion, drawing upon the methodology of multi- model that reverses the order in which transactions are made. objectiveoptimization.Wepresentamathematicalformulation Inessence,weconsiderascenariowhere,first,usersreceivean ofoptimalexchangeofprofiledataformoney,whichtakesinto economic reward, and then, based on that reward, their data account the trade-off between both aspects by treating them are partly or completely disclosed to the bidding companies; as two sides of the same coin, and which contemplates a rich this variation is in line with the literature of pricing private variety of functions as quantifiable measures of user-profile data [8], examined in Sec. 5. Also, we contemplate that users privacy. Our theoretical analysis finds a closed-form solution themselvestakechargeofthisinformationdisclosure,without totheproblemofoptimalsaleofprofiledata,andcharacterizes the intervention of any external entity, following the principle theoptimaltrade-offbetweenprivacyandmoney. ofhardprivacy. Sec.2introducesourmechanismfortheexchangeofprofile More specifically, users of our data-buying model first no- data for money, proposes a model of user profile, and for- tify brokers of the compensation they wish to receive for fully mulates the trade-off between privacy and economic reward. disclosing each of the components of their profile —we shall We proceed with a theoretical analysis in Sec. 3, while Sec. 4 henceforth refer to these compensations as category rates. For numerically illustrates the main results. Next, Sec. 5 reviews example, if profiles represent purchasing habits across a num- thestateofartandconclusionsaredrawninSec.6. berofcategories,ausermightspecifylowratesforcompletely revealing their shopping activity in groceries, and they might 2 A MECHANISM FOR THE EXCHANGE OF PRIVATE impose higher prices on more sensitive purchasing categories DATA FOR MONEY like health care. Afterwards, based on these rates, interested buyers try to make a bid for the entire profile. However, as In this section, we present a mechanism that allows users to commented above, it is now up to the user to decide whether share portions of their profile with data-broker companies, to accept or decline the offer. Should it be accepted, the user in exchange for an economic reward. The description of our would disclose their profile according to the money offered, mechanismisprefacedbyabriefintroductionoftheconceptof and give the buyer —and the intermediary broker— access to hardprivacyandourdata-purchasingmodel. thecorrespondingdata. As we shall describe more precisely in the coming subsec- 2.1 Hard-PrivacyandData-PurchasingModel tions,weshallassumeacontrolleddisclosureofuserinforma- Privacy-enhancing technologies (PETs) can be classified de- tionthatwillhingeupontheparticulareconomicrewardgiven. pending on the level of trust placed by their users [3], [4]. A Basically,themoremoneyisofferedtoauser,themoresimilar privacy mechanism providing soft privacy assumes that users the disclosed profile will be to the actual one. Furthermore, entrust their private data to an entity, which is thereafter we shall assume that there exists a communication protocol 3 𝑞 enablingthisexchangeofinformationformoney,andthatusers 2 bTehhisavweohrkondesotelsy ninotaltlacsktelepsthoef spariadctdicaatla-dtreatanislascotifonanprimocpelses-. 𝛿2=0.7 𝑞3 𝛿3=1 mentation of this protocol and the buying model described 𝑝𝑖=1/3 above. This is nevertheless an important issue, and dispelling 𝑞1 𝛿1=0.3 𝑡2 𝑡3 the assumption that must behave honestly is one of the many 𝑡1 excitingdirectionsforfuturework. Fig. 3: We provide an example of how our profile-disclosure mechanism 2.2 User-ProfileRepresentation operates.Inthisexample,weconsidercategoryratesof1dollarforeach ofthen=3categories,anda2-dollarofferbyapurchasingcompany.We We model user private data (e.g., posts and tags on social showanapparentprofiletthatresultsfromapplyingacertaindisclosure networks, transactions in a bank account) as a sequence of strategyontheactual profileq.Thedisclosuredeparts fromtheuniform random variables (r.v.’s) taking on values in a common finite distribution.Theselectedstrategyfullyrevealstheinterestoftheuserin alphabet of categories, in particular the set X = {1,...,n} thecategory3.However,thecompensationoffereddoesnotallowthemto dothesamewiththecategories1and2.Rather,theuserdecidestoexpose for some integer n (cid:62) 2. In our mathematical model, we 30and70percentoftheinterestvaluesinthesetwocategoriesrespectively, assumetheser.v.’sareindependentandidenticallydistributed. whichequatestothereceivedreward. This assumption allows us to represent the profile of a user particulargroup.Ineithercase,theultimateaimofanattacker by means of the probability mass function (PMF) according couldbefrompricediscriminationtosocialsorting. to which such r.v.’s are distributed, a model that is widely In our mathematical model, the selection of either privacy acceptedintheprivacyliterature[9],[10],[11]. model entails choosing a reference, initial profile p the user Conceptually, we may interpret a profile as a histogram of wishestoimpersonatewhennomoneyisofferedfortheirdata. relative frequencies of user data within that set of categories. Forexample,intheprofile-densitymodel,ausermightwantto For instance, in the case of a bank account, grocery shopping exhibit very common interests or habits and p might therefore and traveling expenses could be two categories. In the case of be the average profile of the population. In the classification social-networks accounts, on the other hand, posts could be model,theusermightbecomfortablewithshowingtheprofile classifiedacrosstopicssuchaspolitics,sportsandtechnology. of a less-sensitive group. As we shall explain in the next In our scenario of data monetization, users may accept subsection, the initial profile will provide a “neutral” starting unveiling some pieces of their profile, in exchange for an pointforthedisclosureoftheactualprofileq. economic reward. Users may consider, for example, revealing a fraction of their purchases on Zappoos, and may avoid 2.4 DisclosureMechanismandPrivacyFunction disclosingtheirpaymentsatnightclubs.Clearly,dependingon the offered compensation, the profile observed by the broker In this section, we propose a profile-disclosure mechanism and buying companies will resemble, to a greater or lesser suitable for the data-buying and privacy models described extent,thegenuineshoppinghabitsoftheuser.Inthiswork,we previously. The proposed technique operates between these shallrefertothesetwoprofilesastheactualuserprofileandthe two extreme cases. When there is no economic compensation apparentuserprofile,anddenotethembyq andt,respectively. for having access to a user account, the disclosed profile coin- cideswiththeinitialdistributionp,andtheobservationofthis information by the data broker or potential purchasers does 2.3 PrivacyModels notposeanyprivacyrisktotheuser.Whentheuserisoffered Before deciding how to disclose a profile for a given reward, sufficientreward,however,theactualprofileqisfullydisclosed users must bear in mind the privacy objective they want to andtheirprivacycompletelycompromised. achieve by such disclosure. In the literature of information Our disclosure mechanism reveals the deviation of the privacy, this objective is inextricably linked to the concrete user’sinitial,falseinteresttotheactualvalue.Informalterms, assumptions about the attacker against which a user wants to we define the disclosure rate δi as the percentage of disclosure protect.Thisisknownastheadversarymodelanditsimportance lyingonthelinesegmentbetweenpi andqi.Concordantly,we lies in the fact that the level of privacy provided is measured define the user’s apparent profile t as the convex combination withrespecttoit. t = (1−δ)p+δq, where δ = (δ1,...,δn) is some disclosure In this work, we consider two privacy objectives for users, strategy specified by the user. The disclosure mechanism may which may also be interpreted from an attacker perspective; beinterpretedintuitivelyasarollerblind.Thestartingposition in our case, data brokers, data-buying companies and in gen- δ = 0 corresponds to leaving the roller in the value p, that is, eral any entity with access to profile information may all be t = p. Depending on whether qi < pi or qi > pi, a positive regardedasprivacyadversaries. δ maytranslateintoloweringorraisingtherollerrespectively. • On the one hand, we assume a profile-density model, in Fig. 3 illustrates this effect for a uniform initial profile, that is, which a user wishes to make their profile more common, pi =1/nforalli=1,...,n. tryingtohideitinthecrowd. In our model, the user therefore must decide a disclosure • Ontheotherhand,weconsideraclassificationmodelwhere strategy that shifts t from the initial PMF to the actual one; the user does not want to be identified as a member of a clearly,thedisclosedinformationmustequatetothemoneyof- givengroupofusers. feredbythedatapurchaser.Thequestionthatfollowsnaturally Intermsofanadversarymodel,theformerobjectivecould is, what is the privacy loss due to this shift, or said otherwise, be defined under the assumption that the attacker aims at howdowemeasuretheprivacyoftheapparentprofile? targeting peculiar users, that is, users who deviate from a In this work, we do not contemplate a single, specific typicalbehavior.Thelattermodel,ontheotherhand,couldfit privacy metric, nor consider that all users evaluate privacy withanadversarywhowishestolabelauserasbelongingtoa the same way. Instead, each user is allowed to choose the 4 most appropriate measure for their privacy requirements. In only contemplates the case when all given probabilities and particular,wequantifyauser’sprivacyriskgenericallyas categoryratesarestrictlypositive: R=f(t,p)=f((1−δ)p+δq,p), qi,pi >0foralli=1,...,n. (2) wheref: (t,p)(cid:55)→f(t,p)isaprivacyfunctionthatmeasuresthe Withoutlossofgenerality,weshallassumethat extenttowhichtheuserisdiscontentwhentheinitialprofileis pandtheapparentprofileist. qi (cid:54)=pi foralli=1,...,n. (3) X A particularly interesting class of those privacy functions We note that we can always restrict the alphabet to those are the dissimilarity or distance metrics, which have been exten- categorieswhereqi (cid:54)= pi holds,andredefinethetwoprobabil- sively used to measure the privacy of user profiles. The intu- itydistributionsaccordingly. itive reasoning behind these metrics is that apparent profiles In this work, we shall limit our analysis to the case of closer to p offer better privacy protection than those closer to privacy functions f: (t,p) (cid:55)→ f(t,p) that are twice differen- q, which is consistent with the two privacy models described tiable on the interior of their domains. In addition, we shall inSec.2.3.ExamplesofthesefunctionscomprisetheEuclidean consider these functions capture a measure of dissimilarity or distance,Kullback-Leibler(KL)divergence[13],andthecosine distancebetweenthePMFstandp,andaccordinglyassumethat andHammingdistances. f(t,p)(cid:62)0,withequalityif,andonlyif,t=p.Occasionally,we shall denote f more compactly as a function of δ, on account of the fact that t = (1−δ)p+δq, and that p and q are fixed 2.5 FormulationoftheOptimalTrade-OffbetweenPrivacy variables. andMoney Beforeestablishingsomenotationalaspectsanddivinginto Equipped with a measure of the privacy risk incurred by a themathematicalanalysis,itisimmediatefromthedefinitionof disclosure strategy, the proposed mechanism aims at finding the privacy-money function and the assumptions made above the strategy that yields the minimum risk for a given reward. that its initial value is R(0) = 0. The characterization of the Next, we formalize the problem of choosing said strategy optimal trade-off curve modeled by R(µ) at any other values as a multiobjective optimization problem whereby users can ofµisthefocusofthissection. configureasuitabletrade-offbetweenprivacyandmoney. Let w = (w1,...,wn) be the tuple of category rates spec- 3.1 NotationandPreliminaries ified by a user, that is, the amount of money they require to completely disclose their interests or habits in each category. We shall adopt the same notation for vectors used in [12]. Since in our data-buying model users have no motivation for Specifically,wedelimitvectorsandmatriceswithsquarebrack- giving their private data for free, we shall assume these rates ets,withthecomponentsseparatedbyspace,anduseparenthe- are positive. Accordingly, for a given economic compensation sestoconstructcolumnvectorsfromcommaseparatedlists. µ,wedefinetheprivacy-moneyfunctionas Occasionally, we shall use the notation xTy to indicate the standard inner product on Rn, (cid:80)ni=1xiyi, and (cid:107)·(cid:107) to denote R(µ)= min f(t,p), (1) the Euclidean norm, i.e., (cid:107)x(cid:107) = (xTx)1/2. Recall [12] that a δ (cid:80)(cid:80)iwitiiδ=i=1,µ, hyperplaneisasetoftheform 0(cid:54)δi(cid:54)1 {x:vTx=b}, whichcharacterizestheoptimaltrade-offbetweenprivacyand where v ∈ Rn, v (cid:54)= 0, and b ∈ R. Geometrically, a hyperplane economiccompensation. may be regarded as the set of points with a constant inner Conceptually, the result of this optimization problem is a product to a vector v. Note that a hyperplane separates Rn disclosurestrategyδ∗thattellsus,foragivenamountofmoney, into two halves; each of these halves is called a halfspace. The howtounveilaprofilesothatthelevelofprivacyismaximized. resultsdevelopedinthecomingsubsectionswillbuildupona Intuitively,iff isaprofile-similarityfunction,thedisclosureis particularintersectionofhalfspaces,usuallyreferredtoasslab. chosentominimizethediscrepanciesbetweentheapparentand Concretely,aslabisasetoftheform theinitialprofiles.Naturally,theminimizationmustsatisfythat the compensation offered is effectively exchanged for private {x:bl (cid:54)vTx(cid:54)bu}, (cid:80) information. This is what the(cid:80)condition µ = iwiδi means. the boundary of which are two hyperplanes. Informally, we Theotherequalitycondition, iti =1,merelyreflectsthatthe shallrefertothemasthelowerandupperhyperplanes. resultingapparentprofilemustbeaprobabilitydistribution. In closing, the problem (1) gives a disclosure rule that not 3.2 MonotonicityandConvexity only assists users in protecting their privacy, but also allows themtofindtheoptimalexchangeofprivacyformoney. Ourfirsttheoreticalcharacterization,namelyTheorems1and3, investigates two elementary properties of the privacy-money trade-off. The theorems in question show that the trade-off is 3 OPTIMAL DISCLOSURE OF PROFILE INFORMATION nondecreasingandconvex.Theimportanceofthesetwoprop- Thissectionisentirelydevotedtothetheoreticalanalysisofthe ertiesisthattheyconfirmtheevidencethataneconomicreward privacy-moneyfunction(1)definedinSec.2.5.Inourattemptto will never lead to an improvement in privacy protection. In characterizethetrade-offbetweenprivacyriskandmoney,we otherwords,acceptingmoneyfromadatapurchaserdoesnot shall present a solution to the optimization problem inherent lowerprivacyrisk.Together,thesetworesultswillallowusto inthedefinitionofthisfunction.Afterwards,weshallanalyze determinetheshapeofR(µ). (cid:80) some fundamental properties of said trade-off for several in- Beforeproceeding,defineµmax = iwiandnotethatwhen (cid:80) terestingcases.Forthesakeofbrevity,ourtheoreticalanalysis µ=µmax,theequalitycondition iwiδi =µimpliesδi =1for 5 alli.Hence,R(µ )=f(q,p).Also,observethattheprivacy- (a) follows from the fact that f(t,p) is convex in the pairs of max money function is not defined for a compensation µ > µ probabilitydistributions[13,§2],and max sincetheoptimizationprobleminherentinthedefinitionofthis (b) reflects that δλ is not necessarily the solution to the mini- functionisnotfeasible. mizationproblemR((1−λ)µ+λµ(cid:48)). (cid:4) Theconvexityoftheprivacy-moneyfunction(1)guarantees Theorem 1 (Monotonicity). The privacy-money function R(µ) its continuity on the interior of its domain, namely (0,µ ). max isnondecreasing. However, it can be checked, directly from the definition of R(µ), that continuity also holds at the interval endpoints, 0 Proof: Consider an alternative privacy-money function Rtwao(µin)ewquhaelrietythceoncsotnradiintitos,nµ(cid:80)(cid:54)iw(cid:80)iδiiw=iδiµ(cid:54)isµrmeapxl.aWceedsbhyallthfiersset arensduLlµatsmstaslxyh.,owwen winouthldis lsiukbesetoctipooni,nwt hoiucthtahreegveanliedrafloitryaofwtidhee show that this function is nondecreasing and, based on it, we variety of privacy functions f(t,p), provided that they are shallprovethemonotonicityofR(µ). non-negative,twicedifferentiableandconvexinthepair(t,p). Let 0 (cid:54) µ < µ(cid:48) (cid:54) µ , and denote by δ(cid:48) the solution to max Some examples of functions meeting these properties are the theminimizationproblemcorrespondingtoRa(µ(cid:48)).Clearly,δ(cid:48) squaredEuclideandistance(SED)andKLdivergence. is feasible to the problem Ra(µ) since µ(cid:48) > µ. Because the feasibilityofδ(cid:48)doesnotnecessarilyimplythatitisaminimizer oftheproblemcorrespondingtoRa(µ),itfollowsthat 3.3 ParametricSolution Ra(µ)(cid:54)f((1−δ(cid:48))p+δ(cid:48)q, p)=Ra(µ(cid:48)), Our next result, Lemma 4, provides a parametric solution to the minimization problem involved in the formulation of and hence that the alternative privacy-money function is non- the privacy-money trade-off (1) for certain privacy functions. decreasing. Eventhoughsaidlemmaprovidesaparametric-formsolution, This alternative function can be expressed in terms of the fortunately we shall be able to proceed towards an explicit originalone,bytakingR(µ)asaninneroptimizationproblem closed-formexpression,albeitpiecewise,forsomespecialcases of Ra(µ), namely Ra(µ) = minµ(cid:54)α(cid:54)µ R(α). Based on this and values of n. For the sake of notational compactness, we max expression, it is straightforward to verify that the only condi- definethedifferencetupled=(q1−p1,...,qn−pn). tionconsistentwiththefactthatRa(µ)isnondecreasingisthat Lemma 4 (General Parametric Solution). Let f be additively R(µ)benondecreasingtoo. (cid:4) separableintothefunctionsfi fori = 1,...,n.Foralli,let Next,wedefineaninterestingpropertyborrowedfrom[13] fi : [0,1] → R be twice differentiable in the interior of its forKLdivergence,thatwillbeusedinTheorem3toshowthe domain, with f(cid:48)(cid:48) > 0, and hence strictly convex. Because i convexityoftheprivacy-moneyfunction. f(cid:48)(cid:48) > 0, f(cid:48) is strictly increasing and therefore invertible. i i Definition2.Afunctionf(t,p)isconvexinthepair(t,p)if Denote the inverse by f(cid:48)−1. Now consider the following i optimizationprobleminthevariablesδ1,...,δn: f(λt +(1−λ)t ,λp +(1−λ)p ) 1 2 1 2 n (cid:54)λf(t1,p1)+(1−λ)f(t2,p2), (4) minimize (cid:88)fi(δi) faonrdaallllp0ai(cid:54)rsλof(cid:54)p1ro.babilitydistributions(t1,p1)and(t2,p2) subjectto 0i=(cid:54)1δi (cid:54)1fori=1,...,n, (5) n n (cid:88) (cid:88) diδi =0and wiδi =µ. Theorem3(Convexity).Iff(t,p)isconvexinthepair(t,p),then i=1 i=1 thecorrespondingprivacy-moneyfunctionR(µ)isconvex. Thesolutiontotheproblemexists,isuniqueandoftheform (cid:110) (cid:111) Proof: The proof closely follows the proof of Theorem 1 δ∗ =max 0,min{f(cid:48)−1(αd +βw ),1} , i i i i of [14]. We proceed by checking the definition of convexity, thatis,that for some real numbers α,β such that (cid:80)idiδi∗ = 0 and (1−λ)R(µ)+λR(µ(cid:48))(cid:62)R((1−λ)µ+λµ(cid:48)) (cid:80)iwiδi∗ =µ. for all 0 (cid:54) µ < µ(cid:48) (cid:54) µ and all 0 (cid:54) λ (cid:54) 1. Denote by δ and Proof:Weorganizetheproofintwosteps.Inthefirststep, max weshowthattheoptimizationproblemstatedinthelemmais δ(cid:48) the solutions to R(µ) and R(µ(cid:48)), respectively, and define convex; then we apply Karush-Kuhn-Tucker (KKT) conditions δλ =(1−λ)δ+λδ(cid:48).Accordingly, to said problem, and finally reformulate these conditions into (1−λ)R(µ)+λR(µ(cid:48))=(1−λ)f((1−δ)p+δq, p) a reduced number of equations. The bulk of this proof comes later,inthesecondstep,whereweproceedtosolvethesystem +λf((1−δ(cid:48))p+δ(cid:48)q, p) ofequations. ((cid:62)a)f(cid:16)(1−λ)((1−δ)p+δq) To see that the problem is convex, simply observe that the (cid:17) objective function f is the sum of strictly convex functions fi, +λ((1−δ(cid:48))p+δ(cid:48)q), p and that the inequality and equality constraint functions are affine. The existence and uniqueness of the solution is then =f((1−δ )p+δ q, p) λ λ a consequence of the fact that we minimize a strictly convex ((cid:62)b)R((1−λ)µ+λµ(cid:48)), function over a convex set. Since the objective and constraint functions are also differentiable and Slater’s constraint qual- where ification holds, KKT conditions are necessary and sufficient 6 conditions for optimality [12, §5]. The application of these contradicts the fact that f(cid:48) is strictly increasing. Assuming i optimalityconditionsleadstothefollowingLagrangiancost, δi = 0, on the other hand, leads to fi(cid:48)(0) (cid:62) αdi +βwi, which (cid:88) (cid:88) runscontrarytotheconditionfi(cid:48)(1)(cid:54)αdi+βwi andthestrict L= fi(δi)− λiδi monotonicityoffi(cid:48). +(cid:88)µ (δ −1)−α(cid:88)d δ −β(cid:16)(cid:88)w δ −µ(cid:17), In summary, δi = 0 if αdi +βwi (cid:54) fi(cid:48)(0), or equivalently, i i i i i i fi(cid:48)−1(αdi +βwi) (cid:54) 0; δi = fi(cid:48)−1(αdi +βwi) if fi(cid:48)(0) < αdi + andfinallytotheconditions βwi < fi(cid:48)(1), or equivalently, 0 < fi(cid:48)−1(αdi +βwi) < 1; and fi(cid:48)(δi)−λi+µi−αdi−βwi =0 (dualoptimality), δ1i.A=c1coirfdαindgi+ly,βitwisi i(cid:62)mfmi(cid:48)(e1d)i,aoterteoqoubivtaailnenthtley,sfoi(cid:48)l−ut1i(oαndfio+rmβwgiiv)e(cid:62)n λiδi =0, µi(δi−1)=0 (complementaryslackness), inthestatement. (cid:4) λi,µi (cid:62)0 (dualfeasibility), As mentioned at the beginning of this subsection, the opti- mization problem presented in the lemma is the same as that 0(cid:54)δi (cid:54)1, (cid:80)diδi =0, (cid:80)wiδi =µ (primalfeasibility). of(1)butforadditivelyseparable,twicedifferentiableobjective We may rewrite the dual optimality condition as λi = functions, with strictly increasing derivatives. Although these fi(cid:48)(δi) + µi − αdi − βwi and µi = αdi + βwi − fi(cid:48)(δi) + λi. requirements obviously restrict the space of possible privacy By eliminating the slack variables λi, µi, and by substituting functionsofouranalysis,thefactisthatsomeofthebestknown the above expressions into the complementary slackness con- dissimilarityanddistancefunctionssatisfytheserequirements. ditions,wecanformulatethedualoptimalityandcomplemen- This is the case of some of the most important examples of taryslacknessconditionsequivalentlyas Bregmandivergences[15],suchastheSED,KLdivergenceand the Itakura-Saito distance (ISD) [16]. In the interest of brevity, fi(cid:48)(δi)+µi (cid:62) αdi+βwi, (6) many of the results shown in this section will be derived only fi(cid:48)(δi)−λi (cid:54) αdi+βwi, (7) for some of these three particular distance measures. Due to (fi(cid:48)(δi)+µi−αdi−βwi)δi =0, (8) itsmathematicaltractability,however,specialattentionwillbe (fi(cid:48)(δi)−λi−αdi−βwi)(δi−1)=0. (9) giveFnortontohteatSioEnDa.l simplicity, hereafter we shall denote by zi In the following, we shall proceed to solve these equations and γ the column vectors (di,wi) and (α,β), respectively. A which,togetherwiththeprimalanddualfeasibilityconditions, compelling result of Lemma 4 is the maximin form of the are necessary and sufficient conditions for optimality. To this solution and its dependence on the inverse of the derivative end, we consider these three possibilities for each i: δi = 0, of the privacy function. The particular form that each of the n 0<δi <1andδi =1. componentsofthesolutiontakes,however,hingesonwhether We first assume δi = 0. By complementary slackness, it diα + wiβ is greater or less than the value of the derivative followsthatµi =0and,invirtueof(6),thatfi(cid:48)(0)(cid:62)αdi+βwi. of fi at 0 and 1; equivalently, in our vector notation, the We now suppose that this latter inequality holds and that lemma shows that the solution is determined by the specific δi > 0. However, if δi is positive, by equation (7) we have configurationofthenslabs fi(cid:48)(δi)(cid:54)αdi+βwi,whichcontradictsthefactthatfi(cid:48) isstrictly increasing.Hence,δi =0if,andonlyif,αdi+βwi (cid:54)fi(cid:48)(0). ∇f(0)(cid:52)zTγ (cid:52)∇f(1), Next,weconsiderthecase0<δi <1.Notethat,whenδi > 0,itfollowsfromtheconditions(7)and(8)thatfi(cid:48)(δi)(cid:54)αdi+ where ∇f(0) denotes the gradient of f at 0, and zi are the βwi, which, by the strict monotonicity of fi(cid:48), implies fi(cid:48)(0) < columns of z. In particular, the i-th component of the solution αdi+βwi.Ontheotherhand,whenδi < 1,theconditions(9) isequalto0,1orfi(cid:48)−1(ziTγ)if,andonlyif,ziTγ (cid:54) fi(cid:48)(0),ziTγ (cid:62) and (6) and again the fact that f(cid:48) is strictly increasing imply f(cid:48)(1),orf(cid:48)(0)<zTγ <f(cid:48)(1),respectively. i i i i i thatαdi+βwi <fi(cid:48)(1). From the lemma, it is clear then that γ, which must satisfy Toshowtheconverse,thatis,thatfi(cid:48)(0)<αdi+βwi <fi(cid:48)(1) the primal equality constraints dTδ = 0 and wTδ = µ, is the is a sufficient condition for 0 < δi < 1, we proceed by contra- parameter that configures the point of operation within the α- diction and suppose that the left-hand side inequality holds β planewhereallsuchhalfspaceslie.Informally,theregionof and the solution is zero. Under this assumption, equation (9) this plane where γ falls on is what determines which precise impliesthatµi =0,andinturnthatfi(cid:48)(0)(cid:62)αdi+βwi,which componentsare0,1andfi(cid:48)−1(ziTγ).Nevertheless,theproblem isinconsistentwiththefactthatfi(cid:48)isstrictlyincreasing.Further, when trying to determine the particular form of each of the n assuming αdi +βwi < fi(cid:48)(1) and δi = 1 implies that λi = 0 componentsistheapparentarbitrarinessandlackofregularity and,onaccountof(7),thatfi(cid:48)(1)(cid:54)αdi+βwi,acontradiction. ofthelayoutdrawnbytheircorrespondingslabs,whichmakes Consequently,thecondition0<δi <1isequivalentto it difficult to obtain an explicit closed-form solution for any f(cid:48)(0)<αd +βw <f(cid:48)(1), givenµ,q,p,wandn.Especiallyforlargevaluesofn,conduct- i i i i ing a general study of the optimal trade-off between privacy and the only conclusion consistent with (6) and (7) is that andeconomicrewardbecomesintractable. fi(cid:48)(δi)=αdi+βwi,orequivalently, Motivated by all this, our analysis of the solution and the δ =f(cid:48)−1(αd +βw ). correspondingtrade-offfocusesonsomespecificalbeitriveting i i i i cases of slabs layouts. In particular, Sec. 3.5 will examine The last possibility corresponds to the case when δi = 1, several instantiations of the problem (5) for small values of whichbyequations(8)and(7)implyfi(cid:48)(1)(cid:54)αdi+βwi.Next, n. Afterwards, Sec. 3.5 will tackle the case of large n for some we check that this latter condition is sufficient for δi = 1. We speciallayoutsthatwillpermitustosystematizeourtheoretical first assume 0 < δi < 1. In this case, λi = µi = 0 and the analysis. Fig. 4 shows a configuration of slabs for n = 6, and dualoptimalityconditionsreducetofi(cid:48)(δi)=αdi+βwi,which illustratestheconditionsthatdefineanoptimalstrategy. 7 The importance of Proposition 5 is obvious: for some pri- vacy functions and distributions q and p, the existence of a sort of origin of coordinates in the slabs layout may reveal certainregularitieswhichmayhelpussystematizetheanalysis ofthesolutionsspace.Forexample,atrivialconsequenceofthe intersection of all lower hyperplanes on O is that any γ lying onanboundedpolyhedronwillleadtoasolutionwithatleast onecomponentoftheformf(cid:48)−1(zTγ)onitsinterior.Whenthe i i assumptionsoftheabovepropositiondoesnotsatisfy,however, this property may not hold for any n and the choice of the originmaynotbeevident. In the next subsections, we shall investigate the optimal trade-off between privacy and money for several particular cases.Asweshallsee,thesecaseswillleveragecertainregular- itiesderivedfrom,orasaresultof,saidreferencepointonthe Fig.4:Slabslayoutontheα-βplaneforn=6categories.Eachcomponent α-β plane. Before that, however, our next result, Corollary 6, of the solution is determined by a slab and, in particular, by the specific provides such point for each of the three privacy functions γ falling on the plane. We show in dark blue the lower and upper hyperplanes of the i-th slab. In general, it will be difficult to proceed consideredinouranalysis. towards an explicit closed-form solution and to study the corresponding Corollary 6. Consider the nontrivial case when q (cid:54)= p. The optimalprivacy-moneytrade-offforanyconfigurationoftheseslabsand anyγandn. solution to zTγ = ∇f(0) is unique and yields (0,0) for the squared Euclidean and the Itakura-Saito distances, and 3.4 OriginofLowerHyperplanes (1,0)fortheKLdivergence. Despite the arbitrariness of the layout depicted by the slabs Proof:WeobtaintheresultasadirectapplicationofPropo- associated with a particular instantiation of the problem (5), sition 5. Note that the gradient of the squared Euclidean and nextweshallbeabletoderiveaninterestingpropertyforsome the Itakura-Saito distances vanishes at δ = 0. In the case of specific privacy functions. The property in question is related to the need of establishing a fixed point of reference for the the KL divergence, ∇f(0) = (d1,...,dn). Clearly, in the three geometryofthesolutionsspace. casesinvestigated,theconditiondifj(cid:48)(0)=djfi(cid:48)(0)foralli(cid:54)=j in the proposition is satisfied, which implies that the solution Proposition5(IntersectionofLowerHyperplanes).Inthecase isunique.Then,itisimmediatetoderivethesolutionsclaimed when q (cid:54)= p, if difj(cid:48)(0) = djfi(cid:48)(0) for all i,j = 1,...,n and inthestatement. (cid:4) i(cid:54)= j,thenthehyperplaneszTγ = f(cid:48)(0)fori =1,...,nall i i Although it seems rather obvious, the above corollary ac- intersectatasinglepointOontheplaneα-β. tually tells us something of real substance. In particular, for Proof: Clearly, the consequent of the statement is true if, the three privacy functions under study, O does not depend andonlyif,thesystemofequationszTγ =∇f(0)hasaunique onauser’sprofilenortheparticularinitialdistributionchosen. solution.Weproceedbyprovingthattherankofthecoefficient This result therefore shows the appropriateness of basing our TexPoint fonts used in EMF. and augmented matrices is equal to 2 under the conditions analysisonsuchfunctions. Read the TexPoint manual befosrtaete dyiontuhe dpreopleostiteio nt.his box.: AAA A Ontheonehand,recallthatzi =(di,wi)istheAi-thcolumn 3.5 Casen(cid:54)3 of z, and check that its rank is two if, and only if, diwj (cid:54)= djwi for some i,j = 1,...,n and i (cid:54)= j. That said, now we We start our analysis of several specific instantiations of the showthattheconsequentofthisbiconditionalstatementistrue problem(5)forsmallvaluesofthenumberofinterestcategories providedthatq (cid:54)= p.Tothisend,weassume,bycontradiction, n. We shall first tackle the case n = 2 and afterwards the case thatsgn(d1)=···=sgn(dn),wheresgn(·)isthesignfunction. n=3. (cid:80) (cid:80) Ifdi =qi−pi >0fori=1,...,n,wehave1= qi > pi = The special case n = 2 reflects a situation in which a user 1,acontradiction.Thecasedi <0forallileadstoananalogous maybewillingtogrouptheoriginalsetoftopics(e.g.,business, contradiction,andthecasedi =0(foralli)contradictsthefact entertainment, health, religion, sports) into a “sensitive” cate- thatq (cid:54)= p.Hence,theconditionq (cid:54)= pimpliesthattheremust gory(e.g.,health,religion)anda“non-sensitive”category(e.g., existsomeindexesi,j withi (cid:54)= j suchthatsgn(di) (cid:54)= sgn(dj), business, entertainment, sports), and disclose their interests whichinturnimpliesthatdiwj (cid:54)=djwi,andthatrank(z)=2. accordingly. Evidently, this grouping would require that the On the other hand, to check the rank of the augmented user specify the same rate wi for all topics belonging to one matrix,observethatthedeterminantofany3x3submatrixwith ofthesetwocategories.Ournextresult,Theorem7,presentsa rowsi,j,kyields closed-form solution to the minimization problem involved in the definition of function (1) for this special case. As we shall det(z|∇f(0))=w (d f(cid:48)(0)−d f(cid:48)(0)) i j k k j see now, this result can be derived directly from the primal +wj(difk(cid:48)(0)−dkfi(cid:48)(0)) feasibilityconditions. +w (d f(cid:48)(0)−d f(cid:48)(0)). k i j j i Theorem 7 (Case n = 2, and SED and KL divergence). Let Fromthisexpression,itiseasytoverifythatrank(z|∇f(0))= f : [0,1]×[0,1] → R+ be continuous on the interior of its 2ifalltermsdifj(cid:48)(0)−djfi(cid:48)(0)withi(cid:54)=jvanish,whichensures, domain. by the Rouche´-Capelli theorem [17], that there exists a unique (i) For any µ ∈ [0,µ ] and i = 1,2, the optimal disclosure max solutiontozTγ =∇f(0). (cid:4) strategyisδ∗ = µ . i µ max 8 (ii) InthecaseoftheSEDandKLdivergence,thecorrespond- djmj forj = 2.Forthecorrespondingindexj andforany ing,minimumdistanceyieldstheprivacy-moneyfunctions µ(cid:54)µj,theoptimaldisclosurestrategyis RSED(µ)=2(cid:18)diµµ (cid:19)2 and δi∗ =(cid:26) 0(jv+i,12)jdi µ,, ii(cid:54)==22jj , max RKL(µ)=(cid:88)2 (cid:18)diµµ +pi(cid:19)log(cid:18)diµp/µmax +1(cid:19). amnodnethyefucnorcrteiosnponding, minimum SED yields the privacy- i=1 max i µ2 R (µ)= . SED (j+1)σ2 Proof: Since n = 2, we have that d1 = −d2, which, by m2j vδ2∗ir.tuTeheonf,tfhreompritmhealocthonerdiptiroimna(cid:80)l cdoinδi∗di=tion0,(cid:80)imwpiliδei∗s =thaµt,δ1i∗t =is expoPserosotfh:eItsitsrusctrtuairgehotffotrhweaorpdtitmoivzaetriiofyntphraotbtlheme SaEdDdrfeusnsectdioinn immediatetoobtainthesolutionclaimedinassertion(i)ofthe Lemma 4. Note that, according to the lemma, the components theorem. Finally, it suffices to substitute the expression of δ∗ (cid:80)intiot∗ithloegftu∗in/cptii,otnosdfeSrEivDe(δtih)e=opt(cid:80)imi(atl∗itr−adpei-)o2ffafnudncftiKoLn(tR∗,(pµ))i=n ogfivtehnebsyoltuhteioinnvseurcsheothfathte0p<rivδaicy<fu1nfcotironsoamnediyi=eld1,2,3 are eachcase. (cid:4) f(cid:48)−1(αd +βw )= α + wiβ. InlightofTheorem7,wewouldliketoremarkthesimple, i i i 2di 2d2i linearformofthesolution,which,moreimportantly,isvalidfor To check that a solution does not admit only one positive a set of privacy functions which is larger than that considered component, simply observe that the system of equations com- in Lemma 4. In particular, not only the KL divergence, the (cid:80) squared Euclidean and the Itakura-Saito distances satisfy the p(cid:80)osed of the two primal equality conditions idiδi = 0 and conditions of this theorem, but also many others which are iwiδi =µisinconsistent. Having shown that there must be at least two positive not differentiable (e.g., total variation distance) nor additively components, we apply such primal equality conditions to a separable(e.g.,Mahalanobisdistance). solutionwith0 < δ1,δ3 < 1.Toverifythesetwoequalitiesare Another straightforward consequence of Theorem 7 is that met, first note that the former is equivalent to α+βm2 = 0, the optimal strategy implies revealing both categories (e.g., andthelattercanbewrittenequivalentlyas sensitiveandnon-sensitive)simultaneouslyandwiththesame level of disclosure. In other words, if a user decides to show a αm + β (cid:88) m2 =µ. fractionoftheirinterestinonecategory,thatsamefractionmust 2 2 i i=1,3 bedisclosedontheothercategorysoastoattainthemaximum levelofprivacyprotection. Then, observe that the condition m1 > m3 in the theorem ensures that the determinant of the homogeneous system is BeforeproceedingwithTheorem8,firstweshallintroduce nonzero, and, accordingly, that the Lagrange multipliers that what we term money thresholds, two rates that will play an solvethesetwoequationsare important role in the characterization of the solution to the minimization problem (5) for n = 3. Also, we shall introduce α=− m2 µ and β = 1 µ. (11) somedefinitionsthatwillfacilitatetheexpositionoftheafore- σ2 σ2 m2 m2 mentionedtheorem. Finally, it suffices to substitute the expressions of α and β For i = 1,...,n, denote by mi the slope of vector zi, i.e., intothefunctionf(cid:48)−1,toobtainthesolutionwithtwononzero mofia=ll bwduii.tLthetemi-ithansdlopσem2.iWbehethnetahreitshumbeintidcemxeian(cid:54)∈aXnd,voabrsiaenrvcee optimalcomponenitsclaimedinthetheorem. Next, we derive the conditions under which this solution that the mean and variance are computed from all slopes. is defined. With this aim, just note that the inequalities zTγ > Accordingly,definethemoneythresholdsµj as f1(cid:48)(0)andz3Tγ >f3(cid:48)(0)areequivalenttod1(m1−m2)>01and (j+1)d σ2 d3(m3−m2) > 0, respectively. On the other hand, δ2 = 0 if, µj =im(cid:54)=i2nj mi−mi 2mj 2j andWoenlnyoiwf,zs2Thγow(cid:54)fth2(cid:48)(a0t),wohreenquthivearelenatrley,tdw2o(mco2m−pmon2e)n(cid:54)ts00. < forj =1,2. δi,δj < 1,theni = 1andj = 3.Tothisend,weshallexamine the case 0 < δ2,δ3 < 1 and δ1 = 0. The other possible case, Additionally, we define the relative coefficient of variation of theratiowi/di as 0om<itδte1d,δ.2 <1andδ3 =0,proceedsalongthesamelinesandis m −m vi,j = iσ2 j (10) First, though, we shall verify that d1 (cid:62) 0, a condition that mj willbeusedlateron.Weproceedbycontradiction.Sincewi >0 thfoeriin,djex=o1f,d.i.s.p,enr,siwonhic[1h8]m, aaymbeeasruegrearcdoemdmasonthlyeuintivliezresde oinf mfor1 a(cid:62)ll i,ma2ne(cid:62)gatmiv3e, dth1aitmdp2l,ieds3, b<y th0.e Bourdtehrianvginagss(cid:80)udimp<tion0 statistics and probability theory to quantify the dispersion of f(cid:80)or i =(cid:80)1,2,3 leads us to the contradiction 0 > idi = a probability distribution. As we shall show in the following iqi − ipi = 0. Consequently, d1 is nonnegative, but by result, our coefficient of variation will determine the closed- virtueof(3),itfollowsthatd1 >0. formexpressionoftheoptimaldisclosurestrategy. Havingverifiedthepositivenessofd1,nextwecontemplate the case when 0 < δ2,δ3 < 1 and δ1 = 0. Note that, in this Theorem 8 (Case n = 3 and SED). For n = 3 and the SED case,theconditionδ1 =0holdsif,andonlyif,d1(m1−m1)(cid:54) function,assumewithoutlossofgeneralitym1 (cid:62)m2 (cid:62)m3. 0. However, since d1 > 0, we have that m1 (cid:54) 12(m2+m3), Eitherwj+1 (cid:54) dj+1mj+1 forj = 1andm1 > m3,orwj > which contradicts the fact that m1 (cid:62) m2 (cid:62) m3 and m1 > m3. 9 Consequently, it is not possible to have 0 < δ2,δ3 < 1 and Theorem 8 provides an explicit closed-form solution to the δ1 = 0. The case when 0 < δ1,δ2 < 1 and δ3 = 0 leads to problem of optimal profile disclosure, and characterizes the another contradiction and the conclusion that 0 < δ1,δ3 < 1 correspondingtrade-offbetweenprivacyandmoney.Although andδ2 =0. it rests on the assumption that µ < µ1,µ2 and —for the Next, we check the validity of the conditions under which sake of tractability and brevity— tackles only the case of SED, this solution is defined. Recall that these conditions are the provided results shed light on the understanding of the d1(m1−m2)>0,d3(m3−m2)>0andd2(m2−m2)(cid:54)0.It behavior of the solution and the trade-off, and enables us to iseasytoverifythattheformertwoinequalitieshold,sincethe establish interesting connections with concepts from statistics arithmetic mean is strictly smaller (greater) than the extreme andestimationtheory. value m1 (m3); the strictness of the inequality is due to the In particular, the most significant conclusion that follows assumptionm1 > m3 inthestatement.Ontheotherhand,the from the theorem is the intuitive principle upon which the latter inequality is the condition assumed in the statement of optimal disclosure strategy operates. On the one hand, in line the theorem. Therefore, we have 0 < δ1,δ3 < 1 and δ2 = 0 if, with the results obtained in Theorem 7, the solution does not andonlyif,w2 (cid:54)d2m2. admit only one positive component: we must have either two Next, we turn to the case when 0 < δ1,δ2,δ3 < 1. By ap- or three active components. On the other hand, and more plying the two primal equality constraints of the optimization importantly, the optimal strategy is linear with the relative problem(5),weobtainthesystemofequations coefficient of variation of the ratio wi/di, a quantity that is 3(cid:20) 1 m (cid:21) (cid:20)α(cid:21) (cid:20)0(cid:21) closely related to the index of dispersion, also known as Fano 2 m 1(cid:80)3 0 m2 β = µ , factor1. 0 3 i=1 i The solution, however, does not only depend on vi,j but andnotethatthesolutionisuniqueonaccountofthefactthat also on the difference between the interest value of the actual sgn(di) (cid:54)= sgn(dj) for some i,j = 1,2,3 and i (cid:54)= j, which profile and that of the initial PMF. Essentially, the optimized impliesthatσ2 >0.Substitutingthevalues disclosureworksasfollows.Weconsiderthecategoryiwiththe m0 2m 2 largestvaluewi,whichinpracticemaycorrespondtothemost α=− 0 µ and β = µ (12) sensitivecategory.Forthatcategory,ifdi issmallandmi isthe 3σ2 3σ2 m0 m0 ratio that deviates the most from the mean value —relative to into f(cid:48)−1(zTγ) gives the expression of the optimal disclosure the variance—, then the optimal strategy suggests disclosing i i the profile mostly in that given category. This conforms to strategystatedinthetheoremfor0<δ1,δ2,δ3 <1. Now, we examine the necessary and sufficient conditions intuition since, informally, revealing small differences qi −pi forthisoptimalstrategytobepossible,which,accordingtothe when wi is large may be (cid:80)sufficient to satisfy the broker’s lemma,are0<ziTγ <2di fori=1,2,3.Tothisend,notethat mdeamyannodt, hi.ae.v,ethaesciognndifiitciaonnt imipwaicδtio=n uµs,earnpdritvhaicsyr2e.vOelnatitohne the left-hand inequalities can be recast as di(mi −m0) > 0, for i = 1,2,3. We immediately check that the inequalities for other hand, if di is comparable to wi, and mi is close to the meanvalue,thenδ∗ recommendsthattheusergivepriorityto i = 1 and i = 3 hold, as the mean is again strictly smaller othercategorieswhenunfoldingtheirprofile. (greater)thantheextremevaluem1(m3).Thestrictnessofthese two inequalities is due to the fact that (cid:80)3i=1di = 0 and the offdAelpsoe,nfdrsomqutahdirsatthiceaolrlyemonwtheedoefdfeurceedtmhaotntehye,eoxpatcimtlyalastrwaditeh- assumption(3).Ontheotherhand,observethat the case n = 2, and inversely on the variance of the ratios sgn(m2−m0)=sgn(m2−m2), m1,m2,m3. Last but not least, we would like to remark that, although and therefore that the condition d2 (m2−m0) > 0 is equiva- Theorem8doesnotcompletely3 characterizetheoptimalstrat- lenttod2 (m2−m2)>0.Thatsaid,notethatd2 (m2−m2)> egy nor the corresponding trade-off for any q, p, w and µ for 0 is the negation of the condition for having a solution with n = 3, the proof of this result does show how to systematize two nonzero components smaller than one. Accordingly, we theanalysisofthesolutionforanyinstanceofthosevariables. have either two or three components of this form, as stated in Sec.4providesanexamplethatillustratesthispoint. thetheorem. To show the validity of the solution in terms of µ, observe that, for w2 (cid:54) d2m2, the parameterized line (α(µ),β(µ)) 3.6 Casen(cid:62)3andConicalRegularConfigurations moves within the space determined by the intersection of the In this subsection,we analyze the privacy-moneytrade-off for slabs 1 and 3. To obtain the range of validity of a solution largevaluesofn,startingfrom3.Tosystematizethisanalysis, such that 0 < δ1,δ3 < 1 and δ2 = 0, we need to find the however,weshallrestrictittoaparticularconfigurationofthe closestpointofintersection(totheorigin)witheithertheupper slabs layout, defined next. Then, Proposition 10 will show an hyperplane 1 or the upper hyperplane 3. Put differently, we interesting property of this configuration, which will allow us requirefindingtheminimumµsuchthateitherzTγ =f(cid:48)(1)or toderiveanexplicitclosed-formexpressionofboththesolution 1 1 zTγ = f(cid:48)(1). By plugging the values of α and β given in (11) andtrade-offforanarbitrarilylargenumberofcategories. 3 3 into these two equalities, it is straightforward to derive the Definition 9. For a given q, p, w and n (cid:62) 3, let C be the moneythresholdµ1.Weproceedsimilarlytoshowtheinterval collection of slabs on the plane α-β that determines the ofvalidity[0,µ2]inthecasewhenw2 >d2m2,bearinginmind thatnowαandβ aregivenby(12). 1.The difference with respect to these quantities is that our measure To conclude the proof, it remains only to write R(µ) in of dispersion inverses the ratio variance to mean, and also reflects the terms of the optimal apparent distribution, that is, R(µ) = deviationwiththeparticularvalueattainedbyagivencomponent. (cid:80)ni=1(ti−pi)2 = (cid:80)ni=1d2i δi2, and from this, it is routine to di2l.eaBdeatroiqnumadinradtitchaallty,wsmheanllvuasilnugesfoSEfDprtiovaacsysersisskp.rivacy,smallvaluesof obtaintheexpressiongivenattheendofthestatement. (cid:4) 3.Thatis,forallvaluesofµ. 10 Thepreviouspropositionshowsaremarkablefeatureofthe 4 conical regular configuration: at a practical level, the fact that allintersectionsontheplaneα-β (exceptγ1,1)lieonlowerhy- 1,n °1,1 perplanes suggests utilizing these hyperplanes, parameterized 1,4 in polar coordinates with respect to the origin O, to efficiently delimit the solutions space. In other words, in our endeavor to systematize the study of the solution and trade-off, it may r suffice to use a reduced number of cases, bounded by angles 3 andsegmentsofhyperplanes. ' { ' On the other hand and from a geometric standpoint, any O 4 3 consecutive pair of lower hyperplanes defines a cone without 3 intersectionsinitsinterior;hencethenameoftheconfiguration. Finally,astheslabsaresortedinincreasingorderoftheirslopes, r we can go counter-clockwise from slab 1 to n, and start again 2 atthelinethroughOandγ1,1,whichservesasareferenceaxis. 1,n 1 2 Before we continue examining this concrete configuration, Fig. 5: A conical regular configuration for n = 4 on the α-β plane. In we shall introduce some notation. Let ϕ and r be the polar trhesispeficgtuivreel,ywbeysthhoewanthgeulsaergcmoeonrdtsinoaftehsypϕe2rp(cid:54)laϕne(cid:54)srϕ2(3ϕa)nadndϕ3r3(cid:54)(ϕϕ),(cid:54)givϕe4n. coordinatesofγ.Definetheanglethresholdsϕk as Theconedefinedbyr (cid:62) 0andϕ3 (cid:54) ϕ (cid:54) ϕ4 isintersectedbytheupper  hamypoecnrogprltrahenesempsso1en,lvd2eisannogdnst3ho.elHuinotitwoerenivoertroo,fn(t5ehi)ethscetorantoeefidnthqienuseeLshteiyompne.mrpala4ne.sWinittehroseuctt ϕk = aaϕrrkcc−ttnaa−nn1−wd+1ndfkfπn(cid:48)1/(cid:48)((w11))k−−dwn1ff1n(cid:48)(cid:48)((11)),,, kkk ===nn1,++..12.,,n...,2n+1 , lbo(cid:48)isassofgenerality,assume m11 >···> m1n.DefineAi,biand andthesegmentsofupperhyperplanesrj as TexPoint fonts used in EMF.  zT   f(cid:48)(0)   f(cid:48)(1)  Read the TexPoint manual before you delete ithis box.: AAi A A i f(cid:48)(1) Ai =ziT−1,bi =fi(cid:48)−1(1)Aand b(cid:48)i =fi(cid:48)−1(1). rj(ϕ)= (cid:20)j (cid:21) z1T f1(cid:48)(1) f1(cid:48)(0) zT cosϕ j sinϕ Then,C iscalledaconicalregularconfigurationifeachofthe systemofequationsAiγ =biandAiγ =b(cid:48)ifori=3,...,n for j = 1,...,n. Note that ϕn+1 is the angular coordinate of hasauniquesolution. γ1,1. Occasionally, we shall omit the dependence of these line 1,n segmentsontheangularcoordinateϕ.Figure5illustratesthese Proposition 10. Suppose that there exists a conical regular configurationC forsomeq, p, w andn.Denotebyγa,b the coordinatesandsegmentsonaconicalregularconfigurationfor i,j n=4. uniquesolutionto Our next result, Lemma 11, provides a parametric solution (cid:40) ziTγ =fi(cid:48)(a) in the special case when the slabs layout exhibits such con- zTγ =f(cid:48)(b) figuration. The solution is determined by the aforementioned j j thresholds and line segments, and is valid for any privacy for i,j = 1,...,n with i (cid:54)= j, and a,b ∈ {0,1}. Assume function satisfying the properties stated in Lemma 4. As we f(cid:48)(0)(cid:54)=f(cid:48)(1)foralli.Then,exceptforγ1,1,C satisfies i i 1,n shall show next, this result will be instrumental in proving zTγa,b =f(cid:48)(0) (13) Theorem12. k i,j k forsomek =1,...,nandalli(cid:54)=j. Lemma11(ConicalRegularConfigurations).Underthecondi- tionsofLemma4,assumethatthereexistsaconicalregular Proof: The existence and uniqueness of γa,b is guaranteed i,j configuration.Considerthefollowingcases: by the fact that 1 > ··· > 1 . The property stated in the propositionfollowms1fromthefamctnthatthesystemsofequations (a) ϕk < ϕ (cid:54) ϕk+1 for k = 1 and, either r < rj for j = 1 or Aiγ =biandAiγ =b(cid:48)ifori=3,...,nhaveauniquesolution. rj−1 (cid:54) r for j = 2; and ϕk < ϕ (cid:54) ϕk+1 for k = 2 and, ThesystemsofequationsoftheformAiγ = bi ensurethat either r < rj for j = 1, or rj−1 (cid:54) r < rj for j = 2, or γ1,1 =γ0,1 fori=2,...,n−1.Obviously,anyγa,bsuchthat r (cid:62)rj−1 forj =3. i,1 i+1,1 i,j a = 0 or b = 0 with i (cid:54)= j satisfies (13) for k = i or k = j. (b) ϕk <ϕ(cid:54)ϕk+1 forsomek =3,...,nand,eitherr <rj+1 Accordingly,wejustneedtoprovethecasea=b=1. for j = 1, or rj (cid:54) r < rj+1 for some j = 2,...,k−2, or Suppose i > j. Note that Aiγ = b(cid:48)i implies, on the one rj (cid:54)r <rj+2(modk)forj =k−1,orrj+1(modk) (cid:54)r <rj hand,that forj =k,orr (cid:62)rj−1 forj =k+1. γ1,1 =γ1,0 =γ1,0 =···=γ1,0, (c) ϕk < ϕ < ϕk+1 for k = n+1 and, either r < rj+1 for i,i−1 i−1,1 i−2,1 j,1 j = 1, or rj (cid:54) r < rj+1 for some j = 2,...,n−1, or andontheotherhand,thatγ1,1 =γ1,0.Thus,γ1,1 =γ1,1 , rj (cid:54)r <r1 forj =n,orr (cid:62)rj−n forj =n+1. j,j−1 j,1 i,i−1 j,j−1 from which it follows that γ1,1 = γ1,0. The exception, i.e., (d) ϕk (cid:54) ϕ < ϕk+1 for some k = n+2,...,2n and, either zTγ1,1 (cid:54)=f(cid:48)(0)forallk =1,..i.,j,n,isjujs,t1ifiedbytheconditions r < rn−j+1 for j = 1, or rn−j+2 (cid:54) r < rn−j+1 for some fk(cid:48)(01),n(cid:54)= f(cid:48)k(1) for all i, which guarantee that all slabs have j =2,...,2n−k+1,orr (cid:62)rn−j+2 forj =2(n+1)−k. i i nonemptyinteriors,andthestrictordering 1 >···> 1 . (cid:4) Letδ∗ bethesolutiontotheproblem(5).Accordingly, m1 mn

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.