Hypothesis Testing under Maximal Leakage Privacy Constraints Jiachun Liao, Lalitha Sankar Flavio P. Calmon Vincent Y. F. Tan School of Electrical, Computer IBM T.J. Watson Research Center Department of ECE, and Energy Engineering, in Yorktown Heights, New York Department of Mathematics, Arizona State University Email: [email protected] National University of Singapore Email: {jiachun.liao,lalithasankar}@asu.edu Email: [email protected] Abstract—Theproblemofpublishingprivacy-guaranteeddata binary sources for any leakage level; for arbitrary source 7 1 forhypothesistestingisstudiedusingthemaximalleakage(ML) alphabets we obtain the PUT in the high and low privacy as a metric for privacy and the type-II error exponent as the 0 regimes. utility metric. The optimal mechanism (random mapping) that 2 Our results show that the maximal leakage mechanism in maximizesutilityforaboundedleakageguaranteeisdetermined n for the entire leakage range for binary datasets. For non-binary trying to simultaneously ensure utility and restrict leakage a datasets, approximations in the high privacy and high utility yields output symbols that reveal either complete or partial J regimes are developed. The results show that, for any desired knowledge about a subset of the source symbols. Specifically 4 leakage level, maximizing utility forces the ML privacy mecha- for the binary case, the mechanism is such that the published 2 nismtorevealpartialtocompleteknowledgeaboutasubsetofthe data reveals the original data for one of the source letters source alphabet. The results developed on maximizing a convex ] function over a polytope may also of an independent interest. without any uncertainty! This behavior results from a combi- T nationofthemaximalleakagerequirement(feasiblepolytope) I I. INTRODUCTION and the convexity of the relative entropy as a function of the . s privacy mechanism (randomized mapping). c There is tremendous value to publishing datasets for a vari- Notation: We use bold capital letters to represent matrices, [ ety of statistical inference applications; however, it is crucial e.g.Xisamatrixwhoseith columnisX and(i,j)th entryis 1 to ensure that the published dataset while providing utility i X . We use bold lower case letters to represent row vectors, v does not reveal other information. Specifically, the published ij e.g. x is a vector with the ith entry x . We denote sets by 9 dataset should allow the intended inference to be made while i 9 capitalcalligraphicletters,e.g.,X.Foravectorxwithentries limiting other inferences. This requires using a randomizing 0 x ,[x]isadiagonalmatrixwhosetheith diagonalentryisx . mechanism (i.e., a noisy channel) that guarantees a certain i i 7 We use (cid:107)·(cid:107) and log(·) to denote the (cid:96) norm and logarithm 0 measure of privacy; however, any such privacy mechanism 2 with base 2, respectively; D denotes the relative entropy. . will, in turn, reduce the fidelity of the intended inference 1 The phase “column permutation” implies the application of leading to a tradeoff between utility of the published data and 0 a permutation operation on the columns of a matrix. 7 the privacy of the respondents in the dataset. 1 Recently, in [1], Issa et al. (see also [2]) propose the II. PROBLEMFORMULATION : metric of maximal leakage (ML) as a measure of the gain, v Binary hypothesis testing is a statistical inference problem i in bits, in guessing any function of the original data from concerning the decision between two distinct probability dis- X the published data and show that it is effectively the Sibson tributions of the observed data. Let Xn = (X ,X ,...,X ) ar mutual information of order ∞ of the randomized mapping denote a sequence of n random variables,1who2se entrines (the privacy mechanism) from the alphabet of the original X ∈ X,i ∈ {1,2,...,n}, are independent and identically i to that of the published dataset. Inspired by the operational distributed (i.i.d.) according to a distribution p that is hypoth- significance of the ML metric, for the statistical application esized to be either H : p = p or H : p = p . Let 1 1 2 2 of binary hypothesis testing, we determine the privacy-utility β(n) and β(n) be the probabilities of error, such that type- 1 2 tradeoff (PUT) using ML as the privacy metric and the type- I error β(n) (resp. type-II error β(n)) is the probability of II error exponent as the utility metric. We consider the local 1 2 choosingH (resp.H )whenoriginaldataxn isgeneratedby 2 1 privacy model in which the same (memoryless) mechanism p (resp. p ). From the Chernoff-Stein lemma [3, Chap. 11], 1 2 is applied independently to each entry of the dataset. This under a constraint that β(n) ∈(0,1), the maximal asymptotic capturesalargeclassofapplicationsinwhichtherespondents 1 error exponent of β(n) is D(p (cid:107)p ). The relative entropy of a dataset can apply a privacy mechanism before sharing 2 1 2 D(p (cid:107)p ) is a measure of the accuracy of hypothesis testing. data. We present closed-form expressions for the PUT for 1 2 For i.i.d. datasets considered here, we restrict our analysis to memoryless privacy mechanisms. A memoryless privacy Thiswork issupportedinpart bytheNational Science Foundationunder grantsCCF-1350914andCIF-1422358. mechanism independently maps entry Xi ∼ p of Xn to an output Xˆ ∈ Xˆ to obtain a released dataset Xˆn. Formally, a measures, respectively, is given by the following non-convex i privacymechanismWisanM×M row-stochasticconditional optimization probability matrix with entries W = Pr{Xˆ = j|X = i}, ij i,j ∈ {1,...,M}. As a result of applying a privacy mech- max D(p1W(cid:107)p2W) W anism, the hypothesis test is now performed on the i.i.d. M sequence Xˆn distributed as either p1W or p2W. It is easy s.t. (cid:88)maxW ≤2l (3a) ij to see that the resulting type-II error exponent of the test is i j=1 D(p W(cid:107)p W).Wechoosethisfunction(ofW)astheutility 1 2 M of the privacy-guaranteed hypothesis test. (cid:88) W =1 for all i (3b) ForrandomvariablesX andXˆ aswellastheprivacymech- ij j=1 anismWdefinedabove,thefollowingpropositionsummarizes W ≥0 for all i,j (3c) ij the definition and simplification of the maximal leakage in [1, Def. 1, Thm. 1, Cor. 1]. where l∈[0,logM]. Proposition 1. Given a joint distribution [p]W of (X,Xˆ), By adding M slack variables (cid:15)j, j ∈ {1,...,M}, the the maximal leakage from X to Xˆ is defined as privacy constraint in (3a) can be rewritten as Pr{U =Uˆ} W ≤(cid:15) for all i,j ∈{1,...,M} (4) L(X →Xˆ)= sup log . (1) ij j U−X−Xˆ−Uˆ maxu∈UPr{U =u} (cid:88)M (cid:15) ≤2l. (5) j The expression in (1) is equivalent to j=1 M L(X →Xˆ)=log(cid:16)(cid:88)max{W }(cid:17)=I (X;Xˆ). (2) From (4) and (5) in conjunction with (3b) and (3c), we note i ij ∞ that the feasible region of (3) is a M2 + M dimensional j=1 polytope resulting from 2M2 + M + 1 linear constraints. Remark 1. Inthesequel,weassumethatp andp havethe The optimization problem is non-convex since it involves 1 2 same (and full) support. Under this assumption, we note that maximizingaconvexfunction.Furthermore,sincethefeasible the expression in (2) is independent of the source distribution. regionisapolytope,theoptimalsolutionsareontheboundary. In particular, the single constraint on the M2 entries of W Specifically, at least one corner point of the polytope is an in (2) suggests that multiple mechanisms can achieve the optimal solution of (3). However, enumerating the vertices sameleakage.However,thesemechanismswillgenerallyhave of the polytope is infeasible. As a first step to obtaining different implications with respect to privacy protection. For closed-form solutions, in the following theorem, we highlight example, for M = 4, the following two matrices, W and properties of the optimal solutions. 1 W , have the same maximal leakage of 1 bit. 2 Theorem 1. For an optimal solution W∗ of (3), all column 1 0 0 0 0.5 0.5 0 0 permutations of W∗ are also optimal solutions. If W∗ has 0 0.3 0.3 0.4 0.5 0.5 0 0 at least one zero column, an infinite number of solutions are W1 =0 0.3 0.3 0.4 W2 = 0 0 0.5 0.5 optimal. 0 0.3 0.3 0.4 0 0 0.5 0.5 AdetailedproofisinAppendixB,andwebrieflyhighlight However, for W , given an output symbol, there is some the intuition here. Optimality of all permutations of W∗ 2 uncertainty regarding the input symbol. In contrast, for W , follows from the fact that the column permutation of W∗ 1 oneoftheoutputsymbolcompletelyrevealsthecorresponding preserves both the objective and the constraints. Furthermore, input symbol. when W∗ has at least one all-zero column, convex combina- tionsofW∗ andanycolumnpermutationofW∗ thatinvolves Lemma 1. The function I (X;Xˆ)(cid:44)I (p,W) satisfies the ∞ ∞ permuting one of the all-zero columns are also optimal. For following properties: example, let 1. 0≤I (p,W)≤logM; ∞ 2. I (p,W)=0 ⇔ (cid:80) max {W }=1 ⇔ W is rank-1; 0 1−a a 1−a 0 a ∞ j i ij 3. I∞(p,W) = logM ⇔ W is a permutation of identity W1 =0 1−b b and W2 =1−b 0 b. matrix I. 0 1−c c 1−c 0 c The proof of Lemma 1 is in Appendix A. Basically these For the maximal leakage l=1+max{a,b,c}−min{a,b,c}, properties follow from (2). where a,b,c ∈ (0,1), if W is an optimal solution of (3), 1 then so is W , and so are all convex combinations of W 2 1 A. Privacy-Utility Trade-off and W , since they preserve the objective. 2 The PUT for the binary hypothesis testing problem with Inthefollowingsection,weobtainaclosed-formexpression maximal leakage and relative entropy as privacy and utility for the PUT in (3) for binary sources. (cid:13)1 and (cid:13)4 result from column permutations of W, and thus, have the same utility. Similarly, vertices (cid:13)2 and (cid:13)3 have the same utility. On the other hand, vertices (cid:13)5 and (cid:13)6 achieve zero utility. Therefore, it suffices to compare the utilities at vertices (cid:13)1 and (cid:13)2 given by f in (6) and f in (7), 1 2 respectively. From (9), we note that for all l and both cases (f ≥f or 1 2 f ≤ f ), there is no uncertainty of the input symbol given 1 2 oneoftheoutputsymbols.Thisisadirectconsequenceofthe convexity of the relative entropy and the linearity of the ML constraint. These observations, coupled with fact that 2l ≥ 1 for the binary case, forces one of the ρ to be 1 (or 0). i IV. TRADE-OFFFORARBITRARYALPHABETS Inthissection,weconsidernon-binarysources,i.e.,M >2. Fig. 1: Feasible region of for binary sources: ρ and ρ are 1 2 Because it is challenging to find closed-form solutions, we the off-diagonal entries of privacy mechanism W focus on two extremal regimes: namely, the high privacy (l≈ III. TRADE-OFFFORBINARYSOURCES 0)lowutilityandlowprivacy(l≈logM)highutilityregimes. Without loss of generality, for binary sources, the probabil- In each regime, we exploit the continuous differentiability of ity distributions p and p can be represented by Bernoulli D in W to approximate the relative entropy objective about 1 2 parametersp andp .Furthermore,notethatl∈[0,1].Define the extremal points (presented in Lemma 1). This allows us 1 2 to simplify the optimization problem and subsequently obtain f (p ,p ,l) 1 1 2 closed-form solutions. (1−p )(cid:0)(2−2l)+p (2l−1)(cid:1) (cid:44)(p −1)(2l−1)log 2 1 A. Euclidean Approximation in High Privacy regime 1 (1−p )(cid:0)(2−2l)+p (2l−1)(cid:1) 1 2 From Lemma 1, recall that for the perfect privacy case, +log(2−2l)+p1(2l−1) (6) i.e., l = 0, the optimal mechanism W0 is a rank-1 row (2−2l)+p (2l−1) stochastic matrix. In particular, all rows of W are the same 2 0 f2(p1,p2,l) vectorw0.Thusthetwooutputdistributionsarethesame,i.e., p (cid:0)1+p (1−2l)(cid:1) 1+p (1−2l) pkW0 =w0 for all k =1,2. In the high privacy regime, we (cid:44)p1(2l−1)logp1(cid:0)1+p2(1−2l)(cid:1) +log1+p1(1−2l). introduce an Euclidean information theoretic (EIT) approxi- 2 1 2 mation (see also [4], [5]) of D(p W(cid:107)p W) by restricting its 1 2 (7) Taylor series about W =W to the second (quadratic) term 0 Theorem 2. For binary hypotheses H1 : p = p1 and H2 : 21(cid:13)(cid:13)(p1 − p2)W[(w0)−21](cid:13)(cid:13)2 (see also [6]). The mechanism p=p ,anda chosenmaximalleakage l∈[0,1],themaximal W isassumedtobeinasmallneighborhoodaboutW ,such 2 0 utility (error exponent) is given by that for some small δ ∈ [0,1], |(pW) −w | ≤ δw for j 0j 0j (cid:8) (cid:9) all j. This, in turn, implies that l ≤ log(1+δ). Therefore, max f (p ,p ,l),f (p ,p ,l) (8) 1 1 2 2 1 2 for l ∈ [0,1], the EIT approximation of the utility function and is achieved by D(p W(cid:107)p W) results in the following optimization 1 2 (cid:34)i2f f−11(2pl1,p22l,0−l)1≥(cid:35)fo2r(p(cid:34)1,2pl20−,l1) 2−12l(cid:35), mWsa.tx. 21(cid:88)M(cid:13)(cid:13)(pm1a−xWp2ij)W≤[2(lw0)−21](cid:13)(cid:13)2 W∗ = (9) j=1 i (11) (cid:34)i2flf0−(p1 ,p2−,1l)2l≤(cid:35)for(p(cid:34),2p−1,2l)l 2l0−1(cid:35). (cid:88)jM=1Wij =1 for all i 1 1 2 2 1 2 W ≥0 for all i,j. ij Due to space restrictions, we briefly outline the proof (see Since the feasible regions in (11) and (3) are the same, the Appendix C for details). Let ρ1 and ρ2 be the off-diagonal optimal solutions of the EIT approximation in (11) are also entries of the privacy mechanism W. The constraint in (3a) feasible for the original PUT in (3), thus the utility of the simplifies to optimalsolutionsof(11)isalowerboundoftheoptimalutility of (3), and the EIT approximation is tight for l very close to 2−2l ≤ρ +ρ ≤2l (10) 1 2 0, i.e., δ ≈0. and along with these in (3b) and (3c) yields the shaded Observe that in (11), the objective depends on p − p 1 2 region in the (ρ ,ρ ) space as shown in Fig. 1. The vertices and being a convex function, is maximized on the boundary 1 2 of the feasible region. Let I (cid:44) {i : p − p > 0} and matrix ΨWT and Ψ is the partial derivative matrix of + 1i 2i I (cid:44){i:p −p ≤0}.Thefollowingtheoremgiveoptimal D(p W(cid:107)p W) calculated at the identity matrix, and has − 1i 2i 1 2 solutions of the EIT approximation in (11) for l≤1. entries equal to Theorem 3. For non-binary sources in the high privacy Ψ =p (cid:16)logp1j +loge(cid:17)−p (cid:16)p1j loge(cid:17). (13) regime, i.e., l∈[0,1], the maximal utility of the EIT approxi- ij 1i p 2i p 2j 2j mation (11) is The resulting PUT in the high utility regime is (2l−1) 2 (cid:107)p1−p2(cid:107)21. (12) max Tr(ΨWT) W The optimal mechanism W∗ has two unique columns: one of M (cid:88) thetwocolumnshasnonzeroentriesgivenby2l−1onlyinthe s.t. maxWij ≤2l i positions indexed by I+ while the other column has the same j=1 (14) nonzero entries only in the positions indexed by I . Each of M − (cid:88) the remaining M −2 columns of W∗ has the same entry in Wij =1 for all i each row and the sum of these M −2 entries is 2−2l. j=1 W ≥0 for all i,j ij TheproofofTheorem3isinAppendixD.Theproofhinges on the following two simplifications: where l ≥ log(M −1). The optimization in (14) is a linear (i) Sincep −p containsbothpositiveandnegativeentries, program and can be efficiently solved. The following lemma 1 2 maximizing (cid:107)(p −p )W(cid:107) requires that every column presents a property of the optimal solutions of (14). 1 2 of the optimal solution W∗ has either the maximal and Lemma 2. For the high utility regime, i.e., l ≥ log(M −1), minimal values of the column in the positions indexed the optimal solution W∗ of (14) has no less than M(M−2) by I and I , respectively, or vice versa. This allows + − zero entries such that all diagonal entries are positive and finding the structure of W∗. every input symbol is mapped to at most two output symbols. (ii) WefurtherexploittheEITapproximationthatallrowsof the optimal mechanism W∗ are in a small neighborhood The proof of Lemma 2 is in Appendix E. Basically, the centered at w to find the optimal w . This relies on (i) proof follows from the fact that the maximal entry in each 0 0 above in exploiting the structure of W∗. row of Ψ in (13) is on the diagonal and that the objective in Thus, Theorem 3 shows that to preserve the utility of binary (14) is the sum of convex combinations of every row of Ψ. hypothesis testing while matching the maximal leakage, the Thus, in the high utility regime, at least one revealed symbol privacy mechanism first splits all input symbols into two reduces uncertainty about the input to at most one bit! subsets S and S = X \S . Specifically, every symbol in 1 2 1 S has a higher probability under H than under H , i.e., the V. CONCLUDINGREMARKS 1 1 2 indices of symbols in S (resp. S ) are in J (resp. J ). The 1 2 + − We have developed the PUTs for the hypothesis testing privacymechanismthenmapsallsymbolsofS withthesame 1 problemusingMLastheprivacymetricandtheerrorexponent probability to a single output. Similarly, all symbols in S are 2 as the utility metric. Our results for both binary and M-ary mapped with the same probability to a single output that is data suggest that the mechanism guaranteeing a bounded ML, distinct from that of S . Therefore, observing one of these 1 i.e.,limitingguessesaboutarbitraryfunctionsofXn,isableto two output symbols, we know the corresponding input subset maximize the utility only by partially or fully revealing a few even if we cannot identify the exact input symbols within the input symbols (even in the high privacy regime). This raises subset. concern about the appropriateness of ML for this problem. In Remark 2. From Theorem 3, we have that if either I or I contrast, in the high privacy regime, the mutual information + − has only one element, the privacy mechanism W∗ of the EIT privacy metric [6] yields output distributions with little or no approximation will reveal one input symbol as is. semblance to the original distributions for all input symbols. B. Linear Approximation in High Utility Regime APPENDIX From Lemma (1), recall that for the perfect utility, i.e., A. Proof of Lemma 1 l = logM, all column permutations of the identity matrix areoptimal.Withoutlossofgenerality,wechoosetheidentity Proof. Fromtherowstochasticityconditionoftheconditional matrix, i.e., W = I, as the perfect utility achieving mech- probability matrix W, we have anism. In the high utility regime, i.e., l ≥ log(M − 1), 0≤W ≤1 (15) by expanding the Taylor series around the identity ma- ij trix, we can approximate the objective to the first order M (cid:88) as D(p1W(cid:107)p2W) = Tr(ΨWT) + o((cid:107)(p1 − p2)W(cid:107)2), Wij =1 for all j, (16) where the expression Tr(ΨWT) represents the trace of the j=1 directly from which, the upper and lower bounds of the sum C. Proof of Theorem 2 of the maximal value of every column of W are Proof. For binary sources, privacy mechanism W is a 2×2 matrix, which can be expressed as M (cid:88) 1≤ max{W }≤M ·1=M, (17) (cid:20) (cid:21) ij 1−ρ ρ i W= 1 1 (21) j=1 ρ 1−ρ 2 2 and then, the function I∞(p,W) is bounded by where 1≥ρ1,ρ2 ≥0. Then, 0≤I∞(p,W)≤logM, (18) (cid:88)2 maxW =(cid:40)2−ρ1−ρ2, ρ1+ρ2 <1 (22) ij where the left equality holds if and only if every column j=1 i ρ1+ρ2, ρ1+ρ2 >1 of W has the same value, i.e., for each j ∈ {1,...,M}, Therefore, the privacy constraint in (3a) simplifies as max {W }=W for all i, thus, the W is a rank-1 matrix; i ij ij and the right equality holds if and only if the maximal value 2−2l ≤ρ1+ρ2 ≤2l, (23) of every column is 1, i.e., max {W } = 1 for all j, and i ij and along with the non-negativity of entries of W, requiring therefore, W is a permutation of identity matrix I. ρ ,ρ ≥ 0, yield the feasible region shown as the shaded 1 2 region in Fig. 1. B. Proof of Theorem 1 The original PUT (3) maximizes the convex function Proof. From the constraints in (3), any column permutation D(p W(cid:107)p W) over the shaded polytope as shown in Fig. 1 2 of a feasible W is also feasible since the unchanged leakage 1, thus at least one of the corner points of the polytope and row stochasticity conditions are satisfied. Let W∗ be an is optimal, i.e., an optimal solution is one of the vertexes optimal solution of (3) achieving (cid:13)1-(cid:13)6 in Fig. 1. The utilities at vertexes (cid:13)5 and (cid:13)6 are 0. (cid:0) (cid:1) (cid:0) (cid:1) Thevertexes(cid:13)1 resp. (cid:13)2 and(cid:13)4 resp. (cid:13)3 arefromcolumn D(p W∗(cid:107)p W∗)=(cid:88)M p W∗logp1Wj∗ (19) permutations, and thus, referring to Theorem 1, (cid:13)1 (cid:0)resp. (cid:13)2(cid:1) 1 2 j=1 1 j p2Wj∗ and (cid:13)4 (cid:0)resp. (cid:13)3(cid:1) have the same utility. Thus, to get the optimalvalue,itsufficestocomparetheutilitiesof(cid:13)1 and (cid:13)2. where W∗ is the jth column of W∗. Since addition is j 1. The privacy mechanism W at (cid:13)1 is commutative, thus any column permutation of W∗ gives the same objective value, i.e., any column permutation of W∗ is (cid:20)2−2l 2l−1(cid:21) , (24) also an optimal solution of (3). 1 0 IfW∗ hasonezerocolumn,wecangenerateadifferentop- and then, the utility of (cid:13)1 is given by (6). timalsolutionW¯ ∗ bypermutingthezerocolumnwithanother 2. The privacy mechanism W at (cid:13)2 is non-zerocolumn.Withoutlossofgenerality,assumeallentries (cid:20) (cid:21) of the first column of W∗ are zero and W¯ ∗ is generated by 0 1 , (25) permutingthefirsttwocolumnsofW∗.LetW∗ andW¯ ∗,for 2l−1 2−2l j j all j ∈ {1,...,M}, denote the jth columns of W∗ and W¯ ∗, and then, the utility of (cid:13)2 is presented in (7). respectively. Thus, W∗ = W¯ ∗ = 0 and W∗ = W¯ ∗. The Therefore,forM =2theoptimalutilityof(3)isthemaximal 1 2 2 1 feasible region of (3) is convex, thus all convex combinations value of (6) and (7). of W∗ and W¯ ∗, indicated as λW∗+(1−λ)W¯ ∗, λ∈[0,1], are also feasible. The objective value of λW∗+(1−λ)W¯ ∗ D. Proof for Theorem 3 can be written as Proof. The objective in (11) can be bounded as =D(cid:88)M(pp1(λ(cid:0)WλW∗∗++(1(1−−λ)λW)¯W¯∗)∗(cid:107)(cid:1)plo2(gλpW1(cid:0)∗λ+W(j∗1+−(λ1)W−¯λ∗))W)¯ j∗(cid:1) (cid:13)(cid:13)(p1−p2)W[(w0)−21](cid:13)(cid:13)2 =(cid:88)jM=1 (cid:0)(p1−wp02j)Wj(cid:1)2 1 j j p (cid:0)λW∗+(1−λ)W¯ ∗(cid:1) M (cid:0)(cid:80) (p −p )W +(cid:80) (p −p )W (cid:1)2 j=1 2 j j =(cid:88) i∈I+ 1i 2i ij i∈I− 1i 2i ij =(1−λ)p W¯ ∗log(1−λ)p1W¯ 1∗ +λp W∗logλp1W2∗ j=1 w0j 1 1 (1−λ)p W¯ ∗ 1 2 λp W∗ (26a) 2 1 2 2 +(cid:88)M p1Wj∗logpp1WWj∗∗ (20) ≤(cid:88)M (cid:18)maxiWij(cid:80)√iw∈I+(p1i−p2i) (26b) j=3 2 j j=1 0j =(cid:88)M p W∗logp1Wj∗ =D(p W∗(cid:107)p W∗). + miniWij(cid:80)√i∈I−(p1i−p2i)(cid:19)2 j=2 1 j p2Wj∗ 1 2 w0j That is, all convex combinations of W∗ and W¯ ∗ are optimal. =1(cid:107)p −p (cid:107)2(cid:88)M (maxiWij −miniWij)2 (26c) 4 1 2 1 w Thus, an infinite number of solutions are optimal. j=1 0j where the inequality (26c) directly results from can verify that this simplification leads to two unique rows (cid:12) (cid:12) (cid:12) (cid:12) in W∗ given by [W +2l −1,W ,(cid:15) ,(cid:15) ,...,(cid:15) ] (cid:12) (cid:88) (cid:12) (cid:12) (cid:88) (cid:12) 1 1,min 2,min 1 2 M−2 (cid:12)(cid:12) (p1i−p2i)(cid:12)(cid:12)=(cid:12)(cid:12) (p1i−p2i)(cid:12)(cid:12)= 2(cid:107)p1−p2(cid:107)1. and [W1,min,W2,min +2l −1,(cid:15)1,(cid:15)2,...,(cid:15)M−2]. All column i∈I− i∈I+ permutations of W∗, i.e., permuting the two rows above Let W and W be the maximal and minimal values simultaneously, are permitted. j,max j,min of the jth column of W, respectively. The sufficient and From(26c),wededucethatw0j isrelevantonlyforj =1,2 necessary condition for the equality of (26b) is that the jth and we now use the EIT approximation to determine these column of W, for all j ∈{1,...,M}, has entries as either values. Since the EIT approximation requires that all rows of the W∗ are in a ball of radius δ about w , we can find the (cid:40) 0 W = Wj,max for i∈I+ (27) corresponding w0 as the average of the two unique rows of ij W for i∈I W∗ tosatisfytheneighborhoodcondition.Duetothefactthat j,min − entriesofthefirsttwocolumnsareeitherW∗ orW∗ + or k,min k,min (cid:40) 2l−1,k =1,2,thefirsttwoentriesofthew are 2W1∗,min+2l−1 W for i∈I 0 2 W = j,min + (28) 2W∗ +2l−1 ij and 2,min .Therefore,thecorrespondingoptimalvalue Wj,max for i∈I− 2 of (11) is WenowshowthattheoptimalsolutionW∗of(11)hasatmost two columns for which Wj∗,max−Wj∗,min >0. For l>0, i.e., (2l−1)2(cid:107)p1−p2(cid:107)21 (cid:88)2 1 , 2l > 1, since W∗ achieves the maximal leakage of l bits 4 2W∗ +2l−1 k=1 k,min and is row stochastic, the maximal values of the columns of which in turn is maximized by W∗ = 0, k = 1,2. Thus, W∗ are not all in the same row. Thus, W∗ has at least two k,min the optimal privacy mechanism is columns, one in the form of (27) and the other in the form of (28). The remaining columns of W∗ are either in the form 2l−1 0 (cid:15) (cid:15) ... (cid:15) oanfd(2J7) o=r(cid:8)ojf (:2W8).∗LisetinJ1th=e f(cid:8)ojrm:Wofi∗j(2is8)i(cid:9)n.tThehufso,rmweohfa(v2e7)} ... ... ...1 ...2 ... M...−2 2 ij W∗ =2l−1 0 (cid:15)1 (cid:15)2 ... (cid:15)M−2 (33) (cid:88)M 0 2l−1 (cid:15)1 (cid:15)2 ... (cid:15)M−2 Wj∗,max =2l (29a) ... ... ... ... ... ... j=1 (cid:88) (cid:88) 0 2l−1 (cid:15) (cid:15) ... (cid:15) W∗ + W∗ =1 (29b) 1 2 M−2 j,max j,min j∈J1 j∈J2 where the first m rows are the same and the remaining (cid:88) W∗ + (cid:88) W∗ =1. (29c) M − 2 rows are the same. Note that the non-negative (cid:15)i, j,min j,max i∈{1,...,M −2}, sum up to 2−2l. j∈J1 j∈J2 From (29a)-(29c), we get E. Proof of Lemma 2 M Proof. From(13),weknowthatdiagonalentriesofthepartial (cid:88) (W∗ −W∗ )=2l+1−2 (30) derivative matrix Ψ is j,max j,min j=1 Ψ =p logp1i for all i∈{1,...,M}. (34) Furthermore, subtracting (29b) or (29c) from (29a) and ob- ii 1i p 2i serving that W∗ −W∗ ≥0 for all j, we have j,max j,min In addition, for all i,j ∈{1,...,M}, we have (cid:88) (Wj∗,max−Wj∗,min)=2l−1 for k =1,2 (31) Ψii−Ψij j∈Jk =p logp1i −p (cid:16)logp1j +loge(cid:17)+p (cid:16)p1j loge(cid:17) ⇒ Wj∗,max−Wj∗,min ≤2l−1 j ∈Jk, k =1,2, (32) 1i p2i 1i p2j 2i p2j p p (cid:16) p p (cid:17) where the equality in (32) holds if and only if W∗ has at =−p log 2i 1j +p loge −1+ 2i 1j 1i p p 1i p p mosttwocolumns,indexedbyj∗ ∈J andj∗ ∈J suchthat 1i 2j 1i 2j 1 1 2 2 (cid:16)p p (cid:17) (cid:16)p p (cid:17) W∗ −W∗ >0, k =1,2, which also means that for ≥−p 2i 1j −1 loge+p 2i 1j −1 loge=0. thejrk∗e,mmaaxiningMjk∗,m−in2columnsofW∗,W∗ −W∗ =0, 1i p1ip2j 1i p1ip2j j,max j,min (35) j ∈ {1,...,M} \ {j∗,j∗}. Without loss of generality, let 1 2 j∗ = 1, j∗ = 2, I = {i ∈ {1,...,m},m ≤ M − 1} That is to say, for every row of the partial derivative matrix 1 2 + and I = {i ∈ {m + 1,...,M}}. Thus, the first m Ψ, the maximal value is on the diagonal. − entries of the first column of W∗ are W + 2l − 1 LetW∗ betheoptimalsolutionof(14)andpositiveintegers 1,min while the remaining entries are W . Similarly, for the n , i ∈ {1,...,M}, indicate the number of nonzero entries 1,min i second column, the first m entries are W while the of the ith row in W∗. Since all rows of W∗ sum up to 1, 2,min remaining entries are W +2l−1. Finally, the ith column the objective in (14) is the sum of convex combinations of 2,min has the same entry (cid:15) ≥ 0 for i ∈ {3,...,M}. One everyrowofΨ.Therefore,whilesatisfyingtheMLconstraint, i−2 maximizing Tr(ΨWT) requires that choosing the diagonal entriesofW∗aslargeaspossible,andtheconvexcombination oftheith rowofΨinvolvesthen maximalentriesoftherow. i In general, for all i ∈ {1,...,M}, n should be as small as i possible. Specially, for l = logM, the optimal value of (14) is D(p (cid:107)p ) such that all diagonal entries of W∗ should be 1 2 1 and n =1 for all i, i.e., W∗ is the identity matrix. i For l ≥ log(M − 1), we have (cid:80)M max W∗ = 2l ≥ j=1 i ij M − 1. Construct a matrix W such that every row of W has only two non-zero entries at positions indexed by the first and second maximal entries of the corresponding row of Ψ. Specifically, all diagonal entries of W are 2l (2l ≥ M−1) M M M and the another non-zero entry of each row in W is M−2l M (M−2l ≤ 1 ). The W is feasible and n can be at least as M M i smallas2foralli.Therefore,sincethenumberofzeroentries of W is (M−2)M, the number of zero entries of W∗ is no less than (M −2)M. REFERENCES [1] I. Issa, S. Kamath, and A. B. Wagner, “An operational measure of informationleakage,”in2016AnnualConferenceonInformationScience andSystems(CISS),2016. [2] S.A.Mario,K.Chatzikokolakis,C.Palamidessi,andG.Smith,“Measur- inginformationleakageusinggeneralizedgainfunctions,”in2012IEEE 25thComputerSecurityFoundationsSymposium,2012. [3] T.M.CoverandJ.A.Thomas,ElementsofInformationTheory,2nded. Wiley-Interscience,2006. [4] S.BoradeandL.Zheng,“Euclideaninformationtheory,”in2008IEEE InternationalZurichSeminaronCommunications,2008. [5] S. Huang, C. Suh, and L. Zheng, “Euclidean information theory of networks,”IEEETrans.onInform.Th.,vol.61,no.12,pp.6795–6814, 2015. [6] J.Liao,L.Sankar,V.Y.F.Tan,andF.duPinCalmon,“Hypothesistesting in the high privacy limit,” in Allerton Conference 2016, Monticello, IL, 2016.