Private Information Retrieval from MDS Coded Data with Colluding Servers: Settling a Conjecture by Freij-Hollanti et al. 7 1 Hua Sun and Syed A. Jafar 0 2 n a J 0 Abstract 3 A (K,N,T,Kc) instance of the MDS-TPIR problem is comprised of K messages and N ] distributed servers. Each message is separately encoded through a (Kc,N) MDS storage code. T A user wishes to retrieve one message, as efficiently as possible, while revealing no information I . about the desiredmessageindex to any colluding setofup to T servers. The fundamentallimit s c on the efficiency of retrieval, i.e., the capacity of MDS-TPIR is known only at the extremes [ where either T or Kc belongs to {1,N}. The focus of this work is a recent conjecture by Freij-Hollanti,Gnilke,HollantiandKarpukwhichoffersageneralcapacityexpressionforMDS- 2 v TPIR.WeprovethattheconjectureisfalsebypresentingasacounterexampleaPIRschemefor 7 the setting (K,N,T,Kc) = (2,4,2,2), which achieves the rate 3/5, exceeding the conjectured 0 capacity,4/7. Insightsfromthecounterexampleleadustocapacitycharacterizationsforvarious 8 instances of MDS-TPIR including all cases with (K,N,T,Kc)=(2,N,T,N −1), where N and 7 T can be arbitrary. 0 . 1 0 7 1 : v i X r a Hua Sun (email: [email protected]) and Syed A. Jafar (email: [email protected]) are with the Center of Pervasive CommunicationsandComputing(CPCC)intheDepartmentofElectricalEngineeringandComputerScience(EECS) at theUniversity of California Irvine. 1 1 Introduction Private Information Retrieval (PIR) is the problem of retrieving one out of K messages from N distributed servers (each stores all K messages) in such a way that any individual server learns no information about which message is being retrieved. The rate of a PIR scheme is the ratio of the number of bits of the desired message to the total number of bits downloaded from all servers. The supremum of achievable rates is the capacity of PIR. The capacity of PIR was shown in [1] to be −1 1 1 1 C = 1+ + +···+ (1) PIR N N2 NK−1 (cid:18) (cid:19) The capacity of several variants of PIR has also since been characterized in [1, 2, 3, 4, 5]. The focus of this work is on a recent conjecture by Freij-Hollanti, Gnilke, Hollanti and Karpuk (FGHK conjecture, in short) in [6] which offers a capacity expression for a generalized form of PIR, called MDS-TPIR. MDS-TPIR involves two additional parameters: K and T, which generalize c the storage and privacy constraints, respectively. Instead of replication, each message is encoded through a (K ,N) MDS storage code, so that the information stored at any K servers is exactly c c enough to recover all K messages. Privacy must be preserved not just from each individual server, but from any colluding set of up to T servers. MDS-TPIR is a generalization of PIR, because setting both T = 1 and K = 1 reduces MDS-TPIR to the original PIR problem for which the c capacity is already known (see (1)). The capacity of MDS-TPIR is known only at the degenerate extremes – when either T or K c takes thevalue1or N. Ifeither T orK isequal toN thenby analogy tothesingle server settingit c follows immediately that the user must download all messages, i.e., the capacity is 1/K. If K = 1 c or T = 1, thentheproblemspecializes toTPIR,andMDS-PIR,respectively. Thecapacity of TPIR (K =1) was shown in [2] to be c T T2 TK−1 −1 C = 1+ + +···+ (2) TPIR N N2 NK−1 (cid:18) (cid:19) The capacity of MDS-PIR (T = 1) was characterized by Banawan and Ulukus in [5], as K K2 KK−1 −1 C = 1+ c + c +···+ c (3) MDS-PIR N N2 NK−1 (cid:18) (cid:19) It is notable that K and T play similar roles in the two capacity expressions. c The capacity achieving scheme of Banawan and Ulukus [5] improved upon a scheme proposed earlier by Tajeddine and Rouayheb in [7]. Tajeddine and Rouayheb also proposed an achievable scheme for MDS-TPIR for the T = 2 setting. The scheme was generalized by Freij-Hollanti et al. [6] to the (K,N,T,K ) setting, T +K ≤ N, where it achieves the rate 1− T+Kc−1. Remarkably, c c N the rate achieved by this scheme does not depend on the number of messages, K. In support of the plausible asymptotic (K → ∞) optimality of their scheme, and based on the intuition from existing capacity expressions for PIR, MDS-PIR and TPIR, Freij-Hollanti et al. conjecture that if T +K ≤ N, then the capacity of MDS-TPIR is given by the following expression. c FGHK Conjecture [6]: T +K −1 (T +K −1)K−1 −1 conj c c C = 1+ +···+ (4) MDS-TPIR N NK−1 (cid:18) (cid:19) 2 The conjecture is appealing for its generality and elegance as it captures all four parameters, K,N,T,K in a compact form. T and K appear as interchangeable terms, and the capacity c c expression appears to be a natural extension of the capacity expressions for TPIR and MDS-PIR. Indeed, the conjectured capacity recovers the known capacity of TPIR if we set K = 1 and that of c MDS-PIR if we set T = 1. However, in all non-degenerate cases where T,K ∈/ {1,N}, the capacity c of MDS-TPIR, and therefore the validity of the conjecture is unknown. In fact, in all these cases the problem is open on both sides, i.e., the conjectured capacity expression is neither known to be achievable, nor known to be an outer bound. The lack of any non-trivial outer bounds for MDS- TPIR is also recently highlighted in [8]. This intriguing combination of plausibility, uncertainty and generality of the FGHK conjecture motivates our work. Our contribution is summarized next. Summary of Contribution As the main outcome of this work, we disprove the FGHK conjecture. For our counterexample, we consider the setting (K,N,T,K ) = (2,4,2,2) where the data is stored using the (2,4) MDS code c (x,y) → (x,y,x+y,x+2y). The conjectured capacity for this setting is 4/7. We show that the rate 3/5 > 4/7 is achievable, thus disprovingthe conjecture. As a converse argument, we show that no (scalar or vector) linear PIR scheme can achieve a rate higher than 3/5 for this MDS storage code subject to T = 2 privacy. The insights from the counterexample lead us to characterize the exact capacity of various instances of MDS-TPIR. This includes all cases with (K,N,T,K ) = (2,N,T,N−1), where N and c T can be arbitrary. The capacity for these cases turns out to be N2−N C = (5) 2N2−3N +T Note that this is the information theoretic capacity, i.e., for K = 2 messages, no (N −1,N) MDS storage code and no PIR scheme (linear or non-linear) can beat this rate, which is achievable with the simple MDS storage code (x ,x ,··· ,x ) → (x ,x ,··· ,x , N−1x ) and a linear PIR 1 2 N−1 1 2 N−1 i=1 i scheme. P The general capacity expression for MDS-TPIR remains unknown. However, we are able to show that it cannot be symmetric in K and T, i.e., the two parameters are not interchangeable in c general. Also, between K and T the capacity expression does not consistently favor one over the c other. These findings are illustrated by the following four cases for which the capacity is settled. (K,N,T,K ) c (2,4,2,3) (2,4,3,2) (2,4,1,3) (2,4,3,1) Capacity 6/11 4/7 4/7 4/7 Ref. Theorem 3 Section 7.1.2 [5] [2] The first two columns show that the capacity is not symmetric in K and T, since switching their c values changes the capacity. The first two columns also suggest that increasing K hurts capacity c more than increasing T. However, considering columns 3 and 4 as the baseline wherethe capacities are equal, and comparing the drop in capacity from column 3 to column 1 when T is increased, versus no change in capacity from column 4 to column 2 when K is increased shows the opposite c trend. Therefore, neither T nor K is consistently dominant in terms of the sensitivity of capacity c to these two parameters. 3 Finally, taking an asymptotic view of capacity of MDS-TPIR, we show that if T +K > N, c then the capacity collapses to 0 as the number of messages K → ∞. This is consistent with the restriction of T +K ≤ N that is required by the achievable scheme of Freij-Hollanti et al. whose c rate does not depend on K. Notation: For n ,n ∈ Z, define the notation [n : n ] as the set {n ,n +1,··· ,n }, A 1 2 1 2 1 1 2 n1:n2 as the vector (A ,A ,··· ,A ), and S(n : n ,:) as the submatrix of a matrix S formed by n1 n1+1 n2 1 2 retaining only the nth to the nth rows. The notation X ∼ Y is used to indicate that X and Y are 1 2 identically distributed. The cardinality of a set I is denoted as |I|. The determinant of a matrix S is denoted as |S|. For an index set I = {i ,··· ,i } such that i < ··· < i , the notation A 1 n 1 n I represents the vector (A ,··· ,A ). (V ;V ;··· ;V ) refers to a matrix whose ith row vector is i1 in 1 2 n V ,i ∈ [1 : n]. i 2 Problem Statement Consider1 K independent messages W ,··· ,W ∈ FL×1, each represented as an L × 1 vector 1 K p comprised of L i.i.d. uniform symbols from a finite field F for a prime p. In p-ary units, p H(W ) = ··· = H(W )= L (6) 1 K H(W ,··· ,W ) = H(W )+···+H(W ) (7) 1 K 1 K There are N servers. The nth server stores (W1n,W2n,··· ,WKn), where Wkn ∈ FKLc×1 represents L/K symbols from W ,k ∈ [1 :K]. c k H(W |W )= 0, H(W ) = L/K (8) kn k kn c We require the storage system to satisfy the MDS property, i.e., from the information stored in any K servers, we can recover each message, i.e., c [MDS] H(W |W )= 0,∀K ⊂ [1 :N],|K | = K (9) k kKc c c c Let us use F to denote a random variable privately generated by the user, whose realization is not available to the servers. F represents the randomness in the strategies followed by the user. Similarly, G isarandomvariablethatdeterminestherandomstrategies followed bytheservers, and whose realizations are assumed to be known to all the servers and to the user. The user privately generates θ uniformly from [1 : K] and wishes to retrieve W while keeping θ a secret from each θ server. F and G are generated independently and before the realizations of the messages or the desired message index are known, so that H(θ,F,G,W ,··· ,W ) = H(θ)+H(F)+H(G)+H(W )+···+H(W ) (10) 1 K 1 K Suppose θ = k. In order to retrieve W ,k ∈ [1 : K] privately, the user privately generates N k [k] [k] random queries, Q ,··· ,Q . 1 N [k] [k] H(Q ,··· ,Q |F) = 0,∀k ∈ [1 :K] (11) 1 N 1While the problem statement is presented in its general form, we will primarily consider cases with K = 2 messages in this paper (outerboundsfor larger K are presented in Section 7.4). 4 The user sends query Q[k] to the nth server, n ∈ [1 : N]. Upon receiving Q[k], the nth server n n [k] [k] generates an answering string A , which is a function of the received query Q , the stored n n information W ,··· ,W and G, 1n Kn H(A[k]|Q[k],W ,··· ,W ,G) = 0 (12) n n 1n Kn Each server returns to the user its answer A[k].2 n [k] [k] From all the information that is now available at the user (A ,Q ,F,G), the user decodes 1:N 1:N the desired message W according to a decoding rule that is specified by the PIR scheme. Let P k e denote the probability of error achieved with the specified decoding rule. Toprotecttheuser’sprivacy,theK strategiesmustbeindistinguishable(identicallydistributed) from the perspective of any subset T ⊂ [1 : N] of at most T colluding servers, i.e., the following privacy constraint must be satisfied. [k] [k] [k′] [k′] [T-Privacy] (Q ,A ,G,W ,··· ,W )∼ (Q ,A ,G,W ,··· ,W ), T T 1T KT T T 1T KT ∀k,k′ ∈ [1 : K],∀T ⊂ [1 : N],|T| = T (13) The PIR rate characterizes how many bits of desired information are retrieved per downloaded bit and is defined as follows. R = L/D (14) where D is the expected value of the total number of bits downloaded by the user from all the servers. A rate R is said to be ǫ-error achievable if there exists a sequence of PIR schemes, indexed by L, each of rate greater than or equal to R, for which P → 0 as L → ∞. Note that for such a e sequence of PIR schemes, from Fano’s inequality, we must have 1 [k] [k] [Correctness] o(L) = H(W |A ,Q ,F,G) (15) L k 1:N 1:N (11) 1 [k] = H(W |A ,F,G) (16) L k 1:N whereo(L) represents a term whose value approaches zero as L approaches infinity. Thesupremum of ǫ-error achievable rates is called the capacity C.3 3 Settling the Conjecture Our main result, which settles the FGHK conjecture, is stated in the following theorem. Theorem 1 For the MDS-TPIR problem with K = 2 messages, N = 4 servers, T = 2 privacy and the (K ,N) = (2,4) MDSstorage code (x,y) → (x,y,x+y,x+2y), a rate of 3/5 isachievable. Since c the achievable rate exceeds the conjectured capacity of 4/7 for this setting, the FGHK conjecture is false. 2If the A[k] are obtained as inner products of query vectors and stored message vectors, then such a PIR scheme n is called a linear PIR scheme. 3Alternatively, the capacity may be defined with respect to zero error criterion, i.e., the supreme of zero error achievable rates where a rate R is said to be zero error achievable if there exists (for some L) a PIR scheme of rate greater than or equal toR for which P =0. e 5 Proof: We present a scheme that achieves rate 3/5. We assume that each message is comprised of L = 12 symbols from F for a sufficiently4 large prime p. Define a ∈ F6×1 as the 6×1 vector p p (a ;a ;··· ;a ) comprised of i.i.d. uniform symbols a ∈ F . Vectors b,c,d are defined similarly. 1 2 6 i p Messages W ,W are defined in terms of these vectors as follows. 1 2 W = (a;b) W = (c;d) (17) 1 2 3.1 Storage Code The storage is specified as (W ,W ,W ,W ) = (a,b,a+b,a+2b) (18) 11 12 13 14 (W ,W ,W ,W ) = (c,d,c+d,c+2d) (19) 21 22 23 24 Recall that W is the information about message W that is stored at Server n. Thus, Server 1 kn k stores(a,c),Server2stores(b,d),Server3stores(a+b,c+d),andServer4stores(a+2b,c+2d). In particular, each server stores 6 symbols for each message, for a total of 12 symbols per server. Any two servers store just enough information to recover both messages, thus the MDS storage criterion is satisfied. 3.2 Construction of Queries [k] [k] [k] The query to each server Q is comprised of two parts, denoted as Q (W ),Q (W ). Each n n 1 n 2 part contains 3 row vectors, also called query vectors, along which the server should project its corresponding stored message symbols. Q[k] = (Q[k](W ),Q[k](W )) (20) n n 1 n 2 Inpreparation for theconstruction of thequeries, letus denotethe setof all fullrank6×6 matrices over F as S. Theuser privately chooses two matrices, S andS′, independentlyand uniformlyfrom p S. Label the rows of S as V ,V ,V ,V ,V ,V , and the rows of S′ as U ,U ,U ,U ,U ,U . Define 1 2 3 4 5 6 0 1 2 3 4 5 V = {V ,V ,V }, U = {U ,U ,U } (21) 1 1 2 3 1 0 6 8 V = {V ,V ,V }, U = {U ,U ,U } (22) 2 1 4 5 2 0 7 9 V = {V ,V ,V }, U = {U ,U ,U } (23) 3 2 4 6 3 0 1 3 V = {V ,V ,V }, U = {U ,U ,U } (24) 4 3 5 6 4 0 2 4 U ,U ,U ,U are obtained as follows. 6 7 8 9 U = U +U , U = U +2U (25) 6 1 2 7 1 2 U = U +U , U = U +2U (26) 8 3 4 9 3 4 As a preview of what we are trying to accomplish, we note that for Server n ∈ [1 : 4], V will n be used as the query vectors for desired message symbols, while U will be used as query vectors n 4It suffices to choose p =349 for Theorem 1. In general, the appeal to large field size, analogous to the random coding argument in information theory, is made to prove the existence of a scheme, but may not be essential to the constructionofthePIRscheme. Tounderscorethispoint,Section7.1includessomeexamplesofMDS-TPIRcapacity achieving schemes over small fields. 6 for undesired message symbols. Since K = 2, the same query vector V sent to two different c i servers will recover 2 independent desired symbols. Each V ,i ∈ [1 : 6], is used exactly twice, so i all queries for desired symbols will return independent information for a total of 12 independent desired symbols. On the other hand, for undesired symbols note that U is used as the query 0 vector to all 4 servers, but because K = 2, it can only produce 2 independent symbols, i.e., 2 of c the 4 symbols are redundant. The dependencies introduced via (25),(26) are carefully chosen to ensure that the queries along U ,U ,U ,U will produce only 3 independent symbols. Similarly, 1 2 6 7 the queries along U ,U ,U ,U will produce only 3 independent symbols. Thus, all the queries for 3 4 8 9 the undesired message will produce a total of only 8 independent symbols. The 12 independent desired symbols and 8 independent undesired symbols will be resolved from a total of 12+8 = 20 downloaded symbols, to achieve the rate 12/20 = 3/5. To ensure T = 2 privacy, the U and V i i queries will be made indistinguishable from the perspective of any 2 colluding servers. The key to the T = 2 privacy is that any V ,V , n 6= n′ have one element in common. Similarly, any U ,U , n n′ n n′ n 6= n′ also have one element in common. This is a critical aspect of the construction. Next we provide a detailed description of the queries and downloads for message W ,k ∈ [1 :2], k both when W is desired and when it is not desired. To simplify the notation, we will denote k W = (x;y). Note that when k = 1, (x;y) = (a;b) and when k = 2, (x;y) = (c;d). k 3.2.1 Case 1. W is Desired k The query sent to Server n is a 3×6 matrix whose rows are the 3 vectors in V . The ordering of n the rows is uniformly random, i.e., Server n: Q[k](W )= π (V ), n ∈ [1: 4] (27) n k n n For a set V = {V ,V ,V }, π (V) is equally likely to return any one of the 6 possibilities: i1 i2 i3 n (V ;V ;V ), (V ;V ;V ), (V ;V ;V ), (V ;V ;V ), (V ;V ;V ) and (V ;V ;V ). The π i1 i2 i3 i1 i3 i2 i2 i1 i3 i2 i3 i1 i3 i1 i2 i3 i2 i1 n are independently chosen for each n ∈[1 : 4]. [k] After receiving the 3 query vectors Q (W ), Server n projects its stored W symbols along n k kn [k] these vectors. This creates three linear combinations of W symbols (denoted as A (W )). kn n k A[k](W )= Q[k](W )W (28) n k n k kn Define kc = 3−k as the complement of k, i.e., kc = 1 if k = 2 and vice versa. The answers A[k] n [k] [k] to be sent to the user will be constructed eventually by combining A (W ) and A (W ), since n k n kc separately sending these answers will be too inefficient. The details of this combining process will be specified later. Next we note an important property of the construction. [k] Desired Symbols Are Independent: We show that if the user can recover A (W ) from the 1:4 k [k] downloads, then he can recover all 12 symbols of W . From A (W ) the user recovers the 12 k 1:4 k symbols V x, V x, V x, V y, V y, V y, V (x+y), V (x+y), V (x+y), V (x+2y), V (x+2y), 1 2 3 1 4 5 2 4 6 3 5 V (x+2y). From these 12 symbols, he recovers V x and V y for all i ∈ [1 : 6]. Since S = 6 i i (V ;V ;V ;V ;V ;V ) has full rank (invertible) and the user knows V , he recovers all symbols in 1 2 3 4 5 6 1:6 x and y (thus W ). k 7 3.2.2 Case 2. W is Undesired k Similarly, the query sent to Server n is a 3×6 matrix whose rows are the 3 vectors in U . The n ordering of the rows is uniformly random for each n, and independent across all n ∈ [1 :4]. Server n : Q[kc](W ) = π′(U ), n ∈ [1 : 4] (29) n k n n Each server projects its stored W symbols along the 3 query vectors to obtain, kn A[kc](W )= Q[kc](W )W (30) n k n k kn [kc] Interfering Symbols Have Dimension 8: A (W ) is comprised of U x, U x, U x, U y, U y, U y, 1:4 k 0 6 8 0 7 9 U (x+y), U (x+y), U (x+y), U (x+2y), U (x+2y), U (x+2y). We now show that these 0 1 3 0 2 4 12 symbols are dependent and have dimension only 8.5 Because of (25) and (26), we have U x+U y = U (x+y) 0 0 0 U x+2U y = U (x+2y) 0 0 0 U x+U y−U (x+y) = U (x+2y) 6 7 1 2 U x+U y−U (x+y) = U (x+2y) (31) 8 9 3 4 [kc] Thus,ofthe12symbolsrecoveredfromA (W ),atleast4arelinearcombinationsoftheremaining 1:4 k [kc] 8. It follows that A (W ) contains no more than 8 dimensions. The number of dimensions is 1:4 k also not less than 8 because, the following 8 undesired symbols (two symbols from each server) are independent, Server 1 : U x,U x = (U +U )x 0 6 1 2 Server 2 : U y,U y = (U +2U )y 0 9 3 4 Server 3 : U (x+y),U (x+y) 1 3 Server 4 : U (x+2y),U (x+2y) (32) 2 4 To see that the 8 symbols are independent, we add 4 new symbols (U x, U y, U x, U y) such 1 3 5 5 that from the 12 symbols, we can recover all 12 undesired symbols (S′x, S′y). Since the 4 new symbols cannot contribute more than 4 dimensions, the original 8 symbols must occupy at least 8 dimensions. 3.3 Combining Answers for Efficient Download [k] Based on the queries, each server has 3 linear combinations of symbols of W in A (W ) and 3 1 n 1 [k] linear combinations of symbols of W in A (W ) for a total of 12 linear combinations of desired 2 n 2 symbols and 12 linear combinations of undesired symbols across all servers. However, recall that there are only 8 independent linear combinations of undesired symbols. This is a fact that can be exploited to improve theefficiency of download. Specifically, wewill combine the 6queried symbols (i.e., the 6 linear combinations) from each server into 5 symbols to be downloaded by the user. Intuitively, 5 symbols from each server will give the user a total of 20 symbols, from which he can resolve the 12 desired and 8 undesired symbols. 5Equivalently,thejoint entropy of these12 variables, conditioned on U is only 8 p-ary units. 0:9 8 The following function maps 6 queried symbols to 5 downloaded symbols. L(X ,X ,X ,Y ,Y ,Y ) = (X ,X ,Y ,Y ,X +Y ) (33) 1 2 3 1 2 3 1 2 1 2 3 3 Note that the first four symbols are directly downloaded and only the last symbol is mixed. The desired and undesired symbols are combined to produce the answers as follows. A[k] = L(C A[k](W ),C A[k](W )) (34) n n n 1 n n 2 where C are deterministic 3×3 matrices, that are required to satisfy the following two properties. n Denote the first 2 rows of C as C . n n P1. All C must have full rank. n P2. For all (3!)4 distinct realizations of π′,n ∈ [1 : 4], the 8 linear combinations of the un- n [k] desired message symbols that are directly downloaded (2 from each server), C A (W ), 1 1 kc [k] [k] [k] C A (W ), C A (W ), C A (W ) are independent. 2 2 kc 3 3 kc 4 4 kc As we will prove in the sequel, it is not difficult to find matrices that satisfy these properties. In fact, these properties are ‘generic’, i.e., uniformly random choices of C matrices will satisfy n these properties with probability approaching 1 as the field size approaches infinity. The appeal to generic property will be particularly useful as we consider larger classes of MDS-TPIR settings. Those (weaker) proofs apply here as well. However, for the particular setting of Theorem 1, based on a brute force search we are able to strengthen the proof by presenting the following explicit choice of C ,n ∈ [1 :4] which satisfies both properties over F . n 349 1 2 3 1 7 3 1 10 8 1 3 5 C = 6 5 4 , C = 11 9 8 , C = 7 5 4 , C = 12 9 3 (35) 1 2 3 4 0 0 1 0 0 1 0 0 1 0 0 1 Property P1 is trivially verified. Property P2 is verified by considering one by one, all of the 64 distinct realizations of π′,n ∈ [1 : 4]. To show how this is done, let us consider one case here. n Suppose the realization of the permutations is such that π′(U ) = (U ,U ,U ) (36) 1 1 0 6 8 π′(U ) = (U ,U ,U ) (37) 2 2 0 9 7 π′(U ) = (U ,U ,U ) (38) 3 3 1 3 0 π′(U ) = (U ,U ,U ) (39) 4 4 2 4 0 then we have 1 2 0 −3 0 3 0 3 U x 0 6 5 0 −4 0 4 0 4 U x 6 0 −3 1 7 3 0 3 0 U y 0 (C1A[1k](Wkc);··· ;C4A[4k](Wkc))= 08 −08 181 09 18 100 08 00 U (Ux9+y y) (40) 1 4 0 4 0 7 5 0 0 U (x+y) 3 5 0 10 0 0 0 1 3 U (x+2y) 2 3 0 6 0 0 0 12 9 U (x+2y) 4 ,C | {z } 9 The determinant of C over F is 321. Since the determinant is non-zero, all of its 8 rows are 349 linearly independent. Note that the test for property P2 does not depend on the realizations of U vectors. To see why this is true, note that the 8 linear combinations of (x,y) in the rightmost i column vector of (40) are linearly independent. Therefore, if C is an invertible matrix then the 8 directly downloaded linear combinations on the LHS of (40) are also independent (have joint entropy 8 p-ary units, conditioned on U ). 0:9 At this point the construction of the scheme is complete. All that remains now is to prove that the scheme is correct, i.e., it retrieves the desired message, and that it is T = 2 private. 3.4 The Scheme is Correct (Retrieves Desired Message) As noted previously, the first 4 variables in the output of the L function are obtained directly, [k] [k] [k] [k] [k] [k] [k] i.e., C A (W ), C A (W ), C A (W ), C A (W ) and C A (W ), C A (W ), C A (W ), 1 1 1 2 2 1 3 3 1 4 4 1 1 1 2 2 2 2 3 3 2 [k] [k] [k] [k] C A (W )arealldirectlyrecovered. BypropertyP2ofC ,C A (W ),C A (W ),C A (W ), 4 4 2 n 1 1 kc 2 2 kc 3 3 kc [k] C A (W ) are linearly independent. Since the user has recovered 8 independent dimensions of 4 4 kc interference, and interference only spans 8 dimensions, all interference is recovered and eliminated. Once the interference is eliminated, since C matrices have full rank, the user is left with 12 in- n dependent linear combinations of desired symbols, from which he is able to recover the 12 desired message symbols. Therefore the scheme is correct. 3.5 The Scheme is Private (to any T = 2 Colluding Servers) To prove that the scheme is T = 2 private (refer to (13)), it suffices to show that the queries for any 2 servers are identically distributed, regardless of which message is desired. Since each query is made up of two independently generated parts, one for each message, it suffices to prove that the query vectors for a message (say W ) are identically distributed, regardless of whether the message k is desired or undesired, Q[k](W ),Q[k](W ) ∼ Q[kc](W ),Q[kc](W ) , ∀n ,n ∈ [1 :4],n < n (41) n1 k n2 k n1 k n2 k 1 2 1 2 (cid:16) (cid:17) (cid:16) (cid:17) Note that Q[k](W ),Q[k](W ) =(π (V ),π (V )) (42) n1 k n2 k n1 n1 n2 n2 Q(cid:16)[kc](W ),Q[kc](W )(cid:17) = π′ (U ),π′ (U ) (43) n1 k n2 k n1 n1 n2 n2 (cid:16) (cid:17) (cid:0) (cid:1) Therefore, to prove (41) it suffices to show the following. V ,V ,V ,V ,V ∼ U ,U ,U ,U ,U (44) i1 i2 i3 i4 i5 0 j1 j2 j3 j4 (cid:0) (cid:1) (cid:0) (cid:1) where V = {V ,V ,V }, V = {V ,V ,V }, U = {U ,U ,U }, U = {U ,U ,U }. Be- n1 i1 i2 i3 n2 i1 i4 i5 n1 0 j1 j2 n2 0 j3 j4 cause S is uniformly chosen from the set of all full rank matrices, we have (V ,V ,V ,V ,V ) ∼ (V ,V ,V ,V ,V ) (45) i1 i2 i3 i4 i5 1 2 3 4 5 Next we note that there is a bijection between (U ,U ,U ,U ,U ) ↔ (U ,U ,U ,U ,U ) (46) 0 j1 j2 j3 j4 0 1 2 3 4 10