On a Duality Between Recoverable Distributed Storage and Index Coding Arya Mazumdar Department of ECE University of Minnesota– Twin Cities Minneapolis, MN 55455 email: [email protected] Abstract—In this paper, we introduce a model of a single- closelyrelatedtothefollowingindexcodingproblemonaside 4 failurelocallyrecoverabledistributedstoragesystem.Thismodel information graph. In this paper, we formalize this relation. 1 appears to give rise to a problem seemingly dual of the well- 0 studied index coding problem. The relation between the dimen- A. Index Coding 2 sions of an optimal index code and optimal distributed storage r code of our model has been established in this paper. We also A very natural “source coding” problem on a network, a show some extensions to vector codes. called the index coding, was introduced in [3], and since M then is a subject of extensive research. In the index coding I. INTRODUCTION problemasideinformationgraphG(V,E)isgiven.Eachvertex 0 v ∈ V represents a receiver that is interested in knowing a 1 Recently, local repair property of error-correcting codes is uniform random variable Y ∈ F . For any v ∈ V, define v q ] the center of a lot of research activity. In a distributed storage N(v)={u∈V :(v,u)∈E}tobetheneighborhoodofv.The T system, a single server failure is the most common error- receiver at v knows the values of the variables Y ,u∈N(v). u I event, and in that case, the aim is to reconstruct the content How much information should a broadcaster transmit, such . s of the server from as few other servers as possible (or by that every receiver knows the value of its desired random c [ downloadingminimalamountofdatafromotherservers).The variable? Let us give the formal definition from [3], adapted study of such regenerative storage systems was initiated in for q-ary alphabet here. 2 [9] and then followed up in several recent works. In [11], Definition 1: AnindexcodeCforFn withsideinformation v q 2 a particularly neat characterization of a local repair property graph G(V,E),V ={1,2,...,n}, is a set of codewords in F(cid:96)q 7 is provided. It is assumed that, each symbol of an encoded together with: 6 messageisstoredatadifferentnodeinthenetwork(sincethe 1) An encoding function f mapping inputs in Fn to code- 2 symbol alphabet is unconstrained, a symbol could represent q words, and . 1 a packet or block of bits of arbitrary size). Accordingly, [11] 2) A set of deterministic decoding functions g ,...,g 0 investigatescodesallowinganysinglesymbolofanycodeword such that g (cid:16)f(Y ,...,Y ),{Y : j ∈ N(i)}(cid:17)1= Y fonr 4 to be recovered from at most a constant number of other i 1 n j i 1 every i=1,...,n. symbols of the codeword, i.e., from a number of symbols that : v does not grow with the length of the code. The encoding and decoding functions depend on G. The i integer (cid:96) is called the length of C, or len(C). Given a graph X Theworkof[11]isthenfurthergeneralizedtoseveraldirec- G the minimum possible length of an index code is denoted tions and a number of impossibility results regarding, as well r a as construction of, locally repairable codes were presented by INDEXq(G). In [3], a connection has been made with the length of an (see, for example, [5], [12], [17], [20], [22]), culminating in index code to a quantity called the minrank of the graph. very recent construction of [21]. Suppose, A = (a ) be an n×n matrix over F . It is said However,thetopologyofthenetworkofdistributedstorage ij q that A fits G(V,E) over F if a (cid:54)= 0 for all i and a = 0 systemismissingfromtheabovedefinitionoflocalrepairabil- q ii ij whenever (i,j)∈/ E and i(cid:54)=j. ity.Namely,allserversaretreatedequallyirrespectiveoftheir Definition 2: The minrank of a graph G(V,E) over F is physicalpositions,proximities,andconnections.Herewetake q defined to be, a step to include that into consideration. We study the case when the topology of the storage system is fixed and the minrankq(G)=min{rankFq(A):A fits G}. (1) network of storage is given by a graph. In our model, the servers are represented by the vertices of a graph, and two It was shown in [3], that, servers are connected by an edge if it is easier to establish INDEX (G)≤minrank (G), (2) q q up-or-down link between them, for reasons such as physical locations of the servers, architecture of the distributed system and indeed, minrank (G) is the minimum length of an index q orhomogeneityofsoftwares,etc.Itturnsoutthat,ourmodelis code on G when the encoding function, and the decoding functions are all linear. The above inequality can be strict 1 2 in many cases [1], [14]. 2 In [1], the problem of index coding is further generalized. 1 We only describe here what is important for our context. Just forthispart,assumeq=2.Tocharacterizetheoptimalsizeof anindexcode,[1] introduces thenotionofaconfusiongraph. Two input strings, x = (x ,...,x ),y = (y ,...,y ) ∈ Fn 1 n 1 n 2 are called confusable if there exists some i∈{1,...,n}, such that x (cid:54)= y , but x = y , for all j ∈ N(i). In the confusion i i j j 3 graph of G, total number of vertices are 2n, and each vertex 3 5 represents a different {0,1}-string of length n. There exists an edge between two vertices if and only if the corresponding 5 two strings are confusable with respect to the graph G. The maximum size of an independent set of the confusion graph 4 is denoted by γ(G). 4 However,theconfusiongraphandγ(G)in[1]wereusedas toolstocharacterizethetherateofindexcoding;theywerenot usedtomodelanyimmediatepracticalproblem.Inthispaper, Fig.1. Exampleofadistributedstoragegraph weshowthat,thisnotionofconfusablestringsfitsperfectlyto the situation of local recovery of a distributed storage system. If the data of any one server is lost, we want to recover it Namely, γ(G), in our problem becomes the largest possible from the nearby servers, i.e., the ones with which it is easy to size of a locally recoverable code for a system with topology establish a link. This notion is formalized below. given by G. Suppose, the directed graph G(V,E) represents the network B. Organization of storage. Each element of V represents a server, and in the case of a server failure (say, v ∈ V is the failed server) one The paper is organized in the following way. In Section must be able to reconstruct its content from its neighborhood II, we introduce formally the model of a recoverable dis- N(v). tributed storage system. The notion of an optimal recoverable Given,thisconstraintwhatisthemaximumamountofinfor- distributed storage code given a graph and its relation to the mationonecanstoreinthesystem?Withoutlossofgenerality, optimal index code is also described here. In Section III, we assume V = {1,2,...,n} and the variables X ,X ,...,X 1 2 n provideanalgorithmicproofofthemaindualityrelationofthe respectively denote the content of the vertices, where, X ∈ i indexcodeanddistributedstoragecode.Ourproofisbasedon F ,i=1,...,n. q a covering argument of the Hamming space, and rely on the Definition 3: A recoverable distributed storage system fact that for any given subset of the Hamming space there (RDSS) code C ⊆ Fn with storage recovery graph q exists a translation of the set, that has very small overlap G(V,E),V = {1,2,...,n}, is a set of vectors in Fn together q withtheoriginalsubset.Weconcludewithanextensionofthe with: duality theorem to vector codes and a remark on the optimal - A set of deterministic recovery functions, f :F|N(i)| → linearly recoverable distributed storage codes1. i q F for i = 1,...,n such that for any codeword q (X ,X ,...,X )∈Fn, II. RECOVERABLEDISTRIBUTEDSTORAGESYSTEMS 1 2 n q X =f ({X :j∈N(i)}), i=1,...,n. (3) Consider the network of distributed storage, for example, i i j one of Fig. 1. As mentioned in the introduction, the property Again, the decoding functions depend on G. The log-size of of two servers connected by an edge is based on the ease the code, log |C|, is called the dimension of C, or dim(C). q of establishing a link between the servers. It is also possible GivenagraphGthemaximumpossibledimensionofanRDSS (and sensible, perhaps) to model this as a directed graph code is denoted by RDSS (G). q (especially when uplink and downlink constructions have For example, consider the graph of Fig. 1 again. Here, varying difficulties). In the following, we assume that the V ={1,2,3,4,5}.Therecoverysetsofeachvertex(orstorage graph is directed, and an undirected graph is just a special nodes) are given by: case. N(1)={2,3,4,5}, N(2)={1,3},N(3)={1,2,4}, 1Afterthefirstversionofthispaperappearedinarxiv,weweremadeaware N(4)={1,3,5},N(5)={1,4}. ofaparallelindependentwork[19]whereforvectorlinearcodestheduality between RDSS and index codes (see the discussion preceding Eq. (4)) is Suppose, the contents of the nodes 1,2,...,5 are proved.Theauthorsof[19]usethatobservationtogiveanupperboundon X ,X ,...,X respectively, where, X ∈ F ,i = 1,...,5. 1 2 5 i q theoptimallinearsumrateofthemultipleunicastnetworkcodingproblem. Moreover, X = f (X ,X ,X ,X ),X = f (X ,X ),X = In this paper we have a different focus: we show a proof of (approximate) 1 1 2 3 4 5 2 2 1 3 3 dualityforgeneral(nonlinear)codes. f3(X1,X2,X4),X4 =f4(X1,X3,X5),X5 =f5(X1,X4). Assume, the functions f ,i=1,...,5, in this example are A. Implication of the results of [1] i linear. That is, for α ∈F ,1≤i,j≤5, ij q The result of [1] can be cast in our context in the following way. X =α X +α X +α X +α X 1 12 2 13 3 14 4 15 5 Theorem 1: Given a graph G(V,E), we must have, X =α X +α X 2 21 1 23 3 X3 =α31X1+α32X2+α34X4 n−RDSSq(G)≤INDEXq(G)≤n−RDSSq(G) (cid:16) (cid:17) X4 =α41X1+α43X3+α45X5 +logq min{nlnq,1+RDSSq(G)lnq} . (5) X =α X +α X . 5 51 1 54 4 This result is purely graph-theoretic, the way it was presented This implies, (X ,X ,...,X ) must belong to the null-space in [1]. In particular, the size of maximum independent set of 1 2 5 (over F ) of the confusion graph, γ(G) was identified as the size of the q RDSS code, and its relation to the chromatic number of the 1 −α −α −α −α 12 13 14 15 confusion graph, which represents the size of the index code −α21 1 −α23 0 0 was found. Namely the proof was dependent on the following D≡ −α31 −α32 1 −α34 0 . two crucial steps. −α41 0 −α43 1 −α45 1) Thechromaticnumberofthegraphcanonlybesomuch −α 0 0 −α 1 51 54 awayfromthefractionalchromaticnumber(see,[1]for The dimension of the null-space of D is n minus the rank of detailed definition). D. Hence, it is evident that the dimension of the RDSS code 2) The confusion graph is vertex transitive. This implies is n−minrank (G). Also, the null-space of a linear index that the maximum size of an independent set is equal q code for G is a linear RDSS code for the same graph G (see, to the number of vertices divided by the fractional Eq. (2)). From the above discussion, we have, chromatic number. A proof of the first fact above can be found in [13]. In what RDSS (G)≥n−minrank (G), (4) q q follows, we give a simple coding theoretic proof of this main and, n−minrank (G) is the maximum possible dimension of theorem, without using the notion of the confusion graph or q an RDSS code when the recovery functions are all linear. At its vertex transitivity, for completeness. this point, it is tempting to make the assertion RDSS (G) = q n−INDEX (G),however,thatwouldbewrong.Thisisshown III. THEPROOFOFTHEDUALITY q in the following example. We prove Theorem 1 with the help of following two This example is present in [1], and the distributed storage lemmas. The first of them is immediate. graph, a pentagon, is shown in Fig. 2. For this graph, a Lemma 2: If there exists an index code C of length (cid:96) for maximum-sized binary RDSS code consists of the codewords a side information graph G on n vertices, then there exists {00000,01100,00011,11011,11101}. The recovery functions an RDSS code of dimension n−(cid:96) for the distributed storage are given by, graph G. Proof: Suppose, the encoding and decoding functions of X1 =X2∧X5,X2 =X1∨X3,X3 =X2∧X¯4, theindexcodeCaref:Fn →F(cid:96) andg :F(cid:96)+N(i) →F ,i= X4 =X¯3∧X5X5 =X1∨X4. 1,...,n. There must exisqts somqe x∈Fi(cid:96) suqch that |{y∈qFn : q q f(y) = x}| ≥ qn−(cid:96). Let, D ≡ {y ∈ Fn : f(y) = x} be the If all the recovery functions are linear, we could not have x q distributed storage code with recovery functions, an RDSS code with so many codewords. Here RDSS (G) = 2 log25. On the other hand, the minimum length of an index fi({Xj,j∈N(i)})≡gi(x,{Xj,j∈N(i)}). code for this graph is 3, i.e., INDEX (G) = 3, and this is 2 achieved by the following linear mappings. The broadcaster transmitY =X +X ,Y =X +X andY =X +X +X + The second lemma is the more interesting one. 1 2 3 2 4 5 3 1 2 3 X +X .Thedecodingfunctionsare,X =Y +Y +Y ;X = Lemma 3: If there exists an RDSS code C of dimension 4 5 1 1 2 3 2 Y +X ;X =Y +X ;X =Y +X ;X =Y +X . k for a distributed storage graph G on n vertices, then there 1 3 3 1 2 4 2 5 5 2 4 Although in general RDSS (G)(cid:54)=n−INDEX (G), these exists an index code of length n−k+log min{nlnq,1+ q q q two quantities are not too far from each other. In particular, klnq} for the side information graph G. forlargeenoughalphabet,theleftandrighthandsidescanbe To prove this result, we need the help of a number of other arbitrarily close. This is reflected in Thm. 1 below. lemmas. First of all notice that, translation of any RDSS code Itistobenotedthat,werefrainfromusingceilingandfloor is an RDSS code. functions for clarity in this paper. In many cases, it is clear Lemma 4: Suppose, C ⊆ Fn is an RDSS code. Then q that the number of interest is not an integer and should be any known translation of C is also an RDSS code of same rounded off to the nearest larger or smaller integer. The main dimension. That is, for any a∈Fn, C+a≡{y+a:y∈C} q results do not change for this. is an RDSS code of dimension log |C|. q If, instead we set m(cid:48) = qn−kklnq then, Pr(∪m(cid:48) C (cid:54)= i=1 i Fn)≤qn−k,whichisalsotheexpectednumberofpoints,that 2 q donotbelongtoanyofthem(cid:48) translations.Tocoverallthese 1 remaining points we need at most qn−k other transmission. Hence, there must exists a covering such that qn−kklnq+ qn−k =qn−k(klnq+1)≤m translations suffice. The proof of Lemma 3 can also be given via a greedy algorithm. In the greedy algorithm about logm vectors are recursively chosen instead of m random vectors. We provide 3 the construction/proof next. 5 A. A greedy algorithm for the proof of Lemma 3 Notethat,toproofLemma3weneedtoshowtheexistence 4 of a covering of the entire Fn, by translations of an RDSS q code. What we show here is that the translations themselves form a linear subspace. The greedy covering argument that we employ below was used to show the existence of good linear covering codes in [8] (see, also, [7], [10]), and was Fig.2. Adistributedstoragegraph(thepentagon)thatshowsRDSS(G)(cid:54)= n−INDEX(G). reintroduced in [15] to show the existence of balancing sets. Lemma 5 (Bassalygo-Elias): Suppose, C,B⊆Fn. Then, q (cid:88) Proof: Let, (X ,...,X ) ∈ C. Also assume, a = |(C+x)∩B|=|C||B|. (8) 1 n (a ,...,a ), and X(cid:48) = X +a . We know that, there exist x∈Fn 1 n i i i q recovery functions such that, X =f ({X :j∈N(i)}). i i j Proof: Now, X(cid:48) = X +a = f ({X : j ∈ N(i)})+a ≡ f(cid:48)({X(cid:48) : (cid:88) i i i i j i i j j∈N(i)}. |(C+x)∩B|=|{(x,y):x∈Fn,y∈B,y∈C+x}| q In particular, Lemma 3 crucially use the existence of a x∈Fn q covering of the entire Fnq, by translations of an RDSS code. =|{(x,y):x∈Fnq,y∈B,x∈y−C}| Proof of Lemma 3: We will show that there exists, =|{(x,y):y∈B,x∈y−C}| C ,...,C , C ∈ Fn,i = 1,...,m, all of which are RDSS 1 m i q =|B||y−C|=|C||B|, codes of dimension k such that ∪m C =Fn, (6) where y−C≡{y−a:a∈C}. i=1 i q For any set F ⊆Fn, define q where m = qn−kmin{nlnq,1+klnq}. Assume, the above |F| is true. Then, any y∈Fn must belong to at least one of the Q(F)≡1− . (9) C s. Suppose, y ≡ (Y ,q...,Y ) ∈ Fn and y ∈ C . Then, qn i 1 n q i the encoding function of the desired index code D is simply Inwords,Q(F)denotetheproportionofFnthatisnotcovered q given by, f(y)=i. If the recovery functions of C are fi,j= by F. The following property is a result of Lemma 5. i j 1,...,n, then, the decoding functions of D are given by: Lemma 6: For every subset F ⊆Fn, q (cid:88) gj(i,{Yl :l∈N(j)})=fij({Yl :l∈N(j)}). q−n Q(F∪(F+x))=Q(F)2. (10) x∈Fn Clearly the length of the index code is log m = n−k+ q q log (min{nlnq,1+klnq}). Proof: We have, q ItremainstoshowtheexistenceofRDSScodesC ,...,C 1 m |F∪(F+x)|=2|F|−|F∩(F+x)|. of dimension k each with property (6). We will show that, there exists m vectors xj,j=1,...,m such that Therefore, Ci =C+xi ≡{y+xi :y∈C}. (7) Q(F∪(F+x))=1−2|F|q−n+|F∩(F+x)|q−n, From Lemma 4, Ci,i = 1,...,m are all RDSS codes of and hence, (cid:88) dimension k. Suppose, x ,i = 1,...,m are randomly and independently chosen fromi Fn. Now, q−n Q(F∪(F+x))=1−2|F|q−n q x∈Fn q (cid:88) Pr(∪mi=(cid:48)1Ci (cid:54)=Fnq)≤qn(1−|C|/qn)m(cid:48) <qne−m(cid:48)|C|/qn ≤1, +q−2n |F∩(F+x)| when we set m(cid:48) =qn−knlnq≤m in the above expression x∈Fnq (see [2, Prop. 3.12]). =1−2|F|q−n+q−2n|F|2 of dimension k each with property (6). This is achieved by choosing m−1 vectors x ,j=1,...,m−1 such that j C =C+x ≡{y+x :y∈C}. (11) i i i From Lemma 4, C ,i = 1,...,m − 1 are all RDSS codes i of dimension k. Moreover, from Lemma 7, we already know the existence of x ,j=1,...,m−1 such that property (6) is j satisfied.However,fromLemma7itisalsoclearthatthesem vectors form a linear subspace and can be generated by only log m vectors. q Corollary 8: ForeverysubsetF ⊆Fn,thereexistsalinear q subspace D∈Fn such that |D|=qn|F|−1nlnq and q ∪x∈D(F+x)=Fnq. The above result is helpful in the decoding process of the indexcode.IfCisanRDSScodeandDisthelinearsubspace such that ∪x∈D(C + x) = Fnq, then the decoding of the Fig.3. TherecursiveconstructionofthesetsF1,F2,F3 ofLemma7. obtained index code can be performed from x ∈ Flogq|D| by q first multiplying x with the generator matrix of D and then shiftingCbyit.Hence,ifthereisapolynomialtimedecoding =(1−|F|q−n)2, algorithm for C then there will be one for the index code. It where in the second line we have used Lemma 5. wouldnotbesoforthecaseofrandom-choice,wherewemust The implication of the above lemma is the following result. maintain a look-up table of size exponential in n. Remark 1: It is a perhaps not so surprising that the Lemma 7: For every subset F ⊆ Fn, there exists methodof[1],thatistherandomchoice,(or,infactthemethod q m = qn|F|−1min{nlnq,1 + ln|F|} vectors x = of [13]) gives the exact same result as the greedy algorithm 0 0,x ,x ,...,x ∈Fn, such that method. 1 2 m−1 q ∪m−1(F+x )=Fn. IV. EXTENSIONTOVECTORCODESANDTHECAPACITYOF i=0 i q LINEARCODES Proof: From Lemma 6, for every subset F ⊆ Fn, there q Literatures of distributed storage often considers vector exists x∈Fn such that q linear codes and the same is true for [19]. However in the Q(F∪(F+x))≤Q(F)2. context of general nonlinear codes, vector codes do not bring any further technical novelty and can just be thought of as For the set F ≡F0, recursively define, for i =1, 2,... codes over a larger alphabet. F =F ∪(F +z ), For vector index codes, as earlier, a side information graph i i−1 i−1 i−1 G(V,E) is given. Each vertex v∈V represents a receiver that where z ∈Fn is such that, is interested in knowing a uniform random vector Y ∈ Fp. i q v q The receiver at v knows the values of the variables Y ,u ∈ Q(Fi∪(Fi+zi))≤Q(Fi)2, i=0,1,... N(v). A vector index code C for Fnp with side informuation q graphG(V,E),V ={1,2,...,n},isasetofcodewordsinF(cid:96)p Clearly, q ((cid:96) is the length of the code) together with: Q(Ft)≤Q(F0)2t =(cid:16)1−q−n|F|(cid:17)2t ≤e−q−n|F|2t. 1) An encoding function f mapping inputs in Fnqp to codewords, and At this point we can just use the argument at the end of proof 2) A set of deterministic decoding functions g ,...,g of Lemma 3, with 2t plating the role of m(cid:48). such that g (cid:16)f(Y ,...,Y ),{Y : j ∈ N(i)}(cid:17)1= Y fonr i 1 n j i On the other hand F contains F and its 2t − 1 trans- t 0 every i=1,...,n. lations (see, Figure(cid:10)3 for an illustratio(cid:11)n). Hence, there exists m = min qn|nFl|nq,qn(1|+Fl|n|F|) vectors x0 = GbyivIeNnDaEgXrapp(hGG) (tahlesomcinailmledumthpeobssriobaldecvaasltuecaopfa(cid:96)ciitsy)d.eWnohteend 0,x ,x ,...,x ∈Fn, such that q 1 2 m−1 q the function f, gi are linear, for all 1≤i≤n, in all of their ∪m−1(F+x )=Fn. arguments in Fq, then the code is called vector linear. i=0 i q SimilargeneralizationispossibleforthedefinitionofRDSS codes. A vector RDSS code C⊆(Fp)n with storage recovery q To complete the proof of Lemma 3, as before, we just graph G(V,E),V = {1,2,...,n}, is a set of vectors in Fnp q show the existence of RDSS codes C ≡ C,C ,...,C together with: A set of deterministic recovery functions, f : 0 1 m−1 i F|N(i)|p → Fp for i = 1,...,n such that for any codeword is used crucially in [6]. Hence p, depending on the number of q q (X ,X ,...,X ),X ∈Fp, cycles in the graph, may required to be exponential in n. 1 2 n i q X =f ({X :j∈N(i)}), i=1,...,n. (12) Acknowledgement: The author thanks A. Agarwal, A. G. i i j Dimakis and K. Shanmugam for useful references. This work The normalized log-size of the code, 1 log |C|, is called the p q wassupportedinpartbyagrantfromUniversityofMinnesota. dimension of C. Given a graph G the maximum possible dimension of a vector RDSS code is denoted by RDSSpq(G). REFERENCES When the decoding functions f , 1 ≤ i ≤ n are linear in all i [1] N.Alon,E.Lubetzky,U.Stav,A.Weinstein,andA.Hassidim. Broad- their arguments (in Fq), the code is called vector linear. casting with side information. In Foundations of Computer Science, General(nonlinear)vectorindexorRDSScodescanalsobe 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on, pages 823– thought as scalar codes over the alphabet of size qp. Hence, 832.IEEE,2008. [2] L. Babai. Automorphism groups, isomorphism, and reconstruction, n−RDSSp(G)≤INDEXp(G) chapter 27 of handbook of combinatorics. North-Holland–Elsevier, q q pages1447–1540,1995. log (pnlnq) [3] Z.Bar-Yossef,Y.Birk,T.Jayram,andT.Kol. Indexcodingwithside ≤n−RDSSp(G)+ q . information.InFoundationsofComputerScience,2006.FOCS’06.47th q p AnnualIEEESymposiumon,pages197–206.IEEE,2006. [4] A.Blasiak,R.Kleinberg,andE.Lubetzky. Lexicographicproductsand Asaconsequence,evenforaconstantq,ifp=Ω(logn),we the power of non-linear network coding. In Foundations of Computer have INDEXp(G) and n−RDSSp(G) differ at most by 1 for Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 609– q q any graph G – and for larger p, this difference goes to zero. 618.IEEE,2011. [5] V.CadambeandA.Mazumdar. Anupperboundonthesizeoflocally Although, general vector codes do not lead to a different recoverable codes. In Proc. IEEE Int. Symp. Network Coding, June analysis, we next show that vector linear codes can achieve 2013. a dimension sufficiently close to RDSSp(G) for any graph [6] M. A. R. Chaudhry, Z. Asad, A. Sprintson, and M. Langberg. On q the complementary index coding problem. In Information Theory G(V,E). This should be put into contrast with results, such as Proceedings(ISIT),2011IEEEInternationalSymposiumon,pages244– [4, Thm. 1.2], which show that a rather large gap must exist 248.IEEE,2011. between vector linear and nonlinear index coding (or network [7] G. Cohen. A nonconstructive upper bound on covering radius. Infor- mationTheory,IEEETransactionson,29(3):352–353,1983. coding) rates. [8] P.DelsarteandP.Piret.Domostbinarylinearcodesachievethegoblick Proposition 9: There exists a polynomial time (in n) con- bound on the covering radius?(corresp.). Information Theory, IEEE structible vector linear RDSS code with dimension at least Transactionson,32(6):826–828,1986. [9] A.G.Dimakis,P.B.Godfrey,Y.Wu,M.J.Wainwright,andK.Ram- RDSSp(G) chandran. Networkcodingfordistributedstoragesystems. IEEETrans. q Inform.Theory,56(9):4539–4551,Sep.2010. βlogn·loglogn [10] T.J.Goblick.Codingforadiscreteinformationsourcewithadistortion for a large enough integer p and a constant β<5. measure. PhDthesis,MassachusettsInstituteofTechnology,1963. [11] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality Proof:In[19],itwasshownthatthelinearalgebraicdual ofcodewordsymbols. IEEETrans.Inform.Theory,58(11):6925–6934, ofavectorlinearindexcodeisavectorlinearRDSScode(see, Nov.2012. SectionIIofthispaperforscalarcodes).Thisimpliesthat,for [12] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar. Codes with localregeneration. arXivpreprintarXiv:1211.1932,2012. avectorlinearindexcodeoflength(cid:96),thedualcodeisavector [13] L. Lova´sz. On the ratio of optimal integral and fractional covers. linear RDSS code of dimension n−(cid:96). In [6], a vector linear Discretemathematics,13(4):383–390,1975. index code of length (cid:96) was constructed in polynomial time, [14] E. Lubetzky and U. Stav. Nonlinear index coding outperforming the n−INDEXp(G) linearoptimum.InformationTheory,IEEETransactionson,55(8):3544– such that n−(cid:96) ≥ q , (this result of [6] was also 3551,2009. αlogn·loglogn used in [19]), α is a constant (see, [18], the building-block of [15] A. Mazumdar, R. M. Roth, and P. O. Vontobel. On linear balancing sets. Advances in Mathematics of Communications (AMC), 4(3):345– [6], for the value of the constant). The dual code of this code 361,2010. must be a vector RDSS code of dimension k = n−(cid:96). From [16] Z. Nutov and R. Yuster. Packing directed cycles efficiently. In the above discussion, it is evident that, MathematicalFoundationsofComputerScience2004,pages310–321. Springer,2004. log (pnlnq) [17] D.S.PapailiopoulosandA.G.Dimakis. Locallyrepairablecodes. In n−INDEXpq(G)≥RDSSpq(G)− q p . Proc. Int. Symp. Inform. Theory, pages 2771–2775, Cambridge, MA, July2012. Hence, [18] P. D. Seymour. Packing directed circuits fractionally. Combinatorica, RDSSp(G)− logq(pnlnq) 15(2):281–288,1995. k≥ q p . [19] K.ShanmugamandA.G.Dimakis.Connectionsbetweenindexcoding, αlogn·loglogn locally repairable codes and the multiple unicast problem. personal communication,2014. Hence if p is large enough, then the statement of the theorem [20] N. Silberstein, A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath. Optimal locally repairable codes via rank-metric codes. preprint, is proved. arXiv:1301.6331,2013. Remark 2: How large does p needs to be for the above [21] I. Tamo and A. Barg. A family of optimal locally recoverable codes. proposition to hold? It is clear that p = Ω(logn) is enough arXivpreprintarXiv:1311.3284,2013. log (pnlnq) [22] I. Tamo, D. S. Papailiopoulos, and A. G. Dimakis. Optimal lo- to diminish the additive error term of q . However, p cally repairable codes and connections to matroid theory. preprint, for the algorithm of [6] to work, p needs to be as large as the arXiv:1301.7693,2013. denominator of a linear programming solution (see, [16]) that