Table Of ContentOn a Duality Between Recoverable Distributed
Storage and Index Coding
Arya Mazumdar
Department of ECE
University of Minnesota– Twin Cities
Minneapolis, MN 55455
email: arya@umn.edu
Abstract—In this paper, we introduce a model of a single- closelyrelatedtothefollowingindexcodingproblemonaside
4 failurelocallyrecoverabledistributedstoragesystem.Thismodel information graph. In this paper, we formalize this relation.
1 appears to give rise to a problem seemingly dual of the well-
0 studied index coding problem. The relation between the dimen- A. Index Coding
2 sions of an optimal index code and optimal distributed storage
r code of our model has been established in this paper. We also A very natural “source coding” problem on a network,
a show some extensions to vector codes. called the index coding, was introduced in [3], and since
M then is a subject of extensive research. In the index coding
I. INTRODUCTION problemasideinformationgraphG(V,E)isgiven.Eachvertex
0 v ∈ V represents a receiver that is interested in knowing a
1
Recently, local repair property of error-correcting codes is uniform random variable Y ∈ F . For any v ∈ V, define
v q
] the center of a lot of research activity. In a distributed storage N(v)={u∈V :(v,u)∈E}tobetheneighborhoodofv.The
T system, a single server failure is the most common error- receiver at v knows the values of the variables Y ,u∈N(v).
u
I event, and in that case, the aim is to reconstruct the content How much information should a broadcaster transmit, such
.
s of the server from as few other servers as possible (or by that every receiver knows the value of its desired random
c
[ downloadingminimalamountofdatafromotherservers).The variable? Let us give the formal definition from [3], adapted
study of such regenerative storage systems was initiated in for q-ary alphabet here.
2 [9] and then followed up in several recent works. In [11], Definition 1: AnindexcodeCforFn withsideinformation
v q
2 a particularly neat characterization of a local repair property graph G(V,E),V ={1,2,...,n}, is a set of codewords in F(cid:96)q
7 is provided. It is assumed that, each symbol of an encoded together with:
6 messageisstoredatadifferentnodeinthenetwork(sincethe 1) An encoding function f mapping inputs in Fn to code-
2 symbol alphabet is unconstrained, a symbol could represent q
words, and
.
1 a packet or block of bits of arbitrary size). Accordingly, [11] 2) A set of deterministic decoding functions g ,...,g
0 investigatescodesallowinganysinglesymbolofanycodeword such that g (cid:16)f(Y ,...,Y ),{Y : j ∈ N(i)}(cid:17)1= Y fonr
4 to be recovered from at most a constant number of other i 1 n j i
1 every i=1,...,n.
symbols of the codeword, i.e., from a number of symbols that
:
v does not grow with the length of the code. The encoding and decoding functions depend on G. The
i integer (cid:96) is called the length of C, or len(C). Given a graph
X Theworkof[11]isthenfurthergeneralizedtoseveraldirec-
G the minimum possible length of an index code is denoted
tions and a number of impossibility results regarding, as well
r
a as construction of, locally repairable codes were presented by INDEXq(G).
In [3], a connection has been made with the length of an
(see, for example, [5], [12], [17], [20], [22]), culminating in
index code to a quantity called the minrank of the graph.
very recent construction of [21].
Suppose, A = (a ) be an n×n matrix over F . It is said
However,thetopologyofthenetworkofdistributedstorage ij q
that A fits G(V,E) over F if a (cid:54)= 0 for all i and a = 0
systemismissingfromtheabovedefinitionoflocalrepairabil- q ii ij
whenever (i,j)∈/ E and i(cid:54)=j.
ity.Namely,allserversaretreatedequallyirrespectiveoftheir
Definition 2: The minrank of a graph G(V,E) over F is
physicalpositions,proximities,andconnections.Herewetake q
defined to be,
a step to include that into consideration. We study the case
when the topology of the storage system is fixed and the minrankq(G)=min{rankFq(A):A fits G}. (1)
network of storage is given by a graph. In our model, the
servers are represented by the vertices of a graph, and two It was shown in [3], that,
servers are connected by an edge if it is easier to establish
INDEX (G)≤minrank (G), (2)
q q
up-or-down link between them, for reasons such as physical
locations of the servers, architecture of the distributed system and indeed, minrank (G) is the minimum length of an index
q
orhomogeneityofsoftwares,etc.Itturnsoutthat,ourmodelis code on G when the encoding function, and the decoding
functions are all linear. The above inequality can be strict
1
2
in many cases [1], [14].
2
In [1], the problem of index coding is further generalized.
1
We only describe here what is important for our context. Just
forthispart,assumeq=2.Tocharacterizetheoptimalsizeof
anindexcode,[1] introduces thenotionofaconfusiongraph.
Two input strings, x = (x ,...,x ),y = (y ,...,y ) ∈ Fn
1 n 1 n 2
are called confusable if there exists some i∈{1,...,n}, such
that x (cid:54)= y , but x = y , for all j ∈ N(i). In the confusion
i i j j
3
graph of G, total number of vertices are 2n, and each vertex 3
5
represents a different {0,1}-string of length n. There exists an
edge between two vertices if and only if the corresponding
5
two strings are confusable with respect to the graph G. The
maximum size of an independent set of the confusion graph 4
is denoted by γ(G). 4
However,theconfusiongraphandγ(G)in[1]wereusedas
toolstocharacterizethetherateofindexcoding;theywerenot
usedtomodelanyimmediatepracticalproblem.Inthispaper, Fig.1. Exampleofadistributedstoragegraph
weshowthat,thisnotionofconfusablestringsfitsperfectlyto
the situation of local recovery of a distributed storage system.
If the data of any one server is lost, we want to recover it
Namely, γ(G), in our problem becomes the largest possible
from the nearby servers, i.e., the ones with which it is easy to
size of a locally recoverable code for a system with topology
establish a link. This notion is formalized below.
given by G.
Suppose, the directed graph G(V,E) represents the network
B. Organization of storage. Each element of V represents a server, and in the
case of a server failure (say, v ∈ V is the failed server) one
The paper is organized in the following way. In Section
must be able to reconstruct its content from its neighborhood
II, we introduce formally the model of a recoverable dis-
N(v).
tributed storage system. The notion of an optimal recoverable
Given,thisconstraintwhatisthemaximumamountofinfor-
distributed storage code given a graph and its relation to the
mationonecanstoreinthesystem?Withoutlossofgenerality,
optimal index code is also described here. In Section III, we assume V = {1,2,...,n} and the variables X ,X ,...,X
1 2 n
provideanalgorithmicproofofthemaindualityrelationofthe
respectively denote the content of the vertices, where, X ∈
i
indexcodeanddistributedstoragecode.Ourproofisbasedon F ,i=1,...,n.
q
a covering argument of the Hamming space, and rely on the
Definition 3: A recoverable distributed storage system
fact that for any given subset of the Hamming space there (RDSS) code C ⊆ Fn with storage recovery graph
q
exists a translation of the set, that has very small overlap G(V,E),V = {1,2,...,n}, is a set of vectors in Fn together
q
withtheoriginalsubset.Weconcludewithanextensionofthe
with:
duality theorem to vector codes and a remark on the optimal - A set of deterministic recovery functions, f :F|N(i)| →
linearly recoverable distributed storage codes1. i q
F for i = 1,...,n such that for any codeword
q
(X ,X ,...,X )∈Fn,
II. RECOVERABLEDISTRIBUTEDSTORAGESYSTEMS 1 2 n q
X =f ({X :j∈N(i)}), i=1,...,n. (3)
Consider the network of distributed storage, for example, i i j
one of Fig. 1. As mentioned in the introduction, the property Again, the decoding functions depend on G. The log-size of
of two servers connected by an edge is based on the ease the code, log |C|, is called the dimension of C, or dim(C).
q
of establishing a link between the servers. It is also possible GivenagraphGthemaximumpossibledimensionofanRDSS
(and sensible, perhaps) to model this as a directed graph code is denoted by RDSS (G).
q
(especially when uplink and downlink constructions have For example, consider the graph of Fig. 1 again. Here,
varying difficulties). In the following, we assume that the V ={1,2,3,4,5}.Therecoverysetsofeachvertex(orstorage
graph is directed, and an undirected graph is just a special nodes) are given by:
case.
N(1)={2,3,4,5}, N(2)={1,3},N(3)={1,2,4},
1Afterthefirstversionofthispaperappearedinarxiv,weweremadeaware N(4)={1,3,5},N(5)={1,4}.
ofaparallelindependentwork[19]whereforvectorlinearcodestheduality
between RDSS and index codes (see the discussion preceding Eq. (4)) is Suppose, the contents of the nodes 1,2,...,5 are
proved.Theauthorsof[19]usethatobservationtogiveanupperboundon X ,X ,...,X respectively, where, X ∈ F ,i = 1,...,5.
1 2 5 i q
theoptimallinearsumrateofthemultipleunicastnetworkcodingproblem.
Moreover, X = f (X ,X ,X ,X ),X = f (X ,X ),X =
In this paper we have a different focus: we show a proof of (approximate) 1 1 2 3 4 5 2 2 1 3 3
dualityforgeneral(nonlinear)codes. f3(X1,X2,X4),X4 =f4(X1,X3,X5),X5 =f5(X1,X4).
Assume, the functions f ,i=1,...,5, in this example are A. Implication of the results of [1]
i
linear. That is, for α ∈F ,1≤i,j≤5,
ij q The result of [1] can be cast in our context in the following
way.
X =α X +α X +α X +α X
1 12 2 13 3 14 4 15 5
Theorem 1: Given a graph G(V,E), we must have,
X =α X +α X
2 21 1 23 3
X3 =α31X1+α32X2+α34X4 n−RDSSq(G)≤INDEXq(G)≤n−RDSSq(G)
(cid:16) (cid:17)
X4 =α41X1+α43X3+α45X5 +logq min{nlnq,1+RDSSq(G)lnq} . (5)
X =α X +α X .
5 51 1 54 4
This result is purely graph-theoretic, the way it was presented
This implies, (X ,X ,...,X ) must belong to the null-space in [1]. In particular, the size of maximum independent set of
1 2 5
(over F ) of the confusion graph, γ(G) was identified as the size of the
q
RDSS code, and its relation to the chromatic number of the
1 −α −α −α −α
12 13 14 15 confusion graph, which represents the size of the index code
−α21 1 −α23 0 0 was found. Namely the proof was dependent on the following
D≡ −α31 −α32 1 −α34 0 . two crucial steps.
−α41 0 −α43 1 −α45
1) Thechromaticnumberofthegraphcanonlybesomuch
−α 0 0 −α 1
51 54
awayfromthefractionalchromaticnumber(see,[1]for
The dimension of the null-space of D is n minus the rank of detailed definition).
D. Hence, it is evident that the dimension of the RDSS code 2) The confusion graph is vertex transitive. This implies
is n−minrank (G). Also, the null-space of a linear index that the maximum size of an independent set is equal
q
code for G is a linear RDSS code for the same graph G (see, to the number of vertices divided by the fractional
Eq. (2)). From the above discussion, we have, chromatic number.
A proof of the first fact above can be found in [13]. In what
RDSS (G)≥n−minrank (G), (4)
q q follows, we give a simple coding theoretic proof of this main
and, n−minrank (G) is the maximum possible dimension of theorem, without using the notion of the confusion graph or
q
an RDSS code when the recovery functions are all linear. At its vertex transitivity, for completeness.
this point, it is tempting to make the assertion RDSS (G) =
q
n−INDEX (G),however,thatwouldbewrong.Thisisshown III. THEPROOFOFTHEDUALITY
q
in the following example. We prove Theorem 1 with the help of following two
This example is present in [1], and the distributed storage lemmas. The first of them is immediate.
graph, a pentagon, is shown in Fig. 2. For this graph, a Lemma 2: If there exists an index code C of length (cid:96) for
maximum-sized binary RDSS code consists of the codewords a side information graph G on n vertices, then there exists
{00000,01100,00011,11011,11101}. The recovery functions an RDSS code of dimension n−(cid:96) for the distributed storage
are given by, graph G.
Proof: Suppose, the encoding and decoding functions of
X1 =X2∧X5,X2 =X1∨X3,X3 =X2∧X¯4, theindexcodeCaref:Fn →F(cid:96) andg :F(cid:96)+N(i) →F ,i=
X4 =X¯3∧X5X5 =X1∨X4. 1,...,n. There must exisqts somqe x∈Fi(cid:96) suqch that |{y∈qFn :
q q
f(y) = x}| ≥ qn−(cid:96). Let, D ≡ {y ∈ Fn : f(y) = x} be the
If all the recovery functions are linear, we could not have x q
distributed storage code with recovery functions,
an RDSS code with so many codewords. Here RDSS (G) =
2
log25. On the other hand, the minimum length of an index fi({Xj,j∈N(i)})≡gi(x,{Xj,j∈N(i)}).
code for this graph is 3, i.e., INDEX (G) = 3, and this is
2
achieved by the following linear mappings. The broadcaster
transmitY =X +X ,Y =X +X andY =X +X +X + The second lemma is the more interesting one.
1 2 3 2 4 5 3 1 2 3
X +X .Thedecodingfunctionsare,X =Y +Y +Y ;X = Lemma 3: If there exists an RDSS code C of dimension
4 5 1 1 2 3 2
Y +X ;X =Y +X ;X =Y +X ;X =Y +X . k for a distributed storage graph G on n vertices, then there
1 3 3 1 2 4 2 5 5 2 4
Although in general RDSS (G)(cid:54)=n−INDEX (G), these exists an index code of length n−k+log min{nlnq,1+
q q q
two quantities are not too far from each other. In particular, klnq} for the side information graph G.
forlargeenoughalphabet,theleftandrighthandsidescanbe To prove this result, we need the help of a number of other
arbitrarily close. This is reflected in Thm. 1 below. lemmas. First of all notice that, translation of any RDSS code
Itistobenotedthat,werefrainfromusingceilingandfloor is an RDSS code.
functions for clarity in this paper. In many cases, it is clear Lemma 4: Suppose, C ⊆ Fn is an RDSS code. Then
q
that the number of interest is not an integer and should be any known translation of C is also an RDSS code of same
rounded off to the nearest larger or smaller integer. The main dimension. That is, for any a∈Fn, C+a≡{y+a:y∈C}
q
results do not change for this. is an RDSS code of dimension log |C|.
q
If, instead we set m(cid:48) = qn−kklnq then, Pr(∪m(cid:48) C (cid:54)=
i=1 i
Fn)≤qn−k,whichisalsotheexpectednumberofpoints,that
2
q
donotbelongtoanyofthem(cid:48) translations.Tocoverallthese
1
remaining points we need at most qn−k other transmission.
Hence, there must exists a covering such that qn−kklnq+
qn−k =qn−k(klnq+1)≤m translations suffice.
The proof of Lemma 3 can also be given via a greedy
algorithm. In the greedy algorithm about logm vectors are
recursively chosen instead of m random vectors. We provide
3
the construction/proof next.
5
A. A greedy algorithm for the proof of Lemma 3
Notethat,toproofLemma3weneedtoshowtheexistence
4
of a covering of the entire Fn, by translations of an RDSS
q
code. What we show here is that the translations themselves
form a linear subspace. The greedy covering argument that
we employ below was used to show the existence of good
linear covering codes in [8] (see, also, [7], [10]), and was
Fig.2. Adistributedstoragegraph(thepentagon)thatshowsRDSS(G)(cid:54)=
n−INDEX(G). reintroduced in [15] to show the existence of balancing sets.
Lemma 5 (Bassalygo-Elias): Suppose, C,B⊆Fn. Then,
q
(cid:88)
Proof: Let, (X ,...,X ) ∈ C. Also assume, a = |(C+x)∩B|=|C||B|. (8)
1 n
(a ,...,a ), and X(cid:48) = X +a . We know that, there exist x∈Fn
1 n i i i q
recovery functions such that, X =f ({X :j∈N(i)}).
i i j Proof:
Now, X(cid:48) = X +a = f ({X : j ∈ N(i)})+a ≡ f(cid:48)({X(cid:48) : (cid:88)
i i i i j i i j
j∈N(i)}. |(C+x)∩B|=|{(x,y):x∈Fn,y∈B,y∈C+x}|
q
In particular, Lemma 3 crucially use the existence of a x∈Fn
q
covering of the entire Fnq, by translations of an RDSS code. =|{(x,y):x∈Fnq,y∈B,x∈y−C}|
Proof of Lemma 3: We will show that there exists,
=|{(x,y):y∈B,x∈y−C}|
C ,...,C , C ∈ Fn,i = 1,...,m, all of which are RDSS
1 m i q =|B||y−C|=|C||B|,
codes of dimension k such that
∪m C =Fn, (6) where y−C≡{y−a:a∈C}.
i=1 i q For any set F ⊆Fn, define
q
where m = qn−kmin{nlnq,1+klnq}. Assume, the above
|F|
is true. Then, any y∈Fn must belong to at least one of the Q(F)≡1− . (9)
C s. Suppose, y ≡ (Y ,q...,Y ) ∈ Fn and y ∈ C . Then, qn
i 1 n q i
the encoding function of the desired index code D is simply Inwords,Q(F)denotetheproportionofFnthatisnotcovered
q
given by, f(y)=i. If the recovery functions of C are fi,j= by F. The following property is a result of Lemma 5.
i j
1,...,n, then, the decoding functions of D are given by: Lemma 6: For every subset F ⊆Fn,
q
(cid:88)
gj(i,{Yl :l∈N(j)})=fij({Yl :l∈N(j)}). q−n Q(F∪(F+x))=Q(F)2. (10)
x∈Fn
Clearly the length of the index code is log m = n−k+ q
q
log (min{nlnq,1+klnq}). Proof: We have,
q
ItremainstoshowtheexistenceofRDSScodesC ,...,C
1 m |F∪(F+x)|=2|F|−|F∩(F+x)|.
of dimension k each with property (6). We will show that,
there exists m vectors xj,j=1,...,m such that Therefore,
Ci =C+xi ≡{y+xi :y∈C}. (7) Q(F∪(F+x))=1−2|F|q−n+|F∩(F+x)|q−n,
From Lemma 4, Ci,i = 1,...,m are all RDSS codes of and hence,
(cid:88)
dimension k. Suppose, x ,i = 1,...,m are randomly and
independently chosen fromi Fn. Now, q−n Q(F∪(F+x))=1−2|F|q−n
q
x∈Fn
q (cid:88)
Pr(∪mi=(cid:48)1Ci (cid:54)=Fnq)≤qn(1−|C|/qn)m(cid:48) <qne−m(cid:48)|C|/qn ≤1, +q−2n |F∩(F+x)|
when we set m(cid:48) =qn−knlnq≤m in the above expression x∈Fnq
(see [2, Prop. 3.12]). =1−2|F|q−n+q−2n|F|2
of dimension k each with property (6). This is achieved by
choosing m−1 vectors x ,j=1,...,m−1 such that
j
C =C+x ≡{y+x :y∈C}. (11)
i i i
From Lemma 4, C ,i = 1,...,m − 1 are all RDSS codes
i
of dimension k. Moreover, from Lemma 7, we already know
the existence of x ,j=1,...,m−1 such that property (6) is
j
satisfied.However,fromLemma7itisalsoclearthatthesem
vectors form a linear subspace and can be generated by only
log m vectors.
q
Corollary 8: ForeverysubsetF ⊆Fn,thereexistsalinear
q
subspace D∈Fn such that |D|=qn|F|−1nlnq and
q
∪x∈D(F+x)=Fnq.
The above result is helpful in the decoding process of the
indexcode.IfCisanRDSScodeandDisthelinearsubspace
such that ∪x∈D(C + x) = Fnq, then the decoding of the
Fig.3. TherecursiveconstructionofthesetsF1,F2,F3 ofLemma7. obtained index code can be performed from x ∈ Flogq|D| by
q
first multiplying x with the generator matrix of D and then
shiftingCbyit.Hence,ifthereisapolynomialtimedecoding
=(1−|F|q−n)2,
algorithm for C then there will be one for the index code. It
where in the second line we have used Lemma 5. wouldnotbesoforthecaseofrandom-choice,wherewemust
The implication of the above lemma is the following result. maintain a look-up table of size exponential in n.
Remark 1: It is a perhaps not so surprising that the
Lemma 7: For every subset F ⊆ Fn, there exists methodof[1],thatistherandomchoice,(or,infactthemethod
q
m = qn|F|−1min{nlnq,1 + ln|F|} vectors x = of [13]) gives the exact same result as the greedy algorithm
0
0,x ,x ,...,x ∈Fn, such that method.
1 2 m−1 q
∪m−1(F+x )=Fn. IV. EXTENSIONTOVECTORCODESANDTHECAPACITYOF
i=0 i q
LINEARCODES
Proof: From Lemma 6, for every subset F ⊆ Fn, there
q Literatures of distributed storage often considers vector
exists x∈Fn such that
q linear codes and the same is true for [19]. However in the
Q(F∪(F+x))≤Q(F)2. context of general nonlinear codes, vector codes do not bring
any further technical novelty and can just be thought of as
For the set F ≡F0, recursively define, for i =1, 2,... codes over a larger alphabet.
F =F ∪(F +z ), For vector index codes, as earlier, a side information graph
i i−1 i−1 i−1
G(V,E) is given. Each vertex v∈V represents a receiver that
where z ∈Fn is such that, is interested in knowing a uniform random vector Y ∈ Fp.
i q v q
The receiver at v knows the values of the variables Y ,u ∈
Q(Fi∪(Fi+zi))≤Q(Fi)2, i=0,1,... N(v). A vector index code C for Fnp with side informuation
q
graphG(V,E),V ={1,2,...,n},isasetofcodewordsinF(cid:96)p
Clearly, q
((cid:96) is the length of the code) together with:
Q(Ft)≤Q(F0)2t =(cid:16)1−q−n|F|(cid:17)2t ≤e−q−n|F|2t. 1) An encoding function f mapping inputs in Fnqp to
codewords, and
At this point we can just use the argument at the end of proof 2) A set of deterministic decoding functions g ,...,g
of Lemma 3, with 2t plating the role of m(cid:48). such that g (cid:16)f(Y ,...,Y ),{Y : j ∈ N(i)}(cid:17)1= Y fonr
i 1 n j i
On the other hand F contains F and its 2t − 1 trans-
t 0 every i=1,...,n.
lations (see, Figure(cid:10)3 for an illustratio(cid:11)n). Hence, there
exists m = min qn|nFl|nq,qn(1|+Fl|n|F|) vectors x0 = GbyivIeNnDaEgXrapp(hGG) (tahlesomcinailmledumthpeobssriobaldecvaasltuecaopfa(cid:96)ciitsy)d.eWnohteend
0,x ,x ,...,x ∈Fn, such that q
1 2 m−1 q the function f, gi are linear, for all 1≤i≤n, in all of their
∪m−1(F+x )=Fn. arguments in Fq, then the code is called vector linear.
i=0 i q SimilargeneralizationispossibleforthedefinitionofRDSS
codes. A vector RDSS code C⊆(Fp)n with storage recovery
q
To complete the proof of Lemma 3, as before, we just graph G(V,E),V = {1,2,...,n}, is a set of vectors in Fnp
q
show the existence of RDSS codes C ≡ C,C ,...,C together with: A set of deterministic recovery functions, f :
0 1 m−1 i
F|N(i)|p → Fp for i = 1,...,n such that for any codeword is used crucially in [6]. Hence p, depending on the number of
q q
(X ,X ,...,X ),X ∈Fp, cycles in the graph, may required to be exponential in n.
1 2 n i q
X =f ({X :j∈N(i)}), i=1,...,n. (12) Acknowledgement: The author thanks A. Agarwal, A. G.
i i j
Dimakis and K. Shanmugam for useful references. This work
The normalized log-size of the code, 1 log |C|, is called the
p q wassupportedinpartbyagrantfromUniversityofMinnesota.
dimension of C. Given a graph G the maximum possible
dimension of a vector RDSS code is denoted by RDSSpq(G). REFERENCES
When the decoding functions f , 1 ≤ i ≤ n are linear in all
i [1] N.Alon,E.Lubetzky,U.Stav,A.Weinstein,andA.Hassidim. Broad-
their arguments (in Fq), the code is called vector linear. casting with side information. In Foundations of Computer Science,
General(nonlinear)vectorindexorRDSScodescanalsobe 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on, pages 823–
thought as scalar codes over the alphabet of size qp. Hence, 832.IEEE,2008.
[2] L. Babai. Automorphism groups, isomorphism, and reconstruction,
n−RDSSp(G)≤INDEXp(G) chapter 27 of handbook of combinatorics. North-Holland–Elsevier,
q q pages1447–1540,1995.
log (pnlnq) [3] Z.Bar-Yossef,Y.Birk,T.Jayram,andT.Kol. Indexcodingwithside
≤n−RDSSp(G)+ q . information.InFoundationsofComputerScience,2006.FOCS’06.47th
q p AnnualIEEESymposiumon,pages197–206.IEEE,2006.
[4] A.Blasiak,R.Kleinberg,andE.Lubetzky. Lexicographicproductsand
Asaconsequence,evenforaconstantq,ifp=Ω(logn),we
the power of non-linear network coding. In Foundations of Computer
have INDEXp(G) and n−RDSSp(G) differ at most by 1 for Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 609–
q q
any graph G – and for larger p, this difference goes to zero. 618.IEEE,2011.
[5] V.CadambeandA.Mazumdar. Anupperboundonthesizeoflocally
Although, general vector codes do not lead to a different
recoverable codes. In Proc. IEEE Int. Symp. Network Coding, June
analysis, we next show that vector linear codes can achieve 2013.
a dimension sufficiently close to RDSSp(G) for any graph [6] M. A. R. Chaudhry, Z. Asad, A. Sprintson, and M. Langberg. On
q the complementary index coding problem. In Information Theory
G(V,E). This should be put into contrast with results, such as
Proceedings(ISIT),2011IEEEInternationalSymposiumon,pages244–
[4, Thm. 1.2], which show that a rather large gap must exist 248.IEEE,2011.
between vector linear and nonlinear index coding (or network [7] G. Cohen. A nonconstructive upper bound on covering radius. Infor-
mationTheory,IEEETransactionson,29(3):352–353,1983.
coding) rates.
[8] P.DelsarteandP.Piret.Domostbinarylinearcodesachievethegoblick
Proposition 9: There exists a polynomial time (in n) con- bound on the covering radius?(corresp.). Information Theory, IEEE
structible vector linear RDSS code with dimension at least Transactionson,32(6):826–828,1986.
[9] A.G.Dimakis,P.B.Godfrey,Y.Wu,M.J.Wainwright,andK.Ram-
RDSSp(G) chandran. Networkcodingfordistributedstoragesystems. IEEETrans.
q
Inform.Theory,56(9):4539–4551,Sep.2010.
βlogn·loglogn
[10] T.J.Goblick.Codingforadiscreteinformationsourcewithadistortion
for a large enough integer p and a constant β<5. measure. PhDthesis,MassachusettsInstituteofTechnology,1963.
[11] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality
Proof:In[19],itwasshownthatthelinearalgebraicdual
ofcodewordsymbols. IEEETrans.Inform.Theory,58(11):6925–6934,
ofavectorlinearindexcodeisavectorlinearRDSScode(see, Nov.2012.
SectionIIofthispaperforscalarcodes).Thisimpliesthat,for [12] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar. Codes with
localregeneration. arXivpreprintarXiv:1211.1932,2012.
avectorlinearindexcodeoflength(cid:96),thedualcodeisavector
[13] L. Lova´sz. On the ratio of optimal integral and fractional covers.
linear RDSS code of dimension n−(cid:96). In [6], a vector linear Discretemathematics,13(4):383–390,1975.
index code of length (cid:96) was constructed in polynomial time, [14] E. Lubetzky and U. Stav. Nonlinear index coding outperforming the
n−INDEXp(G) linearoptimum.InformationTheory,IEEETransactionson,55(8):3544–
such that n−(cid:96) ≥ q , (this result of [6] was also 3551,2009.
αlogn·loglogn
used in [19]), α is a constant (see, [18], the building-block of [15] A. Mazumdar, R. M. Roth, and P. O. Vontobel. On linear balancing
sets. Advances in Mathematics of Communications (AMC), 4(3):345–
[6], for the value of the constant). The dual code of this code
361,2010.
must be a vector RDSS code of dimension k = n−(cid:96). From [16] Z. Nutov and R. Yuster. Packing directed cycles efficiently. In
the above discussion, it is evident that, MathematicalFoundationsofComputerScience2004,pages310–321.
Springer,2004.
log (pnlnq) [17] D.S.PapailiopoulosandA.G.Dimakis. Locallyrepairablecodes. In
n−INDEXpq(G)≥RDSSpq(G)− q p . Proc. Int. Symp. Inform. Theory, pages 2771–2775, Cambridge, MA,
July2012.
Hence, [18] P. D. Seymour. Packing directed circuits fractionally. Combinatorica,
RDSSp(G)− logq(pnlnq) 15(2):281–288,1995.
k≥ q p . [19] K.ShanmugamandA.G.Dimakis.Connectionsbetweenindexcoding,
αlogn·loglogn locally repairable codes and the multiple unicast problem. personal
communication,2014.
Hence if p is large enough, then the statement of the theorem [20] N. Silberstein, A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath.
Optimal locally repairable codes via rank-metric codes. preprint,
is proved.
arXiv:1301.6331,2013.
Remark 2: How large does p needs to be for the above [21] I. Tamo and A. Barg. A family of optimal locally recoverable codes.
proposition to hold? It is clear that p = Ω(logn) is enough arXivpreprintarXiv:1311.3284,2013.
log (pnlnq) [22] I. Tamo, D. S. Papailiopoulos, and A. G. Dimakis. Optimal lo-
to diminish the additive error term of q . However,
p cally repairable codes and connections to matroid theory. preprint,
for the algorithm of [6] to work, p needs to be as large as the arXiv:1301.7693,2013.
denominator of a linear programming solution (see, [16]) that