Table Of Content

Update-Efficient Regenerating Codes with Minimum Per-Node Storage Yunghsiang S. Han∗, Hong-Ta Pai†, Rong Zheng‡ and Pramod K. Varshney§ ∗Dep. of Electrical Eng. National Taiwan University of Science and Technology, Taipei, Taiwan Email: [email protected] †Dep. of communication Eng. National Taipei University, Taipei, Taiwan ‡Dep. of Computing and Software, McMaster University, Hamilton, ON, Canada §Dep. of EECS, Syracuse University, Syracuse, USA Abstract—Regenerating codes provide an efficient way to codes first introduced in the pioneer works by Dimakis et recover data at failed nodes in distributed storage systems. It al. in [1], [2] allow efficient data regeneration. To facilitate has been shown that regenerating codes can be designed to data regeneration, each storage node stores α symbols and a minimize the per-node storage (called MSR) or minimize the 3 total of d surviving nodes are accessed to retrieve β ≤ α communication overhead for regeneration (called MBR). In this 1 0 work, we propose a new encoding scheme for [n,d] error- symbols from each node. A trade-off exists between the stor- correcting MSR codes that generalizes our earlier work on age overhead and the regeneration (repair) bandwidth needed 2 error-correcting regenerating codes. We show that by choosing for data regeneration.Minimum Storage Regenerating (MSR) n a suitable diagonal matrix, any generator matrix of the [n,α] codes first minimize the amount of data stored per node, a Reed-Solomon (RS) code can be integrated into the encoding J matrix. Hence, MSR codes with the least updatecomplexity can and then the repair bandwidth, while Minimum Bandwidth 1 be found. An efficient decoding scheme is also proposed that Regenerating (MBR) codes carry out the minimization in the 1 utilizes the [n,α] RS code to perform data reconstruction. The reverse order. There have been many works that focus on the proposeddecodingschemehasbettererrorcorrection capability design of regenerating codes [3]–[10]. Recently, Rashmi et ] and incurs the least number of node accesses when errors are al.proposedoptimalexact-regeneratingcodesthatrecoverthe T present. storeddataatthefailednodeexactly(andthusthenameexact- I . regenerating) [10]; however, the authors only consider crash- s I. INTRODUCTION c stop failures of storage nodes. Han et al. extended Rashmi’s [ Cloud storage is gaining popularity as an alternative to worktoconstructerror-correctingregeneratingcodesfor exact 1 enterprise storage where data is stored in virtualized pools of regeneration that can handle Byzantine failures [11]. In [11], v storagetypicallyhostedbythird-partydatacenters.Reliability theencodinganddecodingalgorithmsforbothMSRandMBR 7 is a keychallengein the designof distributedstoragesystems error-correcting codes were also provided. In [12], the code 9 that provide cloud storage. Both crash-stop and Byzantine capability and resilience were discussed for error-correcting 4 failures (as a result of software bugs and malicious attacks) regenerating codes. 2 . are likely to be present during data retrieval. A crash-stop In addition to bandwidth efficiency and error correction 1 failure makes a storage node unresponsiveto access requests. capability, another desirable feature for regenerating codes 0 3 In contrast, a Byzantine failure responds to access requests is update complexity [13], defined as the maximum number 1 witherroneousdata.Toachievebetterreliability,onecommon of encoded symbols that must be updated while a single : approachisto replicatedatafiles onmultiplestoragenodes in data symbol is modified. Low update complexity is desirable v i a network. Erasure coding is employedto encode the original in scenarios where updates are frequent. Clearly, the update X dataandthentheencodeddataisdistributedto storagenodes. complexityofaregeneratingcodeisdeterminedbythenumber r Typically,morethanonestoragenodesneedtobeaccessedto of non-zero elements in the row of the encoding matrix with a recover the original data. One popular class of erasure codes the maximum Hamming weight. The smaller the number, the is the maximum-distance-separable(MDS) codes. With [n,k] lower the update complexity is. MDS codes such as Reed-Solomon (RS) codes, k data items One drawback of the decoding algorithms for MSR codes are encoded and then distributed to and stored at n storage given in [11] is that, when one or more storage nodes have nodes.A userora data collectorcanretrievethe originaldata erroneous data, the decoder needs to access extra data from by accessing any k of the storage nodes, a processreferredto many storage nodes (at least k more nodes) for data recon- as data reconstruction. struction.Furthermore,whenonesymbolintheoriginaldatais Any storage node can fail due to hardware or software updated,allstoragenodesneedtoupdatetheirrespectivedata. damage. Data stored at the failed nodes need to be recovered Thus, the MSR and MBR codes in [11] have the maximum (regenerated) to remain functional to perform data recon- possibleupdatecomplexity.Bothdeficienciesareaddressedin struction. The process to recover the stored (encoded) data this paper. First, we propose a general encoding scheme for at a storage node is called data regeneration. Regenerating MSR codes. As a special case, least-update-complexitycodes are designed.Second, a new decoding algorithmis presented. B. MSRRegeneratingCodesWithErrorCorrectionCapability It not only provides better error correction capability but also Next,wedescribetheMSRcodeconstructiongivenin[11]. incurs low communicationoverhead when errors occur in the In the rest of the paper, we assume d=2α. The information accessed data. sequence m = [m0,m1,...,mB−1] can be arranged into an information vector U = [Z Z ] with size α × d such that 1 2 II. ERROR-CORRECTING MSR REGENERATINGCODES Z and Z are symmetric matrices with dimension α × α. 1 2 Inthissection,wegiveabriefoverviewofdataregenerating An [n,d = 2α] RS code is adopted to construct the MSR codes and the MSR code construction presented in [11]. code [11]. Let a be a generator of GF(2m). In the encoding of the MSR code, we have A. Regenerating Codes U ·G=C, (3) Let α be the number of symbols stored at each storage where node and β ≤ α the number of symbols downloaded from each storage during regeneration. To repair the stored data at G= 1 1 ··· 1 thefailednode,ahelpernodeaccessesdsurvivingnodes.The a0 a1 ··· an−1 designofregeneratingcodesensuresthatthetotalregenerating  (a0)2 (a1)2 ··· (an−1)2  bandwidth be much less than that of the original data, B.  .   .   .  A regenerating code must be capable of reconstructing the  (a0)α−1 (a1)α−1 ··· (an−1)α−1  original data symbols and regenerating coded data at a failed  (a0)α1 (a1)α1 ··· (an−1)α1  node. An [n,k,d] regenerating code requires at least k and  (a0)αa0 (a1)αa1 ··· (an−1)αan−1  d surviving nodes to ensure successful data reconstruction  (a0)α(a0)2 (a1)α(a1)2 ··· (an−1)α(an−1)2    and regeneration [10], respectively, where n is the number  ...    of storage nodes and k ≤d≤n−1.  (a0)α(a0)α−1 (a1)α(a1)α−1 ··· (an−1)α(an−1)α−1    The cut-set boundgivenin [2], [3] providesa constrainton =(cid:20) G¯G¯∆ (cid:21), the repair bandwidth. By this bound, any regenerating code (4) must satisfy the following inequality: and C is the codeword vector with dimension (α×n). G¯ containsthefirstαrowsinGand∆isadiagonalmatrixwith k−1 (a0)α, (a1)α, (a2)α,..., (an−1)αasdiagonalelements.Note B ≤ min{α,(d−i)β} . (1) that if the RS code is over GF(2m) for m≥⌈log nα⌉, then Xi=0 it can be shown that (a0)α, (a1)α, (a2)α,..., (a2n−1)α are From (1), α or β can be minimized achieving either the min- all distinct. Afterencoding,the ith columnof C is distributed imum storage requirement or the minimum repair bandwidth to storage node i for 1≤i≤n. requirement, but not both. The two extreme points in (1) are referred to as the minimum storage regeneration (MSR) and III. ENCODING SCHEMES FORERROR-CORRECTING MSR minimumbandwidthregeneration(MBR) points,respectively. CODES The values of α and β for the MSR point can be obtained by RS codes are known to have very efficient decoding algo- first minimizing α and then minimizing β: rithms and exhibit good error correction capability. From (4) in Section II-B, a generator matrix G for MSR codes needs α = d−k+1 to satisfy: B = k(d−k+1)=kα , (2) G¯ 1) G = , where G¯ contains the first α rows in G where we normalize β as 1.1 (cid:20) G¯∆ (cid:21) and ∆ is a diagonal matrix with distinct elements in the There are two categories of approaches to regenerate data diagonal. at a failed node. If the replacement data is exactly the 2) G¯ is a generatormatrix of the [n,α] RS code and G is a same as that previously stored at the failed node, we call generator matrix of the [n,d=2α] RS code. it the exact regeneration. Otherwise, if the replacement data Next, we presenta sufficientconditionfor G¯ and ∆ such that only guarantees the correctness of data reconstruction and G is a generator matrix of an [n,d] RS code. regeneration properties, it is called functional regeneration. Theorem 1: Let G¯ be a generator matrix of the [n,α] In practice, exact regeneration is more desirable since there RS code C that is generated by the generator polynomial is no need to inform each node in the network regarding α with roots a1,a2,...,an−α. Let the diagonal elements of ∆ the replacement. Furthermore, it is easy to keep the codes be (a0)α, (a1)α, ..., (an−1)α, where m ≥ ⌈log n⌉ and systematic via exact regeneration, where partial data can be 2 gcd(2m−1,α)=1.ThenGisageneratormatrixof [n,d]RS retrieved without accessing all k nodes. The codes designed code C that is generated by the generator polynomial with in [10], [11] allow exact regeneration. d roots a1,a2,...,an−d. Proof: We need to show that each row of G¯∆ is a 1Ithas been proved that when designing [n,k,d]MSR fork/(n+1)≤ 1/2.itsuffices toconsiderthosewith β=1[10]. codeword of Cd, and all rows in G are linearly independent. Let c = (c0,c1,...,cn−1) be any row in G¯. Then the Withoutlossofgenerality,weassumethatthedatacollector polynomial representation of c∆ is retrieves encoded symbols from k + 2v (v ≥ 0) storage n−1 n−1 nodes, j0,j1,...,jk+2v−1. We also assume that there are v c (ai)αxi = c (aαx)i . (5) storage nodes whose received symbols are erroneous. The i i Xi=0 Xi=0 stored information of the k+2v storage nodes are collected Since c ∈ C , c has roots a1,a2,...,an−α. Then it is easy as the k +2v columns in Yα×(k+2v). The k + 2v columns toseethat(5α)hasrootsa−α+1,a−α+2,...,an−2α thatclearly of G corresponding to storage nodes j0,j1,...,jk+2v−1 are denoted as the columns of G . First, we discuss data contain a1,a2,...,an−2α. Hence, c∆∈C . k+2v d reconstructionwhen v =0.Thedecodingprocedureissimilar InordertoshowthatallrowsinGarelinearlyindependent, to that in [10]. it is sufficient to show that c∆ 6∈ C for all nonzero α c ∈ C . Assume that c∆ ∈ C . Then n−1c (aαx)i must No Error: In this case, v = 0 and there is no error in Y. α α i=0 i have roots a1,a2,...,an−α. It followsPthat c(x) must have Then, aα+1,aα+2,...,an as roots. Recall that c(x) also has roots Y = UG a1,a2,...,an−α. Since n−1 ≥ d = 2α, we have n−α ≥ k G¯ α+1. Hence,c(x) hasn distinctrootsof a1,a2,...,an. This = [Z Z ] k is impossible since the degree of c(x) is at most n−1. Thus, 1 2 (cid:20) G¯k∆ (cid:21) c∆6∈Cα. = [Z1G¯k+Z2G¯k∆] . (7) One advantage of the proposed scheme is that it can now Multiplying G¯T and Y in (7), we have [10], k operateonasmallerfinitefieldthanthatoftheschemein[11]. Another advantage is that one can choose G¯ (and ∆ accord- G¯TkY = G¯TkUGk ingly)freely as longas it is the generationmatrix of an [n,α] = [G¯TkZ1G¯k+G¯TkZ2G¯k∆] RS code. In particular, as discussed in Section I, to minimize = P +Q∆ . (8) updatecomplexity,itisdesirabletochooseageneratormatrix where the row with the maximum Hamming weight has the Since Z and Z are symmetric, P and Q are symmetric least number of nonzero elements. Next, we present a least- 1 2 as well. The (i,j)th element of P +Q∆, 1 ≤ i,j ≤ k and update-complexity generator matrix that satisfies (4). i6=j, is Corollary 1: Let ∆ be the one given in Theorem 1. Let G¯ bethegeneratormatrixofasystematic[n,α]RScode,namely, p +q a(j−1)α , (9) ij ij G¯ =[D|I] and the (j,i)th element is given by where p +q a(i−1)α . (10) ji ji b00 b01 b02 ··· b0(n−α−1) Since a(j−1)α 6= a(i−1)α for all i 6= j, p = p , and q =  b10 b11 b12 ··· b1(n−α−1)  ij ji ij D =  b2...0 b21 b2...2 ··· b2(n−...α−1) (6,) qojbit,aicnoemd.biNnointegt(h9a)tawnde o(1n0ly),othbetaivnalkue−s1ofvapliujeasnfdorqiejacchanrobwe   of P and Q since no elements in the diagonal of P or Q are  b(α−1)0 b(α−1)1 b(α−1)2 ··· b(α−1)(n−α−1)  obtained. I is the (α×α) identity matrix, and TodecodeP,recallthatP =G¯TZ G¯ .P canbetreatedas k 1 k xn−α+i =ui(x)g(x)+bi(x) for 0≤i≤α−1 . aofpoG¯rt,ioitniosftehaesycotdoewseoerdthvaetctGo¯r,iGs¯TkaZg1eG¯ne.rBaytotrhemcaotrnixstroufctitohne G¯ [n,k−1] RS code. Hence, each row in the matrix G¯TZ G¯ is Then, G = (cid:20) G¯∆ (cid:21) is a least-update-complexity generator a codeword. Since we have known k−1 componentskin1each matrix. row of P, it is possible to decode G¯TZ G¯ by the error-and- Proof:The resultholdssince eachrowof G¯ isa nonzero erasure decoder of the [n,k−1] RS kcod1e.2 codeword with the minimum Hamming weight n−α+1. Since one cannot locate any erroneous position from the decoded rows of P, the decoded α codewords are accepted IV. EFFICIENTDECODINGSCHEME FOR as G¯TZ G¯. By collecting the last α columns of G¯ as G¯ to ERROR-CORRECTING MSR CODES k 1 α find its inverse (here it is an identity matrix), one can recover Unlikethedecodingschemein[11]thatuses[n,d]RScode, G¯TZ fromG¯TZ G¯ .Notethatα=k−1.Since anyαrows we propose to use the subcode of the [n,d] RS code, the inkG¯T1 are indekpen1dekntand thusinvertible,we can pickany α [n,α = k−1] RS code generated by G¯, to perform the data k of them to recover Z . Z can be obtained similarly by Q. 1 2 reconstruction.Theadvantageofusingthe[n,k−1]RScodeis two-fold.First,itserrorcorrectioncapabilityishigher(namely, 2 Theerror-and-erasuredecoderofan [n,k−1]RScodecansuccessfully it can tolerate ⌊n−k+1⌋ instead of ⌊n−d⌋ errors). Second, it decodeareceived vectorif s+2v<n−k+2,where sistheerasure(no 2 2 symbol) positions, v is the number of errors in the received portion of the only requires the access of two additional storage nodes (as received vector, and n−k+2 is the minimum Hamming distance of the opposed to d−k+2=k nodes) for the first error to correct. [n,k−1]RScode. Multiple Errors: Before presenting the proposed decoding [20,10,18] MSR Codes for Data Reconstruction algorithm,wefirstprovethatadecodingprocedurecanalways 1 successfullydecodeZ andZ ifv ≤⌊n−k+1⌋andallstorage 0.9 1 2 2 nodes are accessed. Due to space limitation, all proofs are n 0.8 omitted in this section. uctio 0.7 Assumethestoragenodeswitherrorscorrespondtotheℓ0th, str ℓ1th, ..., ℓv−1th columns in the received matrix Yα×n. Then, con 0.6 e G¯TYα×n e of R 0.5 = G¯TUG+G¯TE Rat 0.4 = G¯T[Z1Z2](cid:20) G¯G¯∆ (cid:21)+G¯TE Failure 00..23 = [G¯TZ1G¯+G¯TZ2G¯∆]+G¯TE , (11) 0.1 Previous Algorithm [14] Proposed Algorithm where 0 0.5 0.4 0.3 0.2 0.1 0 E =h0α×(ℓ0−1)|eTℓ0|0α×(ℓ1−ℓ0−1)|···|eTℓv−1|0α×(n−ℓv−1)i . Node Failure Probability Fig.1. Failure-rate comparisonbetweentheprevious algorithm in[11]and theproposedalgorithm for[20,10,18]MSRcode Lemma 1: Thereare atleast n−k+2errorsineach ofthe ℓ0th, ℓ1th, ..., ℓv−1th columns of G¯TYα×n. Wenexthavethemaintheoremtoperformdatareconstruction. it is necessary to perform integrity check after the original data is reconstructed.Two verification mechanismshave been Theorem 2: Let G¯TYα×n = P˜ +Q˜∆. Furthermore, let Pˆ suggestedin[11]:cyclicredundancycheck(CRC)andcrypto- be the corresponding portion of decoded codeword vector to graphichashfunction.Bothmechanismsintroduceredundancy P˜ and E =Pˆ⊕P˜ be the error pattern vector. Assume that P to the original data before they are encoded and are suitable the data collector accesses all storage nodes and there are v, to be used in combination with the decoding algorithm. 1 ≤ v ≤ ⌊n−k+1⌋, of them with errors. Then, there are at 2 The progressive decoding algorithm starts from accessing least n−k+2−v nonzero elements in ℓ th column of E , j P k storage nodes. Error-and-erasure decoding succeeds only 0≤j ≤v−1, and at most v nonzero elements in the rest of when there is no error. If the integrity check passes, then columns of E . P the data collector recovers the original data. If the decoding Theabovetheoremallowsustodesignadecodingalgorithm procedure fails or the integrity check fails, then the data that can correctup to ⌊n−k+1⌋ errors.3 In particular,we need 2 collectorretrievestwomoreblocksofdatafromtheremaining to examinethe erroneouspositionsin G¯T E. Since 1≤v ≤ k+3 storage nodes. Since the data collector has k +2 blocks of ⌊n−k+1⌋, we have n−k+2−v ≥⌊n−k+1⌋+1>v. Thus, 2 2 data, the error-and-erasuredecoding can correctly recoverthe the way to locate all erroneous columns in P˜ is to find out originaldataifthereisonlyoneerroneousstoragenodeamong all columns in E where the number of nonzero elements in P the k+1 nodes accessed. If the integrity check passes, then themaregreaterthanorequalto ⌊n−k+1⌋+1.Afterwelocate 2 the data collector recovers the original data. If the decoding all erroneous columns we can follow a procedure similar to procedure fails or the integrity check fails, then the data that given in the no error case to recover Z from Pˆ. 1 collectorretrievestwomoreblocksofdatafromtheremaining The above decoding procedure guarantees to recover Z 1 storage nodes. The data collector repeats the same procedure when all n storage nodes are accessed. However, it is not until it recovers the original data or runs out of the storage very efficient in terms of bandwidth usage. Next, we present nodes. The detailed decoding procedure is summarized in a progressivedecodingversionof the proposedalgorithmthat Algorithm 1. only accesses enough extra nodes when necessary. Before The proposeddata reconstructionalgorithmforMSR codes presenting it, we need the following corollary. is evaluated by Monte Carlo simulations. It is compared Corollary 2: Consider that one accesses k + 2v storage with the previous data reconstruction algorithms in [11]. nodes, among which v nodes are erroneous and 1 ≤ v ≤ Each data point is generated from 103 simulation results. ⌊n−2k+1⌋. There are at least v+2 nonzero elements in ℓJth Storage nodes may fail arbitrarily with the Byzantine failure column of EP, 0 ≤ j ≤ v −1, and at most v among the probability ranging from 0 to 0.5. [n,k,d] and m are chosen remaining columns of E . P to be [20,10,18] and 5 respectively. Figure 1 shows that the BasedonCorollary2,wecandesignaprogressivedecoding proposedalgorithm can successfully reconstructthe data with algorithm [14] that retrieve extra data from remaining storage much higher probability than the one presented in [11] at nodes when necessary. To handle Byzantine fault tolerance, the same node failure probability. For example, at the node failure probability of 0.1, data for about 1 percent of node 3 In constructing P˜ we only get n−1 values (excluding the diagonal). failure patterns cannot be reconstructed using the proposed SincetheminimumHammingdistanceofan[n,k−1]RScodeisn−k+2, theerror-and-erasure decodingcanonlycorrect upto ⌊n−1−k+2⌋errors. algorithm.Ontheotherhand,dataforover50percentsofnode 2 Algorithm1:DecodingofMSRCodesBasedon(n,k−1) a properly chosen systematic generator matrix. More impor- RS Code for Data Reconstruction tantly, the decoding scheme leads to an efficient decoding begin scheme that can tolerate more errors at the storage nodes, v =0; j =k; and access additional storage nodes only when necessary. A The data collector randomly chooses k storage nodes progressive decoding scheme was thereby devised with low and retrieves encoded data, Yα×j; communication overhead. while v ≤⌊n−k+1⌋ do Possible future work includes extension of the encoding 2 Collect the j columns of G¯ corresponding to and decoding schemes to MBR points, and the study of accessed storage nodes as G¯ ; encoding schemes with optimal update complexity and good j Calculate G¯TjYα×j; regenerating capability. Construct P˜ and Q˜ by using (9) and (10); Perform progressive error-and-erasuredecoding ACKNOWLEDGMENT on each row in P˜ to obtain Pˆ; This work was supported in part by CASE: The Center Locate erroneous columns in Pˆ by searching for for Advanced Systems and Engineering, a NYSTAR center columns of them with at least v+2 errors; for advanced technology at Syracuse University; the National assume that ℓe columns found in the previous ScienceofCouncil(NSC)ofTaiwanundergrantsno.99-2221- action; E-011-158-MY3 and NSC 101-2221-E-011-069-MY3; US Locate columns in Pˆ with at most v errors; National Science Foundation under grant no. CNS-1117560 assume that ℓc columns found in the previous and McMaster University new faculty startup fund. action; if (ℓe =v and ℓc =k+v) then REFERENCES Copy the ℓ erronous columns of Pˆ to their e corresponding rows to make Pˆ a symmetric [1] A. G. Dimakis, P. B. Godfrey, M. Wainwright, and K. Ramchandran, “Networkcodingfordistributedstoragesystems,”inProc.of26thIEEE matrix; International Conference on Computer Communications (INFOCOM), Collect any α columns in the above ℓ Anchorage, Alaska, May2007,pp.2000–2008. c columns of Pˆ as Pˆ and find its [2] A.G.Dimakis,P.B.Godfrey,Y.Wu,M.Wainwright,andK.Ramchan- α dran, “Network coding for distributed storage systems,” IEEE Trans. corresponding G¯α; Inform.Theory,vol.56,pp.4539–4551,September2010. Multiply the inverse of G¯ to Pˆ to recover [3] Y. Wu, A. G. Dimakis, and K. Ramchandran, “Deterministic regen- α α G¯TZ ; erating codes for distributed storage,” in Proc. of 45th Annual Aller- j 1 ton Conference on Control, Computing, and Communication, Urbana- Recover Z1 by the inverse of any α rows of Champaign, Illinois, September 2007. G¯T; [4] Y.Wu,“Existenceandconstructionofcapacity-achievingnetworkcodes j Recover Z from Q˜ by the same procedure; for distributed storage,” IEEEJournal onSelected Areasin Communi- 2 cations, vol.28,pp.277–288,February2010. Recover m˜ from Z1 and Z2; [5] D. F. Cullina, “Searching for minimum storage regenerating codes,” if integrity-check(m˜) = SUCCESS then California Institute ofTechnology SeniorThesis,2009. return m˜; [6] Y.WuandA.G.Dimakis, “Reducing repairtrafficforerasurecoding- based storage via interference alignment,” in Proc. IEEE International j ←j+2; SymposiumonInformationTheory,Seoul,Korea,July2009,pp.2276– 2280. Retrieve 2 more encoded data from remaining [7] K.V.Rashmi,N.B.Shah,P.V.Kumar,andK.Ramchandran,“Explicit storage nodes and merge them into Yα×j; constructionofoptimalexactregeneratingcodesfordistributedstorage,” v ←v+1; in Proc. of 47th Annual Allerton Conference on Control, Computing, andCommunication, Urbana-Champaign, Illinois, September2009,pp. return FAIL; 1243–1249. [8] S.Pawar,S.E.Rouayheb,andK.Ramchandran,“Securingdynamicdis- tributedstoragesystemsagainsteavesdroppingandadversarialattacks,” arXiv:1009.2556v2 [cs.IT]27Apr2011. [9] F. Oggier and A. Datta, “Byzantine fault tolerance of regenerating failure patterns cannot be reconstructed using the previous codes,”arXiv:1106.2275v1 [cs.DC]12Jun2011. algorithmin [11]. The advantageofthe proposedalgorithmis [10] K.V.Rashmi,N.B.Shah,andP.V.Kumar,“Optimalexact-regenerating also overwhelming in the average number of accessed nodes codesfordistributedstorageattheMSRandMBRpointsviaaproduct- matrix construction,” IEEE Trans. Inform. Theory, vol. 57, pp. 5227– fordatareconstruction.Duetospacelimitation,thesimulation 5239,August2011. results are omitted. [11] Y. S. Han, R. Zheng, and W. H. Mow, “Exact regenerating codes for byzantine fault tolerance in distributed storage,” in Proc. of the IEEE INFOCOM2012,Orlendo,FL,March2012. V. CONCLUSION [12] K. Rashmi, N. Shah, K. Ramchandran, and P. Kumar, “Regenerating codesforerrorsanderasuresindistributedstorage,”inProc.ofthe2012 In this work we proposed a new encoding scheme for the IEEEInternationalSymposiumonInformationTheory,Cambridge,MA, [n,2α] error-correctingMSR codesfromthe generatormatrix July2012. of any[n,α] RS codes. Itgeneralizesthe previouslyproposed [13] A. S. Rawat, S. Vishwanath, A. Bhowmick, and E. Soljanin, “Update efficientcodesfordistributedstorage,”inProc.ofthe2011IEEEInter- MSR codes in [11] and has several salient advantages. It national Symposium on Information Theory, Saint Petersburg, Russia, allows the constructionof least-update-complexitycodes with July2011. [14] Y. S. Han, S. Omiwade, and R. Zheng, “Progressive data retrieval for distributednetworkedstorage,”IEEETrans.onParallelandDistributed Systems, vol.23,pp.2303–2314, December2012. [20,10,18] MSR Codes for Data Reconstruction 20 19 ) s e d 18 o N ( s 17 e s s e 16 c c A of 15 r e b 14 m u N 13 e g ra 12 e v A Previous Algorithm [14] 11 Proposed Algorithm 10 0.5 0.4 0.3 0.2 0.1 0 Node Failure Probability