ebook img

Group Testing using left-and-right-regular sparse-graph codes PDF

0.19 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Group Testing using left-and-right-regular sparse-graph codes

Group Testing using left-and-right-regular sparse-graph codes Avinash Vem, Nagaraj T. Janakiraman, Krishna R. Narayanan Department of Electrical and Computer Engineering Texas A&M University {vemavinash,tjnagaraj,krn}@tamu.edu 7 Abstract—We consider the problem of non-adaptive group defectiveset shouldbe equalto 1 whereasin the probabilistic 1 testing of N items out of which K or less items are known to version one is interested in recovering all the defective items 0 be defective. We propose a testing scheme based on left-and- withhighprobability(w.h.p)i.e.,withprobabilityapproaching 2 right-regular sparse-graph codes and a simple iterative decoder. We show that for any arbitrarily small ǫ > 0 our scheme 1 asymptotically in N and K. Another variant of the proba- an orefqtuhireesdeofnelcytivme =itecmǫKs wloigthc1KhNigthestpsrotobarbeicloitvyer(w(.1h−.p)ǫ)i.fer.,acwtiiotnh tboilibsteicgvreearsteiornthisanthoart tehqeuaplrotboab(1ili−tyεo)f froecroavegryiveisnrεequ>ire0d. J probabilityapproaching1asymptoticallyinN andK,wherethe Fortheapproximaterecoveryversiononeisinterestedinonly 5 value of constants cǫ and ℓ are a function of the desired error recovering a (1−ǫ) fraction of the defective items (not the 2 fleqouoralǫtoan1d cfoornstvaanrtiouc1s =valucℓǫes(oobfseǫr)v.edMotorebeimappoprrtaonxitmlyattehlye whole set of defective items) w.h.p. s.IT] mcidtoeemfr=eapcttcilive2vxeKeityidltoeeogcmfoKOsdi(wlnoKg.gh.lKNpoa.lggtToKNerhs)itetsshweomhurircehhssuacilsshtsekmanaoresweurnevbca-tolloiivndbeereafsorortphctbeiomomwtahhpl.ounlAteoalitsssioeeotlnfeooasrsfl wonhetrhFeeaosrnttuhhmeebbceeorsmtobkfinnotaewtsontrsiaaclrehGqieuTviaretbhdielitibysebsΩto(ukKnndo2wllioosnggOKNlo()wKe[29r]lb,oog[u1Nn0d)] c and noisy versions of the problem as long as the number of [11], [12]. Most of these results were based on algorithms [ defectiveitemsscalesub-linearlywiththetotalnumberofitems, relyingonexhaustivesearchesthushaveahighcomputational 1 i.e., K = o(N). The simulation results validate the theoretical complexityofatleastO(K2NlogN).Onlyrecentlyascheme v results by showing a substantial improvement in the number of with efficient decoding was proposed by Indyk et al., [13] 7 tests required when compared to the testing scheme based on where all the defective items are guaranteed to recover using 7 left-regular sparse-graphs. 4 m=O(K2logN) tests in poly(K)·O(mlog2m)+O(m2) 7 time. 0 I. INTRODUCTION If we consider the probabilistic version of the problem, . 1 The problem of Group Testing (GT) refers to testing a it was shown in [7], [8] that the number of tests necessary 0 7 large population of N items for K defective items (or sick is Ω(KlogKN) which is the best known lower bound in 1 people) where grouping multiple items together for a single the literature. And regarding the best known achievability : test is possible. The output of the test is negative if all the bound Mazumdar [14] proposed a construction that has an iv groupeditems are non-defectiveor else the outputis positive. asymptotically decaying error probability with O(Klloogg2KN) X InthescenariowhenK ≪N,theobjectiveofGTistodesign tests. For the approximate version it was shown [8] that the r the testing scheme such thatthe totalnumberof tests m to be requirednumber of tests scale as O(KlogN) and to the best a performed is minimized. of our knowledge this is the tightest bound known. Thisproblemwasfirstintroducedtothefieldofstatistics In [15] authors Lee, Pedarsani and Ramchandran pro- by Dorfman [1] during World War II for testing the soldiers posed a testing scheme based on left-regular sparse-graph for syphilis without having to test each soldier individu- codes and a simple iterative decoder based on thepeeling ally. Since then group testing has found application in wide decoder, which are popular tools in channel coding [16], for variety of problems like clone library screening, non-linear the non-adaptive group testing problem. They refer to the optimization, multi-access communicationetc.., [2] and fields scheme as SAFFRON(Sparse-grAph codes Framewrok For like biology[3], machine learning[4], data structures[5] and gROuptestiNg),areferencewhichwewillfollowthroughthis signalprocessing[6].Acomprehensivesurveyongrouptesting document. The authors proved that using SAFFRON scheme algorithms,bothcombinatorialandprobabilistic,canbefound m=cǫKlogN numberof tests are enoughto identifyatleast in [2], [7], [8]. (1−ǫ) fraction of defective items (the approximate version of GT) w.h.p. The precise value of constant c as a function ǫ In the literature on Group Testing, three kinds of re- of the required error floor ǫ is also given. More importantly construction guarantees have been considered: combinatorial, the computational complexity of the proposed peeling based probabilistic and approximate. In the combinatorial designs decoder is only O(KlogN). They also showed that with for the GT problem,the probabilityof recoveryfor any given m = c · KlogKlogN tests i.e. with an additional logK III. REVIEW: SAFFRON factor,thewholedefectiveset(theprobabilisticversionofGT) can be recovered with an asymptotically high probability of As mentioned earlier the SAFFRON scheme [15] is 1−O(K−α). based on left-regular sparse graph codes and is applied for non-adaptive group testing problem. In this section we will Our Contributions briefly review their testing scheme, iterative decodingscheme (reconstruction of x given y) and their main results. The In this work, we propose a non-adaptive GT scheme SAFFRON testing scheme consists of two stages: the first that is similar to the SAFFRON but we employ left-and- stageis basedona left-regularsparse graphcodewhichpools right-regular sparse-graph codes instead of the left-regular theN itemsintoM non-disjointbinswhereeachitembelongs sparse-graphcodesandshowthatweonlyrequirecǫKlogcNǫKℓ to exactly ℓ bins. The second stage comprises of producingh number of tests for an error floor of ǫ in the approximate testingoutputsateachbinwheretheh differentcombinations version of the GT problem. Although the testing complexity of the pooled items (from the first stage) at the respective of our scheme has the same asymptotic order O(KlogN) as bin are defined according to a universal signature matrix. For that of [15], which as far as we are aware is the best known the first stage the authors consider a bipartite graph with N orderresultfortherequirednumberoftestsintheapproximate variable nodes (corresponding to the N items) and M bin GT, it provides a better explicit upper bound of Θ(Klog N) nodes. Each variable nodeis connectedto ℓ bin nodeschosen K with optimal computationalcomplexity O(Klog N) and also uniformly at random from the M available bin nodes. All K a significant improvementin the required number of tests for the variable nodes (historically depicted on the left side of finite values of K,N. Following the approach in [15] we the graph in coding theory) have a degree ℓ, hence the left- extend our proposed scheme with the singleton-only variant regular, whereas the degree of a bin node on the right is a of the decoder to tackle the probabilistic version of the GT random variable in the range [0:N]. problem. In Sec. V we show that for m = c·KlogKlog N K Definition 1 (Left-regular sparse graph ensemble). Let testsi.e.withanadditionallogK factorthewholedefectiveset G (N,M) be the ensemble of left-regular bipartite graphs canberecoveredw.h.p.Notethatthetestingcomplexityofour ℓ where for each variable node the ℓ right node connections scheme is only logK factor away from the best known lower are chosen uniformly at random from the M right nodes. bound of Ω(KlogN) [7] for the probabilistic GT problem. K We also extend our scheme to the noisy GT problem, where Let T ∈ {0,1}M×N be the adjacency matrix corre- thetestresultsarecorruptedbynoise,usinganerror-correcting G sponding to a graph G∈ G (N,M) i.e., each column in T codesimilartotheapproachtakenin[15].Wedemonstratethe ℓ G correspondstoa variablenodeandhasexactlyℓones.Letthe improvement in the required number of tests due to left-and- rowsin matrix T be givenby T =[tT,tT,...,tT ]T. For right-regulargraphsfor finite values of K,N via simulations. G G 1 2 M thesecondstagelettheuniversalsignaturematrixdefiningthe htestsateachbinbeU∈{0,1}h×N.Thentheoveralltesting II. PROBLEM STATEMENT matrix A:=[AT,...,AT ]T where A =Udiag(t ) of size 1 M i i Formally the group testing problem can be stated as h×N defines the h tests at ith bin. Thus the total number of following. Given a total number of N items out of which tests is m=M ×h. K are defective, the objective is to perform m different tests The signature matrix U in a more general setting with and identify the location of the K defective items from the parameters r and p can be given by test outputs. For now we consider only the noiseless group testing problem i.e., the result of each test is exactly equal to b1 b2 ··· br the boolean OR of all the items participating in the test.  b1 b2 ··· br  b b ··· b Let the support vector x ∈ {0,1}N denote the list of  π11 π21 πr1   b b ··· b  itemsinwhichtheindiceswith non-zerovaluescorrespondto Ur,p = π11 π21 . πr1  (1) the defective items. A non-adaptivetesting scheme consisting  ··· ..    of m tests can be represented by a matrix A ∈ {0,1}m×N b b ··· b  whereeachrowa correspondstoatest.Thenon-zeroindices  π1p−1 π2p−1 πrp−1 in row a correspiond to the items that participate in ith test. bπ1p−1 bπ2p−1 ··· bπrp−1 i The output corresponding to vector x and the testing scheme where bi ∈{0,1}⌈log2r⌉ is the binary expansion vector for i A and can be expressed in matrix form as: andb isthecomplementofb .πk =[πk,πk,...,πk]denotes i i 1 2 r a permutation chosen at random from symmetric group S . y=A⊙x r Henceforth U will refer to either the ensemble of matrices r,p where ⊙ is the usual matrix multiplication in which the generatedoverthe choicesofthe permutationsπk fork ∈[1: arithmetic multiplications are replaced by the boolean AND p−1] or a matrix picked uniformly at the random from the operation and the arithmetic additions are replaced by the saidensemble.Thereferenceshouldbesufficientlyclearfrom boolean OR operation. the context. In the SAFFRON scheme the authors employed a signature matrix from U with r = N and p = 3 thus [15] that enabled the authors to show that their SAFFRON r,p resulting in a U of size h×N with h=6log N. scheme with the described peeling decoder solves the group 2 testing problem with c·KlogN tests and O(KlogN) com- putational complexity. Decoding Lemma 3 (Bin decoder analysis). For a signature matrix Before describing the decoding process let us review U as described in (1), the bin decoder successfully detects r,p some terminology. A bin is referred to as a singleton if there and resolves if the bin is either a singleton or a resolvable isexactlyonenon-zerovariablenodeconnectedtothebinand double-ton. In the case of the bin being a multi-ton, the bin similarly referred to as a double-ton in case of two non-zero decoderdeclaresa wronghypothesisofeithera singletonora variablenodes.Inthe casewhereweknowtheidentityofone resolvabledouble-tonwith a probabilitynogreater than 1 . ofthemleavingthedecodertodecodetheidentityoftheother rp−1 one, the bin is referred to as a resolvable double-ton. And if Proof: This result was proved in [15] for the choice of the bin has more than two non-zero variable nodes attached parameters r = N and p = 3. The extension of the result to we refer to it as a multi-ton. First part of the decoder which general r,p parameters is straight forward. is referredto asbin decoderwillbe able to detectanddecode For convenience the performance of the peeling decoder exactly the identity of the non-zero variable nodes connected is analyzed independently of the bin decoder i.e., a peeling to the bin if and only if the bin is a singleton or a resolvable decoder is considered which assumes that the bin decoder double-ton.Ifthebinisamulti-tonthebindecoderwilldetect is working accurately which will be referred to as oracle it neither as a singleton nor a resolvable double-tonwith high basedpeelingdecoder.Anothersimplificationisthatapruned probability.Thesecondpartofthedecoderwhichiscommonly graphisconsideredwhereallthezerovariablenodesandtheir referred to as peeling decoder [17], when given the identities respective edges are removedfrom the graph. Also the oracle of some of the non-zero variable nodes by the bin decoder, based peeling decoder is assumed to decode a variable node identifies the bins connected to the recovered variable nodes if it is connected to a bin node with degree one or degree and looks for newly uncoveredresolvable double-tonin these twowith oneofthemalreadydecoded,in aniterativefashion. bins. This process of recovering new non-zero variable nodes Any right node with more than degree two is untouched by from already discovered non-zero variable nodes proceeds in this oracle based peeling decoder. It is easy to verify that the an iterative manner (referred to as peeling off from the graph original decoder with accurate bin decoding is equivalent to historically). For details of the decoder we refer the reader to thissimplifiedoraclebasedpeelingdecoderonaprunedgraph. [15]. Theoverallgrouptestingdecodercomprisesofthesetwo Definition 4 (Pruned graph ensemble). Let the pruned graph decoders working in conjunction as follows. In the first and ensemble G˜l(N,K,M) be the set of all bipartite graphs foremost step, given the m tests output, the bin decoder is obtained from removing a random N −K subset of variable applied on the M bins and the set of variable nodes that are nodes from a graph from the ensemble Gℓ(N,M). Note that connected to singletons are decoded and output. We denote graphs from the pruned ensemble have K variable nodes. the decoded set of non-zero variable nodes as D. Now in Before we analyze the pruned graph ensemble let us an iterative manner, at each iteration, a variable node from define the right-node degree distribution (d.d) of an ensemble D is considered and the bin decoder is applied on the bins as R(x) = R xi where R is the probability that a right- connectedtothisvariablenode.Themainideaisthatifoneof i i i thesebinsisdetectedasaresolvabledouble-tonthusresulting node in anyPgraph from the ensemble has degree i. Similarly the edge d.d ρ(x) = ρ xi−1 is defined where ρ is the in decoding a new non-zero variable node. The considered i i i variable node in the previous iteration is moved from D to probability that a randoPm edge in the graph is connected to a right-nodeofdegreei.Note thatthe left-degreedistributionis a set of peeled off variable nodes P and the newly decoded regular (i.e. L(x) = xℓ) even for the pruned graph ensemble non-zerovariablenodeinthepreviousiteration,ifany,willbe and hence is not specifically discussed. placedinsetDandcontinuetothenextiteration.Thedecoder is terminated when D is empty and is declared successful if Lemma5(Edged.dofPrunedgraph). Fortheprunedensem- the set P equals the set of defective items. ble G˜(N,K,M), it was shown that in the limit K,N →∞, ℓ ρ = e−λ and ρ = λe−λ where λ = ℓ/c for M = c K for Remark 2. Note that we are not literally peeling off the 1 2 ǫ ǫ any constant c . decoded nodes from the graph because of the non-linear OR ǫ operation on the non-zero variable nodes at each bin thus Lemma 6. For the pruned graph ensemble G˜(N,K,M) the ℓ preventing us in subtracting the effect of the non-zero node oracle-based peeling decoder fails to peel off atleast (1−ǫ) fromthemeasurementsofthebinnodeunlikeintheproblems fraction of the variable nodes with exponentially decaying of compressed sensing or LDPC codes on binary erasure probability if M ≥ c K where the required c and ℓ for ǫ ǫ channel. various values of ǫ are given in Table. I. Now we state the series of lemmas and theorems from ǫ 10−3 10−4 10−5 10−6 10−7 10−8 10−9 Note that A is defined by placing the r columns of U at the c1(ǫ) 6.13 7.88 9.63 11.36 13.10 14.84 16.57 i ℓ 7 9 10 12 14 15 17 r non-zero indices of ti and the remaining are padded with zero columns. We can observe that the total number of tests TABLEI for this scheme is m=M ×h where h=2plog r. CONSTANTSFORVARIOUSERRORFLOORVALUES 2 Example 9. Let us look at an example for (N,M) = (6,3) and (ℓ,r)=(2,4). Then the adjacency matrix T of a graph G G ∈ G (6,3) and a signature matrix U ∈ {0,1}4×3 for Proof: Instead of reworking the whole proof here from 2,4 p=1 and log r =2 are given by [15], we will list the main steps involved in the proof which 2 we will use further along. Let p be the probability that a 0 0 1 1 j 1 1 0 1 0 1 random defective item is not identified at iteration j of the 0 1 0 1 decoder, in the limit N and K →∞. Then one can write the TG =0 1 1 1 1 0U= 1 1 0 0 . 1 0 1 0 1 1   density evolution (DE) equations relating p to p as   1 0 1 0 j+1 j   pj+1 =[1−(ρ1+ρ2(1−pj))]ℓ−1. Then, the measurement matrix A with m = 2pM⌈log2r⌉ = 12 tests is given by For this DE, we can see that 0 is not a fixed point and hence p 9 0 as j → ∞. Therefore numerically optimizing the 0 0 0 1 0 1 j values of c and ℓ such that lim p ≤ǫ gives the optimal 0 1 0 0 0 1 ǫ j→∞ j values for c and ℓ given in Table. I. It was also shown [15], 1 1 0 0 0 0 ǫ   [16] that for such sparse graph systems the actual fraction 1 0 0 1 0 0   of the undecoded variable nodes deviates from the average 0 0 0 1 1 0   undecoded fraction of the variable nodes given by the DE A=0 0 1 0 1 0 with exponentially low probability. 0 1 1 0 0 0   0 1 0 1 0 0 Combining the lemmas and remarks above, the main   0 0 0 0 1 1 result from [15] can be summarized as below.   0 0 1 0 0 1   Theorem 7. A random testing matrix from the SAFFRON 1 0 1 0 0 0   scheme with m = 6c Klog N tests recovers atleast (1−ǫ) 1 0 0 0 1 0 ǫ 2   fraction of the defective items w.h.p of atleast 1− O(NK2). Definition 10 (Regular-SAFFRON). Let the ensemble of The computational complexity of the decoding scheme is testing matrices be G (N,M)×U where a graphG from ℓ,r r,p O(KlogN). The constant cǫ is given in Table. I for some Gℓ,r(N,M)andasignaturematrixUfromUr,p arechosenat values of ǫ. random and the testing matrix A is defined according to Eq. (2). Note that the total number of tests is 2pMlog r where 2 IV. PROPOSED SCHEME r = Nℓ. M The main difference between the SAFFRON scheme For the regular-SAFFRON testing ensemble defined in described in Sec. III and our proposed scheme is that we Def.10,weemploytheiterativedecoderdescribedinSec.III. usealeft-and-right-regularsparse-graphinsteadofleft-regular Similar to the SAFFRON scheme we will analyze the peeling sparse-graph in the first stage for the binning operation. decoder and the bin decoder separately and union bound the Definition 8 (Left-and-right-regular sparse graph ensemble). total error probability of the decoding scheme. As we have We define G (N,M) to be the ensemble of left-and-right- alreadymentionedtheanalysisofjustthepeelingdecoderpart ℓ,r regular graphs where the Nℓ edge connections from the left can be carried out by considering a simplified oracle-based andMr(=Nℓ)edgeconnectionsfromtherightarepairedup peeling decoder on a pruned graph with only the non-zero according to a permutation π chosen at random from S . variable nodes remaining. Nℓ Definition 11 (Pruned graph ensemble). We will define the Let T ∈ {0,1}M×N be the adjacency matrix corre- G prunedgraphensembleG˜ (N,K,M)asthesetofallgraphs sponding to a graph G ∈ G (N,M) i.e., each column in ℓ,r ℓ,r obtained from removing a random N −K subset of variable T corresponding to a variable node has exactly ℓ ones and G nodes from a graph from left-and-right-regular sparse-graph each row corresponding to a bin node has exactly r ones. And let the universal signature matrix be U ∈ {0,1}h×r ensemble Gℓ,r(N,M). chosen from the U ensemble. Then the overall testing r,p Note that graphs from the pruned ensemble have K matrixA:=[AT,...,AT ]T whereA ∈{0,1}h×N defining 1 M i variable nodes with a degree ℓ whereas the right degree is the h tests at ith bin is given by not regular anymore. A =[0,...,0,u ,0,...,u ,0,...,u ], where (2) i 1 2 r Lemma 12 (Edged.dofprunedgraph). Forthe prunedgraph ti =[0,...,0, 1, 0,..., 1, 0,..., 1]. ensembleG˜ (N,K,M)itcanbeshowninthelimitK,N → ℓ,r ℓ ℓ r= Nℓ r≤N c1K . . .. π .. c1Kbins .. π .. c1Kbins Nvars . Nvars . logN tests logr tests U∈{0,1}clogN×N U∈{0,1}clogr×r M =Θ(KlogN) M =Θ(Klog N) K Fig. 1. Illustration of the main differences between SAFFRON [15] on the left and our regular-SAFFRON scheme on the right. In both the schemes the peelingdecoderonsparsegraphrequiresΘ(K)bins.Butforthebindecoderpart,inSAFFRONschemetherightdegreeisarandomvariablewithamaximum valueofN andthusrequiresΘ(logN)testsateachbin.Whereasourschemebasedonright-regularsparsegraphhasaconstantrightdegreeofΘ(N)and K thusrequires onlyΘ(logN)testsateachbin.ThuswecanimprovethenumberoftestsfromΘ(KlogN)toΘ(KlogN). K K ∞ and K = o(N) that the edge d.d coefficients approach ǫ) fraction of the variable nodes with exponentially decaying ρ = e−λ and ρ = λe−λ where λ = ℓ/c for the choice of probabilityfor M =c K where ℓ,c for variousǫ is given in 1 2 ǫ ǫ M =cK, c being some constant. Table. I. Proof: We will first derive R(x) for the pruned graph Proof: We showed in Lemma. 12 that, in the limit of ensembleandthenusetherelationρ(x)= RR′′((x1))[16]toderive K,N → ∞, the edge degree distribution coefficients ρ1 and theedged.d.Notethatallthebinnodeshaveauniformdegree ρ2 approachthesamevaluesasintheSAFFRON scheme(see r before pruning. In the pruning operation we are removing Lem. 5). Now we follow the exact same approach as that of a N −K subset of variable nodes at random which means Lem.6wherethelimitingvaluesofρ1 =e−λ andρ2 =λe−λ from the bin node perspective, in an asymptotic sense, this is areusedintheDEequationstoshowthatforthegivenvalues equivalenttoremovingeachconnectededgewithaprobability of ℓ and cǫ limj→∞pj ≤ǫ. 1−β where β := KN. Under this process the right-node d.d Theorem 14. Let p ∈ Z such that K = o(N1−1/p). A can be written as random testing matrix from the proposed regular SAFFRON R1 =rβ(1−β)r−1, and similarly (3) ensemble Gℓ,cNǫKℓ (N,cǫK)×UcNǫKℓ ,p with m=c·Klog2 c2KN r tests recovers atleast (1−ǫ) fraction of the defective items Ri = βi(1−β)r−i ∀i<=r w.h.p. The computationalcomplexity of the decoding scheme (cid:18)i(cid:19) is O(Klog N). The constants are c= 2pc ,c = ℓ where ℓ thus giving us R(x)=(βx+(1−β))r. This gives us K ǫ 2 cǫ and c for various values of ǫ are given in Table. I. ǫ rβ(βx+(1−β))r−1 ρ(x)= Proof: It remains to be shown that for the proposed reg- rβ ular SAFFRON scheme the total probability of error vanishes =(βx+(1−β))r−1. asymptotically in K and N. Let E be the event of oracle- 1 Thus we can compute that ρ = (1−β)r−1 and ρ = (r− based peeling decoder terminating without recovering atleast 1 2 1)β(1−β)r−2. For M =cK we evaluate these quantities in (1−ǫ)K variablenodes.LetE betheeventofthebindecoder 2 the limit K,N →∞ as makinganerrorduringthe entiretyofthepeelingprocessand Nℓ−1 Ebin be the event of one instance of bin decoder making an K cK lim ρ1 = lim 1− error. The total probability of error Pe can be upper bounded K,N→∞ K,N→∞(cid:18) N(cid:19) by ℓ =e−λ where λ= c P ≤Pr(E )+Pr(E ) e 1 2 Similarly we can show lim ρ =λe−λ. ≤Pr(E )+Kℓ Pr(E ) K,N→∞ 2 1 bin Note that even if our initial ensemble is left-and-right- Kp ∈O regular the pruned graph ensemble has asymptotically the (cid:18)Np−1(cid:19) same degree distribution as in the SAFFRON scheme where where the second inequality is due to the union bound over the initial ensemble is left-regular. a maximum of Kℓ (number of edges in the pruned graph) Lemma 13. For the pruned graph ensemble G˜ (N,K,M) instances of bin decoding. The third line is due to the fact ℓ,r the oracle-based peeling decoder fails to peel off atleast (1− that Pr(E ) is exponentially decaying in K (see Lemma. 13) 1 and Pr(Ebin)=(cNǫKℓ )p−1 (see Lemma. 3 and Def. 10) VI. ROBUST GROUP TESTING V. TOTALRECOVERY: SINGLETON-ONLYVARIANT Inthissectionwe extendourschemetothegrouptesting problem where the test results can be noisy. Formally, the In this section we will look at the proposed regular- signal model can be described as SAFFRON scheme but with a decoder that uses only the singleton bins. To elaborate, the only difference is in the y=A⊙x+w, decoder which is not iterative in this framework and recovers the variable nodes connected to only the singleton bin nodes where w ∈ {0,1}N is an i.i.d. noise vector distributed and terminates.We will refer to this scheme as singleton-only according to Bernoulli distribution with parameter 0<q < 1 regular-SAFFRON scheme. The trade-off is that we can now 2 and the addition is over binary field. recoverthe whole defective set instead of just a large fraction of the defective items with an additional logK factor tests. Since we do notneedto be able to recoverresolvabledouble- Testing Scheme tonswe only need 2log r numberof tests at each bin i.e. we 2 choose p=1 for the signature matrix in Eqn. (1). In[15]fortherobustgrouptestingproblem,thesignature Theorem 15. Let K = o(N). For M = cαKlogK and matrixusedfornoiselessgrouptestingproblemismodifiedus- (ℓ,r) = (cαlogK,NK) a random testing matrix from the inganerrorcontrolcodesuchthatitcanhandlesingletonsand regular SAFFRON ensemble Gℓ,r(N,M)×Ur,1 with m = resolvable doubletons in the presence of noise. The binning 2cαKlogKlog2 KN tests the singleton-only decoder fails to operationas definedby the bipartite graphis exactly identical recover all the non-zero variable nodes with a vanishing to that of noiseless case. We describe the modifications to probability of O(K−α) where cα =e(1+α). the signature matrix and the bin detection decoding scheme as given in [15] for the sake of completeness and then state Proof: First we observe that for the choice of (ℓ,r) = the performance bounds for our scheme for the noisy group (c logK, N) number of bins M = Nℓ = c KlogK and α K r α testing problem. the number of tests in each bin is 2log N. From Lem. 3 2 K LetC bea binaryerror-correctingcodewith the follow- we know that a singleton bin is guaranteed to be decoded by n ing definition: the bin decoder. Thus it is enough if we show that for this choice for the numberof bins M all the variable nodesin the • Let the encoder and decoder functions be f :{0,1}n → opfru1ne−dOgr(aKph−αar)e. connected to atleast one singleton bin w.h.p {0,1}Rn and g : {0,1}Rn → {0,1}n respectively where R is the rate of the code. In theprunedgraphensemble,foranyparticularvariable node,the probabilitythatanyofthe ℓconnectedbitnodesare For ease of analysis and tight upper bound for the number of not a singleton can be given by (1−R )ℓ where R is the tests we will use random codes and the optimal maximum- 1 1 probability that a bin node in the pruned graph ensemble is a likelihood decoder which gives us the properties: singleton.InthelimitK,N →∞thevalueofR approaches (from Eq. 3) 1 • There exists a sequence of codes {Cn} with the rate of each code being R satisfying R = lim rβ(1−β)r−1 1 K,N→∞ R<1−H(q)−δ =1+qlog q+qlog q−δ (4) K NK−1 2 2 = lim 1− N→∞(cid:18) N(cid:19) foranyarbitrarysmallconstantδsuchthattheprobability K =e−1 of error Pr(g(x+w)6=x) < 2−κn for some κ > 0. In Eqn. 4, q :=1−q. By using union bound over all the K variable nodes in the pruned graph, the probability P that the singleton-only Even though the computational complexity of using random e decoder fails to recover a defective item can be bounded by codes is exponential in block length of the code since the block length for our application is O(log N) and hence we K Pe ≤K(1−R1)ℓ have an overall computational complexity of O(N). But in practice one can use any of the popular error-correcting =O K(1−e−1)e(1+α)logK codes such as spatially-coupled LDPC codes or polar codes (cid:16) (cid:17) =O Ke−e−1e(1+α)logK which are known to be capacity achieving [18], [19] whose (cid:16) (cid:17) computational complexity is linear in block length. =O K−α . The modifiedsignaturematrix U′ can be describedvia (cid:0) (cid:1) r,p In third line we used (1−x)≤e−x. U given in Eq. (1) and encoding function f for C where r,p n n=⌈log r⌉ as follows: a singleton with probability no greater than p . The robust 2 rκ bin decoder wrongly declares a singleton with probability no f(b ) f(b ) ··· f(b ) 1 2 r greater than 1 .  f(b1) f(b2) ··· f(br)  rpκ+p−1 f(b ) f(b ) ··· f(b )  π11 π21 πr1  Proof: Let Ei be the event that the error-controldecoder U′r,p := f(·b·π·11) f(bπ21) ··...· f(bπr1)  (5) gth(ayti1P)r(cEoim)m=it2s−aκnloegrrro=r art−sκec.tTiohneir.oFbruosmt bEinqnd.ec(4o)dewremkinssoews   a singleton if the error-control decoder g(y ) commits an f(b ) f(b ) ··· f(b ) i1  π1p−1 π2p−1 πrp−1  error at any one section. Thus the probability of missing a f(bπ1p−1) f(bπ2p−1) ··· f(bπrp−1) singletoncanbeupperboundedbyapplyingunionboundover all the sections i∈[0:p−1] giving the required result. Then the overall testing matrix A is defined in identical fashion to the definition in Sec. III for the case of noiseless Consider a singleton bin and let the event where the case except that U will be replaced by U′ in Eqn. (5). robust bin decoder outputs a singleton hypothesis but the Formally it can be defined as A := [AT,...,AT ]T where wrong index be E . This event happens when the error- 1 M1 bin A = U′diag(t ) where the binary vectors t are defined in control decoder commits an error and outputs the exact same i i i Sec. III. wrong index at each and every section. We assume that given the error-control decoder makes an error, the output Decoding is uniformly random among all the remaining indices. Thus Pr(E ) can be upper bounded by 1 ( 1 )p−1 which upon Thedecodingschemefortherobustgrouptesting,similar bin rκ r1+κ simplification gives us the required result. tothecaseofnoiselesscase,hastwopartswiththepeelingpart Thefractionofmissedsingletonscanbecompensatedby of the decoder identical to that of the noiseless case whereas using M(1+ p ) instead of M such that the total number of the bin detection part differs slightly with an extra step of rκ singletons decoded will be M(1+ p )(1− p )≈M. decoding for the error control code involved. rκ rκ Given the test output vector at a bin y = Theorem 17. Let p ∈ Z such that K = o N1−1/p . [yT ,yT ,yT ,...,yT ]T, the bin detection for the noisy The proposed robust regular SAFFRON scheme u(cid:0)sing m =(cid:1) ca0se1 ca0n2be11summar(ipz−ed1)2as following: The decoder ∀i ∈ [0 : c · Klog2 cNǫKℓ tests recovers atleast (1 − ǫ) fraction of the p−1] applies the decoding function g(·) to the first segment defective items w.h.p. where c=2pβ(q)cǫ and β(q)=1/R. y in each section i and obtainsthe location l whose binary i1 i Proof: Similar to the noiseless case the total probability expansion is equal to the error-correcting decoder output of error P is dominated by the performance of bin decoder. g(y ). The decoder then declares the bin as a singleton if e i1 πi =l ∀i. l0 i Pe ≤Pr(E1)+Kℓ Pr(Ebin) Similarly given that one of the variable nodes connected Kp+pκ =Pr(E )+O to the bin is already decoded to be non-zero, the resolvable 1 (cid:18)Np−1+pκ)(cid:19) double-tondecodingcan be summarizedas following.Let the N(p−1)(1+κ) location of the already recovered variable node in the bin =O (cid:18) Np−1+pκ) (cid:19) (originally a double-ton) be l then the test output can be 0 given as ∈O(N−κ) yy0012 ff((bbll00)) ff((bbll11)) ww0012 awisnhddeurKee tt≤ohetNhsee(pcf−oac1n)td/tphliafnoterPirlsa(Erdg1ue)eeintsooeuxLgpehomnK.e,n1Nt6ia.lalnyddethceaytihnigrdinlinKe yyp1...12=ul0 ∨ul1 +w=ff((bb...ππll1p00))∨ff((bb...ππll1p11))+ww...1p12 In this sectVioInI.wSeIMwUillLAevTaIOluNatReEthSeULpTeSrformance of the where the location of the second non-zero variable node l proposedregular-SAFFRON scheme via Monte Carlo simula- 1 needs to be recovered.Given y=u ∨u +w and u , the tions and compare it with the results of SAFFRON scheme l0 l1 l0 first segments of each section in u +w can be recovered provided in [15] for both the noiseless and noisy models. l1 since for each segmentof u either the vectorf(b ) or it’s complement is available. Onl0ce the first section f(bπlk0 )+w Noiseless Group Testing πi of each segment i is recovered, we apply singleton dle1coding As per Thm. 14 the proposedregularSAFFRON scheme procedure and rules as described above. requires only 6c Klog Nℓ tests as opposed to 6c KlogN ǫ cǫK ǫ Lemma 16 (Robust Bin Decoder Analysis). For a signature tests of SAFFRON scheme to recover (1 − ǫ) fraction of matrixU′ asdescribedin(5),therobustbindecodermisses defective items with a high probability. We demonstrate this r,p 100 100 fectives 10−1 ectives 1100−−12 de 10−2 ef d d e d entifi 10−3 ntifie 10−3 d e uni nid 10−4 of 10−4 ℓ=3 fu n ℓ=5 o Fractio 10−5 ℓℓ==73, regular action 10−5 qq==00..0034 r ℓ=5, regular F 10−6 q=0.05 q=0.03,regular ℓ=7,regular 10−6 q=0.04,regular 100 200 300 400 500 600 700 10−7 q=0.05,regular Number of tests per defective item (m/K) 4 6 8 10 12 Number of tests (×105) Fig. 2. MonteCarlo simulations for K =100,N =216. Wecompare the SAFFRON scheme [15] with the proposed regular SAFFRON scheme for various left degrees ℓ∈{3,5,7}. Theplots inblue indicate the SAFFRON Fig. 3. MonteCarlo simulations forK =128,N =232. Wecompare the schemeandtheplotsinredindicateourregularSAFFRONschemebasedon SAFFRON scheme with the proposed regular-SAFFRON scheme for a left left-and-right-regular bipartite graphs. degree ℓ = 12. We fix the number of bins and vary the rate of the error controlcodeused.Theplotsinblueindicate theSAFFRONscheme[15]and the plots in red indicate the regular-SAFFRON scheme based on left-and- by simulating the performance for the system parameters right-regular bipartite graphs. summarized below. • We fix N =216 and K =100 of 4×7(4+2e)bits at each bin and the total number of tests • For ℓ∈{3,5,7} we vary the number of bins M =cK. m=28M(4+2e). • In Eqn. 1 the parameter p=2 is chosen for matrix U • Thus the bin detection size is h=6log2 cNKℓ VIII. CONCLUSION • Hence the total number of tests m=6cKlog2 cNKℓ (cid:0) (cid:1) WeaddressedtheGroupTestingproblemforK defective The results are shown in Fig. 2. We observe that there is items out of N items. Using left-and-right-regular sparse- clear improvement in performance for the proposed regular graph codes we propose a new construction for the testing SAFFRON scheme whencomparedto the SAFFRON scheme matrix based on that of Lee et al., [15]. We show that this for each ℓ∈{3,5,7}. improvesthe testing complexity upon the previous results for the approximate version of the Group Testing problem and Noisy Group Testing achievesasymptoticallyvanishingerrorprobabilityundersub- Similar to the noiseless group testing problem we simu- linear time, orderoptimal,computationalcomplexity.We also late the performanceof our robustregular-SAFFRON scheme show that the proposed scheme with a variant of the original and compare it with that of the SAFFRON scheme. For decoder has a testing complexity that is only logK factor convenienceof comparison we choose our system parameters away from the lower bound for the probabilistic version of identical to the choices in [15]. The system parameters are the Group Testing problem with order optimal computational summarized below: complexity. • N =232,K =27. We fix ℓ=12,M =11.36K REFERENCES • BSC noise parameter q ∈{0.03,0.04,0.05} • In Eqn. 1 the parameter p=1 is chosen for matrix U [1] R.Dorfman,“Thedetectionofdefectivemembersoflargepopulations,” • Thus the bin detection size is h=4log2 NMℓ [2] TDh.-eZA.nDnaulsaonfdMaFt.heKm.atHicwalanSgta,tiCstoicms,bvinoal.to1r4i,alnog.r4o,uppp.t4e3s6ti–n4g40a,n1d94i3ts. applications, vol.12. WorldScientific, 1999. The results are shown in Fig. 3. Note that for the above set [3] H.-B.Chenand F.K.Hwang, “A surveyonnonadaptive group testing of parameters the right degree r = Nℓ ≈ 26. We choose to algorithms through the angle of decoding,” Journal of Combinatorial M operate in field GF(27) thus giving us a message length of 4 Optimization, vol.15,no.1,pp.49–59,2008. [4] D.M.MalioutovandK.R.Varshney,“Exactrulelearning viaboolean symbols. For the choice of code we use a (4+2e,4) Reed- compressed sensing.,” in Proc. Int. Conf. on Machine Learning., Solomon code for e ∈ [0 : 8] thus giving us a column length pp.765–773, 2013. [5] M.T.Goodrich,M.J.Atallah,andR.Tamassia,“Indexinginformation fordata forensics,” inInternational Conference onApplied Cryptogra- phyandNetwork Security, pp.206–221, Springer, 2005. [6] A. Emad and O. Milenkovic, “Poisson group testing: A probabilistic model for nonadaptive streaming boolean compressed sensing,” in Int. Conf.onAcoustics,Speech&SignalProc.,pp.3335–3339,IEEE,2014. [7] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds andnovel algorithms,” IEEETrans. Inf. Theory,vol.60,no.5,pp.3019–3035, 2014. [8] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy grouptesting,”IEEETrans.Inf.Theory,vol.58,no.3,pp.1880–1901, 2012. [9] A.G.D’yachkovandV.V.Rykov,“Boundsonthelengthofdisjunctive codes,”ProblemyPeredachiInformatsii,vol.18,no.3,pp.7–13,1982. [10] P.Erdo¨s,P.Frankl, andZ.Fu¨redi, “Families offinite setsinwhich no setiscovered bytheunionofrothers,” IsraelJournal ofMathematics, vol.51,no.1,pp.79–89,1985. [11] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEETransactionsonInformationTheory,vol.10,no.4,pp.363–377, 1964. [12] E.Porat andA.Rothschild, “Explicit nonadaptive combinatorial group testing schemes,” IEEE Transactions on Information Theory, vol. 57, no.12,pp.7982–7989, 2011. [13] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non- adaptivegrouptesting,”inProceedingsofthetwenty-firstannualACM- SIAM symposium on Discrete Algorithms, pp. 1126–1142, Society for Industrial andAppliedMathematics, 2010. [14] A.Mazumdar,“Nonadaptivegrouptestingwithrandomsetofdefectives viaconstant-weight codes,”arXivpreprintarXiv:1503.03597, 2015. [15] K.Lee,R.Pedarsani, andK.Ramchandran, “Saffron: Afast, efficient, and robust framework for group testing based on sparse-graph codes,” arXivpreprintarXiv:1508.04485, 2015. [16] T. Richardson and R. Urbanke, Modern coding theory. Cambridge University Press,2008. [17] X. Li, S. Pawar, and K. Ramchandran, “Sub-linear time compressed sensingusingsparse-graphcodes,”inProc.IEEEInt.Symp.Information Theory,pp.1645–1649, 2015. [18] S. Kumar, A. J. Young, N. Macris, and H. D. Pfister, “Threshold saturationforspatiallycoupledldpcandldgmcodesonbmschannels,” IEEETrans.Inf.Theory,vol.60,no.12,pp.7389–7415,2014. [19] S. Kudekar, T. Richardson, and R. L. Urbanke, “Spatially coupled ensemblesuniversallyachievecapacityunderbeliefpropagation,”IEEE Trans.Inf.Theory,vol.59,no.12,pp.7761–7813,2013.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.