ebook img

Learning Immune-Defectives Graph through Group Tests PDF

0.73 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning Immune-Defectives Graph through Group Tests

Learning Immune-Defectives Graph through Group Tests Abhinav Ganesan, Sidharth Jaggi, and Venkatesh Saligrama, Senior Member, IEEE Abstract—This paper deals with an abstraction of a uni- [3].Someofthesepathogenicproteinsmightshareacommon fied problem of drug discovery and pathogen identification. inhibitory mechanism against a lead compound which serves Pathogen identification involves identification of disease-causing to distinguish them from the non-pathogenic ones [3]. So, biomolecules. Drug discovery involves finding chemical com- 5 pounds, called lead compounds, that bind to pathogenic pro- finding potential pathogenic proteins amidst a large collection 1 teins and eventually inhibit the function of the protein. In of biomolecules by testing them against known inhibitory 0 this paper, the lead compounds are abstracted as inhibitors, compoundsisaproblemcomplementarytotheproblemoflead 2 pathogenicproteinsasdefectives,andthemixtureof“ineffective” compounddiscovery.Theleadcompoundscanbeabstractedas chemical compounds and non-pathogenic proteins as normal g inhibitoritems,thepathogenicproteinsasdefectiveitems,and items. A defective could be immune to the presence of an u the others as normal items. Now, the above problems can be inhibitor in a test. So, a test containing a defective is positive A iff it does not contain its “associated” inhibitor. The goal of combined to be viewed as an inhibitor-defective classification 6 this paper is to identify the defectives, inhibitors, and their problem on the mixture of pathogenic and non-pathogenic 1 “associations” with high probability, or in other words, learn proteins,andbillionsofchemicalcompounds.Thisunifiesthe the Immune Defectives Graph (IDG) efficiently through group process of finding both the pathogenic proteins and the lead tests. We propose a probabilistic non-adaptive pooling design, ] compounds. An efficient means of solving this problem could T a probabilistic two-stage adaptive pooling design and decoding I algorithms for learning the IDG. For the two-stage adaptive- potentially be applied in high-throughput screening for drugs . pooling design, we show that the sample complexity of the and pathogens or computer-assisteddrug and pathogen identi- s c numberoftestsrequiredtoguaranteerecoveryoftheinhibitors, fication.Anaturalconsiderationisthat,whilesomepathogenic [ defectives, and their associations with high probability, i.e., the proteins might be inhibited by some lead compounds, other upperbound,exceedstheproposedlowerboundbyalogarithmic 3 multiplicativefactorinthenumberofitems.Tobeprecise,lower pathogenic proteins might be immune to some of these lead v and upper bounds of Ω((r+d)logn+rd) and O(rdlogn) compounds present in the mixture of items. In other words, 5 tests respectively are identified for classifying r inhibitors and d each defective item is possibly immune to the presence of 5 defectives amongst n items, and their associations. For the non- someinhibitoritemssothatitsexpressioncannotbeprevented 5 adaptive pooling design, we show that the upper bound (given by the presence of those inhibitors when tested together. By 0 by O((r +d)2logn) tests) exceeds the proposed lower bound 0 (given by min(cid:110)Ω((r+d)logn+rd),Ω(cid:16) r2 logn(cid:17),Ω(cid:0)d2(cid:1)(cid:111) definition,aninhibitorinhibitsatleastonedefective.Learning . logr this inhibitor-defective interaction as well as classifying the 3 tests)byatmostalogarithmicmultiplicativefactorinthenumber inhibitors and defectives efficiently through group testing is 0 of items. 5 presented this work. 1 A representation of this model, which we refer to as the : v I. INTRODUCTION Immune-Defectives Graph (IDG) model, is given in Fig. 1. i The presence of a directed edge between a pair of vertices X Preliminary stages of drug discovery involve finding (cid:0) (cid:1) w ,w representstheinhibitionofthedefectivew by r ‘blocker’ or ‘lead’ compounds that bind to a biomolecular ik1 jk2 jk2 a the inhibitor w and the absence of a directed edge between target, which is a disease causing pathogenic protein, in order ik1(cid:0) (cid:1) apairofvertices w ,w indicatesthattheinhibitorw toinhibitthefunctionoftheprotein.Suchcompoundsarelater ik1 jk2 ik1 does not affect the expression of the defective w when used toproduce newdrugs. Theselead compoundshave tobe jk2 tested together. A formal presentation of the IDG model and identified amidst billions of chemical compounds [1], [2], and the goals of this paper appear in the next section. hence drug discovery is a tedious process. A complementary problem involves identifying pathogenic proteins amidst non- Example 1: An instance of the IDG model is given in Fig. pathogenic ones, both of which are structurally identical in 2. In this example, the outcome of a test is positive iff a some respects. For instance, out of five known species of defective wjk2, for some k2, is present in the test and its ebolavirus, only four of them are pathogenic to humans (see associated inhibitor w(cid:0)ik2 does no(cid:1)t appear in the test. Observe p. 5 in [2]) and a similar example can be found in arenavirus that if the item-pair wik1,wjk2 , for k1 (cid:54)= k2, appears in a test and w does not appear in the test, then the outcome ik2 (cid:0) (cid:1) This work was done when A. Ganesan was a Post Doctoral Fellow at is positive. Also, if the item-pair w ,w appears in a the Chinese University of Hong Kong, Hong Kong SAR (e-mail: abhi- ik2 jk2 test and if w also appears in the test but not w , then [email protected](oem-m).aSil.:[email protected])e.seV.USnailvigerrsaimtyaoifsHwoitnhgBKoostnogn, the test outcojmk2(cid:48)e is positive. But if the appearanceiokf2(cid:48) every University,Boston,USA(e-mail:[email protected]). defective w in a test is compensated by the appearance of A part of the content of this paper has been presented at the 2015 IEEE its associatejdk2(cid:48)inhibitor w in the test, then the test outcome InternationalSymposiumonInformationTheory(ISIT). ik2(cid:48) 2 the previous test outcomes, while in adaptive pooling designs, some constructed pools might depend on the previous test outcomes. A k-stage adaptive pooling design is comprised of pool construction and testing in k-stages, where the pools constructed for (non-adaptive) testing in the kth stage depend on the outcomes in the previous stages. While adaptive group testing requires lesser number of tests than non-adaptive group testing, the latter inherently supports parallel testing of multiple pools. Thus, non-adaptive group testing is more economical (because it allows for automation) as well as saves time (because the pools can be prepared all at once) whichareofconcerninlibraryscreeningapplications[8].The 1-inhibitor model has been extensively studied, and several adaptive and non-adaptive pooling designs for classification of the inhibitors and the defectives are known (refer, [9]– [12]). A detailed survey of known non-adaptive and adaptive Fig.1:ArepresentationoftheIDGModel,whereI represents pooling designs for the 1-inhibitor model is given in [13]. the set of inhibitors and D represents the set of defectives. The best (in terms of number of tests) known non-adaptive pooling design that guarantees high probability classification of the inhibitors and defectives is proposed in [13]. The non- is negative. The outcome of a test is also negative when none adaptive pooling design proposed in [13] requires O(dlogn) of the defectives appear in a test. tests in the r = O(d) regime and O(cid:16)r2 logn(cid:17) tests in d the d = o(r) regime to guarantee classification of both the inhibitors and defectives with high probability1. In the small inhibitor, i.e., r = O(d) regime, the upper bound on the number of tests matches with the lower bound while in the largeinhibitor,i.e.,d=o(r)regime,theupperboundexceeds (cid:16) (cid:17) thelowerboundofO r2 logn byalog r multiplicative dlogr d d factor.Nonetheless,the1-inhibitormodelconstrainsthatevery inhibitor must inhibit every defective, which is likely to be a tight requirement in practice. So, the IDG model is a more practical variant of the 1-inhibitor model. A formal presentation of the IDG model and the goals of this paper are given in the next section. Notations: The Bernoulli distribution with parameter p is denoted by B(p), where p denotes the probability of the Bernoulli random variable taking a value of one. The set of binary numbers is denoted by B. Matrices are indicated by Fig. 2: An example for the IDG Model where each defective boldface uppercase letters and vectors by boldface lowercase is associated with a distinct inhibitor so that r =d. letters. The row-i, column-j entry of a matrix M is denoted by M(i,j), and the coordinate-i of a vector y is denoted by y(i). All the logarithms in this paper are taken to the base The IDG model can also be viewed as a generalization two. The probability of an event E is denoted by Pr{E}. The of the 1-inhibitor model introduced by Farach et al. in [4]. notation f(n) ≈ g(n) represents approximation of a function This model was motivated by errors in blood testing where f(n)byg(n).Mathematically,theapproximationdenotesthat blocker compounds (i.e., inhibitors) block the expression of for every (cid:15) > 0, there exists n such that for all n > n , defectives in a test [5]. This is also motivated by drug dis- 0 0 1−(cid:15)< |f(n)| <1+(cid:15). covery applications where the inhibitors are actually desirable |g(n)| items that inhibit the pathogens [6]. In the 1-inhibitor model, atestoutcomeispositiveiffthereisatleastonedefectiveand II. THEIDGMODEL no inhibitors in the test. So, the presence of a single inhibitor Consider a set of items W indexed as w ,··· ,w com- is sufficient to ensure that the test outcome is negative. 1 n prised of r inhibitors, d defectives, and n − r − d normal Efficient testing involves pooling different items together items. It is assumed throughout the paper that r,d=o(n). in every test so that the number of tests can be minimized Definition 1: An item pair (w ,w ), for i(cid:54)=j, is said to be [7]. Such a testing methodology is called group testing. The i j associatedwhentheinhibitorw inhibitstheexpressionofthe pooling methodology can be of two kinds, namely non- i adaptiveandadaptivepoolingdesigns.Innon-adaptivepooling 1Thenumberofinhibitors,defectivesandnormalitemsaredenotedbyr, designs, any pool constructed for testing is independent of d,andn−d−r respectively. 3 defective w . An item pair (w ,w ), for i (cid:54)= j, is said to be defective sets is represented by E(I,D). Let Iˆ, Nˆ, Dˆ, and j i j non-associated if either the inhibitor w does not inhibit the Eˆ(Iˆ,Dˆ) denote the declared set of inhibitors, normal items, i expression of the defective w or if w is not an inhibitor or defectives, and declared association pattern between (Iˆ,Dˆ) j i if w is not a defective. respectively. The target is to meet the following error metric. j Ingeneral,thementionofanitempair(w ,w )neednotmean i j thatwi isaninhibitorandwj isadefective.Thisisunderstood max Pr(cid:110)(cid:16)Iˆ,Dˆ,Eˆ(cid:16)Iˆ,Dˆ(cid:17)(cid:17)(cid:54)=(I,D,E(I,D))(cid:111)≤cn−δ, from the context. I,D,E(I,D) Definition 2: Anassociationgraphisalefttorightdirected (1) bipartite graph BBB = (I,D,E), where the set of vertices (on the left hand side) I ={w ,w ,··· ,w }⊂W denotes the for some constants c,δ >0. We propose pooling designs and i1 i2 ir set of inhibitors, the set of vertices (on the right hand side) decodingalgorithms,andlowerboundsonthenumberoftests D ={w ,w ,··· ,w }⊂W denotes the set of defectives, required to satisfy the above error metric. It is assumed that j1 j2 jd andE isacollectionofdirectededgesfromI toD.Adirected the defective and the inhibitor sets are distributed uniformly edgee=(w ,w )∈E,fori∈{i ,··· ,i },j ∈{j ,··· ,j }, across the items, i.e., the probability that any given set of i k 1 r 1 d denotes that the inhibitor w inhibits the expression of the r+ditemsconstitutesallthedefectivesandinhibitorsisgiven i defective w . by 1 . It is also assumed that the association pattern k (n)(n−d) We refer to E(I,D) conditioned on the sets (I,D) to be d r E(I,D) is uniformly distributed over all possible association the association pattern on (I,D). patterns on (I,D). A pooling design is denoted by a test matrix M ∈ BT×n, WeconsidertwovariantsoftheIDGmodel.Thefirstbeing wherethejth itemappearsintheith testiffM(i,j)=1.Atest the case where the maximum number of inhibitors that can outcome is positive iff the test contains at least one defective inhibitanydefective,givenbyI ,isknown.Werefertothis without any of its associated inhibitors. A positive outcome is max model as the IDG with side information (IDG-WSI) model. denoted by one and a negative outcome by zero. For example, Fig. 2 represents a case where I =1. While It is assumed throughout the paper that the defectives are max it is known that I = 1, it is unknown which among the not mutually obscuring, i.e., a defective does not function as max items w ,··· ,w represent which inhibitors and defectives. an inhibitor for some other defective. In other words, the set 1 n For a given value of (r,d), not all positive integer values of of inhibitors I and the set of defectives D are disjoint. I ≤ r might be feasible. For instance, if (r,d) = (3,2), The goal of this paper is to identify the association graph, max then I = 1 is not feasible because, by definition, each or in informal terms, learn the IDG. Thus, the objectives are max inhibitor is associated with at least one defective. So, in the two-fold as represented by Fig. 3. IDG-WSI model, we assume that the given value of I is max 1) Identify all the defectives. feasibleforthe(r,d)tuple.Inparticular,if(c−1)d<r ≤cd 2) Identify all the inhibitors and also their association for some integer c ≥ 1, then I ≥ c. This immediately max pattern with the defectives. follows from the fact that each inhibitor must be associated with at least one defective. The other variant of the IDG model we consider in this paper is the case where there is no side information about the inhibitor-defective associations, which means that each defective can be inhibited by as many as r inhibitors. We refer to this model as the IDG-No Side Information (IDG- NSI) model. For both the models, the goals (as stated in the beginning of this section) are the same. The contributions of this paper for the IDG models are summarized below. • The sample complexity of the number of tests sufficient torecovertheassociationgraphwhilesatisfyingtheerror metric (1) using the proposed – non-adaptive pooling design is given by Fig. 3: Here, the presence of a directed arrow represents an TNA = O(cid:0)(r+d)2logn(cid:1) and TNA = associationbetweenaninhibitorandadefective.Theproblem O(cid:0)(Imax+d)2logn(cid:1) tests for the IDG-NSI statement is to identify the set of inhibitors I, defectives D and IDG-WSI models respectively (Theorem 1, and the association pattern E(I,D). Section III). – two-stage adaptive pooling design is given by T = A O(rdlogn) and T =O(I dlogn) tests for the Thisproblemisfurthermathematicallyformulatedasfollows. A max IDG-NSI and IDG-WSI models respectively (Theo- Denote the actual set of inhibitors, normal items, and defec- rem 2, Section III). tives by I, N, and D respectively so that I ∪N ∪D = W. Theactualassociationpatternbetweentheactualinhibitorand • InSectionIV(Theorem4andTheorem5),lowerbounds 4 of A generalization of the 1-inhibitor model, namely k- (cid:26) (cid:18) r2 (cid:19) (cid:27) inhibitor model was introduced in [15]. In the k-inhibitor max Ω((r+d)logn+rd),Ω logn ,Ω(d2) , model, an outcome is positive iff a test contains at least one logr defective and no more than k−1 inhibitors. So, the number (cid:26) (cid:18) I2 (cid:19) max Ω((r+d)logn+I d),Ω max logn , of inhibitors must be no less than a certain threshold k to max logImax cancel the effect of any defective. This model is different Ω(d2)(cid:9) from the model introduced in this paper because, in the IDG model, a single associated inhibitor is enough to cancel the are obtained for non-adaptive pooling designs for the effect of a defective. Further, none of the inhibitors might be IDG-NSI and IDG-WSI models respectively. The first able to cancel the effect of a defective because the defective lower bounds for both the models are valid for adaptive might not be associated with any inhibitor. A model loosely poolingdesignsalso.ThethirdlowerboundfortheIDG- relatedwiththe1-inhibitormodel,namelymutuallyobscuring WSImodelisvalidundersomemildrestrictionsonI max defectivesmodelwasintroducedin[16].Here,itwasassumed and r, the details of which are given in Theorem 5. that multiple defectives could cancel the effect of each other, The pooling design matrix M constructed in this paper use andhencetheoutcomeofatestcontainingmultipledefectives carefully chosen “random matrices”, i.e., the entries of the could be negative. Thus, a defective can also function as a matrices are chosen independently from a suitable Bernoulli inhibitor. However, in this paper, the sets of defectives and distribution.Suchmatricesareknowntopermiteaseofanaly- inhibitorsareassumedtobedisjoint.Thethreshold(classical) sis [14]. Notwithstanding the simplicity of the pooling design group testing model is where a test outcome is positive if construction, figuring out a good decoding algorithm with a the test contains at least u defectives, negative if it contains reasonable computational complexity and good lower bounds, no more than l defectives and arbitrarily positive or negative especially for non-adaptive pooling designs, is a challenging otherwise[17].Thismodelwascombinedwiththek-inhibitor task. The goodness of the pooling design, decoding algorithm model and non-adaptive pooling designs for the resulting tuple and the proposed lower bounds is measured in terms model was proposed in [18]. of the closeness of the upper bounds to the lower bounds A non-adaptive pooling design for the general inhibitor on the number of tests. For non-adaptive pooling designs, model was proposed in [19]. Here, the goal was to identify this can be observed from Table I. For the proposed adaptive allthedefectiveswithnopriorassumptiononthecancellation pooling design, the upper bound exceeds the lower bound by effect of the inhibitors on the defectives, i.e, the underlying at most a logn multiplicative factor for both IDG-NSI and unknown inhibitor model could be a 1-inhibitor, k-inhibitor IDG-WSI models. Also, the proposed decoding algorithms model, or even the ID model introduced in this paper. How- have a computational complexity of O(nT ) and O(nT ) ever, the difference from our work is that, we aim to identify NA A time units for the non-adaptive and adaptive pooling designs, theassociationgraphor,inotherwords,thecancellationeffect respectively.Thisintuitivelymeansthatanitemis“processed” oftheinhibitorsalsoapartfromidentificationofthedefectives. at most a constant number of times per test. But this cancellation effect does not include the k-inhibitor Extension of the results on the upper and lower bounds on model cancellation effect as noted earlier. Group testing on thenumberofteststothecasewhereonlyupperboundsonthe complexmodelwasintroducedin[20].Inthecomplexmodel, numberofinhibitors(givenbyR)anddefectives(givenbyD) a test outcome is positive iff the test contains at least one of are known instead of their exact numbers is straightforward. the defective sets. So, here the notion of defectives items is The target error metric in (1) is re-formulated as maximum generalized to sets of defective items called defective sets. errorprobability criterionover allcombinationsof numberof This complex model was combined with the general inhibitor inhibitors and defectives. The results for this case follow by model and non-adaptive pooling designs for identification of replacing r by R and d by D in the upper and lower bounds defectives was proposed in [21]. Our work is different from on the number of tests. [21] for the same reasons as stated for [19]. Group testing on There are various generalizations of the 1-inhibitor model bipartite graphs was proposed in [22] as a special case of the considered in the literature. These models are summarized in complex model. Here, the left hand side of the bipartite graph thefollowingsub-sectiontoshowthatthemodelconsideredin represents the bait proteins and the right hand side represents this paper, to the best of our knowledge, has not been studied the prey proteins. It is known a priori which items are baits in the literature. and which ones are preys. The edges in the bipartite graph represent associations between the baits and preys. A test outcome is positive iff the test contains associated items and A. Prior Works thegoalwastoidentifytheseassociations.Clearly,thismodel The 1-inhibitor model can be generalized in various direc- is different from the IDG because, in the IDG model, there tions, mostly influenced by generalizations of the classical are three types of items involved and the interactions between group testing model. The various generalizations are listed the three types of items are different from that in [22]. belowandbrieflydescribed.Thoughnoneofthesegeneraliza- Inthenextsection,weproposeaprobabilisticnon-adaptive tions include the model studied in this paper, it is worthwhile and a probabilistic two-stage adaptive pooling design and to understand the differences between these models and the decoding algorithms for both the variants of the IDG model IDG model. discussed this section. 5 TABLE I: Necessary and sufficient number of tests for various regimes of the number of inhibitors, defectives, and I are given. In the max large inhibitor regime, i.e., d = O(r) for the IDG-NSI model and d = O(I ) for the IDG-WSI model, the upper bounds exceed the max lower bounds by multiplicative factors of logr and logI for the IDG-NSI and IDG-WSI models respectively. In the small inhibitor max regime, i.e., r = o(d) for the IDG-NSI model and I = o(d) for the IDG-WSI model, the upper bounds exceed the lower bounds by max multiplicative factors of logn for both IDG-NSI and IDG-WSI models. Model d=O(r),d=O(Imax)(largeinhibitorregime) r=o(d),Imax=o(d)(smallinhibitorregime) IDG-WSI UpperBound:O(cid:0)r2logn(cid:1) UpperBound:O(d2logn) LowerBound:Ω(cid:16) r2 logn(cid:17) LowerBound:Ω(d2) logr IDG-NSI UpperBound:O(cid:0)I2 logn(cid:1) UpperBound:O(d2logn) (cid:18) max (cid:19) LowerBound:Ω Im2ax logn LowerBound:Ω(d2) logImax III. POOLINGDESIGNSANDDECODINGALGORITHM In this section, we propose a non-adaptive pooling design anddecodingalgorithmaswellasatwo-stageadaptivepooling design and decoding algorithm for the IDG-WSI Model. The pooling designs and decoding algorithms for the IDG- NSI model follows from those for the IDG-WSI Model by replacing I by r. max Non-adaptivepoolingdesign:Thepoolsaregeneratedfrom Test and identify thematrixMNA ∈BTNA×n.TheentriesofMNA arei.i.d.as defectives from B(p ). Test the pools denoted by the rows of M . Let the 1 NA outcome vector be given by y∈BTNA×1. The exact value of T is specified in (11) and (12) (where T =β logn) NA NA NA in Sub-section III-A, and its scaling is given in Theorem 1 (which appears before the beginning of Sub-section III-A). Fig. 4: The proposed two-stage adaptive pooling design The exact value of p is also given in Theorem 1. 1 (cid:76) scheme is demonstrated here. The symbol indicates that Adaptive pooling design: A set of pools are generated from thepoolingmatrixM istestedalongwiththeitemsuˆ which the matrix M1 ∈ BT1×n whose entries are i.i.d. as B(p1). are declared defective2s. The items non-associated withiuˆ are The pools denoted by the rows of M are tested first and all i 1 determined from the outcome vector y , for i=1,2,··· ,d. the defectives are classified from the outcome vector y ∈ ui 1 BT1×1. Denote the number of items declared defectives by dˆand the set of declared defectives by (cid:8)uˆ ,uˆ ,··· ,uˆ (cid:9). If 1 2 dˆ outcome tests in which an item participates. The second step dˆ(cid:54)=d, an error is declared. We keep these declared defectives will identify the inhibitors and their associations with the aside and generate another pooling matrix M2 ∈ BT2×(n−d), declared defectives using subsets of the outcome vector y whoseentriesarei.i.d.asB(p ),fortherestoftheitems.Now, 2 in the non-adaptive pooling design and the outcome vectors test the pools denoted by the rows of the matrix M along 2 y ,y ,··· ,y in the adaptive pooling design. with each of the items declared defectives and the outcomes uˆ1 uˆ2 uˆd Let us define the following notations2 with respect to the are denoted by yuˆ1,yuˆ2,··· ,yuˆd ∈ BT2×1. The two stages pools represented by MNA and M1 which are eventually of testing are done non-adaptively as represented in Fig. 4, useful in characterizing the statistics of the different types of and hence the pooling scheme is a two-stage adaptive pooling items that are used in the decoding algorithm. design. The exact values of p and p are given in Theorem 1 2 Notations: 2 (which appears before the beginning of Sub-section III-A). ThescalingofT andT arealsogiveninTheorem2andtheir • I(u) denotes the set of inhibitors that the defective u is 1 2 associated with. exactvaluesaregivenin(11)and(13)(where,T =β logn). The total number of tests is given by T1+dT2.i i • Fuk denotes the event that none of the inhibitors associ- ated with a defective u appears in a test, given that the The defectives are expected to participate in a higher k defective u appears in the test. fraction of positive outcome tests than the normal items or k the inhibitors. And, once the defectives are identified, tests • Di(j) ⊆P({u1,··· ,ud}) denotes the jth-set in the (arbi- trarily) ordered set of all i-tuple subsets of the defective of each one of them with rest of the items can be used to set denoted by D , for j =1,··· ,(cid:0)d(cid:1), where u denotes determine their associations. We show that this can be done i i i a defective and P{(u ,··· ,u )} denotes the power set non-adaptively as well. The decoding algorithm proceeds in 1 d of the set of defectives. two steps for both non-adaptive and adaptive pooling design. • D(s) denotes the defectives associated with the inhibitor The first step will identify the defectives from the outcome vectors y and y in the non-adaptive and adaptive pooling 1 2From hereon, we reserve the notation u to represent a defective, v to designs respectively, according to the fraction of positive representanormalitemandstorepresentaninhibitor. 6 s and its complement is given by D(s)=D−D(s). have the same statistics. • D(s)i denotes the (arbitrarily) ordered set of all i-tuple q(u) (cid:44)Pr(cid:8)y (l)=1|defective u is present in the lth-test(cid:9) subsets of the defective set D(s) and the jth-set in D(s) 1 1 i (j) ≥(1−p )|I(u)| ≥(1−p )Imax, (2) is denoted by D(s) . 1 1 i q(v) (cid:44)Pr(cid:8)y (l)=1|normal item v is present in the lth-test(cid:9) 2 1 Example 2: Realizations of the above notations for the (d)   association graph in Fig. 2 considered in Example 1 are given =(cid:88)d pi(1−p )d−i(cid:88)i Pr (cid:91) F (cid:44)q (3) below. The inhibitor set is given by I = {s ,··· ,s } ⊂ W 1 1 uk 2 andthedefectivesetisgivenbyD ={u1,···1,ud}⊂rW with i=1 j=1 uk∈Di(j)  r = d. An inhibitor si is associated with a distinct defective ≤(cid:88)d pi(1−p )d−i(cid:18)d(cid:19)=1−(1−p )d (cid:44)qUB, (4) ui, and so 1 1 i 1 2 i=1 • I(u) for u=ui is given by I(ui)={si}. q3(s) (cid:44)Pr(cid:8)y1(l)=1|Inhibitor s is present in the lth-test(cid:9) • Fwiuth1 trheeprdeesfeencttsivteheu1evdeonetsthnaotttahpepeinahriibnitaortess1t,agsisvoecniathteadt =|D(cid:88)(s)|pi(1−p )|D(s)|−i|D(cid:88)(s)i| Pr (cid:91) F , • Rtheealdiezfaeticotinvseouf1Daippfoeraris=in1t,h2eaterest.given by i=1 1 1 j=1 uk∈D(s)i(j) uk (5) (cid:12) (cid:12) if (cid:12)D(s)(cid:12)≥1, (cid:12) (cid:12) D ={{u },{u },··· ,{u }}, =0, otherwise. 1 1 2 d D ={{u ,u },{u ,u },··· ,{u ,u }, 2 1 2 1 3 1 d Since the outer and inner summations in (5) is over a subset {u2,u3},··· ,{u2,ud},··· ,{ud−1,ud}}. of those in (3), max q3(s) ≤q2(v) =q2. It is also intuitive that s positiveoutcomeforaninhibitorinatestislessprobablethan that for a normal item. The equality in (3) follows from the fact that a test outcome is positive iff at least one defective RealizationsofD(j) for(i,j)=(1,2)and(i,j)=(2,3) i appears in the test (which is captured by the outer summation are given by term)andnoneoftheinhibitorsassociatedwithatleastoneof these defectives appears in the test (which is captured by the unionoftheeventsF overu ).Asimilarexplanationholds uk k D(2) ={u },D(3) ={u ,u }. true for (5). The upper bound in (4) follows from the upper 1 2 2 1 4 boundofone ontheprobabilitytermsof (3).Inhindsight,the lower bound in (2) and the upper bound in (4) can be easily obtainedasfollows.Thelowerboundonthepositiveoutcome • D(s) for s = s1 is given by D(s1) = u1 and its statistics for a defective item in (2) follows from the worst complement is given by D(s )={u ,··· ,u }. 1 2 d case statistics when all the inhibitors inhibit the expression of • Realizations of D(s)i for s=s1 and i=1,2 everydefective.Theupperboundonthestatisticsforanormal item in (4) follows by using the best case positive outcome statistics, in the absence of inhibitors, where the appearance of any defective gives a positive test outcome. In the sequel, D(s ) ={{u },··· ,{u }}, 1 1 2 d we shall exploit the difference between (2) and (4) to identify D(s ) ={{u ,u },{u ,u },··· ,{u ,u }, 1 2 2 3 2 4 2 d the defectives notwithstanding the fact the one of them could {u3,u4},··· ,{u3,ud},··· ,{ud−1,ud}}. be loose bounds for specific association graphs. For example, (2) is tight for the 1-inhibitor model whereas (4) could be a loose upper bound for the same association graph, depending on the values of p, r, and d. However, fortunately, p can be (j) 1 Realizations of D(s)i with s = s1, for (i,j) = (1,2) chosenappropriatelysothattheloosenessintheboundsdonot and (i,j)=(2,3) are given by affect the scaling of the upper bound on the number of tests required to identify the defectives, and the dominant scaling is determined by the number of tests required to identify the (2) (3) association pattern. D(s) ={u },D(s) ={u ,u }. 1 3 2 2 5 Denote the worst case negative outcome statistic for a defective by b =1−(1−p )Imax. (6) max 1 We now define the following statistics corresponding to the differenttypesofitems.Thefollowingstatisticsalsoholdgood Denote the set of tests corresponding to outcome vector y when y is replaced by y, as entries of both M and M in which an item w participates by T (y) and the set of 1 NA 1 j wj 7 positive outcome tests in which the item w participates by j S (y), for j =1,2,··· ,n. The decoding algorithm is given wj as follows. 1) Step 1 (Identifying the defectives for both non-adaptive and adaptive pooling designs): For the non-adaptive pooling design, if |S (y)| > (cid:12) (cid:12) wj (cid:12)Twj(y)(cid:12)[1 − bmax(1 + τ))] with bmax as defined in (6), declare the item w to be a defective. For the j adaptive pooling design, we use the same criterion, replacingybyy .Denotethenumberofitemsdeclared Fig. 5: The underlying association graph for Example 3. 1 as defectives by dˆand the set of declared defectives by (cid:8)uˆ ,uˆ ,··· ,uˆ (cid:9).Ifdˆ(cid:54)=d,declareanerror.Denotethe 1 2 dˆ     the remaining unclassified items in the population by 1 1 0 0 1 0 (cid:8)w(cid:48),··· ,w(cid:48) (cid:9)(cid:44){w ,··· ,w }−{uˆ ,··· ,uˆ }. 0 1 0 1 0 1 1 n−d 1 n 1 d     2) Step 2 (Identifying the inhibitors and their associations MNA =0 1 1 0 1⇒y=1     for non-adaptive pooling design): 1 0 0 1 1 1 Let P denote the sets of pools in M that contain 0 0 1 1 1 0 k NA only the declared defective uˆk and none of the other Werecallthatcolumn-j ofthematrixM correspondstothe NA declared defectives, for k = 1,··· ,d. Also, let the itemw .ThethresholdforidentifyingthedefectivesinStep1 j outcomes corresponding to these pools be positive. This ofthedecodingalgorithmissuchthatanyitemw thatsatisfies j means that the pools in Pk do not contain any inhibitor the condition |Swj(y)| > 1 is declared to be a defective. Now, from the set I(uˆk), which denotes the set of inhibitors |Twj(y)| 2 observe the operation of the decoding algorithm. associated with the item uˆ if uˆ is indeed a defective. k k Step 1: We observe that Now,consideronlytheoutcomescorrespondingtothese pools denoted by y ⊂y,··· ,y ⊂y. The associa- |S (y)| 1 |S (y)| 2 |S (y)| 1 P1 Pd w1 = , w2 = , w3 = , tionsofthedeclareddefectivesareidentifiedasfollows. |T (y)| 2 |T (y)| 3 |T (y)| 2 w1 w2 w3 • For each k =1 to d, declare (wj(cid:48),uˆk) to be a non- |Sw4(y)| = 2,|Sw5(y)| = 1. associated inhibitor-defective pair if wj(cid:48) participates |Tw4(y)| 3 |Tw5(y)| 2 in at least one of the tests corresponding to the Items w and w are the only items that satisfy the condition 2 4 outcomevectoryPk anddeclaretherestoftheitems |Swj(y)| > 1, and hence are declared defectives. Therefore, to be associated with uˆk. |Twj(y)| 2 the declared defectives are given by uˆ = w , uˆ = w and 1 2 2 4 The items declared as non-associated for all k are the remaining unclassified items are given by w(cid:48) =w ,w(cid:48) = 1 1 2 declared to be be normal items. If Pk = {∅} for some w3,w3(cid:48) =w5. k, declare an error. Step 2: The “useful” pools used for identifying the “non- 3) Step 2 (Identifying the inhibitors and their associations associations” are obtained as P = {3},P = {4}. This is 1 2 for adaptive pooling design): because the third test outcome in which uˆ participates and 1 Let S(yuˆk) denote the set of positive outcome tests uˆ2 doesnotparticipateispositive,andthefourthtestoutcome corresponding to yuˆk, i.e., these pools do not contain in which uˆ2 participates and uˆ1 does not participate is also any inhibitor from the set I(uˆk) if uˆk is a defective. positive.Sincetheitemsw2(cid:48) andw3(cid:48) participateinthethirdtest, (w(cid:48),uˆ ) = (w ,w ) and (w(cid:48),uˆ ) = (w ,w ) are declared • For each k =1 to d, declare (wj(cid:48),uˆk) to be a non- to 2be n1on-assoc3iate2d inhibitor3-def1ective pa5irs 2and (w(cid:48),uˆ ) = associated inhibitor-defective pair if w(cid:48) participates 1 1 j (w ,w ) is declared to be an associated inhibitor-defective in at least one of the tests in the set S(y ) and 1 2 uˆk pair. Similarly, (w(cid:48),uˆ )=(w ,w ) and (w(cid:48),uˆ )=(w ,w ) declare the rest of the items to be associated with 1 2 1 4 3 1 5 4 are declared to be a non-associated item-pairs and (w(cid:48),uˆ )= uˆ . 2 2 k (w ,w ) is declared to be an associated inhibitor-defective 3 4 The items declared as non-associated for all k are pair. Since the item w(cid:48) =w is declared to be non-associated 3 5 declared to be be normal items. If S(yuˆk) = {∅} for with both uˆ1 and uˆ2, it is declared to be a normal item. some k, declare an error. We emphasize that this is a toy example to demonstrate the operation of the proposed decoding algorithm and not The following toy example demonstrates the operation of representative of the values of p or τ or T for the given NA theabovedecodingalgorithmfornon-adaptivepoolingdesign. values of r,d,n. Remark 1: (Step1)Thefirststepinthedecodingalgorithm, Example 3: Consider the following non-adaptive pooling which is the same for both the non-adaptive and adaptive design matrix M ∈ B5×5 and the outcome vector y ∈ pooling design, is similar to the defective classification al- NA B5×1fortheunderlyingassociationgraphshowninFig.5.The gorithm used in [13] for the 1-inhibitor model. The under- item w is a normal item. Here, r =d=2,n=5,T =5. lying common principle used is that there exists statistical 5 NA 8 difference between the defective items and the rest of the T = T + dT = O(I dlogn), where qUB and b A 1 2 max 2 max items. Hence, with sufficient number of tests, the defectives are defined in (4) and (6) respectively. can be classified by “matching” the tests in which an item Remark 3: The value of τ = 1−bmax−q2UB chosen in participatesandthepositiveoutcometests.Theitemsinvolved 2bmax the above theorems implies that the decoding algorithm de- in a large fraction of positive outcome tests are declared clares item w to be a defective if |Swj(y)|,|Swj(y1)| > to be defectives. A similar decoding algorithm was used in j |Twj(y)| |Twj(y1)| the classical group testing framework with noisy tests [23]. (1−bmax)+q2UB. This threshold is simply an average between 2 Here, the inhibitors of a defective item, if any, behave like the worse-case positive outcome statistic for a defective and a noise due to probabilistic presence in a test. The (worst the best-case positive outcome statistic for a normal item or case) expected number of positive outcome tests in which a an inhibitor. The values of p and p are chosen so that the 1 defective participates is at least |Twj(y)|[1−bmax]. Like in former is greater than the latter. [13], the Chernoff-Hoeffding concentration inequality [24] is Thefollowingsub-sectionconstitutestheproofoftheabove used to bound the error probability and obtained the exact theorems. The exact number of tests required to guarantee number of tests required to achieve a target (vanishing) error vanishing error probability for recovery of the association probability.Itisimportanttonotethat,apriori,itisnotclear graph are also obtained. The proof is exactly the same for if a fixed threshold technique can sieve the defectives under the IDG-NSI model, but replacing I by r. worstcasepositiveoutcomestatisticsandtherestoftheitems max under best case positive outcome statistics, with vanishing error probability. The fact that this is indeed possible will be proved in the following sub-section. Remark 2: (Step 2) In the IDG model, the inhibitors for A. Error Analysis of the Proposed Algorithm each defective might be distinct. Hence, an inhibitor for one defective behaves as a normal item from the perspective of another defective. This defective-specific interaction is absent As mentioned in Section II, we require that in the 1-inhibitor model. So, any inhibitor can be identified using any defective, i.e, an inhibitor’s behaviour is defective- (cid:110) (cid:16) (cid:17)(cid:111) invariantinthe1-inhibitormodel,whichwasexploitediniden- max Pr (I,D,E(I,D))(cid:54)= Iˆ,Dˆ,Eˆ(Iˆ,Dˆ ≤cn−δ, I,D,E(I,D) tifying the inhibitors in [13]. Since each inhibitor’s behaviour canbedefective-specificintheIDGmodel,weneedtoidentify the defectives first and then identify its associated inhibitors forsomeconstantc>0andfixedδ >0.Forthenon-adaptive by observing the interaction of the other items with each of pooling design, we find the number of tests T required NA these defectives. to upper bound the error probability of the first step of the The following theorems state the values of the parameters decoding algorithm by c1n−δ1 and that of the second step p1, p2, and τ, and the scaling of the number of tests required of the decoding algorithm by c2n−δ2, for some constants c1 fortheproposednon-adaptiveandadaptivepoolingdesignsto and c . A similar approach is taken for the two-stage adaptive 2 determine the association graph with high probability. Similar pooling design to find the number of tests T and the value of 1 resultscanbestatedfortheIDG-NSImodelbyreplacingImax T2.Finally,thevaluesofδ1 andδ2 arechosensothatthetotal by r in the following theorems. errorprobabilityisupperboundedbycn−δ,forsomeconstant Theorem 1 (Non-adaptive pooling design): Choose the c and given δ >0. pooling design matrix M of size T ×n with its entries NA NA 1) Error Analysis of the First Step: Since the first step of chosen i.i.d. as B(p ) with p = 1 for the IDG-WSI 1 1 3(Imax+d) the decoding algorithm is the same for both the non-adaptive model. Test the pools denoted by the rows of the matrix and adaptive pooling design, the bounds on the number of M non-adaptively. The scaling of the number of tests NA tests obtained below for adaptive pooling design applies for sufficient to guarantee vanishing error probability (1) using the proposed decoding algorithm with τ = 1−bmax−q2UB is the non-adaptive pooling design also. The three possible error given by T = O(cid:0)(I +d)2logn(cid:1), wher2ebmqaUxB and eventsinthefirststepofthedecodingalgorithmforbothnon- NA max 2 adaptive and adaptive pooling design are given by b are defined in (4) and (6) respectively. max Theorem 2 (Adaptive pooling design): Choose the pooling 1) A defective is not declared as one. design matrices M and M of sizes T ×n and T ×n with 2) A normal item is declared as a defective. 1 2 1 2 its entries chosen i.i.d. as i.i.d. B(p ) and B(p ) respectively, 3) An inhibitor is declared as a defective. 1 2 with p = 1 and p = 1 for the IDG-WSI model.1Test t3h(Iempaxo+odls) denoted2 by t2hIemarxows of the matrices Clearly, the defective that has the largest probability of a (cid:16) (cid:17) M non-adaptivelyandclassifythedefectives.Now,testeach negative outcome, given by b = max 1− q(u) , has 1 1max u 1 of the pools from M2 along with the d classified defectives the largest probability of not being declared as a defective. individually. The scaling of the number of tests sufficient to So, with T =β logn, the probability of the first error event 1 1 guarantee vanishing error probability (1) using the proposed for all the defectives can be upper bounded (using the union decoding algorithm with τ = 1−bmax−q2UB is given by bound over all defectives) as 2bmax 9 The term 1−b −q can be lower bounded as follows. max 2 d(cid:88)T1 (cid:32)T1(cid:33)pt(1−p )T1−t (cid:88)t (cid:32)t(cid:33)bv (1−b )t−v 1−bmax−q2 ≥(1−p1)Imax −(1−(1−p1)d) t 1 1 v 1max 1max t=0 v=tbmax(1+τ) ≥1−(Imax+d)p1. =d(cid:88)T1 (cid:32)T1(cid:33)pt(1−p )T1−t × The last lower bound above follows from the fact that t=0 t 1 1 (1−p1)Imax ≥(1−Imaxp1) and (1 − p1)d ≥ (1 − dp1). t (cid:32) (cid:33) Optimizing the denominator terms of (10) with respect to p , v=tb1max+t(bm(cid:88)ax−b1max+bmaxτ) vt bv1max(1−b1max)t−v fwoer shuafvfiecipe1nt=ly l3a(rIgmea1xn+dit).suHffiencecse,thuasting r,d = o(n) in (101), (≤a)d(cid:88)T1 (cid:32)Tt1(cid:33)pt1e−2t(bmax−b1max+bmaxτ)2(1−p1)T1−t 27(Imax+d)(cid:16)ln(nl−nnd−r) +δ1(cid:17)ln2 t=0 β ,β ≥ , (11) (=b)d(cid:104)1−p1+p1e−2(bmax−b1max+bmaxτ)2(cid:105)β1logn NA 1 (1−e−2) where T =β logn. (≤c)dexp(cid:110)−β1p1logn(cid:16)1−e−2(bmax−b1max+bmaxτ)2(cid:17)(cid:111)≤n−δ1 2) ErrNoAr AnalNysAis of the Second Step: In the error analysis of the second step, we assume that all the defectives have (⇐d)dexp(cid:8)−β p logn(cid:0)1−e−2(cid:1)(b −b +b τ)2(cid:9) 1 1 max 1max max been correctly declared. Errors due to error propagation from ≤n−δ1 the first step shall be analyzed later. (cid:0)lnd +δ (cid:1)ln2 Non-adaptive pooling design: ⇒β ≥ lnn 1 , 1 p (1−e−2)(b −b +b τ)2 The only error event for the non-adaptive pooling design in 1 max 1max max the second step is that there does not exist a set of pools where (a) follows from Chernoff-Hoeffding bound [24]3, (b) P such that they contain only the defective u and none of followsfrombinomialexpansion,(c)followsfromthefactthat k k 1−c≤e−c,and(d)followsfromthefactthat(cid:16)1−e−2x2(cid:17)≥ itsassociatedinhibitorsI(uk),andallitsnon-associateditems appearinatleastoneofsuchpools.Denotethiserroreventby (cid:0)1−e−2(cid:1)x2, for 0 < x < 1. Using the fact that b ≤ U(u ).Clearly,noneoftheinhibitorsassociatedwithu will 1max k k b , where b is defined in (6), the following bound on be declared as non-associated with u . This follows from the max max k β suffices. definition of the set of pools P and the decoding algorithm. 1 k (cid:0)lnd +δ (cid:1)ln2 Theprobabilityofthefavourableeventthatanon-associated β ≥ lnn 1 . (7) item appears along with a defective u , but none of its 1 p (1−e−2)(b τ)2 k 1 max associated inhibitors and none of the other defectives appear Similarly, to guarantee vanishing probability for the second in a pool from M is given by NA errorevent(union-boundedoverallnormalitems)andthethird error event (union-bounded over all inhibitors), it suffices that b(uk) (cid:44)p21(1−p1)|I(uk)|(1−p1)d−1. (cid:16)ln(nl−nnd−r) +δ1(cid:17)ln2 Now, probability of the error event U(uk) is upper bounded β ≥ , by 1 p (1−e−2)(1−b (1+τ)−q )2 1 max 2 β ≥ (cid:0)llnnnr +δ1(cid:1)ln2 , (8) Pr{U(uk)}≤(n−d−|I(uk)|)(cid:16)1−b(uk)(cid:17)TNA 1 (cid:16) (cid:17)2 p1(1−e−2) 1−bmax(1+τ)−q3(s) ≤(n−d−|I(uk)|)e−TNAb(uk) ≤n−δ2, if (cid:16) (cid:17) Sasiynmceptmotsaicxalqly3(s)red≤unqd2anatnfdorrall=vaol(unes),otfheτ.bSoou,nsdubinsti(tu8t)inigs βNA ≥ ln(n−dl−nn|Ib(u(ukk)|)) +δ2 ln2. the upper bound on q2 defined in (4) by q2UB, it suffices that Since (1−p1)|I(uk)| ≥ (1−p1)Imax ≥ (1−Imaxp1) and (cid:16)ln(n−d−r) +δ (cid:17)ln2 (1−p1)d−1 ≥(1−dp1), substituting for p1, it suffices that lnn 1 β1 ≥ p (1−e−2)(cid:0)1−b (1+τ)−qUB(cid:1)2 (9) β ≥ 81(I +d)2(cid:18)ln(n−d) +δ (cid:19)ln2. (12) 1 max 2 NA 4 max lnn 2 Now, the value of τ chosen to optimize the denominators of (7)and(9)isgivenbyτ = 1−bmax−q2(UB).Therefore,wehave Adaptive pooling design: 2bmax Like in non-adaptive pooling design the only error event, β ≥max(cid:40) 4(cid:0)llnnnd +δ1(cid:1)ln2 , (10) denotedbyE(uk),isthatitemswj notassociatedwithuk are 1 p1(1−e−2)(cid:0)1−bmax−q2UB(cid:1)2 declared as associated inhibitors, i.e., the item wk does not 4(cid:16)ln(nl−nnd−r) +δ1(cid:17)ln2 . anpopneeaorfinthaeniynhoifbitthoerspaosssioticvieatoeudtcwoimtheutkestwsilSl(byeukd)e.clCarleeadrlays, p1(1−e−2)(cid:0)1−bmax−q2UB(cid:1)2 non-associated with uk. Let T = β logn. The number of tests required to guar- 2 2 antee vanishing error probability for the error event E(u ) is 3Ifthetermbmax(1+τ)>1,thentheprobabilityoftheerroreventunder k considerationisequaltozero.So,itcanbeassumedthatbmax(1+τ)≤1. evaluated as follows. Let wj ∈/ I(uk). Define 10 B. Adaptation for the IDG-NSI Model a(wujk) (cid:44) Pr(cid:8)yuk(l)=1|wj is present in lth-test(cid:9) The only modification required in the pooling design and decodingalgorithmproposedfortheIDG-WSImodeltoadapt ≥(1−p2)|I(uk)| (cid:44)a(uk). it for the IDG-NSI model is that I is replaced by r. For max Now, we have the sake of clarity, we list the only changes below. Pr{E(uk)}≤(n−d−|I(uk)|)(cid:16)1−a(uk)p2(cid:17)T2 ≤n−δ2 1) The1 po,opling=de1s.ign parameters are chosen as p=p1 = (cid:16) (cid:17) 3(r+d) 2 2r ln(n−d−|I(uk)|) +δ ln2 2) In Step 1 of the decoding algorithm the threshold for lnn 2 ⇐ β ≥ . identifying the defectives is chosen as |S (y )| > 2 p2(1−p2)|I(uk)| |T (y )|[1−b (1+τ))],whereb =1−wj(1−1p )r. wj 1 max max 1 Using the fact that (1−p )|I(uk)| ≥ 1 − |I(u )|p , and Intuitively, this worst-case threshold corresponds to a 2 k 2 substituting p = 1 , we have the following bound. scenario where every inhibitor inhibits every defective, 2 2Imax i.e., the 1-inhibitor model. (cid:18) (cid:19) ln(n−d−|I(u )|) β2 ≥4Imax lnn k +δ2 ln2 3) The values of βNA, β1 and β2 are chosen as  (cid:16) (cid:17) ⇐ β2 ≥4Imax(cid:18)ln(lnn−n d) +δ2(cid:19)ln2. (13) βNA ≥max27(r+d) (l1n(−nl−nend−−2r)) +δ1 ln2 ,  3) Analysis of Total Error Probability: Assuming that the (cid:18) (cid:19) (cid:27) 81 ln(n−d) targettotalerrorprobabilityisO(n−δ),thevaluesofδ1 andδ2 4 (r+d)2 lnn +δ2 ln2 , need to be determined. Towards that end, define the following (cid:16) (cid:17) events. 27(r+d) ln(n−d−r) +δ ln2 lnn 1 Eij (cid:44) Event of declaring (wi,wj),i(cid:54)=j, to be an associated β1 ≥ (1−e−2) , pair, (cid:18)ln(n−d) (cid:19) β ≥4r +δ ln2. W (cid:44) Event that at least one actual defective has not been 2 lnn 2 declared as a defective. Hence, the total number of tests required for the IDG-NSI model scales as T = O(cid:0)(r+d)2logn(cid:1) for the non- Let E denote the correct association pattern for some realiza- NA adaptive pooling design and T = O(rdlogn) for the two- tion {I,D}. Now, the total probability of error is given by A stage adaptive pooling design.    (cid:91) (cid:91)  (cid:88) Inthenextsection,lowerboundsonthenumberoftestsfor Pr E W ≤ Pr{E }+Pr{W} ij ij non-adaptive and adaptive pooling designs are obtained.   (wi,wj)∈/E (wi,wj)∈/E (cid:88) (cid:88) (cid:88) (cid:88) ≤ Pr{Eij}+ Pr{Eij} (14) IV. LOWERBOUNDSFORNON-ADAPTIVEANDADAPTIVE wi(cid:54)=wjwj∈N∪I wi∈/I(wj)wj∈D POOLINGDESIGN + Pr{W} In this section, two lower bounds on the number of tests <n×2n−δ1 +dn−δ2 +n−δ1. (15) requiredfornon-adaptivepoolingdesignsforsolvingtheIDG- There are two possible ways in which the event E , for NSI and IDG-WSI problems with vanishing error probability ij are obtained. One of the lower bounds is simply obtained (w ,w ) ∈/ E, can occur. One possibility is that the item i j by counting the entropy in the system and this lower bound w has been erroneously declared as a defective in the first j also holds good for adaptive pooling designs. The other lower step of the algorithm, and hence any item w declared to i boundisobtainedusingalowerboundresultforthe1-inhibitor be associated with w is an erroneous association. The first j model which is stated below. We recall that all the inhibitors term in (14) represents this possibility. The other possibility inhibit the expression of every defective in the 1-inhibitor is that w has been correctly identified as a defective, but the j model. item w is erroneously declared to be associated with w . The i j Theorem 3 (Th. 1, [13]): An asymptotic lower bound on second term in (14) represents this possibility. The last term the number of tests required for non-adaptive pooling designs accounts for the fact that a defective might be missed out in in order to classify r inhibitors amidst d defectives and the first step of the algorithm. Note that the other two terms n − d normal items in the 1-inhibitor model is given by do not capture this error event. Finally, (15) follows from the (cid:16) (cid:17) error analysis of the first and second steps of the decoding Ω dlro2gr logn , in the d=o(r),r =o(n) regime4. d algorithm.Therefore,ifthetargeterrorprobabilityisO(n−δ), Thesecondlowerboundinthefollowingtheoremdominates then choose δ ,δ =δ+1. in the large inhibitor regime, i.e., the number of inhibitors 1 2 Recall that the number of tests required for non-adaptive is large compared to the number of defectives. It conveys and adaptive pooling designs are given by T =β logn NA NA and T = T + dT = (β + dβ )logn respectively. 4ThoughTheorem1in[13]isstatedfortheclassificationofboththedefec- A 1 2 1 2 tivesandinhibitorsinthe1-inhibitormodel,itisalsovalidforclassification Therefore, from (11), (12), and (13) we have that T = NA of inhibitors alone. This is because the entropy in the system is dominated O(cid:0)(Imax+d)2logn(cid:1) and TA =O(Imaxdlogn). bythenumberofinhibitors,inthelargeinhibitorregime.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.