Series on Applied Mathematics Volume 18 POOLING DESIGNS AND NONADAPTIVE GROUP TESTING Important Tools for DNA Sequencing Ding-Zhu Du Frank K Hwang World Scientific POOLING DESIGNS AND NONADAPTIVE GROUP TESTING Important Tools for DNA Sequencing SERIES ON APPLIED MATHEMATICS Editor-in-Chief: Frank Hwang Associate Editors-in-Chief: Zhong-ci Shi and U Rothblum Vol. 1 International Conference on Scientific Computation eds. T. Chan and Z.-C. Shi Vol. 2 Network Optimization Problems — Algorithms, Applications and Complexity eds. D.-Z. Du and P. M. Pandalos Vol. 3 Combinatorial Group Testing and Its Applications by D.-Z. Du and F. K. Hwang Vol. 4 Computation of Differential Equations and Dynamical Systems eds. K. Feng and Z.-C. Shi Vol. 5 Numerical Mathematics eds. Z.-C. Shi and T. Ushijima Vol. 6 Machine Proofs in Geometry by S.-C. Chou, X.-S. Gao and J.-Z. Zhang Vol. 7 The Splitting Extrapolation Method by C. B. Liem, T. Lu and T. M. Shih Vol. 8 Quaternary Codes by Z.-X. Wan Vol. 9 Finite Element Methods for Integrodifferential Equations by C. M. Chen and T. M. Shih Vol. 10 Statistical Quality Control — A Loss Minimization Approach by D. Trietsch Vol. 11 The Mathematical Theory of Nonblocking Switching Networks by F. K. Hwang Vol. 12 Combinatorial Group Testing and Its Applications (2nd Edition) by D.-Z. Du and F. K. Hwang Vol. 13 Inverse Problems for Electrical Networks by E. B. Curtis and J. A. Morrow Vol. 14 Combinatorial and Global Optimization eds. P. M. Pardalos, A. Migdalas and R. E. Burkard Vol. 15 The Mathematical Theory of Nonblocking Switching Networks (2nd Edition) by F. K. Hwang Vol. 16 Ordinary Differential Equations with Applications by S. B. Hsu Vol. 17 Block Designs: Analysis, Combinatorics and Applications by D. Raghavarao and L. V. Padgett Vol. 18 Pooling Designs and Nonadaptive Group Testing — Important Tools for DNA Sequencing by D.-Z. Du and F. K. Hwang POOLING DESIGNS AND NONADAPTIVE GROUP TESTING Important Tools for DNA Sequencing Ding-Zhu Du University of Texas at Dallas, USA and Xi'an Jiaotong University, China Frank K Hwang National Chiao Tung University, Taiwan, ROC \[p World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONGKONG • TAIPEI • CHENNAI Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. POOLING DESIGNS AND NONADAPTIVE GROUP TESTING Important Tools for DNA Sequencing Series on Applied Mathematics — Vol. 18 Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 981-256-822-0 Printed in Singapore by World Scientific Printers (S) Pte Ltd Preface Group testing has been well-known for its various applications in blood testing, chemical leakage testing, electric shortage testing, coding, multi-access channel com munication ..., among others, which are well-documented in our book Combinatorial Group Testing and Its Applications (1993, 2nd ed. 2000). However, its applica tion into molecular biology, especially in the design of screening experiment, is quite recent and still under tremendous development (a group testing algorithm for this application is often referred to as pooling design). For example, in our 1993 book, we mentioned this application only on one page. In the 2000 edition we covered this topic in three chapters. But now we have to write a whole book to report its progress, and to make painful choices of what to include. The new application also brings new problems to the theory of group testing. We mention the three fundamental differences. 1. To achieve a given screening objective, the number of biological experiments required is huge and each such experiment is time-consuming (compared to electronic testing). Thus it is of utmost importance to use parallel, or nonadaptive, testing where all experiments can be performed parallelly, or at least in a few rounds. Tra ditionally, the theory of group testing has been focused on sequential testing, due to its requirement of fewer number of tests. But the focus of pooling design is now shifted to nonadaptive kind. Since a nonadaptive design is just a binary matrix, many mathematical tools can now be brought to group testing for judicious use. 2. A biological experiment is known for its unreliability, as versus electronic testing which carries high accuracy. Therefore error-detection and error-correcting, long ignored in the traditional group testing theory, have to be dealt with squarely. A very lucky and unexpected turn-out is that not only the newly developed theory of pooling design can deal with errors, but in a highly structured way in the sense that treatment of errors is not of the ad-hoc type and all such treatments in different models have the same features. 3. Attempts have been made before to set up traditional group testing in graph models, but such attempts are superfluous in the sense that only terminology, but no v VI Preface substance, is graph-theoretic. In the complex model of pooling design, we now have a genuine extension of group testing to graph testing from a real need in molecular biology. Not only the theory of group testing is enriched, the extension also provides many new interesting and challenging problems to graph theory. A one-semester graduate course on the first nine chapters of this book was taught at the National Chiaotung University at Hsinchu in Spring 2005. Chapters 6 and 7 were taught as a 3-week short course at the National Center of Theoretical Sciences. The second author thanks the institutions for financial support and the students for participation. We wish to thank Dr. Hong Zhao, Dr. Ying Liu and Dr. My Thai for their help during preparation of this book. Contents Preface v Chapter 1 Introduction 1 1.1 Group Testing 1 1.2 Nonadaptive Group Testing 4 1.3 Applications in Molecular Biology 7 1.4 Pooling Designs for Two Simple Applications 13 1.5 Pooling Designs and Mathematics 15 1.6 An Outline of the Book 17 References 18 Chapter 2 Basic Theory on Separating Matrices 21 2.1 d-Separable and d-Separable Matrices 21 2.2 d-Disjunct Matrices 25 2.3 The Minimum Number of Pools for Given d and n 27 2.4 Combinatorial Bounds for d-Disjunct Matrices with Constant Weight 30 2.5 Asymptotic Lower and Upper Bounds 33 2.6 (d, r)-Disjunct Matrices 40 2.7 Error-Tolerance 45 References 52 Chapter 3 Deterministic Designs 55 3.1 t-Designs and i-Packing 55 3.2 Direct Construction 57 3.3 Explicit Construction of Selectors 59 3.4 Grid Designs 60 3.5 Error-Correcting Code 63 3.6 Transversal Designs 65 vii viii Contents 3.7 The d = 2 Case 70 References 79 Chapter 4 Deterministic Designs from Partial Orders 83 4.1 Subset Containment Designs 83 4.2 Partial Order of Faces in a Simplicial Complex 84 4.3 Monotone Graph Properties 90 4.4 Partial Order of Linear Spaces over a Finite Field 92 4.5 Atomic Poset 95 References 97 Chapter 5 Random Pooling Designs and Probabilistic Analysis 99 5.1 Introduction to Random Designs 99 5.2 A General Approach to Compute Probabilities of Unresolved Clones — 100 5.3 Random Incidence Designs 106 5.4 Random k-Set Designs 109 5.5 Random r-Size Designs Ill 5.6 Random Distinct fc-Set Designs 113 5.7 Intersection Pooling Designs 115 5.8 Subset Containment Designs in Extended Use 118 5.9 Edge-Representative Decoding with r = 2 and d = 3 122 5.10 Some Trivial 2-Stage Pooling Designs 126 References 128 Chapter 6 Pooling Designs on Complexes 131 6.1 Introduction 131 6.2 A Construction of (H : d; ,z)-Disjunct Matrix 134 6.3 (d, r; ^-Disjunct Matrix 136 6.4 Constructions for (d, r; ^-Disjunct Matrices 139 6.5 Random Designs 144 6.6 Trivial Two-stage Pooling Designs for Complete r-graphs 148 6.7 Sequential Algorithms for H 152 T References 162 Chapter 7 Contig Sequencing 165 7.1 Introduction 165 7.2 Some Probability Analysis of a /c-subset 166 7.3 Sequential Algorithms 170 7.4 Nonadaptive Algorithms for Matching 172 7.5 The 3-Stage Procedure 184 References 188 Contents ix Chapter 8 The Inhibitor Model 189 8.1 Introduction 189 8.2 1-Round Algorithm 190 8.3 Sequential and fc-Round Algorithms 193 8.4 Some Other Inhibitor Models 195 References 198 Chapter 9 Hyperplane Designs 201 9.1 Introduction 201 9.2 m-Dimensional Arrays 203 9.3 A K x K Decomposition of K 207 T c n 9.4 Efficiency 215 9.5 Other Transversal Designs 219 9.6 Two Recent Applications 221 References 222 Chapter 10 Non-unique Probe Selection 225 10.1 Introduction 225 10.2 Complexity of Pooling Designs 227 10.3 Complexity of Minimum Pooling Designs 228 10.4 Approximations of Minimum Pooling Designs 231 References 231 Index 233