ebook img

Statistics in Molecular Biology and Genetics: Selected Proceedings of a 1997 Joint AMS-IMS-SIAM Summer Conference on Statistics in Molecular Biology (Lecture Notes-Monograph Series) PDF

308 Pages·1999·17.02 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistics in Molecular Biology and Genetics: Selected Proceedings of a 1997 Joint AMS-IMS-SIAM Summer Conference on Statistics in Molecular Biology (Lecture Notes-Monograph Series)

Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Statistics in Molecular Biology and Genetics Selected Proceedings of a 1997 Joint AMS-IMS-SIAM Summer Conference on Statistics in Molecular Biology Francoise Seillier-Moiseiwitsch, Editor Volume 33 Published by the Institute of Mathematical Statistics and the American Mathematical Society Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Volume 33 Statistics in Molecular Biology and Genetics Selected Proceedings of a 1997 Joint AMS-IMS-SIAM Summer Conference on Statistics in Molecular Biology Francoise Seillier-Moiseiwitsch, Editor American Mathematical Society Providence, Rhode Island Institute of Mathematical Statistics Hayward, California Institute of Mathematical Statistics Lecture Notes-Monograph Series Editorial Board Andrew A. Barbour, Joseph Newton, and David Ruppert (Editor) The production of the IMS Lecture Notes-Monograph Series is managed by the IMS Business Office: Julia A. Norton, IMS Treasurer, and Elyse Gustafson, IMS Business Manager. This volume was co-published with the American Mathematical Society Library of Congress Catalog Card Number: 99-076060 International Standard Book Number 0-940600-47-1 Copyright © 1999 Institute of Mathematical Statistics All rights reserved Printed in the United States of America Ill TABLE OF CONTENTS Preface v F. Seillier-Moiseiwitsch Genetic Mechanisms On a Markov Model for Chromatid Interference 1 H. Zhao and T. Speed Population Genetics Some Statistical Aspects of Cytonuclear Disequilibria 21 5. Datta Diffusion Process Calculations for Mutant Genes in Nonstationary Populations 38 R. Fan and K. Lange The Coalescent with Partial Selfing and Balancing Selection: An Application of Structured Coalescent Processes 56 M. Nordborg Human Genetics Statistical Aspects of the Transmission/Disequilibrium Test (TDT) 77 W. Ewens Estimation of Conditional Multilocus Gene Identity among Relatives 95 E. Thompson and S. Heath Quantitative Genetics A Review of Methods for Identifying QTL's in Experimental Crosses 114 K. Broman and T. Speed Evolutionary Genetics Markov Chain Monte Carlo for the Bayesian Analysis of Evolutionary Trees from Aligned Molecular Sequences 143 M. Newton, B. Man and B. Larget Likelihoods on Coalescents: A Monte Carlo Sampling Approach to Inferring Parameters from Population Samples of Molecular Data 163 J. Felsenstein, M. Kuhner, J. Yamato and P. Beerli IV Uses of Statistical Parsimony in HIV Analyses 186 K. Crandall Linear Estimators for the Evolution of Transposable Elements 207 P. Joyce, L. Fox, N. Casavant and H. Wichman A Conditional Approach to the Detection of Correlated Mutations 221 M. Karnoub, F. Seillier-Moiseiwitsch and P.K. Sen Correlated Mutations in Protein Sequences: Phylogenetic and Structural Effects 236 A. Lapedes, B. Giraud, L. Liu and G. Stormo Sequence Motifs Compound Poisson Approximations for Occurences of Multiple Words ... 257 G. Reinert and 5. Schbath Protein Structure Deriving Interatomic Distance Bounds from Chemical Structure 276 M. Trosset and G. Phillips Protein Fold Class Prediction is a New Field for Statistical Classification and Regression 288 L. Edler and J. Grassmann PREFACE The papers in this volume were presented at one of the 1997 Summer Re- search Conferences in the Mathematical Sciences jointly sponsored by the In- stitute of Mathematical Statistics, the American Mathematical Society and the Society for Industrial and Applied Mathematics. The theme of the meeting was "Statistics in Molecular Biology and Genetics". That this volume is published jointly by the Institute of Mathematical Statistics and the American Mathe- matical Society reflects the emerging importance of Statistics in these fields. These papers fall into broad categories: population genetics, evolutionary genetics, protein structure, genetic mechanisms, quantitative genetics, human genetics and sequence motifs. While some of these areas have a long history of statistical input (and have motivated some mainstream statistical devel- opments), others are new statistical applications. The talks by Professors D. Botstein, M.-C. King and M. Olson underlined the great need for statistical expertise in cutting-edge biological technology. Their stimulating presentations treated us with wonderfully clear overviews of current directions in important areas of genetic research (namely, physical mapping, genetic mapping and func- tional genetics). The manuscripts underwent a rigorous review process: each was scrutinized by two anonymous referees. For their critical reviews, my gratitude goes to: D. Balding, L. Edler, W. Ewens, J. Felsenstein, J. Hein, P. Joyce, A. Kong, M. Kuhner, K. Lange, A. Lapedes, M. Man, P. Marjoram, K. Mengersen, M.- S. McPeek, M. Moehle, D. Nelson, M. Nordborg, I. Painter, A. Pluzhnikov, A. Ramaswami, G. Reinert, S. Sawyer, M. Stephens, E. Thompson and M. Trosset. I particularly want to thank Professors M. Waterman and P. Donnelly for their help in organizing the conference. Financial support for the meeting was provided by the National Science Foundation and the National Institutes of Health. Professor P. Donnelly deserves a double thank-you for dealing with the papers with which I had a conflict of interest. I extend my thanks to the anonymous referees he selected. Finally, I am enormously endebted to R. Budrevich for his tireless help in the preparation of this volume. F. Seillier-Moiseiwitsch Chapel Hill VI Statistics in Molecular Biology IMS Lecture Notes - Monograph Series (1999) Volume 33 ON A MARKOV MODEL FOR CHROMATID INTERFERENCE BY HONGYU ZHAO AND TERENCE P. SPEED Yale University School of Medicine and University of California, Berkeley Meiotic exchange between homologous chromosomes takes place after the formation of a bundle of four chromatids. Crossovers are precise breakage-and- reunion events. Random strand involvements (no chromatid interference) and random distribution of crossovers (no chiasma interference) are usually assumed in analyzing genetic data. In this paper, we discuss a Markov model for chro- matid interference. Closed form expression for the probability of any multilocus recombination/tetrad pattern is derived. Both chromatid interference and chi- asma interference can be studied together using this model. In particular, we discuss chi-square models for chiasma interference. 1. Introduction. In diploid cells, each chromosome is paired with its ho- mologue during meiosis . Each member of a given homologous pair has two identical sister chromatids, so that each synapsed paired structure consists of four chromatids. Usually one or more crossovers occur among the four chro- matids. A crossover is a precise breakage-and-reunion event occurring between two nonsister chromatids. The types of genetic data considered here are single spore data, in which the products of a single meiosis are recovered separately, and tetrad data, in which all four meiotic products are recovered together. A tetrad consists of four spores, each of which is haploid, encased in a structure called an ascus. In some organisms, such as Neurospora crassa (red bread mold), tetrads consist of four spores in a linear order corresponding to the meiotic divisions; these are called ordered tetrads. In other organisms, such as Saccharomyces cerevisiae (baker's yeast), the four spores are produced as a group without order and are called unordered tetrads. Griffiths et al. (1996) covers relevant genetic background. In this paper, genes (markers, loci) are denoted by script letters. For exam- ple, we use A and B to denote two genes. Alleles of A are denoted by A and a, while alleles of B are denoted by B and b. Suppose that A and B are on the same chromosome arm, and consider a diploid cell having AB and ab on ho- mologous chromosomes. There are four possible products at these loci resulting from meiosis of this cell, namely, AB, ab, Ab, and aB. The first two are called Supported in part by NIH grant HG-01093. AMS 1991 subject classifications. Primary 60J20; secondary 60P10. Key words and phrases. Chromatid interference, chiasma interference, Markov model, chi-square model, single spore data, tetrad data. H. ZHAO AND T. P. SPEED parental types or nonrecombinants, the other two types, Ab and aB, are called recombinants. If two markers are recombined by crossovers in a meiotic product, then during meiosis an odd number of crossovers must have occurred between the two markers on the strand carrying them. The proportion of recombinants, r^^, is called the recombination fraction. Because recombination fractions are not additive, genetic (or map) distance is used as an additive measure of distance between loci. Genetic distance between two markers is defined as the average number of crossovers per strand per meiosis between these markers. The unit of genetic distance is Morgan (M). Two markers are 1M apart if on average there is one crossover occurring on a single strand per meiosis between these two markers. In practice, centiMorgan (cM = 0.01M) is more commonly used in genetic mapping. The occurrence of crossovers cannot be observed directly and has to be in- ferred from observed recombination events. In the case of single spore data, a given meiotic product may be scored as recombinant or non-recombinant for each pair of markers. The map distance between two markers can be estimated from the observed recombination fraction. In the case of unordered tetrad data, there are three possible observed outcomes for each pair of markers: parental di- type when all four strands are non-recombinants, tetratype when exactly two of the four strands show recombination between the two markers, and nonparental ditype when all four strands are recombinants. The map distance between two markers can be estimated from the observed proportions of these three tetrad types. In the case of ordered tetrads bearing one marker Λ with alleles A and a, there are six distinguishable configurations: 1 2 3 4 5 6 A a a A A a A a A a a A a A a A a A a A A a A a Configurations 1 and 2 are called first division segregation (FDS) patterns and configurations 3 to 6 are called second division segregation (SDS) patterns. Because of random spindle to centromere attachments during meiosis, configu- rations 1 and 2 have the same probability, and the four configurations showing SDS pattern also have the same probability (Griffiths et al. 1996). The map distance between a marker and the centromere can then be estimated from the observed SDS proportion. To estimate map distance from the observed data, a model is needed which connects the process of crossover to the observed outcomes. Any model has to

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.