DNA STRUCTURE AND FUNCTION Richard R. Sinden Albert B. Alkek Institute of Biosciencesand Technology Centerfor Genome Research Department ofBiochemistryandBiophysics TexasA&M University Houston, Texas ACADEMIC PRESS AnImprintofElsevier SanDiego NewYork Boston London Sydney Tokyo Toronto Front cover:Visualization of the TATA-box binding protein (TBP) and its associated proteins that form a preinitiation complex in all eukaryotes for transcribing DNA to messenger RNA. Acrylic painting in collaborationwith Dr. S. K. Burley, Howard Hughes Medical Institute, Rockefeller University. Illustration copyright by Irving Geis, e This book isprintedon acid-free paper. Copyright © 1994by ACADEMICPRESS,INC. All Rights Reserved. No part of this publicationmay be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permissions maybesoughtdirectly fromElsevier'sScienceandTechnology RightsDepartmentinOxford, UK. Phone(44) 1865843830,Fax:(44) 1865 853333,e-mail: [email protected]. Youmayalsocomplete your requeston-line viatheElsevierhomepage: http://www.elsevier.com byselecting "CustomerSupport" andthen"Obtaining Permissions". Academic Press, Inc. AnImprintofElsevier 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition publishedby AcademicPressLimited 24-28 Oval Road, LondonNWI 7DX Library of Congress Cataloging-in-Publication Data Sinden, Richard R. DNAstructureandfunction / RichardR.Sinden. p. em. Includes bibliographicalreferencesand index. ISBN-13: 978-0-12-645750-6 ISBN-10: 0-12-645750-6 1. DNA. 2. Moleculargenetics. I. Title. QP624.S56 1994 574.87'3282--dc20 94-10464 CIP ISBN-13: 978-0-12-645750-6 ISBN-10: 0-12-645750-6 PRINTEDINTHEUNITEDSTATESOFAMERICA Transferred toDigital Printing2009 To my father and mother for their genetic contribution; to my wife, Jane for sharing her DNA; and to my children, David and Laura, may you treat the DNA ofyour ancestors with careand respect. Preface For years after description of the right-handed DNA double helix by Watson and Crick, DNA was viewed by many as a uniform molecule. With the genetic information encoded as a linear array of triplet codons, it seemed that the key to understanding the regulation of gene expression or the processes of replication, repair, and recombination would likely lie in the in teraction of specific proteins with regions of defined base sequences within uniform DNA molecules. However, within the past 15 years our understand ing of the complex nature of DNA structure has grown considerably. Defined, ordered sequence DNA (dosDNA) including inverted repeats, mirror repeats, direct repeats, and hornopurine-homopyrimidine elements can form a number of alternative DNA structures. Phased A tracts lead to stable DNA bending, inverted repeats can form cruciform structures, alternating purine pyrimidine sequences can form left-handed Z-DNA, and homopurine-ho mopyrimidine regions with mirror repeat symmetry can form intramolecular triplex structures. Other dosDNA sequences include A + T-rich regions that can exist as stably unwound regions at origins of DNA replication and at ma trix attachment regions, and guanine-rich regions at telomeres that can form triplex and quadruplex structures. The list of alternative conformations of DNA that can form in sequences found in the human genome, and in the genomes of other organisms, will continue to grow. Although enormous progress has been made in elucidating the struc tures formed by dosDNA elements, relatively little is definitively known xvii xviii Preface about the biology of alternative DNA conformations. The human genome is littered with homopurine-hornopyrimidine elements and alternating purine pyrimidine tracts that can form triplex structures and left-handed Z-DNA, re spectively. Probably every human gene that has been sequenced has one or more dosDNA elements that could participate in the formation of an alterna tive DNA conformation. Why is the human genome so full of dosDNA ele ments if they are not biologically important? Could these elements simply be the results of errors in DNA polymerization or the products of unequal ge netic recombination events that lead to genetic expansion? Are dosDNA ele ments integral parts of the sophisticated, elaborate interactive dance between the DNA and the many proteins that leads to the coordinated and develop mental expression responsible for development from sperm and eggto adult? This book is intended to serve as a source of information about the many structures of DNA that can form in dosDNA elements. It should be useful for graduate students, advanced undergraduates, and all scientists in terested in a survey of the structures of DNA and the possibilities for the in volvement of DNA in biological reactions. The first chapter describes the ba sics of DNA organization- the bases, the base pairs, the B-DNA helix, properties of the B-DNA double helix- and surveys various chemicals and enzymes that react with DNA. The next chapter on DNA bending provides a detailed example of how the primary sequence of DNA directs a defined shape to the DNA double helix. This chapter also begins to introducethe ex perimental rationale and procedures used to study DNA structure. Specific experiments are presented in "Details of Selected Experiments" sections at the end of several chapters or experiments (or concepts beyond the scope of the general text) are presented in boxes. Boxed sections add another level of sophistication to an appreciation for the details of DNA structure or the de tails of structural studies. Chapter 3 presents a simplified description of DNA supercoiling and its biological significance. The next three chapters (Chapters 4-6) discuss three major alternative helical forms of DNA that have been studied extensively in the last 15 years: cruciforms, Z-DNA, and intramolec ular triplex DNA. Chapter 7 contains a brief description of other non-B-DNA alternative forms of DNA. The list of structures presented in Chapter 7 is necessarily incomplete, since new DNA structures are continu ally being described. Although an attempt is made to present the structures of alternative DNA conformations and possibilities for their involvement in bi ology, the reader should remember that this is an intense area of research. In 5 years we may have a much clearer understanding of alternative DNA struc tures and their role in replication, repair, recombination, and the regulation of gene expression. Chapter 8 presents basic principles of the interactions be tween DNA and proteins. This field represents one of the most active with great progress being made in solving the X-ray crystal structures of Preface xix DNA-protein cocrystals. Chapter 9 briefly discusses the significance of the organization of DNA into chromosomes. I am deeply indebted to the many scientists who have shared their ideas with me, both in written and verbal form. I greatly appreciate the efforts of lain L. Cartwright, James E. Dahlberg, Maxim Frank-Kamenetskii Fred Gimble, Myron Goodman, Paul J. Hagerman, James Hu, Terumi Kohwi Shigematsu, David M. J. Lilley, Donal S. Luse, and Miriam Ziegler for read ing selected chapters of this book. I thank Richard Gumport and William Scovell for reading the entire manuscript. I greatly appreciate the contribu tions of Jan Klysik, David Ussery and especially Karl Drlica for very careful reading of multiple drafts of the entire manuscript. Ithank Vladimir Potaman for carefully proofreading the page proofs. I am also very grateful for the pa tience and persistence of artist Patti Restle of Calyx Studio, Cincinnati, Ohio (formerly with the Medical Illustration Department at the University of Cincinnati College of Medicine, Cincinnati, Ohio). Patti drew all the figures for this book, either de novo or from my modification of drawings from the literature. I also thank BeverlyDomingue for assistance with preparation and proofreading ofthe manuscript. RichardR. Sinden Foreword The past fifteen years have produced an immense refinement of our ideas about DNA structure and the interactions with proteins. Up to this point, DNA was known as a straight, right-handed double helix composed of two strands held together through complementary Watson-Crick base pair ing. While most of this remains correct for the bulk of our knowledge of DNA, the work done over the past few years has demonstrated that almost none of these points is immutable. Within the context of the "standard" B DNA duplex, systematic, sequence-dependent structural variation occurs, particularly in the way in which sequential base pairs see each other. These properties become especially important when the deformation of DNA is re quired, as when it bends around proteins such as a histone core. On a larger scale, DNA can become left-handed and can adopt three- and four-stranded conformations. The trajectory of the axis may be systematically deformed, as shown in the curved structure adopted by phased oligoadenine tracts. Base pairing can be broken or rearranged to form helical junctions and cruciform structures, and alternative base-base interactions such as Hoogsteen pairing and the guanine tetrad are also important. This new appreciation of the wealth of conformations available to DNA is the result of a number of advances in techniques. Some of these can fairly be placed at the high-tech end of the scale of laboratory techniques, while some are rather less so. Probably the single most important contribution has come from the organic chemists, who have provided methods that enable us to synthesize chemically virtually any sequence of oligonucleotides up to XXI xxii Foreword about 100 bases in length and sufficiently pure for the most demanding of physical techniques. This has enabled single-crystal X-ray studies to generate an immense structural resource which has been extended to solution by multidimensional NMR spectroscopic methods. These standard structural methods are now being supplemented by new approaches, such as cryoelec tron microscopy and fluorescence resonance energy transfer, that generate larger-scale structural information. Asecond powerful approach involves the application of the methods of molecular genetics to the study of DNA struc ture. The ability to clone any sequence into multicopy plasmids and then study them under the conditions of negative supercoiling has opened an en tirely new world on DNA conformational flexibility. This combination of physical chemistry and molecular biology provides powerful structural, ther modynamic, and kinetic information for studying local DNA structure. Third, enzyme and chemical probing approaches have also been very impor tant in the study of local DNA structure. The reactivity or accessibility of cer tain positions within a structure may be probed using chemicals such as di methyl sulpfhate or osmium tetroxide, or the effects of modifications introduced at the time of synthesis may bestudied. Topology expands the structural repertoire of DNA considerably. Negative supercoiling provides a way of trapping within the structure of cir cular DNA molecules large amounts of free energy that can be used to stabi lize otherwise improbable DNA structures. This is of considerable biological importance, and subtle interplay can occur. For example, since promoter function requires changes in DNA winding, many promoters are sensitive to the prevailing level of superhelical stress in the template. Yet, we now know that the action of transcription can itself generate local supercoiling effects in some circumstances. These two effects can be coupled together in a complex manner. Perhaps the most important aspects of DNA structural variation are likely to be found in the mechanics of molecular recognition and manipula tion by proteins. Even for proteins whose main function is just to bind a spe cific DNA sequence and repress transcription, distortion of DNA is almost the norm, with either local bending or twisting accompanying binding. Some proteins are required to manipulate the DNA structure to carry out their function. Take the initiation of transcription as an example: in the eubacteria the cAMP-dependent activator CAP bends its cognate sequence by about 90°, while in eukaryotes the TATAbox-binding protein TBP introduces a massive distortion into the DNA, both bending and opening the minor groove. Similarly, proteins involved in site-specific recombination generate precise wrappingof DNA to juxtaposespecificsequences for splicing reactions, while certain classes of nuclease and other proteins recognize the geometry of branched DNAstructures in a highly selective manner. Just because a given sequence can adopt a certain structure in the test tube, this isnot a guarantee that it will occur inside the cell, and a majorgoal Foreword xxiii in this area isan elucidation of the biological role of DNA structural variabil ity. There is no doubt that DNA does possess an immense conformational flexibility that can be exploited by the topology or in interactions with pro teins. Doubtless the next fifteen years will generate many more examples of this, and hopefullysome more surprises. DavidM. ]. Lilley