Table Of Content7972tp.new.indd 1 7/30/10 3:15 PM
SCIENCE, ENGINEERING, AND BIOLOGY INFORMATICS
Series Editor:Jason T. L. Wang
(New Jersey Institute of Technology, USA)
Published:
Vol. 1: Advanced Analysis of Gene Expression Microarray Data
(Aidong Zhang)
Vol. 2: Life Science Data Mining
(Stephen T. C. Wong & Chung-Sheng Li)
Vol. 3: Analysis of Biological Data: A Soft Computing Approach
(Sanghamitra Bandyopadhyay, Ujjwal Maulik & Jason T. L. Wang)
Vol. 4: Machine Learning Approaches to Bioinformatics
(Zheng Rong Yang)
Vol. 5: Biodata Mining and Visualization: Novel Approaches
(Ilkka Havukkala)
Vol. 6: Database Technology for Life Sciences and Medicine
(Claudia Plant & Christian Böhm)
Vol. 7: Advances in Genomic Sequence Analysis and Pattern Discovery
(Laura Elnitski, Helen Piontkivska & Lonnie R. Welch)
XiaoLing - Advances in Genomic Sequence Ana2lysis.pmd 3/2/2011, 9:18 AM
advances in genomic
sequence analysis and
pattern discovery
editors
Laura Elnitski
National Human Genome Research Institute,
National Institutes of Health, USA
Helen Piontkivska
Kent State University, USA
Lonnie R Welch
Ohio University, USA
World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI
7972tp.new.indd 2 7/30/10 3:15 PM
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ADVANCES IN GENOMIC SEQUENCE ANALYSIS AND PATTERN DISCOVERY
Science, Engineering, and Biology Informatics — Vol. 7
Copyright © 2011 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN-13 978-981-4327-72-5
ISBN-10 981-4327-72-7
Typeset by Stallion Press
Email: enquiries@stallionpress.com
Printed in Singapore.
XiaoLing - Advances in Genomic Sequence Ana1lysis.pmd 3/2/2011, 9:18 AM
December16,2010 16:55 9inx6in AdvancesinGenomicSequenceAnalysisandPatternDiscovery b1051-fm
Contents
Preface vii
AbouttheEditors ix
Part I: Pattern Discovery Methods 1
Chapter1: Large-ScaleGeneRegulatoryMotifDiscovery
withNestedMICA 3
MatiasPiipari,ThomasA.Down
andTimJ.P.Hubbard
Chapter2: R’MES:ATooltoFindMotifswithaSignificantly
UnexpectedFrequencyinBiologicalSequences 25
SophieSchbathandMarkHoebeke
Chapter3: AnIntricateMosaicofGenomicPatterns
atMid-rangeScale 65
AlexeiFedorovandLarisaFedorova
Chapter4: MotifFindingfromChipstoChIPs 93
GiulioPavesi
Chapter5: ANewApproachtotheDiscoveryofRNA
StructuralElementsintheHumanGenome 117
LeiHua,MiguelCervantes-Cervantes
andJasonT.L.Wang
Part II: Performance and Paradigms 133
Chapter6: BenchmarkingofMethodsforMotif
DiscoveryinDNA 135
KjetilKlepper,GeirKjetilSandve,
MortenBeckRye,KjerstiHysingBolstad
andFinnDrabløs
v
December16,2010 16:55 9inx6in AdvancesinGenomicSequenceAnalysisandPatternDiscovery b1051-fm
vi Contents
Chapter7: EncyclopediasofDNAElementsforPlantGenomes 159
JensLichtenberg,AlperYilmaz,KyleKurz,
XiaoyuLiang,ChaseNelson,ThomasBitterman,
EricStockinger,ErichGrotewold
andLonnieR.Welch
Chapter8: ManycoreHigh-PerformanceComputing
inBioinformatics 179
Jean-StéphaneVarré,BertilSchmidt,
StéphaneJanotandMathieuGiraud
Chapter9: NaturalSelectionandtheGenome 209
AustinL.Hughes
Index 221
December16,2010 16:55 9inx6in AdvancesinGenomicSequenceAnalysisandPatternDiscovery b1051-fm
Preface
Those who are involved with mapping the genomic landscapes are
participatinginoneofthemostexcitingfrontiersofscience.Wehavethe
opportunitytoreverseengineertheblueprintsandthecontrolsystemsof
livingorganisms.Computationaltoolsarekeyenablersinthedeciphering
process.Thus,thisbookprovidesanin-depthpresentationofsomeofthe
important computational biology approaches to genomic sequence anal-
ysis.Thefirstpartofthebookdiscussesmethodsfordiscoveringpatterns
in DNA and RNA. This is followed by the second part that reflects on
methodsinvariousways,includingperformance,usageandparadigms.
Part I, Pattern Discovery Methods, provides a collection of computa-
tionalmethodsandtools.Chapter1,“Large-ScaleGeneRegulatoryMotif
DiscoverywithNestedMICA,”presentsanalgorithmicapproach,describes
usage of the tool based on the algorithm, and illustrates its usage via a
detailed case study. In Chapter 2, “R’MES: A Tool to Find Motifs with a
Significantly Unexpected Frequency in Biological Sequences,” the authors
describe a software tool that contains rigorous statistical models of DNA
words. “An Intricate Mosaic of Genomic Patterns at Mid-range Scale,”
Chapter 3 of the book, focuses on intricate mosaics found in genomes; a
numberofspecificpatternsareidentified.Thefourthchapter,“MotifFind-
ing from Chips to ChIPs,” provides a comprehensive survey of methods
forthedenovodiscoveryofputativeover-representedtranscriptionfactor
bindingsitesinnucleotidesequences.PartIconcludeswithachapterthat
considersthediscoveryofRNAstructuralmotifs:“ANewApproachtothe
DiscoveryofRNAStructuralElementsintheHumanGenome.”
The second part, Performance and Paradigms, consists of chapters
thatcontemplatetheeffectivenessofrelevantcomputationalbiologytech-
niques. Chapter 6, “Benchmarking of Methods for Motif Discovery in
DNA,” presents a variety of metrics for assessing the performance of
the class of methods described in Part I. In “Encyclopedias of DNA Ele-
ments for Plant Genomes,” the application of methods is illustrated with
casestudies.Thetopicofscalablealgorithmicapproachesisconsideredin
vii
December16,2010 16:55 9inx6in AdvancesinGenomicSequenceAnalysisandPatternDiscovery b1051-fm
viii Preface
Chapter 8, “Manycore High-Performance Computing in Bioinformatics.”
Chapter9, “Natural Selection and the Genome,” discusses evolution of
genomic sequences and the role that natural selection plays in direct-
inggenomeevolution.Italsoprovidesaconceptualframeworkforbetter
understandingoftheevolutionaryimplicationsandinsightsthataregen-
erated through genomic sequence analyses, and emphasizes the critical
roleofpurifyingselection.
December16,2010 16:55 9inx6in AdvancesinGenomicSequenceAnalysisandPatternDiscovery b1051-fm
About the Editors
Dr. Laura Elnitski is a molecular and computational
biologistwhostudiesnoncodingfunctionalelements
in vertebrate genomes. She has served as an analyst
for the Mouse, Rat, Chicken and Bovine Genome
Consortia.
Dr. Elnitski is extensively involved in NHGRI’s
ENCODE(EncyclopediaofDNAElements)project,
which aims to produce a comprehensive catalog
of functional elements in the human genome.
Dr.Elnitski’sresearchusesintegrativeanalysestoelucidateboththepres-
ence and activity of functional elements in the human genome that have
been historically difficult to characterize. For example, computationally,
herworkpredictsmutationsincodingsequencesthataffectpropersplic-
ing. Targets of these mutations include exonic splicing enhancers and
silencers. In experimental analyses, she is mapping elements that silence
geneexpressionusinganassaysystemdesignedinherlab.
Driving towards a molecular understanding of ovarian cancer,
Dr. Elnitski combines in silico and wet-bench techniques. She has anno-
tatedbidirectionalpromotersinthehumangenome,includingthosereg-
ulating noncoding genes, using data collected in RNA-seq assays. These
results are being used to find gene silencing events caused by aberrant
methylation in tumor samples. She is also addressing functional conse-
quencesofmutationsinthosetumors.
Dr.ElnitskiistherecipientofaRuthL.KirschsteinPostdoctoralFel-
lowshipthroughtheNIH(2000–2003),OutstandingResearchAchieve-
ment Award (International Symposium on Bioinformatics Research and
Applications—2007),afeaturedscientistintheWomeninBioinformatics
Researchdocumentary(2007)andaGenomeTechnologyYoungInvesti-
gatorAward(2009).SheservesasanadhocreviewerfortheNIHGCAT
ScientificGrantReviewPanelandisanassociateeditorofBMCGenomics
andformerlyGenomeResearch.
ix