ebook img

Comparative Genomics: RECOMB 2005 International Workshop, RCG 2005, Dublin, Ireland, September 18-20, 2005. Proceedings PDF

174 Pages·2005·3.213 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Comparative Genomics: RECOMB 2005 International Workshop, RCG 2005, Dublin, Ireland, September 18-20, 2005. Proceedings

Lecture Notes in Bioinformatics 3678 EditedbyS.Istrail,P.Pevzner,andM.Waterman EditorialBoard: A.Apostolico S.Brunak M.Gelfand T.Lengauer S.Miyano G.Myers M.-F.Sagot D.Sankoff R.Shamir T.Speed M.Vingron W.Wong Subseries of Lecture Notes in Computer Science Aoife McLysaght Daniel H. Huson (Eds.) Comparative Genomics RECOMB 2005 International Workshop, RCG 2005 Dublin, Ireland, September 18-20, 2005 Proceedings 1 3 SeriesEditors SorinIstrail,CeleraGenomics,AppliedBiosystems,Rockville,MD,USA PavelPevzner,UniversityofCalifornia,SanDiego,CA,USA MichaelWaterman,UniversityofSouthernCalifornia,LosAngeles,CA,USA VolumeEditors AoifeMcLysaght UniversityofDublin,SmurfitInstituteofGenetics,TrinityCollege,Irland E-mail:[email protected] DanielH.Huson TübingenUniversity,CenterforBioinformatics(ZBIT) 72076Tübingen,Germany E-mail:[email protected] LibraryofCongressControlNumber:2005932241 CRSubjectClassification(1998):F.2,G.3,E.1,H.2.8,J.3 ISSN 0302-9743 ISBN-10 3-540-28932-1SpringerBerlinHeidelbergNewYork ISBN-13 978-3-540-28932-6SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. SpringerisapartofSpringerScience+BusinessMedia springeronline.com ©Springer-VerlagBerlinHeidelberg2005 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:11554714 06/3142 543210 Preface The complexity of genome evolutionposes many exciting challenges to develop- ersofmathematicalmodels andalgorithms,who haverecourseto a spectrumof algorithmic,statisticalandmathematicaltechniques,rangingfromexact,heuris- tic, fixed-parameterandapproximationalgorithmsfor problems basedonparsi- mony models to Monte CarloMarkovChain algorithmsfor Bayesiananalysisof problems based on probabilistic models. The annual RECOMB Satellite Workshop on Comparative Genomics (RECOMBComparativeGenomics)isaforumonallaspectsandcomponentsof thisfield,rangingfromnewquantitativediscoveriesaboutgenomestructureand process to theorems on the complexity of computational problems inspired by genome comparison. The informal steering committee for this meeting consists of David Sankoff, Jens Lagergrenand Aoife McLysaght. Thisvolumecontainsthepaperspresentedatthe3rdRECOMBComparative Genomicsmeeting,whichwasheldinDublin,Ireland,onSeptember18–20,2005. The first two meetings of this series were held in Minneapolis, USA (2003) and Bertinoro, Italy (2004). This year, 21 papers were submitted, of which the Program Committee se- lected14forpresentationatthe meetingandinclusioninthisproceedings.Each submission was refereed by at least three members of the Program Committee. Aftercompletionofthereferees’reports,anextensiveWeb-baseddiscussiontook placeformakingdecisions.TheRECOMBComparativeGenomics2005Program Committee consisted of the following 27 members: Vineet Bafna, Anne Berg- eron, Mathieu Blanchette, Avril Coghlan, Dannie Durand, Nadia El-Mabrouk, NiklasEriksen,AaronHalpern,RoseHoberman,DanielHuson,JensLagergren, Giuseppe Lancia, Emmanuelle Lerat, Aoife McLysaght, Istvan Miklos, Bernard Moret,PavelPevzner,BenRaphael,Marie-FranceSagot,DavidSankoff,Cathal Seoighe, Beth Shapiro, Igor Sharakhov, Mike Steel, Jens Stoye, Glenn Tesler andLouxinZhan.We wouldliketothank the ProgramCommittee members for their dedication and hard work. RECOMBComparativeGenomics 2005hadseveralinvitedspeakers,includ- ing:Anne Bergeron(Universit´eduQu´ebec`a Montreal,Canada),LaurentDuret (Laboratoire de Biometrie et Biologie Evolutive, Universit´e Claude Bernard, Lyon, France), Eddie Holmes (Department of Biology, Pennsylvania State Uni- versity,USA),JeffreyLawrence(DepartmentofBiologicalSciences,Universityof Pittsburgh, USA), Stephan Schuster (Department of Biochemistry and Molecu- lar Biology,Pennsylvania State University, USA), Ken Wolfe (Genetics Depart- ment, Trinity College Dublin, Ireland) and Sophia Yancopoulos (Institute for Medical Research, New York, USA). In addition to the invited talks and the contributed talks, an important in- gredient of the program was the lively poster session. VI Preface RECOMBComparativeGenomics 2005wouldliketo thankScience Founda- tion Ireland (SFI) and Hewlett-Packard for providing financial support for the conference.WewouldliketothanktheUniversityofDublin,TrinityCollege,for hosting the meeting. We would like to thank Nadia Browne for administrative support. In closing, we would like to thank all the people who submitted papers and posters and those who attended RECOMB Comparative Genomics 2005 with enthusiasm. September 2005 Aoife McLysaght and Daniel Huson Table of Contents Lower Bounds for Maximum Parsimony with Gene Order Data Abraham Bachrach, Kevin Chen, Chris Harrelson, Radu Mihaescu, Satish Rao, Apurva Shah ....................................... 1 Genes Order and Phylogenetic Reconstruction: Application to γ-Proteobacteria Guillaume Blin, Cedric Chauve, Guillaume Fertin.................. 11 Maximizing Synteny Blocks to Identify Ancestral Homologs Guillaume Bourque, Yasmine Yacef, Nadia El-Mabrouk ............. 21 An Expectation-Maximization Algorithm for Analysis of Evolution of Exon-Intron Structure of Eukaryotic Genes Liran Carmel, Igor B. Rogozin, Yuri I. Wolf, Eugene V. Koonin..... 35 Likely Scenarios of Intron Evolution Miklo´s Csu˝ro¨s ................................................. 47 OMA, a Comprehensive, Automated Project for the Identification of Orthologs from Complete Genome Data: Introduction and First Achievements Christophe Dessimoz, Gina Cannarozzi, Manuel Gil, Daniel Margadant, Alexander Roth, Adrian Schneider, Gaston H. Gonnet ............................................. 61 The Incompatible Desiderata of Gene Cluster Properties Rose Hoberman, Dannie Durand ................................. 73 The String Barcoding Problem is NP-Hard Marcello Dalpasso, Giuseppe Lancia, Romeo Rizzi.................. 88 A Partial Solution to the C-Value Paradox Jeffrey M. Marcus ............................................. 97 Individual Gene Cluster Statistics in Noisy Maps Narayanan Raghupathy, Dannie Durand .......................... 106 Power Boosts for Cluster Tests David Sankoff, Lani Haque...................................... 121 Reversals of Fortune David Sankoff, Chungfang Zheng, Aleksander Lenert ............... 131 VIII Table of Contents Very Low Power to Detect Asymmetric Divergence of Duplicated Genes Cathal Seoighe, Konrad Scheffler ................................ 142 A Framework for Orthology Assignment from Gene RearrangementData Krister M. Swenson, Nicholas D. Pattengale, Bernard M.E. Moret ... 153 Author Index................................................... 167 Lower Bounds for Maximum Parsimony with Gene Order Data Abraham Bachrach, Kevin Chen, Chris Harrelson, Radu Mihaescu, Satish Rao, and Apurva Shah Department of Computer Science, UC Berkeley Abstract. In this paper, we study lower bound techniques for branch- and-boundalgorithmsformaximumparsimony,withafocusongeneor- derdata.WegiveasimpleO(n3)timedynamicprogrammingalgorithm for computing the maximum circular ordering lower bound, where n is the number of leaves. The well-known gene order phylogeny program, GRAPPA, currently implements two heuristic approximations to this lowerbounds.Ourexperimentsshowasignificantimprovementoverboth these methods in practice. Next,we show that the linear programming- based lower bound of Tang and Moret (Tang and Moret, 2005) can be greatly simplified, allowing us to solve the LP in O∗n3) time in the worst case, and in O∗(n2.5) time amortized over all binary trees. Fi- nally,weformalizetheproblemofcomputingthecircularorderinglower bound, when the tree topologies are generated bottom-up, as a Path- Constrained Traveling Salesman Problem, and give a polynomial-time 3-approximation algorithm forit.Thisisaspecial caseof themoregen- eral Precedence-Constrained Travelling Salesman Problem and has not previously been studied, tothe best of our knowledge. 1 Introduction Currently,themostaccuratemethodsforphylogeneticreconstructionfromgene order data are based on branch-and-bound search for the most parsimonious tree under various distance measures. These include GRAPPA [1], BP-Analysis [2], and the closely-related MGR [3]. Since branch-and-bound for this problem is potentially a super-exponential-time process, computing good pruning lower boundsisveryimportant.However,scoringaparticularpartialorfulltreetopol- ogy is a hard computational problem for many metrics, in particular for gene order data. There has been a good deal of recent work on designing good lower bounds for various distance measures. These techniques are divided between those spe- cially designed for specific distance measures [4–6] and those that hold for arbi- trarymetrics[7–9].Ourlowerboundsfallintothelattercategory.Lowerbounds that hold for arbitrarymetrics are particularly appealing in the context of gene order phylogeny, because an important direction for the field is to extend cur- rent methods to use more realistic metrics than the breakpoint or inversiondis- tances currently used. There is a growing body of algorithmic work on various A.McLysaghtetal.(Eds.):RECOMB2005WsonComparativeGenomics,LNBI3678,pp.1–10,2005. (cid:3)c Springer-VerlagBerlinHeidelberg2005 2 A. Bachrach et al. distance measures, including transpositions,chromosome fusions/fissions,inser- tions/deletions and various combinations of these (see [10] for a comprehensive survey) and our lower bounds apply to all of them. One notable exception to this is the tandem duplication and random loss model [11], which is well-suited to animal mitochondrial genomes, but is asymmetric and therefore does not fit into the standard metric parsimony framework. Inthispaper,wegiveefficientimplementationsoftwolowerbounds.Thefirst is a simple dynamic programming algorithm to compute the maximum circular ordering lower bound in O(n3) time and O(n2) space. Since the exact running time of the algorithm often depends on the choice of root, we also provide an algorithm to compute the optimal root for a given un-rooted tree topology in O(n2) time. Next, we greatly simplify the LP-based lower bound of [9] and show how to implement it in O∗(n3) time1 in the worst case and O∗(n2.5) time amortizedoverallbinarytrees.Finally,westudytheproblemoflowerbounding the tree score when the only a partial topology has been constructed so far and rephrase this as a Path-Constrained Travelling Salesman Problem. This is a special case of the Precedence-Constrained Travelling Salesman Problem [12], in which we are given a partial order graph on a subset of the cities and asked to return a min cost tour that respects the partial ordering. Our version of the problem is simply the case where the partial order graph is a directed path. To ourknowledge,consideringtheeffectofarestrictedpartialorderonthisproblem has not been previously studied, and we give a simple and fast algorithm that computes a 3-approximation for the case of a line. The solution can then be transformed into a lower bound by dividing the score by 3. Finally, we have implemented our dynamic programming lower bound and show that it gives better results on the benchmark Campanulaceae data set. 2 The Circular Ordering Lower Bound Given a rooted binary tree in which one of each pair of children is designated the left child and the other the right child, consider the left-to-right ordering of the leaves, π, induced by some depth-first search of the tree. For a given metric d(·)onpairsofleaves,say(cid:2)the inversiondistance,wedefine the circularordering lowerboundtobeC(π)= n d(π(i),π(i+1)),wherewedefineπ(n+1)=π(1) i=1 for notational convenience. By repeatedly invoking the triangle inequality, it is easy to see that C(π) is a lower bound on the cost of the tree, and the bound is 2 tight if the distance d(·) is the shortest path metric induced by the tree. Thesametreetopologycaninducemorethanoneleaforderingπbyswapping left child and right children at internal nodes of the tree, and some leaf order- ings may produce a higher lower bound that others. A brute-force exponential timealgorithmforcomputingthemaximumcircularorderingistoenumerateall possible combinationsof swapsatinternalnodes.This method has been consid- ered too expensive to work well in practice [13]. Two heuristic approximations 1 The notation O∗(f(n)) omits factors that are logarithmic in n. Lower Bounds for Maximum Parsimony with Gene OrderData 3 are implemented in GRAPPA. The first is the Swap-as-you-go heuristic [7], in which a DFS traversal of the tree is performed, and a swap performed at each internal node when it is visited as long as it improves the lower bound. This heuristic has the attribute of running in linear time. The second heuristic does a similar traversal,but when deciding whether to swap the left and right children of an internal node, the algorithm tries both of them, and keeps the one which gives the better lower bound for the current subproblem. This latter approach runs in O(n4) time. In this section, we show that, in fact, the maximum cicular ordering for a given tree can be computed in O(n3) time by a straightforward dynamic pro- gramming algorithm. We first note that the choice of the tree’s root does not affecttheparsimonyscoreorthemaximumcircularorderinglowerbound.Since the branch-and-boundalgorithmsearchesover unrooted topologies,for the pre- sentation of the algorithm we assume an arbitrarily chosen root. On the other hand,theexactrunningtimeofouralgorithmwilldependonthepositionofthe root,sowe also considerthe problemofoptimalrootplacementafter givingthe description of the algorithm. At each internal node, v, let S be the set of leaves in the subtree rooted at v v. The dynamic programming algorithm constructs a table M which contains, v for each pair of leaves A,B ∈ S , the maximum score attainable by a linear v orderingof the vertices in S that begins with A and ends with B, if one exists. v Note that such an ordering exists if and only if A and B are leaves in subtrees rooted at different children of v. Letthe childrenofv bel andr. Weinductively assumethatthe tablesM(r) andM(l)havealreadybeenconstructed.Letll andlr be l’schildrenandrl and rr be r’s children, and let us assume that the subtrees rooted at ll, lr, rl and rr have a,b,c and d leaves respectively (if r or l are leaves, then this step may be omitted). Intuitively, we could construct M(v) by considering all possible quartets of leaves A∈ S ,B ∈ S ,C ∈ S and D ∈ S . We will then perform ll lr rl rr O(abcd) operations at node v, which would lead to a running time of O(n4) for the whole tree. We can do better than this naive implementation in the following way. Let A∈S and C ∈S . Let ll rl δ(A,C)= max[M (A,B)+d(B,C)]. (1) l B∈Slr So δ(A,C) is the highestscoreattainable by alinear orderingof the leavesinS l which also takes a final step to C. Now for D ∈S we obtain rr M (A,D)= max[δ(A,C)+M (C,D)]. (2) v r C∈Srl The maximum circular ordering lower bound is given at the root by the expression max [M (A,D)+d(A,D)] v A∈Sl,D∈Sr

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.