ebook img

Counting Graphs: An introduction with specific interest in phylogeny PDF

134 Pages·0.544 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Counting Graphs: An introduction with specific interest in phylogeny

Counting Graphs An introduction with specific interest in phylogeny (Dietmar Cieslik, University of Greifswald) 1 Contents 1 Introduction 5 2 Networks 8 2.1 Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Connected graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Degree sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Trees and forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 The matrix of adjacency . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Planar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 Digraphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Labeled Graphs 23 3.1 All graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 The number of connected graphs . . . . . . . . . . . . . . . . . . . . . 24 3.3 Eulerian graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 The number of planar graphs . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.6 Tournaments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 The Number of Labeled Trees 35 4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Trees with a given degree sequence . . . . . . . . . . . . . . . . . . . . 36 4.3 The Pru¨fer code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 The number of labeled forests . . . . . . . . . . . . . . . . . . . . . . . 40 5 Unlabeled Graphs 43 5.1 Isomorphic graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Labeled and unlabeled graphs . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 The number of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.4 The number of connected graphs . . . . . . . . . . . . . . . . . . . . . 48 5.5 Polyhedrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.6 Regular polyhedrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2 6 The Number of Unlabeled Trees 54 6.1 Upper and lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.2 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7 Phylogenetic Networks 58 7.1 Phylogenetic trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.2 Semi-labeled trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 7.3 Double stars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.4 The structure of rooted trees . . . . . . . . . . . . . . . . . . . . . . . 63 7.5 The number of rooted trees . . . . . . . . . . . . . . . . . . . . . . . . 66 7.6 Generalized binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.7 Multifurcating trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.8 Phylogenetic forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8 The collection of Trees 71 8.1 Splits and trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 8.2 Consensus Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8.3 The metric spaces of all trees . . . . . . . . . . . . . . . . . . . . . . . 74 8.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 9 Spanning trees 77 9.1 The number of spanning trees . . . . . . . . . . . . . . . . . . . . . . . 77 9.2 Generating all spanning trees . . . . . . . . . . . . . . . . . . . . . . . 78 9.3 A recursive procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 9.4 The matrix-tree theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 79 9.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 10 Graphs inside 84 10.1 Subgraph isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 10.2 Trees inside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 10.3 Counting perfect matchings . . . . . . . . . . . . . . . . . . . . . . . . 86 10.4 Alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 10.5 Forbidden subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 11 Ramsey Theory 92 11.1 Coloring the edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 11.2 Ramsey’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 11.3 Known Ramsey numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 94 11.4 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 11.5 Generalized Ramsey numbers . . . . . . . . . . . . . . . . . . . . . . . 96 A Orders of growing 98 A.1 The Landau symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 A.2 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3 B Polynomic approaches 100 B.1 Factorials and double factorials . . . . . . . . . . . . . . . . . . . . . . 100 B.2 Binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 B.3 Multinomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 102 C The Harmonic Numbers 104 C.1 The sequence of harmonic numbers . . . . . . . . . . . . . . . . . . . . 104 C.2 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 D The growing of the factorials 106 D.1 Stirling’s inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 D.2 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 E The Partition of Integers 108 E.1 Composition of integers . . . . . . . . . . . . . . . . . . . . . . . . . . 108 E.2 The partition numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 F The Catalan Numbers 112 F.1 Routes in grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 F.2 A recurrence relation for the Catalan numbers . . . . . . . . . . . . . 113 F.3 An explicit formula for the Catalan numbers . . . . . . . . . . . . . . 114 G Fixed points in Permutations 116 G.1 Derangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 G.2 A given number of fixed points . . . . . . . . . . . . . . . . . . . . . . 117 H Metric Spaces 118 H.1 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 H.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 I Computational Complexity 122 I.1 Sources for algorithms in graph theory . . . . . . . . . . . . . . . . . . 122 I.2 P versus NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 I.3 The complexity of enumeration problems . . . . . . . . . . . . . . . . 124 I.4 The spectrum of computational complexity . . . . . . . . . . . . . . . 124 J The Linnaeus’ system 126 4 Chapter 1 Introduction Graphslendthemselvesasnaturalmodelsoftransportationaswellascommunication networks. Theyareamongthemostbasicofallmathematicalstructures. Correspond- ingly,theyhavemanydifferentversions,representationsandincarnations. Thefactis that graph theory serves as a mathematical model for any system involving a binary relation. Trees were first used in 1847 by Kirchhoff in his work on electrical networks, they were later redeveloped and named by Cayley in order to enumerate different isomers of specific chemical molecules. Since that time, enumerative methods for counting various classes of graphs, including trees, have been developed, but are still far from completely scientific. AsitbecameacceptedthatevolutionwastobeunderstoodintermsofMendelian geneticsandDarwiniannaturalselection,sotooitbecameclearthatthisunderstand- ing could not be sought only at a qualitative level. A fundamental problem is the reconstruction of species’ evolutionary past, which is called the phylogeny of those species. The underlying principle of phylogeny is to try to group ”living entities” according to their level of similarity. Trees are widely used to represent evolutionary relationships.1 In biology, for example, the dominant view of the evolution of life is that all existing organisms are derived from some common ancestor and that a new species arises by the splitting of one population into two or more populations that not do not cross-breed, rather than from the mixing of two populations into one. Here, the high level history of life is ideally organized and displayed as a tree. This was already seen by Darwin in 1859. Darwin spoke of ”descent with modification”, which is the central phrase of biological evolution, it refers to a genealogical relation- ship of species through time. These relationships are described in a phylogentic tree, which can therefore be thought of as a central metaphor for evolution, providing a natural and meaningful way to order data and containing an enormous amount of 1Thorley and Page [88]: ”The holy grail of phylogenetics is the reconstruction of the one true treeoflife.” 5 evolutionary information within its branches.2 Doolittle [35]: It has been argued that the ”Tree of Life” is perhaps really a ”Web of Life”, as mechanisms such as hybridization, recombination and swapping of genes probably play a role in evolution. Furthermore: In the widest sense, a classification scheme may represent simply a convenientmethodfororganizingalargesetofdatasothattheretrievalofinformation may be made more efficiently. In this sense, classification is the begin of all science. A classification is the formal naming of a group of individuals N.3 The following statements are pairwise equivalent. • C is a classification for N. • C represents a rooted N-tree. • C consists of a series of partitions for N which become finer and finer. We will embed this question in the context of counting graphs in general, but it is not the purpose of this script to provide a complete survey of counting methods for trees and related networks. We will focus on the counting of specific classes of trees, which are important in investigations about phylogeny and other subjects in mathematical biology. We will only use methods which are present in the first classes of undergraduate studies. Further studies need more mathematics than here are given; in particular methods of higher algebra, see the pioneering work by Polya [93]. Graph counting is a well-established subject in combinatorics, but it is more than only simple enumeration, it also includes • Questions whether certain objects exist. And if yes, they are rare or not? • Combinatorics is the ”Art of Counting”. Many of the problems can be phrased in the form ”How many ways...?”, ”Does there exist an object such that...?” or ”Can we construct... ?” Combinatorics is a modern mathematical field, which now has rapidly increas- ingresearchactivitywithapplicationsinmanyareasofscience. Moreandmore combinatorial structures are used in genetics, biochemistry, evolution, agricul- ture, experimental design and other parts of modern biology. Here, the results are very powerful and the research frontiers are perhaps more accessible than in some more traditional areas of applied mathematics. 2Note that in Darwin’s fundamental book The origin of species [32] there is exactly one figure, andthisshowsthedescriptionoftheevolutionaryhistorybyatree. Historically, this was a new idea: The concept of species having a continuity through time was onlydevelopedinthelate17thcentury;higherlifeformswerenolongerthoughttotransmuteinto differentkindsduringthelifetimeofanindividual. Ittookover150yearsfromthedevelopmentof thisconceptbeforearootedtreewasproposedbyDarwin. 3Everitt[38]: ”Namingisclassifying.” 6 • To describe the border between polynomial, exponential and super-exponential growingof(time-)amounttodealwithgraphsandtounderstandthecomplexity of algorithms.4 • Countingproblemsarecloselyassociatedwithprobability. Indeed,anyproblem ofthekind”Howmanyobjectsaretherewhich...”hasthecloselyrelatedform ”What fraction of all objects ...”, which in turn can be posed as ”What is the probability that arandomly chosen object ...?” when expressed interms of the theory of probability. In this sense Laplace defined probability. Thereisadeepinterplaybetweengraphenumerationandthetheoryofrandom graphs. • In view of the description of polyhedrons as specific graphs it will be possible to count such objects. For readers which are interested in further facts about counting, generating and storinggraphsandtreeswegivealistofbookswhicharecontinuedourconsiderations and give several new hints and investigations. In particular. 1. Harary: Graph Theory; [54]. 2. Harary, Palmer: Graphical Enumeration; [55]. 3. Martin: Counting: The Art of Enumerative Combinatorics; [82]. Whenseveralfactsaboutdiscreteandcombinatorialmathematicsarenotpresent for the reader, the book include an appendix with the most of the important results. 4Forcenturiesalmostallmathematiciansbelievedthatanymathematicalproblemcouldbesolved usinganalgorithm. However,thisviewhasbeenquestionedoverthecourseoftimeasmoreandmore problemshavearisenforwhichnoalgorithmicsolutionhasbeenfoundorforwhichthealgorithms aretoodifficulttodeal. 7 Chapter 2 Networks 2.1 Graphs Wehavetointroduceseveralknowledgeofgraphsandnetworks. AgraphGisdefined to be a pair (V,E) where • V is a nonempty and finite set of elements, called vertices, and • E is a finite family of elements which are unordered pairs of vertices, called edges. Thenotatione=uv meansthattheedgeejoinstheverticesuandv. Inthiscase, we say that u and v are incident to this edge and that u and v are the endvertices of e. Two vertices u and v are called adjacent in the graph G if uv is an edge of G. Different edges e = vw and e = vw are called multiple or parallel edges. A graph 1 2 with multiple edges is called a multigraph. Any graph is also a multigraph.1 N(v) = N (v) denotes the set of all vertices adjacent to the vertex v and is called G the set of all neighbors of v in G. For a vertex v of a graph G the degree g (v) is defined as the number of edges which G are incident to v. If G has no parallel edges, then the cardinality of N(v)=N (v) is G the degree of the vertex: g(v)=g (v)=|N (v)|. (2.1) G G Ifwesumupallthevertexdegreesinagraph,wecounteachedgeexactlytwice,once from each of its endvertices. Thus, Observation 2.1.1 In any graph G=(V,E) the equality (cid:88) g (v)=2·|E| (2.2) G v∈V holds. Particularly, in every graph the number of vertices with odd degree is even. 1Inanycase,weassumethatu(cid:54)=v,thatmeanswedonotadmitloops. 8 A graph is called regular, or more exactly regular of degree r, if each vertex has degree exactly r. A graph which is regular of degree 1 is called a perfect matching, of degree 2 is a collection of so-called cycles, and of degree 3 is called a cubic graph. In view of 2.1.1 we find Corollary 2.1.2 For a graph G=(V,E) which is regular of degree r it holds r·|V|=2·|E|. (2.3) Consequently, if r is odd then |V| must be an even number. It is easy to see that if r and n are not both odd and 0≤r ≤n−1, then always exists an r-regular graph with n vertices. (Exercise) A graph G is said to be a complete graph if any two vertices are adjacent. A complete graph with n vertices has exactly (cid:18) (cid:19) n n(n−1) = (2.4) 2 2 edges, and each vertex is of degree n−1. Let V and V are two sets with n and n elements, respectively. The complete 1 2 1 2 bipartite graph K is defined by n1n2 K =(V ∪V ,{vw :v ∈V ,w ∈V }). (2.5) n1,n2 1 2 1 2 The complete bipartite graph K has n +n vertices and n ·n edges. n1,n2 1 2 1 2 IngeneralagraphG=(V,E)iscalledbipartiteifitispossibletosplitV intosubsets V and V such that every edge joins a vertex of V to a vertex of V . The following 1 2 1 2 theorem should be an exercise for the reader. Theorem 2.1.3 A graph is bipartite if and and only if it contains no cycle of odd length. Q denotes the d-dimensional hypercube. That is the graph whose set of vertices d consistsofallbinaryvectorsfrom{0,1}d,withanedgejoiningtwovectorsifandonly if they differ in exactly one coordinate. Compute the number of vertices and edges, and decide whether Q is bipartite. d Let G=(V,E) be a graph. Then G(cid:48) =(V(cid:48),E(cid:48)) is called a subgraph of G if V(cid:48) is a subset of V and E(cid:48) is a subset of E such that any edge in E(cid:48) joins vertices from V(cid:48). In other terms, V(cid:48) ⊆V (2.6) and (cid:18)V(cid:48)(cid:19) E(cid:48) ⊆E∩ . (2.7) 2 9 Let W ⊆V be a set of vertices, then (cid:18) (cid:19) W G[W]=(W,E∩ ) (2.8) 2 is called the induced subgraph of W in G = (V,E), that means all edges of G that connect vertices of W are also edges of G[W]. 2.2 Connected graphs A chain is a sequence v ,e ,v ,e ,v ,...,v ,e ,v of edges and vertices of G such 1 1 2 2 3 m m m+1 that the edge e is incident to the vertices v and v for any index i = 1,...,m. i i i+1 A chain in which each vertex appears at most once is called a path; more exactly, the path interconnecting the vertices v and v . Then the number m denotes the 1 m+1 length of the path. A single vertex is a path of length 0. A cycle is a chain with at least one edge and with the following properties: No edge appears twice in the sequence and the two endvertices of the chain are the same. A graph which does not contain a cycle is called acyclic. A key notion in graph theory is that of a connected graph. It is intuitively clear what this should mean: A graph G = (V,E) is called a connected graph if for any two vertices there is a path (or, equivalently, a chain) interconnecting them. Clearly, Observation 2.2.1 The relation ”There is a path in G connecting v and v(cid:48)” is an equivalence relation on V ×V. The equivalence classes of this relation divide V into subsets, which induce con- nected subgraphs of G. These classes are called the connected components, or briefly the components of the graph G. A component is a maximal subgraph that is con- nected. A connected graph has exactly one component. Of course, the number of components is an integer between 1 and the number of vertices. Easy to see: Theorem 2.2.2 Let G=(V,E) be a graph with n=|V| vertices and c components. Then (cid:18) (cid:19) (n−c)(n−c+1) n−c+1 n−c≤|E|≤ = . (2.9) 2 2 An edge e of a graph G=(V,E) is called a bridge if G(cid:48) =(V,E\{e}) contains a component more than G. Observation 2.2.3 An edge e of a connected graph G is a bridge if and only if e does not lie in a cycle of G. As an exercise prove the following fact. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.