1 Nonbinary tree-based phylogenetic networks Laura Jetten and Leo van Iersel Abstract Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionaryeventssuchashybridizationandhorizontalgenetransfer.Insomecases, suchhistoriescan be described by a phylogenetic base-tree with additional linking arcs, which can for example represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based 6 nonbinaryphylogeneticnetworks.Wegivesimplegraph-theoreticcharacterizationsoftree-basedandstrictly- 1 tree-basednonbinaryphylogeneticnetworks.Moreover,weshowforeachofthesetwoclassesthatitcanbe 0 decidedinpolynomialtimewhetheragivennetworkiscontainedintheclass.Ourapproachalsoprovides 2 a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary p phylogeneticnetworksinbiologyandshowhowourresultscanbeappliedtothem. e S IndexTerms 0 Phylogenetictree,phylogeneticnetwork,evolution,tree-of-life,tree-based 3 ✦ ] E 1 INTRODUCTION P . o Rooted phylogenetic networks are becoming increasingly popular as a way to describe evolutionary i historiesthatcannotbedescribedbyaphylogenetictree[1],[2].Theleavesofsuchanetworkarelabelledand b represent,forexample,currently-livingspecies,whiletherootofthenetworkrepresentsacommonancestor - q of those species. Vertices with two or more outgoing arcs represent a divergence event in which a lineage [ splitinto twoor more lineages,while vertices withtwo or more incoming arcs representa convergence of differentlineagesintoasinglelineage.Thelattereventsarecalledreticulateevolutionaryeventsandinclude, 3 v for example, hybridization, introgression and horizontal gene transfer. Therefore, these vertices are called 4 reticulations.Aphylogeneticnetworkwithoutreticulationsisa(rooted)phylogenetictree.Hence,phylogenetic 7 networksareamoregeneralmodelforevolutionaryhistoriesthanphylogenetictrees. 9 Although the occurence of reticulate evolutionary events is well-accepted, there are different views on 4 their importance. One possibility is to see evolution as a mainly tree-like (vertical) process with sporadic 0 horizontalevents.Theotherextremeistocompletelyabandontheideaofatree-of-lifeandtoseeevolution . 1 purelyas a network [3], [4], [5], [6]. This discussionis especiallyrelevant for prokaryotes, where the main 0 formofnon-treelikeevolutionishorizontalgenetranser,i.e.geneticmaterialistransferredfromonespecies 6 toanothercoexistingspeciesthatisnotadescendant.Iftheevolutionaryhistoryofagroupofprokaryotes 1 : is mainly tree-like, then you could describe such a history as a phylogenetic species tree with additional v cross-connectingarcsdescribingthehorizontalgenetransferevents.However,iftheirevolutionisinherently i X network-like,thenitmightnotbepossibletoidentifyanytree-likesignalatall. Thisdiscussionhasrecentlyledtotheintroductionofanewclassofphylogeneticnetworkscalled“tree- r a based”[7],whichcontainsthosenetworksthatcanbedescribedbyaphylogeneticbase-treewithadditional linkingarcsbetweenbranchesofthebase-tree.Thisnotionwasmotivatedbytheobservationthatthisisnot alwayspossible,i.e.thereexistnetworksthatcannotbedescribedasabase-treewithlinkingarcs[8]. Francis andSteelshowed recentlythat there is a polynomial-time algorithm todecide whethera given binary phylogenetic network is tree-based or not [7]. In addition, it was shown that any phylogenetic network can be made tree-basedby the addition of leaves. Hence, this notion has to be used with caution in the presence of possible extinctions or under-sampling. Even more recently, a simple graph-theoretic characterizationwasgiventhatcanalsobeusedtodecidewhetheragivenbinarynetworkistree-basedor not [9]. Unfortunately, these results are all restrictedto binary phylogenetic networks, in which all vertices • E-mail:[email protected]@gmail.com • DelftUniversityofTechnology,PO-box5,2600AA,Delft, TheNetherlands. • LeovanIerselwaspartlysupportedbyaVidigrantfromtheNetherlandsOrganisationforScientificResearch(NWO)andpartlybythe 4TUAppliedMathematicsInstitute(4TU.AMI). 2 Reticulationnumber Leaves ReticulateProcess Tree-based Reference Violets 21 16 polyploidisation no [15,Figure4],Figure19 OriginofEukaryotes 6 41 endosymbiosis yes [16,Figure2],Figure20 Influenza 5 7 reassortment yes [17,Figure1] Cichlids(fish) 5 19 hybridisation yes [18,Figure4] TABLE1 Examplesofnonbinaryphylogeneticnetworksinbiology.Thereticulationnumber isdefinedasthetotalnumberof“additional branches”inthenetwork,i.e.areticulationwithpparentsaddsp−1tothereticulationnumber. have at most two incoming and at most two outgoing arcs (see the next section for precise definitions). Moreover,thetechniquesusedbytheseauthorsdonot(easily)extendtononbinarynetworks. Here,wealsoconsidernonbinaryphylogeneticnetworks.Insuchanetwork,avertexcanhavemorethan twooutgoingarcs, representinguncertaintyin the orderof divergence events,ormore thantwo incoming arcs, representing uncertainty in the order of reticulate events. See Table 1 for real biological examples of nonbinaryphylogeneticnetworks (alsosee[10], [11], [12]). Ingeneral, suchuncertaintiescannot simplybe overcome by collecting more data [13], [14]. Since the tree-basedness of nonbinary networks has not been introducedorstudiedbefore,wediscussdifferentpossibledefinitionsoftree-basedinthenonbinarycase. Roughlyspeaking,wecallanonbinaryphylogeneticnetworkstrictlytree-basedifitcanbeobtainedfrom a rooted (nonbinary) phylogenetic tree by adding linking arcs between the branches of the tree, such that notwolinkingarcsattachatthesamepoint.Consequently,insuchanetworkallverticeshaveatmosttwo incomingarcs.Inaddition,allverticeswithmorethantwooutgoingarcscorrespondtoverticesofthebase- tree, becausethe newvertices that are created bythe addition of linking arcs all get at most two outgoing arcs: one of the base-tree and one linking arc. Hence, a strictly-tree-based network can be nonbinary only becausethebase-treecanbenonbinary. We alsoconsider tree-basednonbinary phylogeneticnetworks, which arenetworks that canbe obtained from a rooted (nonbinary) phylogenetic tree by adding linking arcs between branches and/or vertices of thetree.Itturnsoutthatanetworkisinthisclasspreciselyifithasatleastone binaryrefinementthatisa tree-basedbinaryphylogeneticnetwork.Thisisamoregeneralclassthanthestrictly-tree-basedvariant. Our mainresultsareas follows. We first presentanalternativeviewon binarytree-basedphylogenetic networks,whichcanpartlybeextendedtononbinarynetworks.Weintroduceomnians,whichwedefineas non-leaf vertices of which all children are reticulations. We then use this notion to obtain a new, simpler characterizationof binary tree-basedphylogenetic networks. We showthat a binary phylogeneticnetwork is tree-based if and only if every subset S of its omnians has at least |S| different children. We use this to derive,inanalternative(independentlydiscovered)way,thecharacterizationofbinarytree-basednetworks in terms of zig-zag paths [9] and a new matching-based algorithm for deciding whether a given binary networkistree-based.Wealsoshowthateverybinarynetworkwithatmosttworeticulationsistree-based andgiveanewsufficientconditionforabinarynetworktobetree-based. We then proceed to nonbinary networks. We show that our characterization of binary tree-based phy- logenetic networks in terms of omnians can easily be generalized to the nonbinary case. We then obtain the first polynomial-time algorithm for deciding whether a nonbinary phylogenetic network is tree-based. Additionally,weshowasimplecounterexample,showingthatthecharacterizationbasedonzig-zagpaths can not be used to characterize nonbinary tree-based networks. However, we also show that nonbinary strictly-tree-based phylogenetic networks can be characterized using zig-zag paths. Consequently, also for thisclassofnetworksitcanbedecidedinpolynomialtimewhetheragivennetworkbelongstotheclass. We also discuss “stable” phylognetic networks [19], in which for each reticulation r there exists some leaf x such that all paths from the root to x go through r. We show that, although all binary stable phylogeneticnetworksaretree-based,thisisnotalwaysthecasefornonbinarynetworks. This paper is organized as follows. We first give the definitions and new results for binary networks in Section 2, then the definitions and results for nonbinary networks in Section 3. Examples of how these results can be appliedto real, biological, nonbinary phylogenetic networks are given in Section 4. We end withadiscussioninSection5. 2 BINARY PHYLOGENETIC NETWORKS 2.1 Preliminaries First, some essential concepts around binary phylogenetic networks will be explained. Phylogenetic net- workscontainverticesanddirectededges.Directededgeswillbecalledarcsfromnowon. 3 Definition 1. A(rooted)binaryphylogeneticnetworkisadirectedacyclicgraphN =(V,A),whichcontainsa singlerootwithindegree0andoutdegree1or2andmayinadditioncontainthefollowingtypesofvertices: • verticeswithoutdegree0,calledleaves,whicharelabelled; • verticeswithindegree2andoutdegree1,calledreticulations; • verticeswithindegree1andoutdegree2,calledtree-vertices. An example of a binary phylogenetic network is given in Figure 1, in which leaves are coloured blue and reticulations are indicated with a pink shading around the nodes. A (rooted) binary phylogenetic tree is a binary phylogenetic network that contains no reticulations. Although every arc is drawn without arrow head, they are all directed to the lowest vertex. This is the case throughout the paper, unless explicitly mentionedotherwise. Root y x c a b Fig.1.Anexampleofabinaryphylogeneticnetworkwithleaflabelsa,bandc,whichcane.g.representthreepresent-dayspecies, andomniansxandy. Take(u,v)=a∈A,anarcfromvertexutov.Then,aiscalledanoutgoingarcofuandanincomingarc ofv.Vertexuisaparentofvandviscalledachildofu.Ifthereisalsoanarc(u,w)∈A,thenvertexwandv haveajointparent,sowandvarecalledsiblings.Whenanon-leafvertexzhasonlyreticulationsaschildren, then z is called an omnian. For example in Figure 1, vertices x and y are omnians, since both children of theseverticesarereticulations.Verticescanbeomnianandreticulationatthesametime,seee.g.verticesu andv inFigure3.Becauseoftheimportanceofomnians,whichwillbecomeclearlateroninthepaper,we alwaysusesquarenodesforomniansandcircularnodesforallothervertices. Definition2. AbinaryphylogeneticnetworkN istree-basedwithbase-treeT,whenN canbeobtainedfromT via thefollowingsteps: (i) AddverticestothearcsofT.Thesevertices,calledattachmentpoints,havein-andoutdegree1. (ii) Addarcs,calledlinkingarcs,betweenpairsofattachmentspoints,sothatN remainsbinaryandacyclic. (iii) Suppresseveryattachmentpointthatisnotincidenttoalinkingarc. Note in particular that it is not allowed to create multiple linking arcs between the same pair of attachmentpointssinceN isrequiredtobeabinaryphylogeneticnetwork. Abinaryphylogeneticnetworkistree-basedifitistree-basedwithbase-treeT forsomebinaryphyloge- netictreeT.AnexampleoftheprocedureisdisplayedinFigure2.Anexampleofabinarynetworkthatis nottree-basedisgiveninFigure3. Arootedspanningtreeτ ofaphylogeneticnetworkN isasubgraphofN thatisarootedtreeandcontains allvertices(andasubsetofthearcs)ofN.Adummyleaf ofarootedspanningtreeτ isavertexthatisnota leafinnetworkN,butisaleafinτ.Hence,abinaryphylogeneticnetworkistree-basedifandonlyifithas arootedspanningtreewithoutdummyleaves. We will make heavy use of the following bipartite graph. Let N = (V,A) be a binary phylogenetic network. The bipartite graph associated to N is the bipartite graph B = (U ∪ R,E) containing a vertex vo ∈ U for each v ∈ V that is an omnian, a vertex wr ∈ R for each w ∈ V that is a reticulation, and an edge {vo,wr} ∈ E for each(v,w) ∈ Awithv andomnianandw a reticulation.Hence,for avertex v ∈ V thatisareticulationandanomnian,thereisavertexvoinU aswellasavertexvr inR.Foreaseofnotation, wewillomitthesubscriptsfornowonandrefertovo andvr simplyasv.AnexampleisgiveninFigure4. Let N = (V,A) be a binary phylogenetic network. An antichain is a set of vertices K ⊆ V for which thereisnodirectedpathfrom anyvertexinK toanyothervertexinK.NetworkN satisfiestheantichain- to-leafpropertyiffor everyantichaininN thereexistsapathfrom everyvertexinK toaleaf,sothatthese pathsarearc-disjoint.Whichmeans,forexample,thatifthereisanantichainofthreeverticesandthereare onlytwoleavesinthenetwork,thenetworkdoesclearlynotsatisfytheantichain-to-leafproperty. 4 a b c a b c a b c (a) (Base-)treeT (b) Attachmentpointsadded (c) Linkingarcsadded c a a b c b (d) Attachmentpointssuppressed (e) Redrawn as the network from Figure1 Fig.2.Fromphylogenetictreetophyogeneticnetworkinsteps(a)to(e),whichshowsthatthenetworkfromFigure1istree-based. α α 1 2 u i v x j c a b Fig.3.Anexampleofanon-tree-basedbinaryphylogeneticnetwork.Becausearcsiandjaretheonlyoutgoingarcsofverticesu andvrespectively,theywouldbothhavetobepresentinthebase-tree.However,vertexxwouldthenhavetwoincomingarcsin thebase-tree,whichisnotallowed. An example of an antichain can be seen in Figure 3, where vertices α1 and α2 form an antichain. The network does not satisfy the antichain-to-leaf property, because when we look at the antichain formed by verticesuandv,therearenoarc-disjointpathstoleaves. Avertexv iscalledstableifthereexistsaleaflforwhicheverypathfromtheroottolpassesthroughv. Anetworkiscalledstableifeveryreticulationisstable. Let G = (V,E) be a graph. If v,w ∈ V so that(v,w) ∈ E, then w is a neighbour of v. For a set S ⊆ V, the neighbours of S are denoted by Γ(S). A matching M ⊆ E is a set of edges so that no vertex v ∈ V is incidentwithmorethanoneedgeinM.AmaximalpathinGisapaththatisnotcontainedinalargerpath. Thefollowingknownresultswillbeuseful. Proposition 2.1. [7]ConsiderabinaryphylogeneticnetworkNoverleafsetX. (i) IftheparentsofeachreticulationofN aretree-vertices,thenN istree-based. (ii) IfN hasareticulationwhoseparentsarebothreticulations,thenN isnottree-based. Proposition 2.2. [19]Inastable binaryphylogeneticnetworkN, thechildand theparentsofeach reticulationare tree-vertices. Thefollowingobservationfollowsdirectlyfromtheprevioustwopropositions. Corollary2.3. Everybinarystablephylogeneticnetworkistree-based. 5 2.2 Results Thefollowingtheoremwillbeusedtoobtainasimplegraph-theoreticcharacterizationofbinarytree-based phylogeneticnetworks. Theorem 2.4. Let N be a binary phylogenetic network and B = (U ∪R,E) the bipartite graph associated to N. NetworkN istree-basedifandonlyifthereexistsamatchingM inB with|M|=|U|. Proof. AssumethereexistsamatchingM inBwith|M|=|U|,i.e.,allomniansarecoveredbyM.Construct asetAofarcsasfollows:addtheoutgoingarcofeveryreticulationandtheincomingarcofalltree-vertices toA.Additionally,foreachedgeofM,addthecorrespondingarcofN toA,ifithasnotyetbeenadded.For everyreticulationthathasnotyetbeencovered,addoneofitsincommingarcstoA,arbitrarily.ThetreeT, consistingofallverticesofN andthesetofarcsA,isarootedspanningtree,becausethereispreciselyone incoming arc of every vertex contained in T. Moreover, there are no dummy leaves, because U is covered byM.Hence,itfollowsthatN istree-based. Now, assumethatN is tree-basedwithbase-treeT.Colour everyedge of B thatcorresponds to anarc in T.When an omnianhas outdegree 2 and bothoutgoing arcs are contained in T, decolourize one of the twocorrespondingedgesofB,arbitrarily.Hence,eachvertexofU isincidenttoatmostonecolourededge. Since T is a rooted tree, it contains at most one incoming arc of each reticulation. Hence, also each vertex ofRisincidenttoatmostonecolourededge.SothecolourededgesofB formamatchingM.BecauseT is abase-tree,therearenodummyleaves,andsoallomniansarecoveredbyM. This theorem can be used to verify whether a binary phylogenetic network N is tree-based or not in polynomialtime,usinganalgorithmformaximumcardinalitybipartitematching(seee.g.[20]). WewilllookatanexampleofabinaryphylogeneticnetworkN andthebipartitegraphB =(U ∪R,E) associatedtoN inFigure4.Sincethereexistsamatching,whichiscolouredblueanddash-dottedinFigure 4(b),thatcoversU,thebinaryphylogeneticnetworkinFigure4(a)istree-based.Abase-treeT ofnetworkN can be seen in Figure 5, where the arcs that correspond to edges of the matching are dash-dotted and linking-arcsaredashed. U R y y v z y z z q x x a d b c v q (a) A rooted binary phylogenetic net- (b) The bipartite graph asso- workN. ciated to N, with a match- ing covering U indicated by dash-dottedlines. Fig.4.UsingTheorem2.4toshowthatthisisatree-basedbinaryphylogeneticnetwork. v y z q x a d b c a b c d Fig.5.Abase-treeT ofthenetworkinFigure4(a). Since a binary phylogenetic network that contains no reticulations is a rooted tree, such a network is clearly tree-based. The next theorem shows that this is still the case for all networks with one or two reticulations.Ontheotherhand,Figure6showsapartofanetworkN thatcontainsthreereticulationsand isnottree-based.Soitfollowsthatnotallnetworkswiththreereticulationsaretree-based. 6 Theorem2.5. IfabinaryphylogeneticnetworkN containsatmosttworeticulations,thenN istree-based. Proof. If N contains only one reticulation, then both parents of this reticulation are tree-vertices and with Proposition2.1itfollowsthatN istree-based. NowconsiderthecasethatN containsexactlytworeticulationsxandy.Ifxandydonothaveaparent- child relation, then both parents of x and y are tree-vertices and it follows from Proposition 2.1 that N is tree-based.Nowsupposethatxistheparentofy.Therearetwopossibilities,xandy havingajointparent andxandy havingdifferentparents,bothdisplayedinFigure7. Fig.6.Localsituationinanetworkthathasthreereticulationsandisnottree-based. z z w x x y y N N 1 2 (a) (b) Fig.7.Thetwopossibilitiesthatcanoccurwhenreticulationxistheparentofreticulationy,usedintheproofofTheorem2.5. FrompartialnetworksN1andN2ofFigure7wecreatetwobipartitegraphs,A=(U ∪R,E)associated toN1 andB =(U ∪R,E)associatedtoN2,thataredisplayedinFigure8. U R U R z x x x x y y (a) BipartitegraphAassoci- (b) BipartitegraphBassoci- atedtonetworkN1. atedtonetworkN2. Fig.8.ThebipartitegraphsassociatedtothepartialnetworksinFigure7. In both cases in Figure 8 it is easy to see that there is a matching that covers U. It then follows from Theorem2.4thatN istree-based. Toobtainasimplecharacterizationofbinarytree-basednetworks,wewilluseHall’sTheorem,whichis statedbelow. Theorem2.6(Hall’sTheorem[21]). LetB =(U ∪W,E)beabipartitegraph.ThereexistsamatchinginB that coversU ifandonlyif,foreveryU1 ⊆ U,thenumberofdifferentneighboursoftheverticesinU1 isatleast|U1|. Consider Hall’sTheorem andTheorem 2.4. Combiningthose twotheorems givesa characterizationfor abinaryphylogeneticnetworktobetree-based. Corollary 2.7. Let N be a binaryphylogeneticnetwork and U the set of all omnians of N. Then N is tree-basedif andonlyifforallS ⊆U thenumberofdifferentchildrenoftheverticesinS isgreaterthanorequaltothenumberof omniansinS. 7 Proof. FollowsdirectlyfromTheorem2.4andTheorem2.6. AnexampleofhowthistheoremandcorollarycanbeappliedisgiveninFigure9,whereanexampleof abinaryphylogeneticnetworkN isdisplayedin(a)andthebipartitegraphB =(U ∪ R,E)associatedto N in(b).Omniansareindicatedassquarenodesandreticulationsaremarkedwithapinkshadingaround thenodes. From the bipartite graph in Figure 9 it follows with Hall’s Theorem, with S = U, that there exists no matching in B that covers U. Therefore, with Theorem 2.4 it follows that N in Figure 9(a) is not tree- based. Indeed, we can directly see in N that the omians in S = {a,i,h,f,g} have only four different children{b,c,d,e}(notethat,ingeneral,thesetS andthesetofchildrenofverticesinS donothavetobe disjoint).Hencethisnetworkisnottree-based. a i b U R p h f a x b a c q c f g i y d h e e d w d g f Fig.9.Exampleofanon-tree-basedbinaryphylogeneticnetworkN andthebipartitegraphBassociatedtoN. Inapreviousarticle,thefollowingnecessaryconditionforanetworktobetree-basedwasfound. Proposition 2.8. [7]IfabinaryphylogeneticnetworkoverleafsetX istree-based,thenitsatisfiestheantichain-to- leafproperty. On the other hand, if a network is not tree-based, it can still satisfy the antichain-to-leaf property, as shownbytheexamplefrom[7]inFigure10.However,itisdifficulttoseeinthisnetworkwhatisgoingon. Therefore,weshowinFigure11twoexamplesoflocalstructuresthatcauseanetworktobenon-tree-based. The example inFigure 11(a) is similar to the local structure that causesthe network of Figure 10 not tobe tree-based,whichcanbecheckedusingCorollary2.7. At first sight,it might look like networks containingthe localstructures inFigure 11cannot satisfythe antichain-to-leafproperty.However,inFigure12weshowthatitisindeedpossiblethattheydosatisfythis property. Although it was already known that networks that satisfy the antichain-to-leaf property are not necessarilytree-based,thesefiguresilluminatewhythisispossible. Looking at the examples in Figure 11, we see that a pattern has emerged. In (a) the pattern is marked dashed in green. Starting at vertex o1 and ending at vertex o5, we see a zigzag starting with an omnian, alternatingbetweenreticulations and omnians, eventuallyending withan omnian. The last omnianin the patterncanbeareticulationthatisalreadypartofthepath,ascanbeseeninFigure11(b). The next theorem shows that every binaryphylogenetic network that is not tree-basedcontains a local structureasintheexamplesinFigure11. Theorem 2.9. Let N be a binary phylogenetic network and B = (U ∪R,E) the bipartite graph associated to N. NetworkN istree-basedifandonlyifB containsnomaximalpathwhichstartsandendsinU. Proof. Notice that every vertex in B is of degree at most 2. Therefore, B is a disjoint union of paths and cycles.Hence,foreachconnectedcomponentB′ =(U′∪R′,E′)ofB,therearefourpossibletopologies: 8 o 2 o 1 r 1 o 3 r 2 Fig.10.Nottree-basedbinaryphylogeneticnetworksatisfyingtheantichain-to-leafproperty[7]. o o o 2 3 1 r1 =o4 r2 o o o o o 1 2 3 4 5 r r r r 1 2 3 4 r 3 (a) (b) Fig.11.Examplesoflocalstructuresofbinaryphylogeneticnetworksthatarenottree-based. i) AmaximalpaththatbeginsandendsinR. ii) AmaximalpaththatbeginsinU andendsinR. iii) AmaximalpaththatbeginsandendsinU. iv) Acircuit. i)AllverticesinRareofdegreeatmost2.BecausethemaximalpathbeginsandendsinR,allvertices inU′havedegree2.LetS ⊆U′.RecallthatΓ(S)denotesthesetofneighboursofverticesinS.Thenumber of edges incident toS = 2|S| ≤ the numberof edges incident to Γ(S) ≤ 2|Γ(S)|. Therefore,|S| ≤ |Γ(S)| forallS ⊆U′.ItfollowsfromHall’sTheoremthatthereexistsamatchinginB′ thatcoversU′. ii) All vertices in R are of degree at most 2. All vertices in U′ have degree 2, except for the omnian o1 wherethemaximalpathbegins.LetS ⊆U′.ConsiderthesubgraphofB′inducedbyS∪Γ(S).Itconsistsof paths.Firstconsidersuchapaththatdoesnotcontaino1.ThenthepathmustbeginandendinR,because every omnian in S except for o1 has two neighbours in Γ(S). Hence, the paths contain more reticulations than omnians. Now consider a path that contains o1. Then it is a path that begins in U and ends in R. Hence,itcontainsasmanyomniansasreticulations.Itfollowsthatallpathstogether,i.e.thesubgraphofB′ inducedbyS∪Γ(S),containsmorereticulationsthanomnians,so|S|≤|Γ(S)|.Again,sincethisholdsfor allS ⊆U′.ItfollowsfromHall’sTheoremthatthereexistsamatchinginB′ thatcoversU′. iii) LetS = U′. Then|S| = |Γ(S)|+1. Hence,itfollows from Hall’sTheoremthattheredoes not exista matchinginB thatcoversU. iv) All vertices in B are of degree 2 and it follows in the same way as in case i) that |S| ≤ |Γ(S)| for allS ⊆U′ andhencethatthereexistsamatchinginB′ thatcoversU′. 9 (a) (b) Fig.12.Localstructuresofbinaryphylogeneticnetworksthatsatisfytheantichain-to-leafpropertybutarenottree-based. Hence, there exists a matching in B that covers U precisely if there is no maximalpath that starts and endsinU.ThetheoremnowfollowsfromTheorem2.4. Proposition 2.1 showed that a binary phylogenetic network is tree-based if for each reticulation both parents are tree-vertices and not tree-based if for at least one reticulation both parents are reticulations. However, inthe situationinwhich a reticulationin N has one parentthat is a reticulationand the other a tree-vertexitisnotimmediatelyclearifN istree-basedornot.Thenextcorollaryshowsthatsuchnetworks aretree-basedifanadditionalconditionisfulfilled. Corollary2.10. IfforeveryreticulationrinabinaryphylogeneticnetworkN either (i) bothparentsofraretree-vertices;or (ii) oneparentofrisatree-vertexandthesiblingofrisatree-vertexoraleaf, thenN istree-based. Proof. LetB =(U∪R,E)bethebipartitegraphassociatedtoN.SinceallverticesinBhavedegreeatmost two, each connected component of B is a pathor a cycle. Assumethat there exists a maximalpathP that startsandendsinU.Letu∈U bethefirstvertexonthispath.Sinceuisanomnianwithonlyonechild,itis alsoareticulation.Letrbetheonlychildofu.Sinceoneparentofrisareticulation,theotherparentpofr mustbe a tree-vertex andthe siblingof r alsoa tree-vertexor a leaf. However, that meansthat p is not an omnian.Hence,pathP endsinr, whichisacontradictiontotheassumptionthatP endedinU.Itfollows thatthereisnopaththatstartsandendsinU.ByTheorem2.4itfollowsthatN istree-based. The following characterization of binary tree-based phylogenetic networks follows directly from The- orem 2.9, thus providing an alternative proof of this characterization which was independently discov- ered (in a slightly different form) by Louxin Zhang [9]. See Figure 11 for examples. We call a sequence (u1,v1,...,uk,vk,uk+1) of 2k +1 vertices (k ≥ 1) of a network N a zig-zag path if vi is the child of ui andui+1 fori=1,...,k. Corollary 2.11. A binary phylogeneticnetwork is tree-based if and only if it contains no zig-zag path (o1,r1,..., ok,rk,ok+1), with k ≥ 1, in which r1,...,rk are reticulations, o1,...,ok+1 are omnians and o1 and ok+1 are reticulationsaswellasomninans. 3 NONBINARY PHYLOGENETIC NETWORKS 3.1 Preliminaries Westartwiththedefinitionofnonbinarynetworks.1 AnexampleisgiveninFigure13. 1.Wheneverwerefertononbinary,wemean“not-necessarily-binary”. 10 Definition 3. A(rooted)nonbinaryphylogeneticnetworkisadirected,acyclicgraphN =(V,A)thatcontains asinglerootwithindegree0andoutdegree1ormoreandmayadditionallycontainthefollowingtypesofvertices: • verticeswithindegree1andoutdegree0,calledleaves(colouredblueinFigure13),whicharelabelled; • verticeswithoutdegree1andindegree2ormore,calledreticulations(markedinpinkinFigure13); • verticeswithindegree1andoutdegree2ormore,calledtree-vertices. Root a b d e f g c Fig.13.Exampleofanonbinaryphylogeneticnetwork. Anonbinaryphylogenetictreeisanonbinaryphylogeneticnetworkwithoutreticulations. Wewillconsidertwodifferentvariantsoftree-basednessofnonbinarynetworks,whichwename“tree- based”and“strictly-tree-based”. Definition 4. AnonbinaryphylogeneticnetworkN iscalledtree-basedwithbase-treeT,whenN canbeobtained fromT viathefollowingsteps: (a) AddsomeverticestoarcsofT.Thesevertices,calledattachmentpoints,havein-andoutdegree1. (b) Addarcs,calledlinkingarcs,betweenpairsofattachmentspointsandfromtree-verticestoattachmentpoints, sothatN remainsacyclicandsothatattachmentpointshaveindegreeoroutdegree1. (c) Suppresseveryattachmentpointthatisnotincidenttoalinkingarc. Definition 5. A nonbinary phylogeneticnetwork N is called strictly tree-basedwith base-tree T,whenN can be obtainedfromT viathefollowingsteps: (1) AddsomeverticestoarcsinT.Thesevertices,calledattachmentpoints,havein-andoutdegree1. (2) Add arcs, called linking arcs, between pairs of attachments points, so that N remains acyclic and so that exactlyonelinkingarcisattachedtoeachattachmentpoint. A nonbinary phylogenetic network is tree-based if it is tree-based with base-tree T for some nonbinary phylogenetictreeT.Similarly,anonbinaryphylogeneticnetworkisstrictlytree-basedifitisstrictlytree-based withbase-treeT forsomenonbinaryrootedphylogenetictreeT. The distinction between tree-based and strictly tree-based is illustrated by two examples in Figure 14. Anexampleofastrictly-tree-basednonbinarynetworkcanbefoundinFigure18(a). a b c d e a b c d (a) Thisnetworkisnotstrictly (b)Thisnetworkisnotstrictly tree-basedbecausethelinking tree-based because two link- arc is attached to a vertex of ing arcs are attached to the the base-tree, rather than to same attachment point, and an attachmentpoint, and this thiscannotbeavoided. cannotbeavoided. Fig. 14. Two nonbinary networks that are both tree-based but not strictly tree-based. The black solid lines indicate possible base-trees,whilethedashed,greylinesarethelinkingarcs. Thefollowingobservationsareeasilyverified.