Bioinformatics Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant Biology1[OA] AnuradhaPujar2,PankajJaiswal2,ElizabethA.Kellogg2,KaticaIlic2,LeszekVincent2,ShulamitAvraham2, PeterStevens2,FelipeZapata2,LeonoreReiser3,SeungY.Rhee,MartinM.Sachs,MarySchaeffer, LincolnStein,DoreenWare,andSusanMcCouch* Department of Plant Breeding, Cornell University, Ithaca, New York 14853 (A.P., P.J., S.M.); Department of Biology, University of Missouri, St. Louis, Missouri 63121 (E.A.K., P.S., F.Z.); Department of Plant Biology, Carnegie Institution, Stanford, California 94305 (K.I., L.R., S.Y.R.); Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211 (L.V., M.S.); Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724 (S.A., L.S., D.W.); Missouri Botanical Garden, St. Louis, Missouri 63110 (P.S., F.Z.); Maize Genetics Cooperation-Stock Center, Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 (M.M.S.); and Agricultural Research Service, United States Department of Agriculture, Washington, DC 20250 (M.M.S., M.S., D.W.) Plant growth stages are identified as distinct morphological landmarks in a continuous developmental process. The terms describingthesedevelopmentalstagesrecordthemorphologicalappearanceoftheplantataspecificpointinitslifecycle.The widely differing morphology of plant species consequently gave rise to heterogeneous vocabularies describing growth and development.Eachspeciesorfamilyspecificcommunitydevelopeddistinctterminologiesfordescribingwhole-plantgrowth stages. This semantic heterogeneity made it impossible to use growth stage description contained within plant biology databasestomakemeaningfulcomputationalcomparisons.ThePlantOntologyConsortium(http://www.plantontology.org) wasfoundedtodevelopstandardontologiesdescribingplantanatomicalaswellasgrowthanddevelopmentalstagesthatcan be used for annotation of gene expression patterns and phenotypes of all flowering plants. In this article, we describe the developmentofagenericwhole-plantgrowthstageontologythatdescribesthespatiotemporalstagesofplantgrowthasaset of landmark events that progress from germination to senescence. This ontology represents a synthesis and integration of terms and concepts from a variety of species-specific vocabularies previously used for describing phenotypes and genomic information.Itprovidesacommonplatformforannotatinggenefunctionandgeneexpressioninrelationtothedevelopmental trajectory of a plant described at the organismal level. As proof of concept the Plant Ontology Consortium used the plant ontologygrowthstageontologytoannotategenesandphenotypesinplantswithinitialemphasisonthoserepresentedinThe Arabidopsis Information Resource, Gramenedatabase,andMaizeGDB. Plant systems are complex, both structurally and thisdatawithinthecontextofanyparticularbiological operationally, and the information regarding plant system (Gopalacharyulu et al., 2005). The ability to development requires extensive synthesis to provide extract knowledge from historical sources and inte- a coherent view of their growth and development. grateitwithnewinformationderivedfromglobaldata- The difficulty of developing such a synthesis is exac- setsrequires a sophisticated approach to data mining erbated by the deluge of new technologies such as andintegration. high-throughputgenotyping, microarrays, proteomics, Historically, the growth and development of culti- transcriptomics, etc., that generate large amounts of vated plants have been monitored at the whole-plant data rapidly. The speed and magnitude of data depo- level with the help of scales of easily recognizable sitionchallengesourabilitytorepresentandinterpret growthstages.Consequently,thereexistlargevolumes of literature detailing growth stages for individual 1ThisworkwassupportedbytheNationalScienceFoundation plant species or closely related groups of species. For (grantno.0321666). example,Zadok’sscale(Zadoketal.,1974)wasdevel- 2Theseauthorscontributedequallytothepaper. opedfortheTriticeaecropsandiswidelyusedtostage 3Present address: Molecular Sciences Institute, 2168 Shattuck the growth and development of cereal crops in the Ave.,Berkeley,CA94704. UnitedStates.Theflexibilityofthisscalehasallowedit *Corresponding author; e-mail [email protected]; fax 1–607– to be extended to other cultivated plants, and a uni- 255–6683. form code called the Biologische Bundesanstalt, Bun- Theauthorresponsiblefordistributionofmaterialsintegraltothe dessortenamt, and Chemical Industry (BBCH) code findings presented in this article in accordance with the policy was developed from it (Meier, 1997). The BBCH scale describedintheInstructionsforAuthors(www.plantphysiol.org)is: is quite generic and encompasses multiple crops, SusanMcCouch([email protected]). [OA]Open Access articles can be viewed online without a sub- including monocot and eudicot species. It offers scription. standardized descriptions of plant development in www.plantphysiol.org/cgi/doi/10.1104/pp.106.085720 the order of phenological appearance, and has coded 414 Plant Physiology, October 2006, Vol. 142, pp. 414–428, www.plantphysiol.org (cid:2) 2006 American Society of Plant Biologists Plant Growth Stage Ontology each stage for easy computer retrieval. It should be ogies (controlled vocabularies) to facilitate the anno- noted that Arabidopsis (Arabidopsis thaliana), as a tationofgeneticinformation.Forexample,theGramene representative of the Brassicaceae and by virtue of database(Jaiswaletal.,2006)designeditscerealgrowth not being a cultivated species, did not have a specific stage ontology based on the stages described in the growth stage vocabulary or scale until 2001 when standard evaluation system for rice (Oryza sativa; Boyes et al. (2001) developed an experimental plat- INGER, 1996) and those described by Counce et al. form describing the Arabidopsis growth stages using (2000)forrice,byZadoketal.(1974)forTriticeae(wheat the BBCH scale. This work created a crucial semantic [Triticumaestivum],oat[Avenasativa],andbarley[Hordeum link between Arabidopsis and cultivated plants. In vulgare]),andbyDoggett(1988)forsorghum.Exceptfor additiontofacilitatingthedescriptionandsynthesisof the sorghum, which is a less studied crop, these species large amounts of data within a crop species, vocabu- had fairly well-described growth staging vocabularies. lariesliketheBBCHandZadok’sscalealsomakepos- MaizeGDB (Lawrence et al., 2005) developed a very sible transfer of information among researchers and extensivecontrolledvocabularyfromamodifiedversion provide a common language for comparative pur- ofthatdescribedbyRitchieetal.(1993).TheArabidopsis poses (Counce etal., 2000). InformationResource(TAIR;Rheeetal.,2003)developed In the post genomic era, these scales have proved the Arabidopsis growth stage ontology from the scale inadequate to handle the deluge of information that described by Boyes et al. (2001). However, ontologies requiredlarge-scalecomputationforcomparativeanal- createdintheseprojectsremainedrestrictedtoparticular ysis. This called for the conversion of existing scales species or families, whereas comparative genomics re- into ontology that have an advantage over simple quiresthatacommonstandardvocabularybeappliedto scales because their hierarchical organization facili- abroadrangeofspecies.TheuniformBBCHscale(Meier, tates computation across them. Terms in an ontology 1997) appeared to be a suitable model to develop a areorganizedintheformofatree,thenodesofthetree unifiedontologysincethisscalehadalreadysynthesized represent entities at greater or lesser levels of detail monocotandeudicotcropstagesintoasinglevocabulary. (Smith, 2004). The branches connecting the nodes ThePlantOntologyConsortium(POC)wasinaugu- represent the relation between two entities such that rated in 2003 for the purpose of developing common thetermradicleemergencestageisachildofthepar- ontologies to describe the anatomy, morphology, and ent term germination stage (Fig. 1). Individual stages growthstagesoffloweringplants(Jaiswaletal.,2005). ofascalearethenpartsthatcanberelatedtothewhole Itsprimarytaskwastointegrateandnormalizeexist- by their order of appearance during plant growth. ingspecies-specificontologiesorvocabulariesthathad Each term carries a unique identifier and strictly been developed by several major databases for the specified relationships between the terms allow sys- purpose of annotating gene expression and mutant tematicorderingofdatawithinadatabase,thisinturn phenotype. The plant ontology (PO) is divided into improvesinputandretrievalofinformation(Bardand two aspects. The first is the plant structure ontology Rhee, 2004; Harris et al., 2004). (PSO), a vocabulary of anatomical terms (K. Ilic, E.A. Consequently, several species-specific databases Kellogg, P. Jaiswal, F. Zapata, P. Stevens, L. Vincent, converted BBCH and other scales into formal ontol- S.Avraham,L.Reiser,A.Pujar,M.M.Sachs,S.McCouch, Figure1. Theparentandchildtermorga- nization in the whole-plant GSO. The solidcurvedlinesjoiningthetermsrepre- sent IS_A relationship and the dotted curvedlinessuggestaPART_OFrelation- ship between the child and the parent terms.Atermmayormaynothaveachild term. In this example, germination IS_A vegetativestageandfloweringIS_Arepro- ductive stage. Similarly vegetative stage, reproductive stage, senescence, and dor- mancyaresubtypes(IS_A)ofwhole-plant growthstage. Rootemergence andshoot emergence are PART_OF the seedling growth stage. The seedling growth stage andimbibitionarePART_OFgermination. Inthisimagenotallthechildrentermsare shownforeveryparenttermintheGSO. Plant Physiol. Vol. 142, 2006 415 Pujar et al. M. Schaeffer, D. Ware, L. Stein, and S.Y. Rhee, unpub- tive growth and flowering terms are related by IS_A, lisheddata),which,sinceitsreleasetothepublicdomain because flowering is a type of reproductive growth. in 2004, has become widely used by plant genome On the other hand, seedling growth is related to its databases(Jaiswaletal.,2005).Thesecondaspectisthe children terms radicle emergence and shoot emer- plantgrowth and developmentalstages ontology. This gence by PART_OF, because seedling growth is com- componentofPOisfurtherdividedintothewhole-plant prised of the two processes of radicle emergence and growthstageontology(GSO)developedbythisproject shoot emergence(Figs. 1 and2A). and the plant part developmental stages. This article Eachtermisgivenauniqueaccessionnumbernamed focuses on the whole-plant GSO; we will discuss the PO:XXXXXXX where the series of X’s is a seven-digit history,design,andapplicationsoftheGSOandshow number (Fig. 2A). Accession numbers are never re- how it simplifies the description of a continuous and used, even when the term is retired or superseded. complex series of events in plant development. The Obsolete terms are instead moved to a location in the plant part developmental stages will be reviewed hierarchyunderneathatermnamedobsolete_growth_ elsewhere. and_developmental_stage(Fig.2A).Thisensuresthat there is never any confusion about which term an ac- cessionnumberrefersto.Eachtermalsohasahuman- RESULTS readablenamelikeseedlinggrowth,aparagraph-length definitionthatdescribesthecriteriaforidentifyingthe TheGSOwasdevelopedoveraperiodoftwoyears stage, and citations that attribute the term to a source (2004–2006) by a team of plant biologists comprising database,journalarticle,oranexistingstagingsystem. systematists, molecular biologists, agronomists, plant Many terms also have a synonym list; these are de- breeders, and bioinformaticians. We worked to de- scribed in moredetailbelow. velop a set of terms to describe plant development Our choice of the GO data model was driven by from germination to senescence that would be valid numerouspracticalconsiderations,foremostofwhich across a range of morphologically distinct and evolu- wasthefactthatthedatamodelissupportedbyarich tionarilydistantspecies.Althoughtherateofaddition setofdatabaseschemas,editingtools,annotationsys- of new terms to the GSO has slowed since its initial tems, andvisualization tools. stagesofdevelopment,itisstillunderactivedevelop- ment as we refine the ontology in response to user Naming of Plant Growth Stages input andfeedback from database curators. Currently, the GSO has a total of 112 active terms, Thenextissuewedealtwithwashowtonameplant eachorganizedhierarchically(Figs.1and2A)andasso- growthstages.Althoughdevelopmentinanyorganism ciatedwithahuman-readabledefinition.Althoughwe is a continuous process, it is important to have land- startedwithexistingsystemssuchastheBBCH(Meier, marksthatidentifydiscretemilestonesoftheprocessin 1997)aswellasthecontrolledvocabulariesdeveloped awaythatiseasilyreproducible.Extantsystemseither forArabidopsisbyTAIR,forrice,Triticeae(wheat,oat, name growth stages according to a landmark (e.g. and barley), and sorghum by Gramene database, and three-leaf stage) or by assigning a number or other formaize(Zeamays)byMaizeGDB,thecurrentversion arbitrarylabeltoeachstage.Wechosetodefinegrowth ofGSOisquitedistinctfromitspredecessors.Wewill stagesusingmorphologicallandmarksthatarevisible first discuss the major design issues we dealt with to the naked eye (Counce et al., 2000), because such duringthedevelopmentoftheGSO,andthendescribe descriptive terms are more intuitive, self explanatory the structure of the GSO and its applications to real- to the users, and easy to record in an experiment. To worldproblems.Theontologyterms,database,andgene minimize differences among species, we were able to annotation statistics provided here are based on the describe many growth stages using measurements/ April,2006 releaseof the POC database. landmarks that are in proportion to the fully mature state. For example, the inflorescence stages are de- Architecture of the Ontology scribedinprogressionstartingfromthe‘‘inflorescence justvisible,’’‘‘1/4inflorescencelengthreached,’’‘‘1/2 We chose to make use of the data model originally inflorescence length reached,’’ to ‘‘full inflorescence developed for the gene ontology (GO) to describe the length reached.’’ This provides an objective measure- GSO. This data model uses a directed acyclic graph ment of the degree of maturation of the inflorescence (DAG) to organize a hierarchy of terms such that the inawaythatisnotdependentontheabsolutevalueof most general terms are located toward the top of the theinflorescencelength. hierarchy while the most specific ones are located at the bottom of the hierarchy (Figs. 1 and 2, A and B). Synonyms Each parent term has one or more children, and the relationshipbetweenaparentandoneofitschildrenis Because the GSO crosses species and community named either IS_A to indicate that the child term is a boundaries, we needed to acknowledge the fact that specific type of the parent term, or PART_OF to each community has its own distinct vocabulary for indicate that the child term is a component of the describing plant structures and growth stages. To ac- parentterm(Smith,2004).Forexample,thereproduc- commodate this, we made liberal use of the GO data 416 Plant Physiol. Vol. 142, 2006 Plant Growth Stage Ontology model’s synonym lists, which allow any GSO term to Asanexampleofasynonym,consider‘‘doughstagein have one or more synonyms from species-specific wheat’’ and ‘‘kernel ripening in maize,’’ both of which vocabularythatareconsideredequivalenttotheofficial essentiallyrefertothefruitripeningstage.Theseterms termname.TheGSOcurrentlycontains997synonyms are included as synonyms to the generic (species- takenfromseveralplantspecies.Onanaveragethere independent) GSO term ‘‘ripening’’ (PO:0007010; areaboutninesynonymsperGSOterm(TableI).Like http://www.plantontology.org/amigo/go.cgi?view5 terms,weattributesynonymstothedatabase,literature details&show_associations5terms&search_constraint5 reference,ortextbookfromwhichtheywerederived. terms&depth50&query5PO:0007010). From the end Figure2. TheGSOasseenontheontologybrowseravailableathttp://www.plantontology.org/amigo/go.cgi.A,Forbrowsing, simplyclickonthe[1]iconbeforethetermnameplantgrowthanddevelopmentalstages,andthenonthe[1]nexttowhole- plantgrowthstages(GSO).Thiswillexpandthetreebyopeningthechildrenterms.ThePOIDistheterm’saccessionnumber, andthenumberfollowedbythetermnameisthetotalnumberofassociationsthathavebeencuratedtothegenesforagiven term.Thisnumberwillchangedependingonthegeneproductfilterausermayhavechosen.Userscanalsogetapiechart showingthedistributionofdataassociationstoaterm’schildrenterm.Inthisimage,thegenerallevel(toplevel)termsintheGSO are‘‘A_Vegetativegrowth,’’‘‘B_Reproductivegrowth,’’‘‘C_Senescence,’’and‘‘D_Dormancy.’’Thesubstagesof‘‘A_Vegetative growth’’ are ‘‘0_Germination,’’ ‘‘1_Main Shoot Growth,’’ and ‘‘2_Formation of Axillary Shoot,’’ while the substages of ‘‘B_Reproductive growth’’ are ‘‘3_Inflorescence Visible,’’ ‘‘4_Flowering,’’ ‘‘5_Fruit Formation,’’ and ‘‘6_Ripening.’’ Neither ‘‘C_Senescence’’ nor ‘‘D_Dormancy’’ currently has substages beneath them. The alphanumeric prefixes serve to make the substagesappearintheorderinwhichtheyoccurduringtheplant’slifecycle.Ifthetemporalorderisnotdefinedconsistentlyin allplants,thetermsmaynothavetheseprefixes.Theprefixesareusuallyabbreviationsofthetermname;forexample,LPisfor leafproduction,SEisforstemelongation.Thenumericalportionusesdoubledigitsstartingwith01,02,andsoon.Eachofthe substagesmayhavemorespecificstagesbeneathit.Whenatermisretiredorsuperseded,itisconsideredObsolete.Suchterms aremovedtoalocationinthehierarchyunderneathatermnamed‘‘obsolete_growth_and_developmental_stage.’’B,Adetailed viewofthesubstagePO:0007133,‘‘Leafproduction’’anditschildren.Childrentermsupto20leavesvisiblewereaddedto accommodatethegrowthstagerequirementsofthemaizeplant. Plant Physiol. Vol. 142, 2006 417 Pujar et al. ordering is represented using either the DERIVED_ TableI. Asummaryofthenumberofsynonymsintegratedintothe FROM, DEVELOPS_FROM, or OCCURS_AT_OR_ GSOfromeachspecies/family/source AFTER relationship to indicate that one structure is The integration of synonyms for Soybean and Solanaceae is in derivedfromanotherorthatonestagefollowsanother. progress. However, we found these solutions to be unworkable Species/Source No.ofSynonyms fortheGSObecauseoftherequirementthattheontol- Arabidopsis 93 ogy must represent growth stages across multiple Rice 23 species. For example, consider the process of main Maize 162 shoot growth. In the wheat plant, main shoot growth Wheat 65 may be completed at the nine-leaf stage, while in rice Oat 65 and maize, shoot growth may be completed at the Barley 65 11-and20-leafstages,respectively,andthisvarieswith Sorghum 13 BBCHandZadokscales 381 different cultivars/germplasms (Fig. 3). Transition to Soybean 79 the subsequent stage of reproductive growth is thus Solanaceae(mainlytomato) 51 staggered for each species, and cannot be accurately Allthespecies 997 described by an ontology in which each stage rigidly Averageno.ofsynonymsperGSOterm About9 followsanother. Our compromise is to visually order the display of user’spoint of view, thesynonyms can beused inter- terms in a temporal and spatial fashion, but not to changeably with the generic terms when searching build this ordering into the structure of the ontology databasesthat usethe PO.This means that dataasso- itself. In practice, what we do is add alphabetic and ciatedwiththeripeningstageofallplantsisaccessible numeric prefixes to each term. When terms are dis- eventoana¨ıveuser,irrespectiveofthevariabletermi- playedtheuserinterfacetoolssortthemalphabetically nology, diverse morphologies, and differing develop- so that later stages follow earlier ones (Fig. 2A). This mentaltimelinesofplantssuchaswheatandmaize. compromise is similar to the one taken by the Dro- By using synonyms, we were able to merge 98% of sophiladevelopmentalstagesontology(Flybase,http:// terms from the various species-specific source ontol- flybase.org/;OBO,2005). ogies into unambiguous generic terms. In a few cases As an example of how this works, we describe the we encountered identical terms that are used by dif- stages of leaf production using terms named ‘‘LP.01 ferentcommunitiestorefertobiologicallydistinctstages. one leaf visible,’’ ‘‘LP.02 two leaves visible,’’ ‘‘LP.03 Weresolvedsuchcasesbyusingthesensuqualifierto three leaves visible,’’ and so forth. When displayed indicatethatthetermhasaspecies-specific(notgeneric) using the ontology web browser, the terms appear in meaning.Oneexampleofthis,describedinmoredetail theirnaturalorder(Fig.2B).However,thereisnothing later,is‘‘inflorescencevisible’’(tothenakedeye)versus hard wired into the ontology that indicates that LP.01 ‘‘inflorescence visible (sensu Poaceae).’’ In most plants oneleaf visible precedes LP.02 two leaves visible. theinflorescencebecomesvisibletothenakedeyesoon A related issue is the observation that during plant after it forms, whereas in Poaceae (grasses), the inflo- maturation, multiple developmental programs can rescenceonlybecomes visiblemuchlaterinits devel- proceed in parallel. For instance, the processes of leaf opment,afteremergencefromtheflagleafsheath. production and stem elongation, although coupled, aretemporallyoverlappingandcanproceedatdiffer- ent relative rates among species and among cultivars Spatiotemporal Representation withinaspecies.Werepresentsuchprocessesasinde- Less satisfactory is the design compromise that we pendentchildrenofamoregenericterm.Inthecaseof reachedtorepresentthespatialandtemporalordering the previous example, both leaf production and stem of terms. The existing plant growth scales are orga- elongation are represented as types of main shoot nized by the temporal progression of developmental growth usingthe IS_Arelationship (Figs.1 and 2A). events.However,theGOdatamodelpresentsunique challenges in designing an ontology that represents Description of the Ontology thetemporalorderingoftermsacrossmultiplespecies that display small butkeyvariationsinthat ordering. ThefourmaindivisionsoftheGSOare‘‘A_Vegetative In particular, the GO data model does not have a growth,’’ ‘‘B_Reproductive growth,’’ ‘‘C_Senescence,’’ standard mechanism for representing organisms’ de- and‘‘D_Dormancy’’(Fig.2A).Asdescribedearlier,the velopmentaltimelines.Thishasforcedeachorganism alphabeticprefixistheretoforcethesefourdivisionsto database that has sought to represent developmental bedisplayedintheorderinwhichtheyoccurduringthe eventsusingtheGOmodeltograpplewiththeissueof plant’slifecycleingeneral.Thesubstagesof‘‘vegetative representingadynamicprocessinastaticrepresenta- growth’’are‘‘0_Germination,’’‘‘1_MainShootGrowth,’’ tion.Someanimalmodelorganismdatabases,suchas and‘‘2_FormationofAxillaryShoot,’’whilethesubstages WormBase for Caenorhabditis elegans, Flybase for Dro- of‘‘reproductivegrowth’’are‘‘3_InflorescenceVisible,’’ sophila, Zfin for Zebra fish, have developed develop- ‘‘4_Flowering,’’‘‘5_FruitFormation,’’and‘‘6_Ripening.’’ mentalstageontologies(OBO,2005)inwhichtemporal Neither ‘‘C_Senescence’’ nor ‘‘D_Dormancy’’ currently 418 Plant Physiol. Vol. 142, 2006 Plant Growth Stage Ontology Figure3. Correspondinggrowthstagesindifferentplantsandadvantagesofusingbroadandgranulartermsforannotations.In thisexampleonecansayfloweringoccursinplantAatthesix-leafvisiblestage,inplantBatthenine-leafvisiblestage,andin plantCatthe11-leafvisiblestage.PlantsAtoCrepresenteitherdifferentgermplasmaccessions/cultivarsofthesamespeciesor accessions/cultivarsfromdifferentspecies.Thisnomenclatureallowstheresearchertorecordwhenageneisexpressedora phenotypeisobservedbyfollowingthegradualprogressionoftheplant’slifecycle.Forexample,ifageneisexpressedatthesix- leaforthefifth-internodestage,themeaningisnowclear,whileinthepast,theinformationhadtoberecordedasthefifthleaf fromthetopoftheplant.Suchannotationrequiredthatonewaituntiltheplantcompleteditslifecycletocountthenumberof leavesfromthetop,orthatonemakeanassumptionhowmanyleavestherewouldbeintheplant/populationusedinthestudy. Note:thenumberofnodesandthenumberofleavesisalwayslessthanthenumberofinternodesbyone.Thearrowpointing upwardssuggeststhatthenumbersarecountedinthatdirectioninascendingorderstartingwith1andgoingupton,wherencan beanynumberdependingontheplant. has substages beneath them. Again, the numeric pre- Germination(PO:0007057) fixesarethereonlytomakethesubstagesappearina This node in the GSO has eight children that are logicalorder.Eachofthesubstageshasmultiple,more broadly applicable to seed germination. The stages specificstagesbeneathit. under ‘‘seedling growth’’ and ‘‘shoot emergence’’ are Although the BBCH scale (Meier, 1997) was the not given numerical prefixes, as it is not clear which starting point for the GSO, we have diverged from it event precedes the other among the various species. in many important aspects. A major difference is Only events of seed germination were considered in the number of top-level terms (Figs. 1 and 2A). The this ontology, whereas the BBCH scale equates seed BBCH scale has 10 principle stages as its top-level germination with germination of vegetatively propa- terms, but the GSO only has four. We collapsed four gated annual plants and perennials such as bud BBCH top-level stages (germination, leaf develop- sprouting. The two processes areinfact quite distinct ment, stem elongation, and tillering) into our top-level in terms of organs developing at this stage, the phys- vegetative growth term, and collapsed another six iology, and various metabolic processes, and thus we BBCH top-level terms (booting, inflorescence emer- felt that combining them was inappropriate. gence, flowering, fruit development, and ripening) into reproductive growth. We felt justified in intro- ducing the binning terms vegetative growth and re- Main ShootGrowth(PO:0007112) productive growth for several reasons; (1) to help annotate genes that act throughout these phases; (2) Thisreferstothestageoftheplantwhentheshootis persistentuseincurrentscientificliterature,especially undergoingrapidgrowth.Itcanbeassessedindiffer- when the specific stage of gene action or expression entwaysdependingonthespeciesandtheinterestsof remains unclear; and (3) they were requested by our the biologist. Plants may be equally well described in scientificreviewerstoenhancetheimmediateutilityof termsofleavesvisibleonthemainshootorintermsof theontology. the number of nodes detectable (Zadok et al., 1974), We now look in more detail at some of the more andbiologistsstudyingArabidopsiscommonlyassess importantparts of the ontology. the size of the rosette. To accommodate existing data Plant Physiol. Vol. 142, 2006 419 Pujar et al. associatedtothesetermswecreatedthreeinstancesof visible’’(PO:0007047),‘‘flowering’’(PO:0007026),‘‘fruit main shoot growth, namely the ‘‘leaf production,’’ ‘‘ro- formation’’ (PO:0007042), and ‘‘ripening’’ (PO:0007010; sette growth,’’ and ‘‘stem elongation,’’ with a strong Fig. 2A). The ‘‘inflorescence visible (sensu Poaceae)’’ recommendation to use ‘‘leaf production’’ wherever (PO:0007012) specific to grass family is an instance of possible. the genericterm ‘‘inflorescence visible’’(PO:0007047). Thisinturnhastwoinstances:‘‘booting’’(PO:0007014) and ‘‘inflorescence emergence from flag leaf sheath’’ Leaf Production(PO:0007133) (PO:0007041). As described earlier, the generic ‘‘inflo- Leaves are produced successively so that the pro- rescence visible’’ stage is considered separate from gressionthroughthisstagecanbemeasuredbycount- ‘‘inflorescence visible (sensu Poaceae),’’ as the former ingthenumberofvisibleleavesontheplant(Figs.2B includes allplantswhereinflorescenceformation and and3).Inanyspecies,leavesarealwayscountedinthe visibility coincide, while in members of the Poaceae, same way (Meier, 1997; described in detail later). In manydevelopmentaleventsinthereproductivephase plantsotherthanmonocotyledons,leavesarecounted start during the vegetative phase but manifest them- when they are visibly separated from the terminal selvesasvisiblemorphologicalmarkersmuchlater. bud. The recognition of the associated internode (be- Other stages are similar in their organization to the low)followsthesamerule(Fig.3).Leavesarecounted existing scales, but as we continue including various singly unless they are in pairs or whorls visibly sep- species from families Solanceae and Fabaceae, we aratedbyaninternode,inwhichcasetheyarecounted anticipate that changes in the organization may be as pairs or whorls. In taxa with a hypogeal type of required to accommodate them into theGSO. germination,thefirstleafontheepicotylisconsidered tobeleafoneandingrassesthecoleoptileisleafone. User Interface IntheGSOthestagesofleafproductioncontinueup to 20 leaves/pairs/whorls of visible leaves (Fig. 2A), The GSO terms are in a simple hierarchy that is butthiscanbeemendedtoaccommodatehighernum- intuitivetouse.TheGSOisarelativelysmallontology bers, as new species are included. This is unlike the andhasatotalof112terms,excludingtheobsoletenode. BBCH scale(Meier, 1997), whereonly nine leaves can It has four top nodes, 15 interior nodes (terms associ- becountedandalltherestwouldbeannotatedtonine ated with children terms), and 88 leaf nodes (terms leavesormore.Thiswasdonetoaccommodatetheleaf without any children terms; Fig. 2A). New terms are development stages of maize, where depending on added based on user requests after thorough discus- cultivars the number of leaves can be few as five or sions.AresearchercanbrowsetheGSOusingtheontol- have20ormoreleaves.Themaizecommunityandthe ogybrowseravailableathttp://www.plantontology.org/ MaizeGDBdatabase(Lawrenceetal.,2005)useamodi- amigo/go.cgi. This is a Web-based tool for searching fiedversionofRitchie’sscale(Ritchieetal.,1993)inwhich andbrowsingontologiesandtheirassociationstodata. the stages of the maize plant are measured solely by It has been developed by the GO consortium (http:// counting the leaves from the seedling through the www.geneontology.org/GO.tools.shtml#in_house)and vegetative stages,and thenodesarenot counted. modifiedtosuitourneeds.Tobrowse,clickingonthe Stem elongation can be assessed by the number of [1]signinfrontofthetermexpandsthetreetoshow visible nodes; this metric is commonly applied to the childrenterms (Fig.2A).Thisview providesinforma- Triticeae,forwhichtheZadok’s(Zadoketal.,1974)or tion on the PO identification (ID) of the GSO term, BBCH scales (Meier,1997) wereoriginallydeveloped. term name, followed by a number of associated data Stem elongation begins when the first node becomes such as genes. For every green-colored parent term a detectable.Thisisusuallyequivalenttonodenumber summaryofthedataassociatedtoitschildrentermsis seven (the number varies in different cultivars), since presentedasapiechart.Theuserhasanoptiontofilter earlier nodes are not detectable before elongation the number of associated data displayed based on commencesinthegrasses.Boyesetal.(2001)considers species, data sources, and evidence codes. The icons Arabidopsis rosette growth analogous to stem elon- for [i] and [p] suggest the relationship types between gation in the grasses, and uses leaf expansion as the the parent and child term as described in the legend. common factor linking the rosette growth and stem While browsing, a usercan click on the term name to elongation stages. In our model, ‘‘rosette growth’’ getthedetailsatanytime(Fig.4B).Theuserswillsee (PO:0007113)and‘‘stem elongation’’(PO:0007089)are the icon [d] fordevelops_fromrelationship type.This treated as separate instances of sibling stages (Figs. 1 relationship type is used strictly in the PSO and not and 2A), mainly to provide language continuity for GSO. It suggests that a plant structure develops from users, rather than for biological reasons. another structure(Jaiswal et al., 2005). In addition to the browse utility, users may search byenteringthenameofatermoragene.Forexample, ReproductiveGrowth(PO:0007130) queryingwith‘‘germination’’resultsinthreeterms,of Reproductive growth and its child terms are orga- which two are from the GSO section of the plant nized a little differently from vegetative growth. Re- growthanddevelopmentstageontologyandonefrom productive growth has four instances: ‘‘inflorescence thePSO.Toavoidgettingalargelist,usersmaychoose 420 Plant Physiol. Vol. 142, 2006 Plant Growth Stage Ontology Figure4. AnexampleofaGSOsearchusingtheontologybrowserandsearchWebinterface.A,Ontologysearchresultsfor0 germinationbyusingtheexactmatchandtermsfilter.Tostartsearching,visitthewww.plantontology.orgWebsiteandclickon the‘‘SearchandBrowsePlantOntology’’linkonthepagemenu.Anontologybrowserpageopensthathasasearchoptiononthe left-handside.Typethetermnameofinterest,suchas‘‘germination’’foragenericsearchor‘‘0germination’’foranexactmatch. Selectthetermfilterandsubmitquery.Clickonthetermnametovisitthetermdetailpageorbrowsethelineageofthistermin theontologybyclickingthetreeiconnexttothecheckbox.B,Thetermdetailpageprovidesinformationonthetermname, accession/ID,synonyms,definition,comments,andassociationstogenes.C,Thelistofgenesassociatedtothetermarelistedin thebottomhalfofthetermdetailpage.Adefaultlistgivesallthegeneswitheverytypeofevidencecodeandsource.The evidencetype,species,andsourcefilterscanbeusedtogeneratethelistasdesired.Thelistprovidesthegenesymbol,name, source,evidence,andacitation.Thegenesymbollinkstothegenedetailpageandthesourcelinkstotheoriginalrecordinthe contributor’sdatabase(e.g.TAIR/Gramene),theevidencecodelinkstoitsdetailsandthereferencelinkstotheoriginalcitation referredtobythecontributorforinferringtheontologyassociationtothegene.D,Thegenedetailpageprovidesinformationon thesymbol,name,synonym,source,alistofallthetermsintheGSOandPSO,evidence,andthecitations.Thisviewsuggests whereandwhenageneisexpressedand/oranassociatedphenotypeisobserved. theexactmatchoptionbeforesubmittingthequery.A externalreferencesandlinks,ifany,andtheassociated searchfor‘‘0germination’’choosingexactmatchgives data.Theassociationsectionallowsausertoselectthe one result (Fig. 4A). A user may browse the parents sourcedatabase,speciesname,andtheevidencecode and children of this term by clicking on the blue- (TableII)usedtomaketheannotationtolimitthedata coloredtreeiconandfollowingthe[1]signnexttothe displayed.Forexample,thereare138geneassociations term name, which suggests that there are additional totheterm‘‘0germination’’(PO:0007057;Fig.4B).The terms under this term, or simply clicking on the term listofassociateddata(Fig.4C)givesinformationabout name ‘‘0 germination’’ for more details. The term thename,symbol,type(e.g.gene),thesource,andthe detail page (Fig. 4B) provides information on the ID, species, in addition to the evidence used for inferring aspectontology(plantstructureorgrowthanddevel- theassociationtotheterm.Thegenesymbolprovidesa opment),species-specificsynonyms,ifany,definition, hyperlinktothegenedetailpage(Fig.4D),andthedata Plant Physiol. Vol. 142, 2006 421 Pujar et al. immediateparents(Fig.5,AandB)revealsthatmany TableII. ListofevidencecodesforuseinannotationstoGSO of these genes with GSO annotations in TAIR are as- Theseareusedinbuildingtheannotationinferencesthatindicatethe sociated to germination stages, which is a vegetative type of experiment cited by the researcher whose data was used to stage.Similarlythevegetativestages,particularlyfive- determine the protein and/ortranscript expressionand phenotypeof to six-leaf stages (children of leaf production) and mutant(s)orquantitativetraitloci. reproductive stages, namely the inflorescence visible EvidenceCode Name (sensu Poaceae), a child of inflorescence visible, fruit IC Inferredbycurator formation, and ripening stages in therice plant areof IDA Inferredfromdirectassay particular importance (Fig. 5B). The Solanaceae Ge- IEA Inferredfromelectronicannotation nome Network (SGN) has adapted the GSO and has IEP Inferredfromexpressionpattern created a mapping file for Solanaceae (tomato [Lyco- IMP Inferredfrommutantphenotype persiconesculentum])synonymsthatisusedtoassociate IGI Inferredfromgeneticinteraction IPI Inferredfromphysicalinteraction theirdata. Tomato mutants areinitially being curated ISS Inferredfromsequenceorstructuralsimilarity to these terms and, predictably, a large number of NAS Nontraceableauthorstatement mutantswillbeassociatedtotheripeningstages(data TAS Traceableauthorstatement not shown). As we continue to solicit data from collaborating databases and annotate using the GSO, weobtainaglobalviewofhowdataisassociatedwith differentstagesof plant growth(Fig.5B). source links to the same entry on the provider’s Web site. This allows a user to search for extended details thatmaynotbeprovidedinthePOCdatabase,suchas Application information on genome location, biochemical char- This section provides examples of genetic analyses acterization, associations to the GOs, etc. For help at thattypicallyusewhole-plantontogenyasafeatureof anytime,userscanclickonthehelpmenuatthebot- theexperimentaldesignanddataanalysis.Itindicates tomofthebrowserpageorvisitthelinkhttp://www. some of the difficulties of extracting spatiotemporal plantontology.org/amigo/docs/user_guide/index.html. informationfromthe literatureandshows theadvan- tagesofcuratinggenomicinformationusingtheplant Annotations to GSO ontologies (GSO and PSO), which allow the users to Annotation is the process of tagging snippets of query when and where a gene is assayed, expressed, information to the genomic element by skilled biolo- or its effects become visible during the life cycle of a gists to extract its biological significance and deepen plant. In addition the PO database supports queries our understanding of the biological processes (Stein, such as what are the genes that are expressed during 2001).Thecuratorattributestheaddedinformationto thegerminationstageinArabidopsisandriceorshow its sourceby the use of evidence codes (http://www. meallofthephenotypesinthereproductivestagesofa plantontology.org/docs/otherdocs/evidence_codes.html) riceplant when mutated. indicatingthekindofexperimentthatwascarriedout toinfertheassociationtoaGSOterm,suchasinferred Annotation Examples of Mutant Phenotypes from expression pattern (IEP) involving northern, western, and/or microarray experiment, or inferred The primary description of phenotypic data is usu- fromdirectassay(IDA)suchasisolatedenzymeand/ allyatthewhole-plantlevelanditisrarelyastraight- or in situ assays, etc. (Table II). The user interface has forward exercise of term-to-term association for the queryfilteroptionstosearchforgenesannotatedwith curator. For example, characterization of dwarf mu- agiventypeofevidencecode.Explicitspatiotemporal tants is done in different ways, most often by the leaf information related to the whole plant is extracted or node number that is affected, counted either top fromliteraturebyacuratoranddescribedusingterms down or bottom up; in this system the leaf and the from the GSO. The current build of the GSO has over internode below it can be used to define the same 600genesassociatedtoitfromtheTAIRandGramene stage.Thisisdistinctfromnodevisiblestagesthatare databases (Fig. 5A). Analysis of the data at this point lessreliable,asthefirstnodethatisvisibleisavariable may not be entirely reflective of current research in numberin grasses(Fig. 3). Arabidopsisandrice,asmanualcurationisadynamic An example is provided by recording of internode and evolving process and will necessarily lag behind elongation, the main morphological feature that is the actual state of research. In TAIR, about 130 gene affectedindwarfplants,isattributedamongothersto associations to whole-plant growth stages carry the the effect of gibberellin and brassinosteroids (Chory, evidence code IEP, while in the Gramene database, a 1993; Ashikari et al., 1999). Yamamuro et al. (2000) majority of GSO annotations (about 480) carry the show that brassinosteroid plays important roles in evidencecodeinferredfrommutantphenotype(IMP) internode elongation in rice and have characterized and a smaller number of IEP and inferred by genetic dwarf mutants based on the specific internode that is interaction(IGI)associations.Acloserlookatthenum- affected. In the dn-type mutant all the nodes are ber of genes associated to various terms and their uniformly affected (the total number of nodes in a 422 Plant Physiol. Vol. 142, 2006 Plant Growth Stage Ontology Figure5. SummaryoftheArabidopsisandricegeneannotationstotheGSO.A,Growthstage-specificgeneannotationsfrom Arabidopsisand rice.The stagesprefixedwithA toD are the top mostcategories of thegrowth stages,namelyvegetative, reproductive,senescence,anddormancy.Thestagesprefixedwith0to2arevegetativesubstages,andthosewith3to6are reproductivesubstages.AllstagesmeansalltheGSOterms.B,AlistofselectedArabidopsisandricegenesannotatedtofive specificgrowthstageterms,suggestingthecurrentstateofannotationsandnottheactualgrowthstage-specificprofile.Asimilarlist canbegeneratedtogetgrowthstage-specificgeneexpressionprofilesforagivenspecies.Incolumns2and3,thenumbers(written inbold)appearingbeforetheparenthesesarethetotalnumberofgeneannotations;species-specificgenesarewritteninitalics. given mature rice plant). However, in the nl-type to the researcher and curator as well as the user (Fig. mutant, only the fourth internode is affected, while 3). Currently by using the IMP filter, more than 500 in the case of the sh-type mutant, only the first inter- genes annotated to different growth stages are avail- node is affected. However, in this case,the authors of ablein the PO database. the study number the internodes from top down (the uppermost internode below the panicle is the first Cross-Database Comparison of Gene Annotations internode).TobeconsistentwiththeGSO,thesenum- bershavetobeconvertedtotheappropriateleaf/node Almost all organismal databases are mutually ex- countingfromthebaseoftheshoot(Fig.3).Thishasto clusive and provide little or no overlap in their sche- be achieved by the curator’s personal knowledge of maswithotherdatabases.Thustheycatertoexclusive the plant, from legacy information available for the user communities. To illustrate how the use of ontol- speciesandgermplasmaccession,orbycontactingthe ogies can overcome database interoperability prob- authors. Unlike the above example, generally leaves lems, we compare the related processes of flowering arecountedfrombelowandthecuratorextractsinfor- time in Arabidopsis and heading date in rice (Fig. 6). mation from statements such as when the plant is at Thegenenetworkunderlyingthephotoperiodicflow- the three-leaf stage. This permits an immediate visu- ering response involves photoreceptors, circadian alizationofthemorphologicalappearanceoftheplant clock systems, and floral regulator genes (Yanovsky Plant Physiol. Vol. 142, 2006 423
Description: