ebook img

Self-organizing ontology of biochemically relevant small molecules. PDF

1.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Self-organizing ontology of biochemically relevant small molecules.

Chepelevetal.BMCBioinformatics2012,13:3 http://www.biomedcentral.com/1471-2105/13/3 RESEARCH ARTICLE Open Access Self-organizing ontology of biochemically relevant small molecules Leonid L Chepelev1*, Janna Hastings2, Marcus Ennis2, Christoph Steinbeck2 and Michel Dumontier1,3,4 Abstract Background: The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure- activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest. Results: To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development. Conclusions: We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development. Background biological functionality and chemical reactivity patterns, Over the hundreds of years of biochemical research, thus providing indispensable assistance in understanding humanity has encountered myriads of chemical entities the molecular nature of metabolism, toxicity, and bioac- with countless combinations of functional groups that tivity [1,2]. In addition to this, the assignment of indivi- imparted upon their bearers distinct reactivities and dual chemical entities to a given class within a chemical properties. According to the structure-activity relation- ontology or hierarchy may facilitate the inference of the ship (SAR) principle, grouping these entities into struc- potential anticipated roles and properties of these enti- ture- and property-based classes within a larger ties. This capacity of chemical ontologies may help sup- chemical ontology (formal logical specification of a che- port the development of the rising systems sciences, mical hierarchy conceptualization) can enable the reca- such as chemogenomics and systems chemistry [3]. pitulation of or improvements in the prediction of their Despite the numerous efforts to develop automated chemical classification and ontology construction *Correspondence:[email protected] approaches (discussed below), this process has so far 1DepartmentofBiology,CarletonUniversity,Ottawa,Canada practically remained firmly within the hands of human Fulllistofauthorinformationisavailableattheendofthearticle ©2012Chepelevetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Chepelevetal.BMCBioinformatics2012,13:3 Page2of18 http://www.biomedcentral.com/1471-2105/13/3 curators, as exemplified by perhaps the largest chemical skeletons that result in a bioactive interaction, and that ontology constructed to date - the Chemical Entities of given a sufficiently large dataset of compounds known Biological Interest (ChEBI) database and ontology [4]. to be bioactive, it is possible to characterize a class of As more chemical data becomes readily available from functionally active compounds as a collection of classes large-scale experimentation, ChEBI and numerous simi- with well-defined, consistent structural features. Con- lar repositories of biological and chemical information cerns for the predictive capacity of such classifications, (e.g. [5-7]) are increasingly coming under strain due to like the distinction between stereoisomers and a range lack of human curators, the overwhelming wealth and of chemical properties such as molecular volume that diversity of the new chemical information, and the may result in disqualifying an otherwise well-suited 2D boundlessness of the chemical space. Clearly, a means skeleton from a bioactivity-inducing interaction can be to automate and facilitate these classification efforts is laid to rest by explicitly incorporating these features into urgently needed if we desire to understand, expose, and class definition. In this work, we demonstrate that given integrate the entirety of available chemical information. the logical expressivity afforded by the Web Ontology Manual chemical classification efforts frequently Language (OWL) [10], we can automatically create for- involve the tedious work of multiple human experts mal logical definitions for a range of topology and prop- who carefully peruse the relevant chemical literature to erty-based chemical classes which can then self- gain competence in assigning chemical entities to a assemble into a chemical ontology. range of classes. Although the reliance of biochemical While manual literature searching and experimental activity-based classification efforts on the predictions or data analysis is often involved in the classification of informed estimates of human experts has been largely chemical entities into functional classes, structure-based relegated to the informal or speculative domain of scien- hierarchy construction is much more well-defined and tific communications thanks to the development of easily amenable to automation. In constructing struc- numerous quantitative structure-activity relationship ture-based chemical classes, a modern curator would (QSAR) elucidation approaches [8], chemical hierarchy familiarize themselves with any existing informal class construction is still very much a human-dominated definitions, as well as any known chemical class mem- domain. Chemical hierarchies in common use, such as bers. Based on this expertise, the curator would then ChEBI and the Medical Subject Headings (MeSH) define a class, create a short textual or pictorial descrip- organic chemistry hierarchy [9], are structure-based but tion of this class, and add child/parent relationships to often have the capacity to incorporate chemical classes the other classes, as well as including several representa- that are based on functional chemical attributes, such as tive entities. In essence, this approach requires the those bearing a certain pharmaceutical activity (e.g. resulting ontology to consist of a formalisation of expert ChEBI antibiotics). Although structure- and function- domain knowledge in the area, while providing no based classification schemes involve somewhat different means for this knowledge to be captured in a machine- approaches for construction, the basic accepted SAR understandable format. The result of this is a lack of dogma is that chemical structure and chemical function facile re-use of the work of human curators in classifica- are inseparably linked, as noted earlier. Therefore, struc- tion and annotation of novel entities and establishing ture-based classification efforts could be useful in initiat- child/parent relationships for newly added classes when ing the construction of predictive models of small the ontology grows or is modified. Unfortunately, the molecule bioactivity. majority of chemical ontologies, such as ChEBI database The correlation and overlap between chemical topol- and ontology, MeSH chemical classifications, and ogy-based and biological function-based chemical hier- LIPIDMAPS [11], suffer from this, resulting in artificial archies is expected since much of biological activity is discipline-specific barriers to research, since none of enacted through interactions and transformations that these can be easily or automatically integrated. As the are specific to a particular set of structural features of chemical ontologies grow in size and the range of rela- small molecules. For instance, most enzymes operate on tionships covered, manual ontology development a very limited set of chemical structures: carbonyl becomes increasingly difficult due to the multiplying reductases, operate on carbonyl and alcohol functional number of consistency checks that have to be performed groups while bacterial transpeptidases bind to and oper- for each new class and relationship added. ate upon structures that resemble the peptide bond, To date, automated chemical classification has mostly leading to the potency of beta lactam-based drugs in the belonged to the domain of QSAR studies which involve inhibition of their operation. While there may be more the characterization of a training set of chemical entities than one 2D chemical structure that results in an inter- with respect to a particular set of computable or mea- action with a given enzyme (due to its 3D configura- surable molecular features, as well as the property of tion), we claim that there is a finite set of 2D chemical interest, such as biochemical activity or biomedical Chepelevetal.BMCBioinformatics2012,13:3 Page3of18 http://www.biomedcentral.com/1471-2105/13/3 potency [8]. The features that are used for training are applicability of the approach for characterizing some often selected such that the cost of their acquisition is chemical classes (e.g. those requiring cardinality restric- significantly lower than that of the modelled property. tions, such as dienes). Finally, the work of [18] has This feature set is then used to construct a predictive resulted in a structure-based chemical ontology that model using one of many available mathematical proce- employed combinations of chemical substructures of dures, including the creation of decision trees [12], mul- increasing complexity to produce increasingly specia- tidimensional parametric fits [13], or three-dimensional lized chemical classes, with the aim of improving the pharmacophore models [14], to name a few. The resul- structural analysis and pattern identification in sets of tant models are then verified for their capacity to ade- biologically active compounds. quately classify molecules into a range of classes of Unfortunately, a framework or approach for fully interest using a test set of compounds not used in automated construction of chemical ontologies has model creation. While the resultant models may often never been achieved. In this work, we propose a radi- achieve a great degree of accuracy, the vast majority of cally novel approach for structure-based chemical classi- such models are not readily amenable to human under- fication automation by abstracting and automating the standing or automated incorporation into a chemical majority of chemical curation to construct machine- and hierarchy. In order to fully appreciate the structural and human-understandable, formally axiomatized chemical physicochemical molecular features of relevance to a class definitions in OWL. Further, we use these automa- particular functional or structural classification, post- tically generated chemical classes to enable machine rea- processing of the resultant mathematical models is often soning over the ontology and demonstrate ontology self- required. Although we have recently demonstrated the organization, as well as classification of individual che- conversion of decision tree-based QSAR models into mical entities represented using the Chemical Entity formally axiomatized OWL chemical ontologies [15], the Semantic Specification formalism [19]. Finally, we problem of integration of disparate, problem-specific demonstrate that certain functionally-relevant classes functional chemical classifications into a single chemical can be defined unambiguously using our approach, and ontology is still largely unsolved. Compounding this integrated into the overall chemical ontology seamlessly. problem is the fact that chemical class definitions derived using statistical machine learning approaches Results are often inherently probabilistic - that is, no class We have reduced the activity of a human curator to sev- membership requirement is formally stated as a neces- eral steps, namely a) the identification of high-confi- sary or sufficient condition. In order to derive a truly dence chemical class members, b) identification of self-organizing chemical ontology, a deterministic classi- consensus chemical patterns that have to be present in fication in terms of well-defined features and their com- a given compound to qualify for membership in a given binations is necessary. class, c) creation of formal chemical class definition While function-based chemical hierarchy construction axioms, and d) chemical classification and class relation- has not been adequately addressed, methods to generate ship assignments based on the acquired expertise (Fig- structure-based chemical ontologies have been studied ure 1). The ontology that we aim to generate with the in the past to some extent. For example in [16], a very framework we have developed is primarily based on, but early attempt at formalizing chemical classes and their not limited to, chemical structure. We operate on the relationships for chemical inference has been carried premise that chemical class sub-specialization is enabled out, but never practically implemented for any large- through chemical structure extension. We claim that for scale classification exercise. In [17], a chemical ontology each class, a collection of functional groups exist that has been manually constructed with classes formally are necessary for a molecule to be considered a member defined as containing molecules that possessed a parti- of this class. This forms the basis for the self-organiza- cular functional group or a set of functional groups. tion of our automatically generated chemical ontologies. While this work has been successfully applied for practi- cal chemical functionality analysis through chemical Automated Derivation of Formal Chemical Class semantic similarity identification enrichment, the basic Definitions ontology construction still followed a manual process Given a set of chemical entities known to be members and was limited to a set of pre-defined functional of a given chemical class, our framework is capable of groups, insufficient for characterizing a number of more abstracting and generalizing a set of consensus molecu- complex chemical classes. Furthermore, the lack of a lar sub-graphs and features that are present in all repre- formal, accessible framework for chemical classification sentative members of this class (see Methods). In (in the form of OWL) has resulted in limited logical identifying such features, we consider not only the che- chemical class axiom expressivity, therefore limiting the mical graph that is formed by the overlap of several Chepelevetal.BMCBioinformatics2012,13:3 Page4of18 http://www.biomedcentral.com/1471-2105/13/3 Figure1Anabstractviewofthecurationeffortforstructure-basedchemicalhierarchies. smallest class representatives, but also the chemical sub- can be written down as valid SMILES or SMARTS in graphs formed by the fragmentation of class members, multiple ways. Canonicalization of consensus patterns and atomic features, such as ring membership, aromati- ensures that each pattern has exactly one definition, city, and connectivity. Therefore, we generate a set of thus allowing for facile integration and comparison of both, canonicalized SMILES [20] and canonicalized consensus fingerprints from multiple class definition SMARTS [21] patterns (see Methods) that are present generation procedures. in a given molecule. Canonicalization of SMILES and The defining features, or chemical structural motifs, SMARTS patterns is important because the fragmenta- that are present in every member of a given class are tion of chemical structures and their generalized forms deemed to be necessary for class membership and are often yields chemical patterns that are equivalent, but termed consensus chemical features and their collection Chepelevetal.BMCBioinformatics2012,13:3 Page5of18 http://www.biomedcentral.com/1471-2105/13/3 may be viewed to constitute a consensus chemical fin- that the inclusion of the nitrogen-linked ester derivative gerprint for a given chemical class. The consensus che- was erroneous. This was a very fortunate development, mical features that contain but are not contained in since this identified one possible method to verify the other consensus chemical features are termed by us consistency and correctness of human curators. principal characteristic substructures. These substruc- Our second premise is that for one class to be consid- tures can be used directly by curators for quick valida- ered a child class of another chemical class, it has to tion of the automatically generated class definition. For contain all the consensus structural features of the par- instance, for the ChEBI Peptide class (CHEBI:16670), ent class, and some additional features specific to it only one would expect the principal characteristic substruc- (most likely, but not necessarily, principal characteristic ture to constitute the amino acid backbone (Figure 2), substructures). Therefore, while the principal character- which is indeed the case. istic substructures will be useful in establishing the In the process of determining principal characteristic unique structural identity of a given class, the rest of substructures, we have identified certain classes for consensus structural features are useful in establishing which our algorithm could not converge on an expected the class hierarchy (Figure 3). This basic premise applies substructure set. Surprisingly, this was observed for car- to both, simple consensus structural fingerprints, and boxylic acid esters, where only the pattern associated the formal logical definitions of chemical classes and with a carboxylic acid, but without extension to create forms the foundation of the self-organizing nature of an ester was found. Upon closer investigation, we have the chemical ontology generated by our framework. discovered that these difficulties in convergence were The next step in self-organizing chemical ontology exclusively due to errors in the manual curation of generation is the formalization of chemical class defini- input data. In the case of carboxylic acid esters, the fail- tions based on the consensus structural fingerprints. For ure of convergence was due to the inclusion of a mole- this, we generate and maintain a single OWL ontology cule containing RC(= O)ONR pattern instead of the RC that is constantly updated with the unique consensus (= O)OCR pattern in the training set. Technically, car- chemical substructures that are identified as a result of boxylic acid esters conform to a pattern RC(= O)OR’, class analysis, using previously reported principles of where both, R and R’ are aryls or alkyls [22], meaning unique, canonical, and invariable URIs for each Figure2Someoftheconsensusstructuralfeaturesidentifiedfortheclassofpeptidesusingtheproposedmethodology.Notethatin thetwoconsensusfragmentsthatarenotcontainedinotherconsensusfragments,thebasicpeptidebackboneisconserved,evenforproline,a highlyexoticaminoacid,since‘N’inSMARTSdoesnotnecessarilymeananNH3inpractice. Chepelevetal.BMCBioinformatics2012,13:3 Page6of18 http://www.biomedcentral.com/1471-2105/13/3 an option. Consider, for example, the class of Glycols (also sometimes referred to as Diols), which require the presenceofaminimumoftwoalcoholfunctionalgroups in a given molecule. Please note that we state that diols contain at least two alcohol functional groups, since in ourapproach,triolsshouldbeidentifiedassimplyasub- class of diols. The cardinality restriction to at least two alcoholfunctionalgroupsallowsforthis.Thus,diolscan be defined as follows. Please also note thatcardinality is not currently screened for by the framework proposed here andall the cardinality restrictions in ourautomati- callygeneratedontologieswereaddedmanually. Diols EquivalentTo ’Molecular entity’ and ‘has part’ some ‘[#6]’ and ‘has part’ some ‘[#8]’ and ‘has part’ some ‘[#6][#8]’ Figure3Establishingachemicalclasshierarchythroughdirect and ‘has part’ min 2 ‘[#6][OX2H]’ observationofconsensusstructuralfingerprintsissimilarto Attributes and molecular descriptors can also be logicalhierarchyconstruction.Itispossibletoinferparent/child relationshipsbasedsolelyontheconsensusfeaturesofaclass. handled with such definitions. For instance, we may find Formallogicaffordsmuchmorepowerfulexpressions. out that a given enzyme catalyzes the oxidation of organic alcohols that are no heavier than 500 Daltons. This mixing of chemical sub-graphs and physicochem- functional group class [19]. The functional groups ical attributes is also possible, using the principles and defined in this ontology are then used in converting the concepts defined in the CHEMINF ontology [24]. consensus structural fingerprints. For instance, a class of ’Reactive Diols’ ‘Organic Alcohols’ is found to contain the consensus EquivalentTo structural features expressed in SMARTS notation as ’Molecular entity’ [#6] (any carbon atom, aromatic or not), [#8] (any oxy- and ‘has part’ some ‘[#6]’ gen atom), [#6][#8] (carbon and oxygen in a single and ‘has part’ some ‘[#8]’ bond), and [#6][OX2H] (carbon connected to an OH and ‘has part’ some ‘[#6][#8]’ group). This can be translated into the following equiva- and ‘has part’ min 2 ‘[#6][OX2H]’ lent class statement in OWL using the Manchester DL and ‘has attribute’ some (’molecular weight’ and ‘has query syntax [23], which states that an instance of the value’ [= < 500] and ‘has unit some ‘Dalton’) class of organic alcohols is exactly defined by being an Using the dataset of diols annotated with their com- instance of a molecular entity and having at least one putedmolecularweightwiththeconceptsfromtheCHE- (keyword: some) instance of each of the four functional MINFontology,wewereabletocorrectlyidentifythatall groups: 14 diols in the ChEBI training set for this class were ’Organic Alcohols’ indeed‘ReactiveDiols’asperthelogicaldefinitionabove. EquivalentTo ’Molecular entity’ Self-Organizing Chemical Ontology and ‘has part’ some ‘[#6]’ In order to test the integrative capacity and accuracy of and ‘has part’ some ‘[#8]’ our chemical classification framework, we have manually and ‘has part’ some ‘[#6][#8]’ generated a dataset containing 40 ChEBI and 60 MeSH and ‘has part’ some ‘[#6][OX2H]’ chemical classes, for a total of 100 classes with at least 7 Clearly, this specification is human-readable, and can and at most 20 representatives in each chemical class. be further enriched by adding annotations on the func- To demonstrate the self-organizing chemical ontology tional groups involved, including graphical representa- creation from a specific source, we have created a tions or textual descriptions for the various functional stand-alone self-organizing ontology for each, the ChEBI groups, in order to facilitate chemical class definition and the MeSH data sets, containing the formal class understanding. definitions derived from consensus chemical fingerprints Further, if one wishes to handle cases where more of each class (Figure 4). Upon application of the Pellet expressivelogicalstatementsarenecessary,thisisalways reasoner to classify these ontologies, we observed self- Chepelevetal.BMCBioinformatics2012,13:3 Page7of18 http://www.biomedcentral.com/1471-2105/13/3 Figure 4 An example of a ‘flat’ chemical ontology with chemical class definitions, within the Protégé 4.1 ontology editor. No classificationhasbeencarriedout. organization of the formally axiomatized chemical inferences was carried out to ensure that the inferred classes into chemical ontologies with a hierarchy (Fig- chemical hierarchy was consistent and coherent from ures 5 and 6). Each child-parent inference was screened the viewpoint of chemical structure. by a curator to identify that the automatically generated ontologies were consistent for both datasets. In this Chemical Classification Accuracy screening, we have considered the structural viewpoint - For a demonstration of chemical classification, we have that is, it was permissible for ethylamines, for example, randomly selected a set of 200 chemical entities and to be classified as the child class of methylamines, since employed the integrated ontology in order to classify an ethylamine constitutes a structural extension of a each individual compound. Because each chemical methylamine, and because the training sets obtained for entity could be classified into multiple classes, the these classes (from MeSH in this case) included among number of class membership inferences (452) exceeded methylamines chemical entities that were derivatized on the number of tested chemical entities. Within this the carbon atom that should have otherwise been term- dataset, 91% of chemical entities received inferences inal (a CH carbon). While we do not make any claims that exactly matched their annotation within the test 3 with respect to the validity of this approach, we note set and additional 8.5% were matched to classes con- that this sub-classification inference in our work mirrors sistent with their test set-assigned class. For example, the manually assigned sub-classification in the MeSH if a compound was a member of the alcohols in the hierarchy of organic chemicals. test set, but actually contained two OH groups, this Because both ontologies were built using one central compound was correctly identified as belonging to functional group ontology and the CHEMINF ontology diols or glycols, without the explicit specification of its for chemical descriptors, we were then able to trivially membership among the alcohols, even though this import the ChEBI ontology into the MeSH ontology membership is implied. within Protégé 4.1 [25] by specifying direct ChEBI ontol- Overall, 92.7% of the entity inferences were correct ogy import within the MeSH ontology, and to apply the (Table 1). Most of the observed errors were in the Pellet machine reasoner [26] to the resultant ontology classes that necessitated the definition of negations. For to reconstitute a unified chemical ontology with 100 example, a carboxylic acid is a structural derivative of chemical classes (Figure 7). Thus, for the first time, we an alcohol, but is strictly speaking not one. From a were able to automatically and trivially integrate and re- structural perspective, classification of a carboxylic acid use the manual curation efforts from multiple data as an alcohol is not incorrect. However, from a classical sources in order to construct a single seamless chemical organic chemistry viewpoint, this is an incorrect classifi- ontology. Manual analysis of the class relationship cation, and has been noted as such in this work. In Chepelevetal.BMCBioinformatics2012,13:3 Page8of18 http://www.biomedcentral.com/1471-2105/13/3 Figure5MeSHchemicalclasshierarchyautomaticallyinferredusingthePelletreasoner.Pleasenotethatall‘is-a’relationshipsexcept thoseto‘molecularentity’areinferredautomatically.Pleasenotethatallclassesdealwithorganicchemicals. addition to this, we have observed that identifying incor- suggested manual refinements to definitions (e.g. glycols, rect inferences in some cases led to facile identification diols, and alkadienes with cardinality restrictions). An of adjustment of the training sets to automatically gen- additional table shows the classification results in more erate a more accurate chemical entity description or detail (see Additional file 1). Chepelevetal.BMCBioinformatics2012,13:3 Page9of18 http://www.biomedcentral.com/1471-2105/13/3 Figure6ChEBIchemicalclasshierarchyautomaticallyinferredusingthePelletreasoner.Pleasenotethatall‘is-a’relationshipsexcept thoseto‘molecularentity’areinferredautomatically.Pleasenotethatallclassesdealwithorganicchemicals. The entire classification process for 200 compounds chemical classes, and especially chemical classes of bio- took 34 minutes and 10 seconds, running on a Core i7- logical relevance, we have created a stand-alone class of 870 machine with 6GB of RAM, executed serially, while chemical entities that are substrates to yeast alcohol the computational time to classify non-populated ontol- dehydrogenase, using information in the BRENDA data- ogies was negligible. base [27]. Because this data was experimentally derived and the reactions indicated were confirmed to occur in Seamless Integration of Novel Chemical Classes: Enzyme nature, no human curation (beyond eliminating redun- Substrates dant entries) was necessary for the training set com- In order to demonstrate the flexibility and facile amen- posed of 23 unique chemical entities. Upon the ability of our ontology to the incorporation of novel repetition of the exercise described above with this Chepelevetal.BMCBioinformatics2012,13:3 Page10of18 http://www.biomedcentral.com/1471-2105/13/3 Figure7ThechemicalclasshierarchyinferredforanontologythathasresultedfromtheautomatedintegrationoftheMeSHand ChEBIontologies.Pleasenotethatallclassesdealwithorganicchemicals. chemical class, we have, unsurprisingly, discovered that work of human curators, thus potentially improving the resultant class of yeast alcohol dehydrogenase sub- their productivity and making the annotation and classi- strates was a subclass of alcohols, with the added fication of the rapidly multiplying chemical information requirement of possessing an oxygen atom connected to more manageable. one hydrogen atom and a non-aromatic carbon atom. One may argue that curators may also easily define chemical patterns in order to define a given chemical Discussion class instead of starting with identifying a diverse set of Here, we have presented a radically streamlined frame- compounds that would then be used to automatically work and approach to chemical ontology construction derive the relevant patterns. While this is indeed true, based on explicit formal semantic specification of che- automated consensus chemical structure identification mical classes and rooted in chemical graph analysis. By has numerous benefits, including the exhaustive identifi- eliminating the manual assertion of ancestry relation- cation of primitive structural features that are useful in ships between the various chemical classes and instead constructing class hierarchies, as well as in the inher- emphasizing the adequate curation of the limited (as ently objective identification of the structural patterns of small as five entities) training set of the starting chemi- interest. Manual supplementation of class definitions is cal class representatives and the automatically generated always possible, as we have shown with class customiza- logical class definitions, we have greatly simplified the tion, and is sometimes required, in cases where our

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.