Using OWL to model biological knowledge
Introduction
In this paper we investigate the ontological needs of biology and the associated discipline of bioinformatics. Much has been written about what knowledge representation languages such as the description logic (DL) variant of the Web Ontology Language (OWL) can offer domain experts in terms of modelling facilities (Dean et al., 2002). Much less has been written about what particular domains need to capture in such modelling languages. In this paper, we will put forth the knowledge modelling requirements of biology and bioinformatics. This will highlight the limits of modern description logics (DL) as knowledge representation languages. The expressive restrictions of DLs are well known (Baader et al., 2003, Chapter 1), in this article, we take the perspective of the needs of a domain, rather than a computer science research agenda.
OWL-DL is underpinned by a DL (Baader et al., 2003), a fragment of first order logic. This means that an OWL-DL ontology is expressed in a formalism with well-defined semantics and over which automated reasoning can take place. We will describe OWL-DL's use in this context and how it captures biology and bioinformatics domain knowledge in ontologies. One major question to be asked is whether the logical approach followed by OWL-DL suits the description of the natural world, with all its complexities and inconsistencies.
Bioinformatics is the use of computational and mathematical techniques to store, manage and analyse biological data to answer biological problems (Kaminski, 2000). At the centre of bioinformatics is the analysis of DNA and protein sequences. Its goal is to characterise nucleic acid sequences (genes) and their products, primarily proteins. Biology, however, is unlike physics and much of chemistry in that—although it contains many laws and models—few of these are reduced to a mathematical form. It is not possible to take a protein's sequence of amino acids, apply some formula, and derive a set of characteristics such as location, functionality, forms of modification, regulation, etc.
Instead of mathematical laws, bioinformaticians use similarity. The central dogma of bioinformatics is that if an uncharacterised sequence is sufficiently similar to one that has been characterised, then the understanding can be transferred from the characterised to the uncharacterised. Many tools are provided for comparing sequences against databases of other sequences (Attwood and Miller, 2001). This search for similarity is, however, not simply done on the basis of some statistical measures. A good bioinformatician will use all the facts recorded about the entity and the nature of the matches between the sequences in order to infer any biological relationship. This is why both biology and bioinformatics have been characterised as a “knowledge based discipline” (Baker et al., 1999).
As a consequence of needing to record this knowledge in a consistent and computationally amenable form, ontologies of various kinds have become very important in bioinformatics (The Gene Ontology Consortium, 2000, Stevens et al., 2003).4 Molecular biologists wish to describe and record a wide range of knowledge items. These include, but are not limited to:
- •
names of things;
- •
classifications (such as species);
- •
the size (absolute and ranges, both real and integers), shape and numbers of things;
- •
functions, processes and behaviours of things;
- •
structure and substance (atoms, molecules, tissues, etc.);
- •
evidence (both experimental and literature) for facts about the world;
- •
patterns (regular expressions in sequences indicative of some feature, etc.);
- •
parts of things to describe anatomy, composition of molecules and assemblies of molecules, etc.;
- •
the order of things and their transformation, such as life cycle stages, metabolic pathway reactions, exons in genes;
- •
degree of match and similarity of things.
In Section 2 we describe the approach followed by OWL-DL and its modelling constructs and the application of automated reasoning to OWL-DL ontologies. This section can be skipped by those familiar with OWL-DL. We then present a protein family as a case study for ontological modelling in Section 3 and an ontology of that family in Section 4 to describe what can be straightforwardly captured in OWL-DL. Then, in Section 5 we show how some of the limitations of OWL-DL can be circumvented with the use of Ontology Design Patterns. Finally, in Section 6 we discuss what cannot be captured in OWL-DL and use Section 7 to provide a general discussion of the limitations of OWL-DL to represent knowledge in the life sciences.
Section snippets
The OWL-DL model of the World
DLs are a decidable fragment of first order logic and thus have a well-defined, two-valued semantics, i.e., they allow us to express what is universally true (Baader et al., 2003). In OWL-DL, the basic unit of an ontology is a class, which represents a set of individuals, its instances. Moreover, we consider properties, which represent (binary) relations between individuals. Individuals, together with the information about which individual is an instance of which class, and how the individuals
A knowledge case study
Proteins are divided into broad functional classifications called families. Protein phosphatases and protein kinases are two families that control the phosphorylation events in a cell (Alberts et al., 1989).6 Biologists classify phosphatases according to their functionality and evolutionary relationships to each other. Tertiary structure units
A phosphatase family ontology
We developed a phosphatase ontology to help semi-automatically support a phosphatase protein family database (Wolstencroft et al., 2005a, Wolstencroft et al., 2005c) and to automatically classify proteins found in a genome (Wolstencroft et al., 2005b, Wolstencroft et al., 2006).
In this ontology, the classes of phosphatase were defined in terms of their p-domain composition. Fig. 3 shows how the p-domain composition of each protein can be sufficient to recognise to which phosphatase sub-family
Using OWL with Ontology Design Patterns
In this section, we will concentrate on those limitations of OWL that can be worked around by using Ontology Design Patterns (ODPs).9 We do not exhaustively explore ODPs for OWL, but illustrate how they can use OWL-DL's current expressivity to work around some of the inherent
The boundary of the OWL World
In this section, we cross the boundary of the OWL-DL view of the world and explore aspects of biology that OWL-DL cannot represent. Some of these aspects cannot be expressed in OWL-DL or any decidable description logic, because they are known to lead to undecidability, semantic problems, or currently unmanageable computational complexity. Of these other aspects cannot be expressed in OWL-DL, but it is known that an extension of OWL-DL with the corresponding expressive means would be possible. These
Discussion
In this paper, we have explored the ontological requirements posed by biology and bioinformatics and how well OWL-DL's model matches those requirements. There are obviously large areas of the world of biology that can be represented using OWL-DL with great success. It is possible to create OWL-DL descriptions of molecular biology that are both ontologically good and useful in driving applications. Yet, it is relatively easy to find features of biology that do not fit into this strict, universal
References (37)
- et al.
Which craft is best in bioinformatics?
Computers and Chemistry
(2001) - et al.
Molecular Biology of the Cell
(1989) - et al.
Structural and evolutionary relationships among protein tyrosine phosphatase domains
Molecular and Cellular Biology
(2001) - Baader, F., Hanschke, P., 1991. A schema for integrating concrete domains into concept languages. In: Proceedings of...
- Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (Eds.), 2003. The Description Logic...
- et al.
An ontology for bioinformatics applications
Bioinformatics
(1999) - et al.
The OWL instance store: system description
- et al.
Reasoning in expressive description logics
Self-splicing of group introns
Annual Review of Biochemistry
(1990)- Dean, M., Connolly, D., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein,...
Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional Computing Series
RACER system description
Bioinformatics. A user's perspective
American Journal of Respiratory Cell Molecular Biology
Cited by (78)
Optimizing the computation of overriding in DL<sup>N</sup>
2022, Artificial IntelligenceDeep hiearchical multi-label classification applied to chest X-ray abnormality taxonomies
2020, Medical Image AnalysisCitation Excerpt :Yet, pushing raw performance further will likely require models that depart from standard multi-label classifiers. For instance, despite their importance to clinical understanding and interpretation (Stevens et al., 2007; Humphreys and Lindberg, 1993; Stearns et al., 2001), taxonomies of disease patterns are not typically incorporated into CXR CAD systems, or for other medical CAD domains for that matter. This observation motivates our work, which uses hierarchical multi-label classification (HMLC) to both push raw area under the curve (AUC) performance further and also to provide more meaningful predictions that leverage clinical taxonomies.
Rational closure for all description logics
2019, Artificial IntelligenceCitation Excerpt :As such, their inferences are monotonic. There is recurring evidence in the literature that non-monotonic inferences, especially inheritance with exception and overriding, would greatly help in modeling biomedical knowledge, policies, and other important application domains for DLs [30,31,4]. Consequently, many nonmonotonic extensions of DLs have been proposed to address these needs, for example [1,2,15,27,5,13,22,23,4].
Improving comprehension of knowledge representation languages: A case study with Description Logics
2019, International Journal of Human Computer StudiesCitation Excerpt :Unlike knowledge in corporate databases, knowledge on the WWW is rarely complete. The OWA also makes DLs appropriate for certain application areas, e.g. biological research (Stevens et al., 2007). However, the OWA does present difficulties.
A new semantics for overriding in description logics
2015, Artificial IntelligenceCitation Excerpt :Supporting default attributes and exceptions was important enough to look for alternative representation methods, based on classical DLs. The simplest examples can be dealt with by means of ontology design patterns [44,47]. However, these solutions do not scale to more complex examples with multiple exception dimensions, as discussed in [44]: The number of additional concepts introduced by the patterns may grow exponentially.
Semantic Web Methods for Data Integration in Life Sciences
2024, Biological Data Integration: Computer and Statistical Approaches
- 1
Mikel Egaña Aranguren is funded by an EPSRC studentship.
- 2
Katy Wolstencroft's work on the phosphatase ontology was funded by an MRC studentship.
- 3
Matthew Horridge and Nick Drummond are funded by JISC - Semantic Grid and Autonomic Computing Programme grant.