Concept drift and how to identify it
Introduction
Knowledge organisation systems (KOS), such as formal ontologies (e.g. modelled in OWL), thesauri or taxonomies (e.g. described in SKOS) or other term classification schemes, play a crucial role in providing semantic interoperability in many domains and use cases. They have become critical to the Web of Data for structured access of documents in libraries or patient records based on diagnostic information, and many more applications. In almost all modern types of KOS, concepts are the central constructs that are used to describe sets of objects with shared characteristics. Although it is widely recognised to be an oversimplification most current systems consider their underlying KOS to be stable over time. For many applications this starts to be a critical problem, and this paper attempts to provide a theory of what we call concept drift.1 As the world is continuously changing, concepts also change over time. That is, for example, a concept refers to different objects at different points in time. The term Government of the Netherlands refers to different people in 1999 and in 2009. Or, consider the concept Middle class which is a concept with very different properties in various periods of time. Concept drift is known to be a problem in data mining or machine learning, when learned models loose their predictive power over time [50]. Here, we deal with concepts that are explicitly and symbolically represented in some KOS, and where the drift needs to be made explicit as well. In this sense our contribution is orthogonal and complementary to the work in the machine learning literature.
In this paper we will formally define what concept drift in a KOS is, and how to study its impact, given two different ontological paradigms, and without committing to a particular type of Knowledge organisation schemes. These definitions are not intended to provide new philosophical insights, but aim at making existing accepted notions of intension, extension and labelling applicable in the context of dynamics of semantics.
Semantic drift. Throughout the theoretical part of the paper we will use the generic example of the European Union (the concept European Union) and the countries of the EU (EUCountry) as our concepts of reference.2 The historical overview provided by the EU on [5] gives an interesting discussion over its development. Starting in the early Fifties “the European Union is set up” with “Belgium, France, Germany, Italy, Luxembourg and the Netherlands” as founding members. Within the next 6 decades many other countries joined, and the European Union transforms slowly from an economic community to an “international organisation governing common economic, social, and security policies” [4]. The following list gives some definitions for the concept of the European Union over time.
- (1979)
The European Community is a common denominator for the European Economic (EEC), the European Coal and Steel Community (ECSC), and the European Atomic Energy Community (EAEC) [3].
- (1999)
The European Community is the new stage in the implementation of increasing the Union of the European people [2].
- (2003)
The European Union or EU is an international organisation of European states, established by the Treaty on European Union [52].
- (2006)
The European Union (EU) is a supranational and intergovernmental union of 25 independent, democratic member states [53].
- (2010)
The European Union (EU) is an economic and political union of 27 member countries, located primarily in Europe [51].
Most of the aspects of the change of meaning of a concept over time that we introduce in this paper can be identified in this example. The instances of the set of countries of the European Union, also called its extension, change (e.g. from 25 to 27 between 2006 and 2007). The label of the concept European Union changes from European Community to European Union. From a judicial perspective those are different concepts. However, the European Union website [5] considers the EU and EC to be the same concepts. Note that they use the label European Union to signify the concept that existed in 1952. There was no European Union at the time, however, only the European Community. In our opinion the website refers to an abstract concept of an organisational unit which stands for the European idea, which we now refer to as the European Union. Of course, the properties of this European Union have significantly change as well; from a union bound by economic treaties to a full political, military and social organisation. Ontologically, we say that the intension of the concept European Union has changed. These three types of changes will be discussed as three different aspects of semantic drift: extensional, label and intensional drift respectively.
Identity based concept drift. The impact of such drift is difficult to measure and for that purpose we would like to identify more drastic and qualitative changes in meaning. In the example of the EU countries there are some of these more drastic changes. Recall that the EU started out with just 6 countries, all of which were Central European. The effect of the expansion of the EU is that the original meaning of the concept EUCountry in terms of its members in 1950 is now far closer to the concept CES of Central European states than to the concept EUCountry. We will call such changes shift. Another way of looking at drift is to study the similarity between the meaning over time. The following table shows, e.g. the number of European member countries of NATO and EU:
Studying the similarity between these sets, e.g. in this simple example as the ratio of the number of new versus old instances, indicates the moments when the most drastic changes occured (in both cases between 1995 and 2004). We say that we measure stability of the concept as compared to other concepts. In principle, there are two lessons: for each concept we can identify the most unstable moments in a temporal chain, but we can also compare the average or overall instability of a concept over time. An interesting comparison can be made with the concept of European Countries within NATO, which has also seen expansion over time. Note, that the latter is more extensionally stable as countries joined more gradually.
Morphing based concept drift. The previous analysis identifying shift and stability of the meaning of a concept makes sense under the assumption, or ontological commitment, that the concept of the European Union in 1995 is considered to be the same concept of the European Union in 2007. This is just one possible interpretation, and we do not want to restrict our framework to just this one world-view. The discussion on identity is probably as old as philosophy itself, and has played a role in ontology engineering as well. We try to keep out of this discussion by providing both methods that work on identity, and without it. For the latter we consider concepts to be pertaining to just one moment in time and that they evolve/morph into new, but highly similar, concepts at each moment in time. Meaning drift is then to be defined in the degree of dissimilarity of these maximally similar concepts over time. Again, we will study drift with respect to intension, extension and labels.
In this alternative conceptualisation of concept drift we also want to identify some qualitative notions to describe more drastic notions of drift. One example is the notion of split which occurs when the different semantic elements (intension, extension or label) morph into different concepts. When a series of concepts at different moments are linked by this morphing relation, then a morphing chain is formed. The strength of the morphing chain, i.e., how similar the morphed concepts are, is another notion to study.
Research questions. To our knowledge there has been no formalisation of what concept drift actually means and implies. In order to identify different types of changes in concepts and to understand the impact of concept drift, such a formalisation is critical. Therefore, this paper focuses on the following research questions:
- (1)
What is concept drift, and how to formalise it?
- (2)
Can we identify the impact of concept-drift?
It is important to note that we want to develop a generic framework which allows us to study concept drift without a priori choosing one of the above described ontological commitments (identity versus morphing) or a particular type of KOS.
Methodology. We provide a generic formalisation of the meaning of concepts in terms of label, intension and extension. Based on the ontological commitment towards an identity- or morphing-based model concept drift is defined differently, and the qualitative follow-up notions are therefore defined differently.
For the former, the instability over a time period and concept shift between time points (where part of the meaning of a concept shifts to some other concept) are crucial notions. For the latter, the notion of split and the morphing strength will be introduced.
As our proposal is meant in a very generic way we need to discuss the notions of intension and extension in more detail.
Case-studies. We instantiate our framework in four case-studies, studying concept drift in our motivating case, a political ontology in SKOS [17] used by communication scientists for political analysis as well as a general purpose RDFS [30] ontology, DBpedia [25], a legal OWL [31] ontology, LKIF-Core [15], three OBO biomedical ontologies [41]. We investigate the introduced mechanisms for studying concept drift in these four different KR models. Our experiments show the feasibility of both the formalisation and identification mechanisms by pointing to some examples of concept (in)stability, shift, split, and morphing strength which were identified as relevant by collaborating domain experts. We are aware that those case-studies do not constitute a formal evaluation or even less empirical proofs of validity of our approach. However, they indicate the broad coverage and potential of the toolkit (although not all tools are useful in all scenarios). Furthermore, selected domain experts have in three of the four cases confirmed the usefulness of some of the results, the fourth being general enough to allow an informal assessment without expert input.
Contributions. This paper operationalises established (philosophical) insights into a general pragmatic framework for dealing with concept drift and developed a general toolkit to detect drift in different domains. We believe that we contribute to a better understanding of temporal change of meaning in formal knowledge organisation schemes and its impact in practical applications. We motivate and define the crucial notions of shift, split, stability and morphing strength for different ontological commitments and different types of Knowledge organisation systems. The four case-studies illustrate the feasibility of our framework in analysing concept drift in knowledge organisation schemas of varying expressiveness. This is a significantly extended version of our previous paper [47] in which some of the ideas were introduced. We now provide a far more detailed analysis and formalisations, including an alternative model for dealing with changes based on concept morphing (with the related notions of split and morphing strength), we also provide a far more thorough evaluations including new datasets.
Structure of the paper. Section 2 gives a general introduction to the four types of knowledge organisation schemas investigated and discusses the most relevant related work. In Section 3, we provide the basic formalisation of the meaning of concepts and on which two theories of concept drift are based, namely identity-based and morphing-based. These two theories are defined in Section 4 together with a practical toolkit for analysing concept drift. In Section 5, we apply our general framework in four different case studies involving four different KOSs. Finally, Section 6 concludes the paper.
Section snippets
Background
In this section, we describe some general notions that are necessary for understanding the remainder of the paper. In addition, we describe what has been done on formalizing concept drift in knowledge systems.
Two theories of concept drift
Arguably, the meaning of concepts changes over time. In this section we define precisely what this actually means, as there are different philosophical views on the matter. The two core alternatives can briefly be summarised upfront:
- •
Although possibly changing its meaning, a concept can exist over periods of time. Different variants of the same concepts can differ in meaning.
- •
Concepts only exist at specific moments in time, and evolve gradually into some other concepts (possibly with almost the
A qualitative toolkit for analysing concept drift
In the previous section, we have formally defined concept drift as the change of aspects of the meaning of a concept over time. Concept drift happens regularly. Even if it can be measured, it is often difficult to grasp its impact, which probably results in a far too fine-grained analysis. In this section, we will define some more practical notions that describe changes in meaning: our toolkit for analysing concept drift.
This toolkit takes both possible notions of drift into account, and we
Case-studies
In this section we will discuss four different case-studies as examples on how to use our methods, and to give some evidence of the potential of the ideas introduced in Section 3. The purpose of these case-studies is twofold: first, we want to indicate how to practically apply our analysis and tool-kit by exemplifying the usage in a number of very different scenarios, and based on a variety of different knowledge organisation principles. Secondly, we believe that we gather evidence of the
Conclusion
More and more applications critically depend on some kind of concept schemes for the semantic interoperability of their data. However, although it is recognised by many as a critical problem, the continuous change in meaning of concepts (called drift in this paper) has not yet received the attention it deserves in the ontology modelling community. Despite the significant efforts that have gone into topics such as ontology evolution, semantic versioning or temporal modelling and reasoning, most
Acknowledgements
We are especially grateful to Janet Takens, Jan Kleinnijenhuis, Sebastian Khler, Peter Robinson, Sandra Dölken, Paea LePendu, Rinke Hoekstra, Doug Howe who participated our use case study and provided crucial feedback.
References (53)
- et al.
Swoop: a web ontology editing browser
Web Semantics: Sci Services AgentsWorld Wide Web
(2006) - et al.
Understanding ontology evolution: a change detection approach
Web Semantics: Sci. Services Agents World Wide Web
(2007) Language
(1933)- Brockhaus: Europaeische gemeinschaft, translated,...
- DTV, DTV Atlas,...
- Encyclopedia britannica, online version visited 27 May 2010,...
- European Union, The history of the european union, Website, <http://europa.eu/abc/history/index_en.htm>,...
- et al.
Conceptual clustering and its application to concept drift and novelty detection
- et al.
Ontology change: classification and survey
Knowledge Eng. Rev.
(2008) Über sinn und bedeutung
Z. Philosophie Philosophische Kritik
(1892)
Evaluating ontological decisions with ontoclean
Commun. ACM
A model theoretic semantics for ontology versioning
Étude comparative de la distribution florale dans une portion des alpes et des jura
Bull. SociTtT Vaudoise Sci. Naturelles
Ontology mapping: the state of the art
Knowledge Eng. Rev.
Ontology versioning and change detection on the web
Concept versioning: A methodology for tracking evolutionary concept drift in dynamic concept systems
Dbpedia – a crystallization point for the web of data
J. Web Semantics
Cited by (71)
Concept Drift Detection in Data Stream Mining: A literature review
2022, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :Here, when means the drift occurrence time; what defines the different types of drift present in the data stream; when and how the decision model adaptation is examined for concept drift situation. Most of the research work is focused on some limitations of the data stream such as unbounded length, change in concept, the evolution of the new concept, and recurring concept Masud et al., 2011; Faria et al., 2016; Gama et al., 2004; Nishida, 2008; Wang et al., 2011. The concept drift detectors are used in various applications like detection of theft in the energy distribution system, churn prediction for mobile companies, fraud detection, etc.
Model drift: When it can be a sign of success and when it can be an occult problem
2022, Intelligence-Based MedicineKnowledge graph quality control: A survey
2021, Fundamental ResearchNovel hybrid pair recommendations based on a large-scale comparative study of concept drift detection
2021, Expert Systems with ApplicationsCitation Excerpt :In evolving/non-stationary environments, the underlying distribution of data may be altered with time owing to the phenomenon of concept drift (Schlimmer & Granger, 1986; Widmer & Kubat, 1996)_. Concept drift is a situation in which the statistical features of the target concept diverge over the course of time (Wang, Schlobach, & Klein, 2011); the “concept” refers to the quantity that a learning model is trying to predict (i.e., the variable). The term meta-learning made its appearance in machine learning research, which is focused on increasing efficiency in proportion to experience.
A review of tracking concept drift detection in machine learning
2024, Recent Trends in Computational Sciences - Proceedings of the 4th Annual International Conference on Data Science, Machine Learning and Blockchain Technology, AICDMB 2023Unflattening Knowledge Graphs
2023, K-CAP 2023 - Proceedings of the 12th Knowledge Capture Conference 2023