Elsevier

Applied Soft Computing

Volume 57, August 2017, Pages 615-626
Applied Soft Computing

Full Length Article
Unsupervised Mode of Rejection of Foreign Patterns

https://doi.org/10.1016/j.asoc.2017.04.036Get rights and content

Highlights

  • Studied is rejecting option in pattern recognition problem.

  • Recognizing native (proper) patterns and rejecting foreign (erroneous) patterns.

  • Clustering in unsupervised mode to discover data structures.

  • Novel unsupervised mode to reject foreign patterns.

  • Empirical evaluations on a suite of publicly available medical datasets.

Abstract

The study deals with an issue of recognition of native (proper) patterns and rejection of foreign (erroneous) patterns. We present a novel unsupervised approach to rejecting foreign patterns. We construct a geometrical model, which identifies regions in the feature space that are predominantly occupied by native patterns and determines regions where foreign patterns are localized. The model is constructed in an unsupervised mode: we engage clustering to discover structures in the data and use the revealed geometry to form regions with high likelihood of being occupied by native patterns and regions in which foreign patterns are likely to be localized. The geometry of the region of rejected patterns is adjusted by two parameters, which are tuned to achieve a sound balance between rejection of foreign patterns and acceptance of native patterns. It is shown that the proposed method is applicable not only to multiclass data processing problems, but it could also be beneficial in situations when the only available information concerns a single phenomenon (a so-called a one-class data). We demonstrate the usefulness of the proposed approach by studying several publicly available medical datasets.

Introduction

A standard approach to a classification task concerns a formation of a model (classifier) that assigns a class label to each input pattern so that a certain performance measure (say, classification error) becomes minimized. However, we often encounter problems with patterns quality, which hinders performance of the resulting classifier. The first compelling example concerns a situation when highly contaminated (distorted) data are processed. For instance, some data samples may be completely erroneous and as such should not be classified at all; say a segmentation procedure is flawed and improperly extracts patterns, or two streams of data coming from two experiments have been mistakenly merged into a single one. Second, an example of a real-world problem that does not fall within the realm of a standard data processing task is novelty detection [21], [30]. Novelty detection could be expressed as a one-class classification problem [8], [20]. Patterns that do not belong to the recognized class are assumed to be the novelty. In particular, a noteworthy area of application of novelty detection methods is computer-aided medical diagnosis. Say, we have data describing patients who suffer from a certain illness. A new instance (patient), with the use of novelty detection methods, could be accounted as native to the recognized class (and as such we consider him being a sick patient and proceed with appropriate medical care) or foreign to the recognized class (rejected, being healthy). In this way, without any knowledge about the characteristics of the foreign class, we may reject certain patterns based on their “dissimilarity” to native patterns. An ability to form a binary decision rule (to accept or reject a pattern) but based only on the knowledge about native patterns is especially precious when samples of the other class (samples of the foreign class) are very difficult to obtain or differ very much. The described issues are a common problem for computer-aided medical diagnosis. Therefore, novelty detection is a vital issue in medical applications, where the majority (or all) of the gathered data comes from ill subjects and we may use it to train a one-class model that could help us determine whether given new material belongs to this class or not. Such application of machine learning could help doctors in their decision-making problems and reduce the number of invasive and costly tests needed otherwise to formulate a diagnosis. The studied have already presented a few computer aided diagnosis systems based on classification principles, including classifiers with reject option. The study reported in [19] presented an automated model diagnosing vertebral column pathologies [22], proposed a method for detection of antinuclear autoantibodies [16], and discussed a fuzzy rule-based model for assessing coronary artery disease.

With the above motivation in mind, in this study we present a novel approach to foreign patterns rejection. By foreign patterns we mean those patterns that should not be classified into any class and at the same time they do not form their own class(es). We propose an unsupervised approach to form the rejection mechanism with an ultimate objective to separate foreign patterns from native patterns. An issue of constructing multiclass classifiers for native patterns is not considered. Instead, whenever we deal with a native dataset comprising several classes, we regard it as a single-class problem (viz. a single class of native patterns). We define models that determine regions predominantly occupied by native patterns. Patterns that do not fall into such regions are rejected. The formation of these regions is realized in an unsupervised mode. This entails that no prior knowledge about foreign patterns is required to establish the rejection mechanism.

The ultimate objective of this paper is to present and investigate properties of a novel unsupervised approach to foreign patterns rejection. The proposed approach is related to geometrical approaches to rejection. The introduced perspective links the notion of distance with membership grades encountered in fuzzy sets. Both criteria are endowed with parameters and this allows for some flexibility to adjust the criteria to fit a particular problem at hand. With this regard, it is worth stressing that this contribution of this study is original and novel. In the paper, we show that the method could be applied to versatile pattern recognition and classification problems, including critical areas, such as computer-aided medical diagnosis.

The proposed method is first investigated at the conceptual and algorithmic level and subsequently, applied to several medical datasets.

Though the literature of the area of classification is abundant, relatively few attempts have been made to deal with the problem of rejection of foreign patterns (as a matter of fact, this type of problem formulation has not been encountered very often). Among techniques that relate to the approach outlined here are methods that produce classification decisions in the form of flexible scores. A degree of membership to a class is evaluated on a certain scale. Therefore, rejection is realized by eliminating those patterns for which the scores are low. The literature on the topic brings several classification methods that work in this manner, for example [1], [17], [24]. Classification scores could be obtained, for instance, with an extension of linear discriminant analysis, as it is reported in the second of the mentioned papers. Apart from the theoretical studies, literature offers studies where rejection methods are applied to aid particular pattern recognition problems. For example [14], contains a study on handwritten words recognition. Score-based rejection itself has been recognized quite a long time ago: in 1970C.K. Chow proposed to reinforce probabilistic classifiers with ambiguity rejection option that basically assumed that a patterns with low score were rejected [5]. In the same sense rejection option appears in the recalled papers on computer aided medical diagnosis [16], [19], [22].

It should be stressed that foreign patterns are not outliers. Outliers are native patterns that substantially differ from the majority of data. We remove outliers, because they tend to cause problems with construction of a classifier. In contrast, foreign elements are unknown to us at the stage of data preprocessing and classifier construction and we have to reject them, because they do not belong to the data at all. Notably, there is a substantial development in the area of outlier detection, for instance reported in [6]. Scope of the studies on outlier detection is extended to semi-supervised learning methods as well, as reported in [23], and it has been shown that we can efficiently remove native outliers from the data, but the issue of unknown, foreign elements remains rarely mentioned.

As we mentioned at the beginning of the introduction, one-class classification (unary classification) is a stream of studies tightly related to the subject of this paper. Let us reiterate, that in one-class classification, we construct a model based on a given training set of one class. The model is able to evaluate how a given new pattern resembles patterns from the training set. Resemblance is typically calculated based on distance or another similarity measure of choice. Among different methods for one-class classification one should mention estimators relying on a certain data distribution [31]. This group of methods shares a common weakness – their reliability is relatively low when we have a limited amount of data for model training. As a remedy for that, the so-called boundary approaches, which consider only a closed boundary around the trained set, were proposed. In this stream of studies we shall mention methods such as k-centers [31] and Support Vector Data Description [25]. These methods aim at constructing a boundary around a given dataset. To minimize the chance of accepting foreign patterns (the cited works call them outliers) the volume of a region enclosing the dataset becomes minimized.

Finally, one may mention Learning from Positive and Unlabeled Examples; PU learning. This approach is usually discussed as an example of partially supervised classification. Evolution of this approach was motivated by classification cases where there was one large class (called positive) and a large number of other small classes or unlabeled patterns at the same time and there are no so called negative instances. In such case, standard approaches to classification are unable to distinguish negative instances, as there were none for the classifier training. Such problems appear in the domain of text mining and biomedical informatics, etc. In those domains we often encounter problems with one *dominant class and a multitude of rare classes, which are sometimes very hard to correctly label. Typically the PU learning approach is based on assigning similarity scores for new patterns and the training set of the positive class. Among early papers concerning ideas of PU learning one may mention the study reported in [29].

This paper is structured as follows. Section 2 presents basic notions of clustering and in particular, of fuzzy clustering. Section 3 introduces the proposed approach to foreign patterns rejection. Section 4 covers empirical experiments, where we apply and discuss properties of our method. Section 5 concludes this paper and highlights future research directions.

Section snippets

Essential features of fuzzy clustering

Before we proceed with the description of the proposed rejection mechanism, it is necessary to introduce formal notation and algorithms, which are used in the later parts of this paper.

A straightforward idea to patterns rejection is transferred from the field of clustering. We see high resemblance in clustering and in rejection task. Clustering aims at identification of similar subsets of objects within a given dataset. By analogy, a reversed foreign patterns rejection task may be depicted as

Identifying regions of foreign data: unsupervised learning approach

The underlying observation behind the proposed approach is that clustering determines certain geometry in the feature space in which the patterns are located. Owing to the membership grades one can effectively identify regions in which there is a high likelihood of foreign patterns. Furthermore, as this development is based on the method of unsupervised learning, it is free of any explicit assumptions about specific (say, statistical) characteristics of the foreign data. At the same time, it is

Experimental studies

In this section, we apply the proposed method in three case studies concerning different medical datasets. We investigate properties of the proposed procedure. In particular, we are interested in model tuning. We show how to evaluate different combinations of ε and δ parameters and we look into properties of foreign patterns rejection and native patterns acceptance.

As we mentioned in the introduction, an important area of application of foreign patterns rejection techniques is computer-aided

Conclusions

This paper deals with the issue of native (proper) patterns recognition and foreign (outlying, erroneous) rejection. In the study, we presented a novel unsupervised approach to foreign patterns rejection. We proposed a construction of a geometric model, which defines regions in the feature space for native and for foreign patterns. It is worth highlighting that in order to construct such model, only the native patterns are considered. In the development of the method, no specific information

Acknowledgment

The research is supported by the National Science Centre, grant No 2012/07/B/ST6/01501, decision no. UMO-2012/07/B/ST6/01501.

References (31)

  • L. Breiman, A. Cutler, A. Liaw, M. Wiener, Package randomForest:...
  • M.M. Breunig et al.

    LOF: identifying density-based local outliers

    Proc. of the 2000 ACM SIGMOD International Conference on Management of Data

    (2000)
  • C.K. Chow

    On optimum recognition error and reject tradeoff

    IEEE Trans. Inf. Theory

    (1970)
  • B. Desgraupes, Clustering Indices, published in 2013, online source:...
  • M. Ester et al.
  • Cited by (4)

    View full text