Elsevier

Pattern Recognition

Volume 48, Issue 12, December 2015, Pages 3969-3982
Pattern Recognition

On the usefulness of one-class classifier ensembles for decomposition of multi-class problems

https://doi.org/10.1016/j.patcog.2015.06.001Get rights and content

Highlights

  • A novel approach for tackling multi-class problems with ensemble of one-class classifiers.

  • A thorough analysis on areas of applicability of one-class decomposition.

  • Identification of problems, for which one-class decomposition is superior to binarization.

  • Examination of correlations between the used fusion technique and decomposition scheme.

  • An exhaustive experimental study that gives a detailed outlook on the quality of investigated methods.

Abstract

Multi-class classification can be addressed in a plethora of ways. One of the most promising research directions is applying the divide and conquer rule, by decomposing the given problem into a set of simpler sub-problems and then reconstructing the original decision space from local responses.

In this paper, we propose to investigate the usefulness of applying one-class classifiers to this task, by assigning a dedicated one-class descriptor to each class, with three main approaches: one-versus-one, one-versus-all and trained fusers. Despite not using all the knowledge available, one-class classifiers display several desirable properties that may be of benefit to the decomposition task. They can adapt to the unique properties of the target class, trying to fit a best concept description. Thus they are robust to many difficulties embedded in the nature of data, such as noise, imbalanced or complex distribution. We analyze the possibilities of applying an ensemble of one-class methods to tackle multi-class problems, with a special attention paid to the final stage – reconstruction of the original multi-class problem. Although binary decomposition is more suitable for most standard datasets, we identify the specific areas of applicability for one-class classifier decomposition.

To do so, we develop a double study: first, for a given fusion method, we compare one-class and binary classifiers to find the correlations between classifier models and fusion algorithms. Then, we compare the best methods from each group (one-versus-one, one-versus-all and trained fusers) to draw conclusions about the overall performance of one-class solutions. We show, backed-up by thorough statistical analysis, that one-class decomposition is a worthwhile approach, especially in case of problems with complex distribution and a large number of classes.

Introduction

Multi-class problems are abundant in real-life applications. Ranging from several classes in chemometrics [4], medicine [10], by dozens in object recognition [16] and computer vision [17] to hundreds in biometrics [11]. Often with the increase in the number of classes, comes the increased complexity of the estimated decision rules. This leads to the possibility of overfitting and increasing the computational cost of recognition system. Additionally, the difficulties in classification may be present only for some classes, while others can be separated with minimal error.

Building a classifier that handles only a reduced subset of classes may be a solution to these problems. As binary classification itself is well-studied in the last years [47], binary decomposition methods have gained a significant attention of the machine learning community [1]. Binary classifiers return simpler decision boundaries and allow us to reduce the competence areas of each classifier, thus producing locally specialized learners. This leads to a creation of an ensemble of binary learners [63], each dedicated to a sub-problem. From their local decision, the dedicated fusion method must reconstruct the original multi-class problem. While binary decomposition has been proven to perform very well in most multi-class problems [21], it has some limitations as being very dependent on the selected fusion method or low robustness to imbalanced and sparse distribution. That is why novel methods for tackling multi-class data need to be examined, having in mind that binary decomposition is a well-established point of reference.

In this work, we turn the attention towards one-class classification (OCC), which is quite young, yet challenging machine learning domain [35]. OCC seems a natural way of decomposing a multi-class dataset. We consider each class as independent and train a one-class model on each of them. Then, we reconstruct the original problem with a dedicated fusion algorithm, just as in binary decomposition.

Application of OCC to problems, where generating negative examples can be costly, dangerous or simply impossible [24] is obvious. However, one may ask: What is the point of applying OCC to multi-class problems, where the representatives of all classes are given beforehand? Some studies show that when objects from all classes are available it is preferable to use binary classifiers than their one-class counterparts [8], [26].

However, in this work we propose to evaluate a hypothesis that one-class ensembles can achieve high accuracy for handling multi-class datasets, despite discarding counterexamples during the training phase.

OCC methods have several desirable properties that may aid the process of multi-class decomposition. Their primal difference with binary learners lie in the nature of the training phase. Binary classifiers try to find such a decision boundary that will minimize the error on objects from both classes. OCC aims at capturing the unique properties of the target class, by finding the best possible description that at the same time will describe the analyzed set and not be overfitted to the given data. Due to this different principles, the decomposition with OCC presents itself as an interesting tool for handling complex data structures. Our previous studies point out that for some specific cases OCC decomposition may be of better use than other methods [37].

We aim at presenting an exhaustive study on usefulness and effectiveness of multi-class decomposition with one-class classifiers. To put our findings into context, we compare our proposed approach with state-of-the-art methods for binary decomposition, which is contemporary the most popular approach for this task. We show that for specific multi-class cases OCC can be a better choice than binary approach, as opposed to the reports in the literature [8], [26]. Building up on our previous experiences, we propose to analyze the influence of combination methods on the decomposition ensembles and compare the most popular binary and one-class classifiers.

The contribution of this work is as follows:

  • We identify the areas of applicability for OCC decomposition. We give an outlook on the potential high usefulness of OCC in this domain. We emphasize that OCC is not a universal solution for handling multi-class datasets. However, we aim at presenting the cases, in which OCC can outperform the binary approach, despite having less information during the learning phase. We discuss the reasons behind this and show that for complex data OCC offer a worthwhile alternative to binary methods.

  • We study three types of aggregation methods for re-building original multi-class problem – one-versus-one, one-versus-all and trained fusers. Although such a comparison for binary classifiers exists [21] (and we use its findings in our work), such analysis in case of one-class classifiers is missing. So far only combination strategies for one-class problems (not multi-class decomposition) have been examined [52].

  • We present a comparison of multi-class decomposition approaches, carried out with ensembles of one-class and binary classifiers. We apply state-of-the-art combination strategies over a diverse set of real-life datasets. With the use of thorough statistical analysis, we look for best combinations of base classifiers and aggregation methods. To the best of our knowledge, so far such a comparison spanning over binary and one-class decomposition ensembles and their fusion algorithms has not been carried out.

We want to answer some question: Can OCC, despite discarding information about counterexamples, be of use for dealing with multi-class datasets? When doing the multi-class decomposition, when it is preferable to try the OCC approach and when one should use standard binarization? This paper does not concentrate on the problem of when to use the decomposition (this topic has been discussed in [21]) – it deals with the issue on how to use it.

We base our discussion on an extensive empirical study. A diverse set of 19 multi-class benchmarks is used. We test two one-class classifiers, representing main groups of methods in this area (density estimation and boundary estimation), and compare them with well-established binary methods [22]. Based on our previous experience with binary [21] and one-class [61] fusion methods, we selected the best representatives of each of the three combination groups. We include an in-depth analysis of results to draw conclusions about the performance of analyzed methods and areas of suitability of OCC decomposition.

The rest of this paper is organized as follows. The next section presents the background for one-class classification area. In Section 3, we describe the essentials of decomposition techniques for handling multi-class data, while in Section 4 we present in detail the selected state-of-the-art combination methods for reconstructing the multi-class task. Section 5 presents the setup of the used experimental framework and its individual elements. The results of experimental investigations, together with a thorough discussion, are given in Section 6. The final Section 7 concludes the paper.

Section snippets

One-class classification

In this section, we present an overview of the topic of one-class classification. We show the unique properties of this pattern recognition problem and discuss the most popular families of methods used to tackle this task.

Here it is assumed that during the training stage only objects coming from a single class are available. These are called the target concept and are denoted by ωT. The purpose of OCC is to calculate a decision boundary that encloses all available data samples, thus describing

Decomposition techniques for multi-class problems

In the following section, we give a necessary background in the area of multi-class decomposition and binarization. We describe the groups of techniques for aggregating the binary classifiers, that are used in this study – OVO, OVA and trained fusers. Finally, we introduce the concept of applying one-class classifiers for handling multi-class datasets and the differences and similarities between this approach and binarization.

Ensemble fusion methods for aggregating local decisions

In this section, we will describe the five fusion methods for aggregating decomposed decisions used in this paper. They represent three different groups of fusers: OVA, OVO and trained combiners. Our choice of OVO and OVA methods was dictated by the results found in the comprehensive study on binary decomposition [21]. Following suggestions presented in this paper, we have selected three well-performing aggregation schemes. Additionally, we have expanded our study with two trained fusers that

Experimental setup

In this section, we describe the set-up of used experimental framework. We give the details about the used data, classification algorithms and statistical test used.

Experimental results

We have carried an extensive experimental comparison, using the framework described in Section 5. By this, we wanted to answer the following questions:

  • Can applying one-class classifiers for decomposing multi-class datasets bring better results than using binarization, despite rejecting the information about counterexamples?

  • Is there a difference from the decomposition point of view, between the density and boundary-based one-class methods?

  • What kind of aggregation method is most suitable for

Conclusions and future works

In this paper, we have evaluated a hypothesis that one-class classification can be effectively used for handling multi-class datasets. Although OCC discards information about the counterexamples, its major advantage lies in its training principle. It captures the unique properties of the target class. Therefore, for multi-class decomposition it does not try to find best separation boundaries – it aims at creating individual descriptions of each of considered classes. This leads to a completely

Conflict of interest

None declared.

Acknowledgments

Bartosz Krawczyk and Michał Woźniak were partially supported by the Polish National Science Centre under the Gant PRELUDIUM no. DEC-2013/09/N/ST6/03504 realized in years 2014-2016.

Francisco Herrera was partially supported by the Spanish Ministry of Education and Science under Project TIN2011-28488 and the Andalusian Research Plan P10-TIC-6858, P11-TIC-7765.

Bartosz Krawczyk received a B.Sc. Engineering degree in Computer Science in 2011 and M.Sc. degree with distinctions in 2012 from Wroclaw University of Technology, Poland. He was awarded as the best M.Sc. graduate by the Rector of Wroclaw University of Technology. He is currently a Research Assistant and a Ph.D. Candidate in the Department of Systems and Computer Networks at the same university. His research is focused on machine learning, multiple classifier systems, one-class classifiers,

References (65)

  • P. Juszczak et al.

    Minimum spanning tree based one-class classifier

    Neurocomputing

    (2009)
  • M.W. Koch et al.

    Cueing, feature discovery, and one-class learning for synthetic aperture radar automatic target recognition

    Neural Netw.

    (1995)
  • B. Krawczyk

    One-class classifier ensemble pruning and weighting with firefly algorithm

    Neurocomputing

    (2015)
  • B. Krawczyk et al.

    Diversity measures for one-class classifier ensembles

    Neurocomputing

    (2014)
  • B. Krawczyk et al.

    Clustering-based ensembles for one-class classification

    Inf. Sci.

    (2014)
  • L. Kuncheva et al.

    Decision templates for multiple classifier fusionan experimental comparison

    Pattern Recognit.

    (2001)
  • L.I. Kuncheva

    Using measures of similarity and inclusion for multiple classifier fusion by decision templates

    Fuzzy Sets Syst.

    (2001)
  • T.C.W. Landgrebe et al.

    The interaction between classification and reject performance for distance-based reject-option classifiers

    Pattern Recognit. Lett.

    (2006)
  • Y. Liu et al.

    A novel and quick svm-based multi-class classifier

    Pattern Recognit.

    (2006)
  • L. Manevitz et al.

    One-class document classification via neural networks

    Neurocomputing

    (2007)
  • M. Tohme et al.

    Maximum margin one class support vector machines for multiclass problems

    Pattern Recognit. Lett.

    (2011)
  • A. Wang et al.

    A novel pattern recognition algorithmcombining art network with svm to reconstruct a multi-class classifier

    Comput. Math. Appl.

    (2009)
  • T. Wilk et al.

    Soft computing methods applied to combination of one-class classifiers

    Neurocomputing

    (2012)
  • T. Windeatt et al.

    Coding and decoding strategies for multi-class learning problems

    Inf. Fusion

    (2003)
  • M. Woźniak et al.

    A survey of multiple classifier systems as hybrid systems

    Inf. Fusion

    (2014)
  • E.L. Allwein et al.

    Reducing multiclass to binarya unifying approach for margin classifiers

    J. Mach. Learn. Res.

    (2001)
  • E. Alpaydin

    Combined 5 × 2 cv f test for comparing supervised classification learning algorithms

    Neural Comput.

    (1999)
  • E. Alpaydin, E. Mayoraz, Learning error-correcting output codes from data, in: IEEE Conference Publication, vol. 2,...
  • T. Ban, S. Abe, Implementing multi-class classifiers by one-class classification methods, in: IEEE International Joint...
  • A. Bartkowiak, R. Zimroz, Outliers analysis and one class classification approach for planetary gearbox diagnosis, J....
  • C. Bellinger, S. Sharma, N. Japkowicz, One-class versus binary classification: which and when? in: 2012 11th...
  • R. Burduk, P. Trajdos, Construction of sequential classifier using confusion matrix, in: Computer Information Systems...
  • Cited by (67)

    • Simultaneous class-modelling in chemometrics: A generalization of Partial Least Squares class modelling for more than two classes by using error correcting output code matrices

      2022, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      Therefore, more columns may be involved in the classification of tough classes or minority classes or even variable-length codewords (VL-ECOC) are used [15], longer for the though classes and shorter for the easy ones, at the expense of higher computational costs [16]. Krawczyk [3] suggests the use of dynamic ensemble selection to discard non-competent binary learners. In a k-class problem, a KNN algorithm is used to pre-select the classes in a large neighborhood: when a new object must be modeled, the 3 nearest neighbors are considered and just the classes presented in this neighborhood will determine the binary learners selected.

    • Reference-free method for investigating classification uncertainty in large-scale land cover datasets

      2022, International Journal of Applied Earth Observation and Geoinformation
    • A classification method to classify bone marrow cells with class imbalance problem

      2022, Biomedical Signal Processing and Control
      Citation Excerpt :

      Therefore, achieving an effective classification with imbalanced data is crucial [12]. Data-level [13,14] and algorithm-level methods [15–17] have been proposed to handle the class imbalance problem. Data-level methods primarily alter the training data distribution, whereas algorithm-level methods alter the learning or decision process by increasing the importance of the positive class [12].

    • Using binary classifiers for one-class classification

      2022, Expert Systems with Applications
    • Dynamic selection and combination of one-class classifiers for multi-class classification

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Therefore, databases with such impairments may be a difficult to solve problem, even for a powerful technique such as deep learning. An alternative that is recently gaining attention is the one-class decomposition, where each class is treated as a separate problem [7,8]. The data from each class of the original multi-class problem is fed to a one-class classifier (OCC) and each OCC estimates the likelihood of a test example belongs to a target class.

    View all citing articles on Scopus

    Bartosz Krawczyk received a B.Sc. Engineering degree in Computer Science in 2011 and M.Sc. degree with distinctions in 2012 from Wroclaw University of Technology, Poland. He was awarded as the best M.Sc. graduate by the Rector of Wroclaw University of Technology. He is currently a Research Assistant and a Ph.D. Candidate in the Department of Systems and Computer Networks at the same university. His research is focused on machine learning, multiple classifier systems, one-class classifiers, class imbalance, and interdisciplinary applications of these methods. So far, he has published more than 90 papers in international journals and conferences. He was awarded with numerous prestigious awards for his scientific achievements like IEEE Richard E. Merwin Scholarship, PRELUDIUM and ETIUDA grants from Polish National Science Center, Scholarship of Polish Minister of Science and Higher Education or START award from Foundation for Polish Science among others. He served as a Guest Editor in four special issues of journals devoted to ensemble learning and data stream classification. He is a member of Program Committee for over 40 international conferences and a reviewer for dozen of journals.

    Michał Woźniak is a Professor of Computer Science in the Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. He received a M.S. degree in biomedical engineering in 1992 from the Wroclaw University of Technology, and Ph.D. and D.Sc. (habilitation) degrees in computer science in 1996 and 2007 respectively from the same university. His research focuses on machine learning, distributed algorithms and teleinformatics. He has published over 200 papers, three books, and has edited eight ones. He has been involved in several research projects related to the above-mentioned topics, moreover, he has been a consultant on several commercial projects for well-known Polish companies and for public administration. He is a senior member of the IEEE and a member of International Biometric Society.

    Francisco Herrera received his M.Sc. in mathematics in 1988 and Ph.D. in mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. He has been the supervisor of 28 Ph.D. students. He has published more than 240 papers in international journals. He is coauthor of the book “Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases” (World Scientific, 2001).

    He currently acts as Editor in Chief of the international journal “Progress in Artificial Intelligence” (Springer). He acts as an area editor of the International Journal of Computational Intelligence Systems and associated editor of the journals: IEEE Transactions on Fuzzy Systems, Information Sciences, Knowledge and Information Systems, Advances in Fuzzy Systems, and International Journal of Applied Metaheuristics Computing; and he serves as a member of several journal editorial boards, among others: Fuzzy Sets and Systems, Applied Intelligence, Information Fusion, Evolutionary Intelligence, International Journal of Hybrid Intelligent Systems, Memetic Computation, and Swarm and Evolutionary Computation.

    He received the following honors and awards: ECCAI Fellow 2009, IFSA 2013 Fellow, 2010 Spanish National Award on Computer Science ARITMEL to the “Spanish Engineer on Computer Science”, International Cajastur “Mamdani” Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 Paper Award (bestowed in 2011), and 2011 Lotfi A. Zadeh prize best paper Award of the International Fuzzy Systems Association.

    His current research interests include computing with words and decision making, bibliometrics, data mining, big data, data preparation, instance selection, fuzzy rule based systems, genetic fuzzy systems, knowledge extraction based on evolutionary algorithms, memetic algorithms and genetic algorithms.

    View full text