Multi-label classification using a fuzzy rough neighborhood consensus
Introduction
Research in machine learning concerns the ability to learn a confident prediction model based on a set of observations. For example, in a classification context, a learner trains a classification model on the given elements, of which the outcomes are known, and uses the derived model to predict the outcome of previously unseen instances. In the traditional single-label setting, each observation is associated with one outcome, its class label. Multi-label learning [16], [18], [60] represents a more general approach, where an observation can belong to several classes at the same time, that is, more than one class label can be associated with the same instance. The total number of classes is known, but the number of labels per instance can differ across the dataset. Multi-label learning can be more challenging than single-label learning or learning all possible classes independently, as correlations between some classes may be present. An example of such a situation is the existence of a label hierarchy, which needs to be taken into account in the prediction process [37]. Application domains of multi-label classification include image processing (e.g. [24], [49]), text categorization (e.g. [29], [31]) and bioinformatics (e.g. [5], [44]).
In a multi-label dataset, every instance x is described by a number of input features and associated with a labelset. This labelset is represented as a binary vector with m the total number of possible class labels in the dataset and (∀i)(li ∈ {0, 1}). The value li(x) indicates whether or not x belongs to class li. The task of a multi-label classifier is to predict the complete labelset of a target instance. This is inherently different from single-label classification, where only one outcome label needs to be predicted.
Several approaches to multi-label classification have been proposed in the literature. The recent overview book [18] discerns between two main families, the data transformation methods and method adaptation algorithms. The former group of methods applies a transformation to the multi-label dataset, such that it degenerates to one or more easier-to-handle single-label problems, on which a single-label classifier can be applied. Two well-known representatives of this family are the binary relevance (BR, [17]) and label powerset (LP, [3]) transformations. BR creates m binary single-label datasets, one for each class. Each dataset contains the same instances as the original multi-label dataset, but their labelsets are transformed to a single label. For the dataset associated with class li, an instance x receives the label ‘positive’ when and ‘negative’ otherwise. The LP transformation on the other hand creates only one single-label dataset. Each possible labelset receives an identifier, such that labelsets that entirely coincide are associated with the same identifier. This identifier is used as the single new class label. The second family of multi-label classification algorithms handle the multi-label dataset directly and are often based on modifications or generalizations of existing single-label classification schemes. An example is the MLKNN method proposed in [59].
In this paper, we focus on nearest neighbor methods for multi-instance classification, of which MLKNN is an example. The nearest neighbor approach [11] is an intuitive way to predict an outcome for a new observation based on a set of known instances. In its simplest form, it requires no training phase, since no classification model is built. Instead, all known instances are stored in memory as prototypes. In order to predict the outcome of a target instance, its nearest element (or set of nearest elements) is extracted from the prototype set and the prediction is derived from the outcomes of these neighbors. In particular, to classify an instance x, the k nearest neighbor classifier (kNN) locates the k nearest elements among the stored instances and aggregates their class labels to a prediction for x. In single-label classification, this is commonly achieved by a majority vote. The kNN classifier is a simple and understandable algorithm and remains popular in the machine learning community [46]. Several multi-label classifiers based on or extending kNN have been proposed in the literature. We propose a new member of this family in this contribution and use a novel way to aggregate the labelsets of the k nearest neighbors to a prediction based on fuzzy rough set theory.
Fuzzy rough set theory [12] is an alternative to traditional set theory and models uncertainty in data. It covers two complimentary aspects of uncertainty, namely vagueness (fuzziness) and indiscernibility (roughness). The former relates to unclear descriptions of concepts, to which elements can belong to a certain degree. As an example, the set of elements that are similar to a given element x is necessarily fuzzy, since some elements are intrinsically more similar to x than others and making a strict division between similar and non-similar is difficult. The membership degree of an element to a fuzzy set is represented by a real number between 0 and 1. Roughness in a dataset concerns the issue when observations that are indiscernible with respect to their descriptive features have distinct outcomes. In such a situation, it is challenging to sharply delineate the outcome concept based on the input features. Instead, a lower and an upper approximation are provided. Fuzzy rough set theory was developed as a hybridization of fuzzy set theory [58] and rough set theory [32] and has been used successfully in a variety of machine learning techniques [40]. It provides a framework to approximate a concept by two fuzzy sets, the fuzzy rough lower and upper approximation.
The fuzzy rough approximation operators are essentially based on a similarity relation that measures the degree to which elements are similar to each other. As such, they are related to nearest neighbor approaches. In this paper, we use these operators to derive a consensus prediction from the labelsets of the k nearest neighbors of a target instance. In particular, each of the neighbors of the target instance may have a different labelset and the challenge is to aggregate this information to one predicted labelset. Fuzzy rough set theory forms an ideal means to do so. Based on the similarity of the neighbors to the target, an appropriate consensus labelset is derived. We will experimentally show that our approach can outperform state-of-the-art nearest neighbor based multi-label classifiers. Following the recent study of Reyes et al. [34], we limit the comparison of our proposal to the state-of-the-art within the same classifier family, the nearest neighbor methods
The remainder of the paper is structured as follows. In Section 2, we review the existing nearest neighbor based multi-label classifiers. Section 3 recalls fuzzy rough set theory and describes our proposed classification method. Our method is carefully evaluated in an experimental study, of which the set-up is described in Section 4. Section 5 lists and analyzes the experimental results. Finally, Section 6 concludes the paper. We note that additional content, including the full experimental results, is made available at our web page http://www.cwi.ugent.be/sarah.php.
Section snippets
Related work: nearest neighbor based multi-label classifiers
To focus our discussion, we consider a subgroup of multi-label classifiers, namely those based on the nearest neighbor paradigm. Several nearest neighbor based multi-label classifiers have been proposed in the literature. We provide an overview in this section. In our experimental study, we select a number of these methods to compare among each other and against our proposed classifier. This selection is made based on the popularity and performance of these methods in other experimental
Fuzzy rough multi-label classifier
In this section, we present our proposal, that uses fuzzy rough set theory to construct an appropriate consensus among the labelsets of the k nearest neighbors of a target instance. In Section 3.1, we first provide the necessary background on fuzzy rough set theory. Section 3.2 introduces our proposal.
Experimental set-up
In this section, we establish the details of our experimental study, of which the results will be reported in Section 5. We describe the datasets (Section 4.1) on which we conduct our experiments and the metrics we use to evaluate the classification performance of the multi-label methods (Section 4.2). In Section 4.3, we specify how the different parameters of the algorithms are set. Section 4.4 describes the statistical tests used in our analysis.
Experimental evaluation
We now proceed with the empirical analysis of our proposal and a comparison of FRONEC to popular nearest neighbor multi-label classifiers. In Section 5.1, we compare the six versions of FRONEC among each other. Sections 5.2 and 5.3 compare our proposal to the five selected nearest neighbor multi-label classifiers. The former does so on the 30 synthetic datasets, while the latter uses the six real-world datasets listed in Table 1. As a reminder, the full experimental results are available on //www.cwi.ugent.be/sarah.php
Conclusion
In a multi-label classification dataset, each observation is associated with one or more classes. The challenge is to predict all classes at once, between which correlations may exist. Among the multi-label classifiers proposed in the literature, we have reviewed the family of methods based on the nearest neighbor classification principle. Generally put, the predicted labelset is derived based on the information contained in the k nearest neighbors of the instance to classify.
We have proposed a
Acknowledgments
The research of Sarah Vluymans is funded by the Special Research Fund (BOF) of Ghent University (Grant no. BOF.DOC.2014.0074). Yvan Saeys is an ISAC Marylou Ingram Scholar.
References (62)
- et al.
Learning multi-label scene classification
Pattern Recognit.
(2004) - et al.
Attribute selection with fuzzy decision reducts
Inf. Sci. (Ny)
(2010) - et al.
Neighborhood classifiers
Expert Syst. Appl.
(2008) - et al.
FSKNN: multi-label text categorization based on fuzzy similarity and k nearest neighbors
Expert Syst. Appl.
(2012) - et al.
Neighbor selection for multilabel classification
Neurocomputing
(2016) - et al.
A multi-label classification based approach for sentiment classification
Expert Syst. Appl.
(2015) - et al.
Efficient classification of multi-labeled text streams by clashing
Expert Syst. Appl.
(2014) - et al.
A comparative study of fuzzy rough sets
Fuzzy Sets Syst.
(2002) - et al.
Effective lazy learning algorithm based on a data gravitation model for multi-label learning
Inf. Sci. (Ny)
(2016) - et al.
A framework to generate synthetic multi-label datasets
Electron Notes Theor. Comput. Sci.
(2014)