Dependent binary relevance models for multi-label classification
Introduction
Multi-label classification (MLC) is a machine learning problem in which models are sought that assign a subset of (class) labels to each object, unlike conventional (single-class) classification that involves predicting only a single class. Multi-label classification problems are ubiquitous and naturally occur, for instance, in assigning keywords to a paper, tags to resources in a social network, objects to images or emotional expressions to human faces.
There is a considerable amount of literature, in which state-of-the-art binary or multi-class classification algorithms are adapted and extended to the setting of MLC, including methods using decision trees [1], instance-based algorithms [2], neural networks [3], support vector machines [4], naive Bayes [5], conditional random fields [6] and boosting [7]. Besides, there is also another line of research, in which approaches of that kind are completely put aside; instead, the development of specialized methods that consider the particularities of multi-label data is advocated.
In general, the problem of multi-label learning is coming with two fundamental challenges. The first one bears on the computational complexity of the algorithms. If the number of labels is large, then a complex approach might not be applicable in practice. Therefore, the scalability of algorithms is a key issue in this field. The second problem is related to the very nature of multi-label data. Not only is the number of classes typically larger than in multi-class classification tasks, but also each example belongs to a variable-sized subset of labels simultaneously. Moreover, and perhaps even more importantly, the labels will normally not occur independent of each other; instead, there are statistical dependencies between them. From a learning and prediction point of view, these relationships constitute a promising source of information, in addition to that coming from the mere description of the objects. Thus, it is hardly surprising that research on MLC has very much focused on the design of new methods that are able to detect—and benefit from—interdependencies among labels.
In recent years, many papers have analyzed the presence of label correlations, including theoretical analyses of label dependence in the context of MLC [8]. In this regard, different types of dependence have been formally distinguished, such as conditional dependence [6], [9], [10], [11], [12] and marginal (unconditional) dependence [3], [13], [14]. Other papers are aiming at the exploitation of relations in different sets of labels, such as pairwise relations [3], [4], [7], [15], [16], relations in sets of different sizes [11], [17], [18], or relations in the whole set of labels [10], [13], [14]. Exploiting label dependence implicates the induction of complex models. In fact, the more the label combinations are considered, the more complex the models are. This does not mean that exploiting pairwise correlations is preferable to exploiting full-order correlations, since the former may fail to capture the true dependencies while the latter may not work well if the labels display complex relations that are difficult to deal with.
This paper proposes dependent binary relevance (DBR) models as an efficient and effective approach to induce multi-label classifiers that exploit conditional label dependence. Instead of studying them in combination with independent classifiers, like in [10], our goal is to explore their behavior when used in isolation, extending the work presented in [19] in which this approach was favorably compared with several state-of-the-art methods [3], [11], [13], [18]. The DBR approach is conceived as a natural extension of the simple binary relevance strategy, which does not allow for exploiting conditional label dependence. We shall elaborate on the positioning of our approach more closely in Section 3, where we argue that this approach combines properties of two other meta-techniques for MLC, namely chaining [11] and stacking [14], and that it fills a “gap” within the spectrum of methods that have been devised so far.
A key contribution of this paper is a deep analysis of the properties of dependent binary models, in which we characterize those conditions under which they should work well in practice. These models require label estimations (produced by any multi-label classifier) at prediction time. This issue is analyzed throughout the paper, concluding that the more reliable these estimations are, the better the overall performance becomes.
Another contribution of this work is to present a comprehensive study of methods based on chaining [11] and stacking [14] strategies. Our goal is to analyze these two approaches, which are closely connected, and to study those factors that have an influence on their performance. A key distinction between both approaches is the type of training data they rely on, which in turn has a decisive impact on the kind of label dependence captured.
The rest of the paper is organized as follows. The next section introduces multi-label classification in a more formal way. Stacking and chaining methods are reviewed in Section 3. Section 4 is devoted to the new DBR technique; we describe this approach formally and provide a detailed analysis of its properties. Finally, experimental results are reported in Section 5, before concluding the paper in Section 6.
Section snippets
Multi-label classification
Before describing some previous approaches to tackle multi-label classification, we present this learning task in a more formal way. The point of departure is a finite and non-empty set of labels and a training set . The elements of this set are supposed to be independently and randomly drawn according to an unknown probability distribution on , where and are the input and the output space respectively. The former is the space of the object
Modeling label dependence
The arguably most natural way to capture label dependencies is to learn classifier models that condition the prediction of a label yi not only on the object features but also on some of the other labels yj. This idea of conditioning can be realized in different ways. In particular, the following distinctions can be made:
- (i)
Full vs. partial conditioning: The prediction of yi can be conditioned on all other labels or only on a subset of these labels. The most “sparse”
Dependent binary relevance classifiers
Our proposal of dependent binary relevance (DBR) models relies on two main hypotheses: First, taking conditional label dependencies into account is important for performing well in multi-label classification tasks. Second, modeling and learning such dependencies in a redundant way, for example using cyclic graph structures instead of simple chains, may further improve performance in practice.
The first assumption is at the core of research in multi-label classification and quite uncontested in
Experiments
This section reports the settings and the results of the experiments performed. The main goal of these experiments is to make an exhaustive comparison between stacking and chaining approaches, including our newly proposed one. The aim is to prove which one performs better than the rest for each of the metrics considered, and if one of them is superior with regard to capturing label dependence. For a comparison of some of these methods, for instance CC, with other multi-label learners, see [31].
Conclusions and future work
Although multi-label classification can be seen as a simple extension of the well-studied single-class classification, it comes with the challenge that labels usually display dependencies amongst each other. This paper proposes the dependent binary relevance (DBR) approach to cope with multi-label classification under the hypothesis that, in general, the prediction of each label can benefit from information about the other labels. To that end, our learner employs an extended feature space
Conflict of interest
None declared.
Acknowledgments
This research has been supported by Spanish Ministerio de Economía y Competitividad (Grant TIN2011-23558) and by the German Research Foundation (DFG).
Elena Montañes received her M.Sc. degree in Mathematics in 1998 and her Ph.D. in Computer Science from the University of Oviedo, Spain, in 2003. She is currently a senior Lecturer and a member of the Artificial Intelligence Center, Gijón. Her research interests are focused on several machine learning problems, e.g. multi-label and multi-instance tasks.
References (33)
- et al.
Ml-knnA lazy learning approach to multi-label learning
Pattern Recognition
(2007) - et al.
An extensive experimental comparison of methods for multi-label learning
Pattern Recognition
(2012) - A. Clare, R.D. King, Knowledge discovery in multi-label phenotype data, in: European Conference on Data Mining and...
- et al.
Multilabel neural networks with applications to functional genomics and text categorization
IEEE Transactions on Knowledge and Data Engineering
(2006) - A. Elisseeff, J. Weston, A kernel method for multi-labelled classification, in: ACM Conference on Research and...
- A.K. McCallum, Multi-label text classification with a mixture model trained by EM, in: AAAI 99 Workshop on Text...
- N. Ghamrawi, A. McCallum, Collective multi-label classification, in: ACM International Conference on Information and...
- et al.
Boostextera boosting-based system for text categorization
Machine Learning
(2000) - et al.
On label dependence and loss minimization in multi-label classification
Machine Learning
(2012) - K. Dembczyński, W. Cheng, E. Hüllermeier, Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains,...
Classifier chains for multi-label classification
Machine Learning
Combining instance-based learning and logistic regression for multilabel classification
Machine Learning
Multilabel classification via calibrated label ranking
Machine Learning
Cited by (139)
A thorough experimental comparison of multilabel methods for classification performance
2024, Pattern RecognitionA survey on multi-label feature selection from perspectives of label fusion
2023, Information FusionDeep self-organizing cube: A novel multi-dimensional classifier for multiple output learning
2023, Expert Systems with ApplicationsStaC: Stacked chaining for multi-label classification
2023, Expert Systems with ApplicationsResearch on multi-label user classification of social media based on ML-KNN algorithm
2023, Technological Forecasting and Social ChangeCitation Excerpt :As part of the binary association method, each label is treated as a single label and dichotomies are applied to it. Then the samples are input into multiple dichotomies, respectively, and finally the dichotomies are classified as positive cases as a set of labels for the samples to be predicted (Montañes et al., 2014; Yuan et al., 2018; Hadi and Kusprasapta, 2021). The classifier chain method is based on the number of labels trained by multiple classifiers.
Elena Montañes received her M.Sc. degree in Mathematics in 1998 and her Ph.D. in Computer Science from the University of Oviedo, Spain, in 2003. She is currently a senior Lecturer and a member of the Artificial Intelligence Center, Gijón. Her research interests are focused on several machine learning problems, e.g. multi-label and multi-instance tasks.
Robin Senge received his M.Sc. degree in Computer Science from the University of Marburg, Germany, in 2006. After graduation, he worked as a software engineer in industry, where he was consigned to consulting and developed software for financial applications. In 2009, he joined the Computational Intelligence Group (CIG) at the University of Marburg as a doctoral student. His research interests are focused on machine learning and fuzzy systems.
Jose Barranquero is a predoctoral researcher at Artificial Intelligence Center (University of Oviedo). He has a M.Sc. in Web Engineering and a M.Sc. in Soft Computing and Intelligent Data Analysis from the University of Oviedo. His research interests include machine learning, quantification, opinion mining and microblogging networks.
José Ramón Quevedo received his M.Sc. and his Ph.D. degrees in Computer Science from the University of Oviedo, Spain, in 1997 and 2000 respectively. He is currently a Senior Lecturer and a member of the Artificial Intelligence Center, Gijón. His current research is focused on applying machine learning methods to bioinformatics applications.
Juan José del Coz received his Ph.D. degree in Computer Science from the University of Oviedo at Gijón, Spain, in 2000. In 1997, he joined the Computer Science Department of the University of Oviedo, where he is currently a Tenured Associate Professor. He has authored over thirty papers in peer reviewed journals and conferences including articles in NIPS, ICML, JMLR and Pattern Recognition.
Eyke Hüllermeier is with the Department of Mathematics and Computer Science at Marburg University (Germany), where he holds an appointment as a full professor and heads the Computational Intelligence Group. In research, he is mostly interested in machine learning, uncertainty in knowledge-based systems, and applications in bioinformatics.