Dependent binary relevance models for multi-label classification

doi:10.1016/j.patcog.2013.09.029

Pattern Recognition

Volume 47, Issue 3, March 2014, Pages 1494-1508

https://doi.org/10.1016/j.patcog.2013.09.029 Get rights and content

Highlights

•
We propose DBR as a multi-label classifier that exploits conditional label dependence.
•
DBR combines properties of both, chaining and stacking learning strategies.
•
We provide a careful analysis of the relationship between these techniques.
•
We study the underlying dependency structure and the type of training data used.
•
Our experiments show the good performance of DBR in terms of several measures.

Abstract

Several meta-learning techniques for multi-label classification (MLC), such as chaining and stacking, have already been proposed in the literature, mostly aimed at improving predictive accuracy through the exploitation of label dependencies. In this paper, we propose another technique of that kind, called dependent binary relevance (DBR) learning. DBR combines properties of both, chaining and stacking. We provide a careful analysis of the relationship between these and other techniques, specifically focusing on the underlying dependency structure and the type of training data used for model construction. Moreover, we offer an extensive empirical evaluation, in which we compare different techniques on MLC benchmark data. Our experiments provide evidence for the good performance of DBR in terms of several evaluation measures that are commonly used in MLC.

Introduction

Multi-label classification (MLC) is a machine learning problem in which models are sought that assign a subset of (class) labels to each object, unlike conventional (single-class) classification that involves predicting only a single class. Multi-label classification problems are ubiquitous and naturally occur, for instance, in assigning keywords to a paper, tags to resources in a social network, objects to images or emotional expressions to human faces.

There is a considerable amount of literature, in which state-of-the-art binary or multi-class classification algorithms are adapted and extended to the setting of MLC, including methods using decision trees [1], instance-based algorithms [2], neural networks [3], support vector machines [4], naive Bayes [5], conditional random fields [6] and boosting [7]. Besides, there is also another line of research, in which approaches of that kind are completely put aside; instead, the development of specialized methods that consider the particularities of multi-label data is advocated.

In general, the problem of multi-label learning is coming with two fundamental challenges. The first one bears on the computational complexity of the algorithms. If the number of labels is large, then a complex approach might not be applicable in practice. Therefore, the scalability of algorithms is a key issue in this field. The second problem is related to the very nature of multi-label data. Not only is the number of classes typically larger than in multi-class classification tasks, but also each example belongs to a variable-sized subset of labels simultaneously. Moreover, and perhaps even more importantly, the labels will normally not occur independent of each other; instead, there are statistical dependencies between them. From a learning and prediction point of view, these relationships constitute a promising source of information, in addition to that coming from the mere description of the objects. Thus, it is hardly surprising that research on MLC has very much focused on the design of new methods that are able to detect—and benefit from—interdependencies among labels.

In recent years, many papers have analyzed the presence of label correlations, including theoretical analyses of label dependence in the context of MLC [8]. In this regard, different types of dependence have been formally distinguished, such as conditional dependence [6], [9], [10], [11], [12] and marginal (unconditional) dependence [3], [13], [14]. Other papers are aiming at the exploitation of relations in different sets of labels, such as pairwise relations [3], [4], [7], [15], [16], relations in sets of different sizes [11], [17], [18], or relations in the whole set of labels [10], [13], [14]. Exploiting label dependence implicates the induction of complex models. In fact, the more the label combinations are considered, the more complex the models are. This does not mean that exploiting pairwise correlations is preferable to exploiting full-order correlations, since the former may fail to capture the true dependencies while the latter may not work well if the labels display complex relations that are difficult to deal with.

This paper proposes dependent binary relevance (DBR) models as an efficient and effective approach to induce multi-label classifiers that exploit conditional label dependence. Instead of studying them in combination with independent classifiers, like in [10], our goal is to explore their behavior when used in isolation, extending the work presented in [19] in which this approach was favorably compared with several state-of-the-art methods [3], [11], [13], [18]. The DBR approach is conceived as a natural extension of the simple binary relevance strategy, which does not allow for exploiting conditional label dependence. We shall elaborate on the positioning of our approach more closely in Section 3, where we argue that this approach combines properties of two other meta-techniques for MLC, namely chaining [11] and stacking [14], and that it fills a “gap” within the spectrum of methods that have been devised so far.

A key contribution of this paper is a deep analysis of the properties of dependent binary models, in which we characterize those conditions under which they should work well in practice. These models require label estimations (produced by any multi-label classifier) at prediction time. This issue is analyzed throughout the paper, concluding that the more reliable these estimations are, the better the overall performance becomes.

Another contribution of this work is to present a comprehensive study of methods based on chaining [11] and stacking [14] strategies. Our goal is to analyze these two approaches, which are closely connected, and to study those factors that have an influence on their performance. A key distinction between both approaches is the type of training data they rely on, which in turn has a decisive impact on the kind of label dependence captured.

The rest of the paper is organized as follows. The next section introduces multi-label classification in a more formal way. Stacking and chaining methods are reviewed in Section 3. Section 4 is devoted to the new DBR technique; we describe this approach formally and provide a detailed analysis of its properties. Finally, experimental results are reported in Section 5, before concluding the paper in Section 6.

Section snippets

Multi-label classification

Before describing some previous approaches to tackle multi-label classification, we present this learning task in a more formal way. The point of departure is a finite and non-empty set of labels $L = {ℓ_{1}, ℓ_{2}, \dots, ℓ_{m}}$ and a training set $S = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}$ . The elements of this set are supposed to be independently and randomly drawn according to an unknown probability distribution $P (X, Y)$ on $X \times Y$ , where $X$ and $Y$ are the input and the output space respectively. The former is the space of the object

Modeling label dependence

The arguably most natural way to capture label dependencies is to learn classifier models that condition the prediction of a label y_i not only on the object features $x$ but also on some of the other labels y_j. This idea of conditioning can be realized in different ways. In particular, the following distinctions can be made:

(i)
Full vs. partial conditioning: The prediction of y_i can be conditioned on all other labels ${y_{1}, \dots, y_{i - 1}, y_{i + 1}, \dots, y_{m}}$ or only on a subset of these labels. The most “sparse”

Dependent binary relevance classifiers

Our proposal of dependent binary relevance (DBR) models relies on two main hypotheses: First, taking conditional label dependencies into account is important for performing well in multi-label classification tasks. Second, modeling and learning such dependencies in a redundant way, for example using cyclic graph structures instead of simple chains, may further improve performance in practice.

The first assumption is at the core of research in multi-label classification and quite uncontested in

Experiments

This section reports the settings and the results of the experiments performed. The main goal of these experiments is to make an exhaustive comparison between stacking and chaining approaches, including our newly proposed one. The aim is to prove which one performs better than the rest for each of the metrics considered, and if one of them is superior with regard to capturing label dependence. For a comparison of some of these methods, for instance CC, with other multi-label learners, see [31].

Conclusions and future work

Although multi-label classification can be seen as a simple extension of the well-studied single-class classification, it comes with the challenge that labels usually display dependencies amongst each other. This paper proposes the dependent binary relevance (DBR) approach to cope with multi-label classification under the hypothesis that, in general, the prediction of each label can benefit from information about the other labels. To that end, our learner employs an extended feature space

Conflict of interest

None declared.

Acknowledgments

This research has been supported by Spanish Ministerio de Economía y Competitividad (Grant TIN2011-23558) and by the German Research Foundation (DFG).

Elena Montañes received her M.Sc. degree in Mathematics in 1998 and her Ph.D. in Computer Science from the University of Oviedo, Spain, in 2003. She is currently a senior Lecturer and a member of the Artificial Intelligence Center, Gijón. Her research interests are focused on several machine learning problems, e.g. multi-label and multi-instance tasks.

References (33)

M.-L. Zhang et al.
Ml-knnA lazy learning approach to multi-label learning
Pattern Recognition
(2007)
G. Madjarov et al.
An extensive experimental comparison of methods for multi-label learning
Pattern Recognition
(2012)
A. Clare, R.D. King, Knowledge discovery in multi-label phenotype data, in: European Conference on Data Mining and...
M.-L. Zhang et al.
Multilabel neural networks with applications to functional genomics and text categorization
IEEE Transactions on Knowledge and Data Engineering
(2006)
A. Elisseeff, J. Weston, A kernel method for multi-labelled classification, in: ACM Conference on Research and...
A.K. McCallum, Multi-label text classification with a mixture model trained by EM, in: AAAI 99 Workshop on Text...
N. Ghamrawi, A. McCallum, Collective multi-label classification, in: ACM International Conference on Information and...
R.E. Schapire et al.
Boostextera boosting-based system for text categorization
Machine Learning
(2000)
K. Dembczyński et al.
On label dependence and loss minimization in multi-label classification
Machine Learning
(2012)
K. Dembczyński, W. Cheng, E. Hüllermeier, Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains,...

E. Montañés, J.R. Quevedo, J.J. del Coz, Aggregating independent and dependent models to learn multi-label classifiers,...

J. Read et al.

Classifier chains for multi-label classification

Machine Learning

(2011)

G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in: Data Mining and Knowledge Discovery Handbook, 2010,...

W. Cheng et al.

Combining instance-based learning and logistic regression for multilabel classification

Machine Learning

(2009)

S. Godbole, S. Sarawagi, Discriminative methods for multi-labeled classification, in: Pacific-Asia Conference on...

J. Fürnkranz et al.

Multilabel classification via calibrated label ranking

Machine Learning

(2008)

Cited by (139)

A thorough experimental comparison of multilabel methods for classification performance
2024, Pattern Recognition
Multilabel classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one class. These problems require the development of new, efficient methods. Advantageously using the correlation among different labels can provide better performance than methods that manage each label separately. In recent decades, many methods have been developed to deal with multilabel datasets, which makes it difficult to decide which method is the most appropriate for a given task. In this paper, we present the most comprehensive comparison carried out so far. We compare a total of 62 different methods and several configurations of each one for a total of 197 trained models. We also use a large set of problems comprising 65 datasets. In addition, we studied the efficiency of the methods considering six different classification performance metrics. Our results show that, although there are methods that repeatedly appear among the top-performing models, the best methods are closely related to the metric used for evaluating the performance. We also analyzed different aspects of the behavior of the methods.
A survey on multi-label feature selection from perspectives of label fusion
2023, Information Fusion
With the rapid advancement of big data technology, high-dimensional datasets comprising multi-label data have become prevalent in various fields. However, these datasets often contain more relevant and redundant features, which can adversely affect the performance of machine learning algorithms. Multi-label feature selection (MLFS) has emerged as a crucial pre-processing step in multi-label learning to address this issue. This survey provides an overview of multi-label learning and its algorithms, including problem transformation and algorithm adaptation. We also introduced three traditional strategies for MLFS: filter, wrapper, and embedded-based methods. Furthermore, we categorize existing research on multi-label feature selection into six aspects based on label fusion: label transformation-based (Binary Relevance-based and Label Powerset-based), label correlation-based (second and high-order, high and hybrid order), label specific-based, semi-supervised-learning-based, missing and noisy labels-based, and label enhancement-based approaches. We provide a detailed introduction to each method’s common approaches and theories. Additionally, we conduct experimental comparisons on practical multi-label learning datasets to evaluate the advantages and disadvantages of different algorithms. We discuss the application of multi-label feature selection in various domains, such as data mining, computer vision, natural language processing, and bio-informatics. Finally, we outline potential future research directions in multi-label feature selection, including MLFS with online learning, active learning, label distribution learning, partial label learning, granular computing, and class-imbalanced learning.
Deep self-organizing cube: A novel multi-dimensional classifier for multiple output learning
2023, Expert Systems with Applications
Multi-dimensional classification (MDC) task can be considered the most inclusive description of all classifications tasks, as it joins multiple class spaces and their multiple class members into a single compound classification problem. The challenges in MDC arise from the possible class dependencies across different class spaces, as well as the imbalance of labels in training datasets due to lack of all possible combinations. In this paper, we propose a straightforward, yet efficient, MDC deep learning classifier, named Deep Self-Organizing Cube (DSOC) that can model dependencies among classes in multiple class spaces, while consolidating its ability to classify rare combinations of labels. DSOC is formed of two n-dimensional components, namely the Hypercube Classifier and the multiple DSOC Neural Networks connected to the hypercube. The multiple neural networks component is responsible for feature selection and segregation of classes, while the Hypercube classifier is responsible for creating the semantics among multiple class spaces and accommodate the model for rare sample classification. DSOC is a multiple-output learning algorithm that successfully classify samples across all class spaces simultaneously. To challenge the proposed DSOC model, we conducted an assessment on seventeen benchmark datasets in the four types of classification tasks, binary, multi-class, multi-label and multi-dimensional. The obtained results were compared to four standard classifiers and eight competitive state-of-the-art approaches reported in literature. The DSOC has achieved superior performance over standard classifiers as well as the state-of-the-art approaches in all the four classification tasks. Moreover, in terms of Exact Match accuracy metrics, DSOC has outperformed all state-of-the-art approaches in 77.8% of the cases, which reflects the superior ability of DSOC to model dependencies and successfully classify rare samples across all dimensions simultaneously.
Multiple voice disorders in the same individual: Investigating handcrafted features, multi-label classification algorithms, and base-learners
2023, Speech Communication
Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., multiple disorders might exist, demanding multi-label classification procedures for effective diagnoses. Consequently, the objective of this paper is to investigate the application of five multi-label classification methods based on problem transformation to play the role of base-learners, i.e., Label Powerset, Binary Relevance, Nested Stacking, Classifier Chains, and Dependent Binary Relevance with Random Forest (RF) and Support Vector Machine (SVM), in addition to a Deep Neural Network (DNN) from an algorithm adaptation method, to detect multiple voice disorders, i.e., Dysphonia, Laryngitis, Reinke’s Edema, Vox Senilis, and Central Laryngeal Motion Disorder. Receiving as input three handcrafted features, i.e., signal energy (SE), zero-crossing rates (ZCRs), and signal entropy (SH), which allow for interpretable descriptors in terms of speech analysis, production, and perception, we observed that the DNN-based approach powered with SE-based feature vectors presented the best values of F1-score among the tested methods, i.e., 0.943, as the averaged value from all the balancing scenarios, under Saarbrücken Voice Database (SVD) and considering 20% of balancing rate with Synthetic Minority Over-sampling Technique (SMOTE). Finally, our findings of most false negatives for laryngitis may explain the reason why its detection is a serious issue in speech technology. The results we report provide an original contribution, allowing for the consistent detection of multiple speech pathologies and advancing the state-of-the-art in the field of handcrafted acoustic-based non-invasive diagnosis of voice disorders.
StaC: Stacked chaining for multi-label classification
2023, Expert Systems with Applications
Multi-label classification has a fundamental challenge of finding the interdependence of labels. Stacking is one of the most prominent methods for the same. It uses the full dependency, i.e., it utilizes dependence among all the labels. The stacking method uses classifiers in two layers, where the first layer’s classifiers assume no interdependence among labels. It leads to the first layer classifiers being error-prone. Also, it diminishes the performance of the second layer’s classifiers because they are fed with the first layer’s classifiers’ outputs. Further, there is no correlation utilization among the second layer’s classifiers. This paper addresses the poor performance of the first layer’s classifiers and no correlation utilization at the second layer. It proposes a method called Stacked Chaining (StaC) that uses interdependence of labels in a chaining fashion at both layers. This way, it resolves the poor performance of the first layer and also incorporates label correlation at the second layer. The StaC method is compared with other prevalent state-of-the-art methods, including Classifier Chain and Stacking, using ten benchmark multi-label datasets. Experimental results show that the proposed StaC method achieves better performance than other state-of-the-art methods for different performance evaluation parameters.
Research on multi-label user classification of social media based on ML-KNN algorithm
2023, Technological Forecasting and Social Change
Citation Excerpt :
As part of the binary association method, each label is treated as a single label and dichotomies are applied to it. Then the samples are input into multiple dichotomies, respectively, and finally the dichotomies are classified as positive cases as a set of labels for the samples to be predicted (Montañes et al., 2014; Yuan et al., 2018; Hadi and Kusprasapta, 2021). The classifier chain method is based on the number of labels trained by multiple classifiers.
Several research studies have been conducted on multi-label classification algorithms for text and images, but few have been conducted on multi-label classification for users. Moreover, the existing multi-label user classification algorithm does not provide an effective representation of users, and it is difficult to use directly in social media scenarios. By analyzing complex social networks, this paper aims to achieve multi-label classification of users based on research in single-label classification.
Considering the limitations of existing research, this paper proposes a user topic classification method based on heterogeneous networks as well as a user multi-label classification method based on community detection. The model is trained using the ML-KNN multi-label classification algorithm. In actual scenarios, the algorithm is more effective than existing multi-label classification methods when applied to multi-label classification tasks for social media users.
According to the results of the analysis, the algorithm has a high level of accuracy in classifying different theme users into a variety of different scenarios using different theme users. Furthermore, this study contributes to the advancement of classification research by expanding its perspective.

View all citing articles on Scopus

Robin Senge received his M.Sc. degree in Computer Science from the University of Marburg, Germany, in 2006. After graduation, he worked as a software engineer in industry, where he was consigned to consulting and developed software for financial applications. In 2009, he joined the Computational Intelligence Group (CIG) at the University of Marburg as a doctoral student. His research interests are focused on machine learning and fuzzy systems.

Jose Barranquero is a predoctoral researcher at Artificial Intelligence Center (University of Oviedo). He has a M.Sc. in Web Engineering and a M.Sc. in Soft Computing and Intelligent Data Analysis from the University of Oviedo. His research interests include machine learning, quantification, opinion mining and microblogging networks.

José Ramón Quevedo received his M.Sc. and his Ph.D. degrees in Computer Science from the University of Oviedo, Spain, in 1997 and 2000 respectively. He is currently a Senior Lecturer and a member of the Artificial Intelligence Center, Gijón. His current research is focused on applying machine learning methods to bioinformatics applications.

Juan José del Coz received his Ph.D. degree in Computer Science from the University of Oviedo at Gijón, Spain, in 2000. In 1997, he joined the Computer Science Department of the University of Oviedo, where he is currently a Tenured Associate Professor. He has authored over thirty papers in peer reviewed journals and conferences including articles in NIPS, ICML, JMLR and Pattern Recognition.

Eyke Hüllermeier is with the Department of Mathematics and Computer Science at Marburg University (Germany), where he holds an appointment as a full professor and heads the Computational Intelligence Group. In research, he is mostly interested in machine learning, uncertainty in knowledge-based systems, and applications in bioinformatics.

View full text

Dependent binary relevance models for multi-label classification

Highlights

Abstract

Introduction

Section snippets

Multi-label classification

Modeling label dependence

Dependent binary relevance classifiers

Experiments

Conclusions and future work

Conflict of interest

Acknowledgments

Pattern Recognition

Pattern Recognition

Multilabel neural networks with applications to functional genomics and text categorization

IEEE Transactions on Knowledge and Data Engineering

Boostextera boosting-based system for text categorization

Machine Learning

On label dependence and loss minimization in multi-label classification

Machine Learning

Classifier chains for multi-label classification

Machine Learning

Combining instance-based learning and logistic regression for multilabel classification

Machine Learning

Multilabel classification via calibrated label ranking

Machine Learning