Abstract
Multi-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real-world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labeled examples in the training dataset, a multi-labeled method extracts inherent information to output a function that predicts the labels of unlabeled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting the edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach.
Similar content being viewed by others
References
Barbedo JGA, Lopes A (2007) Automatic genre classification of musical signals. EURASIP J Adv Signal Process 2007(1):064960 (2007). doi:10.1155/2007/64960
Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: 15th international conference on machine learning, pp 55–63. Morgan Kaufmann, London
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172. doi:10.1023/A:1014043630878
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167. doi:10.1613/jair.606
de Carvalho A, Freitas AA (2009) A tutorial on multi-label classification techniques. In: Foundations of computational intelligence. Studies in computational intelligence, vol 5, pp 177–195. Springer, Berlin (2009). doi:10.1007/978-3-642-01536-6_8
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD ’01), vol 2168, pp 42–53. Springer-Verlag, London (2001)
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. doi:10.1109/TIT.1967.1053964
Dasarathy BV (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, London. doi:10.1109/2.84880
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer Theory. IEEE Trans Syst Man Cybern 25(05):804–813
Denoeux T, Younes Z, Abdallah F (2010) Representing uncertainty on set-valued variables using belief functions. Artif Intell 174(7–8):479–499. doi:10.1016/j.artint.2010.02.002
Devijver PA (1986) On the editing rate of the Multiedit algorithm. Pattern Recogn Lett 4(1):9–12. doi:10.1016/0167-8655(86)90066-8
Elisseeff A, Weston J (2001) Kernel methods for multi-labelled classification and categorical regression problems. In: Advances in neural information processing systems, vol 14, pp 681–687. Biowulf Technologies, MIT Press, New York (2001)
García S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Analysis Mach Intell 34(3):417–435. doi:10.1109/TPAMI.2011.142
Garcia S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
Guan D, Yuan W, Lee YK, Lee S (2009) Nearest neighbor editing aided by unlabeled data. Inf Sci 179(13):2273–2282. doi:10.1016/j.ins.2009.02.011
Hattori K, Takahashi M (2000) A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recogn 33(3):521–528. doi:10.1016/S0031-3203(99)00068-0
Jin B, Muller B, Zhai C, Lu X (2008) Multi-label literature classification based on the gene ontology graph. BMC Bioinform 9:525. doi:10.1186/1471-2105-9-525
Kanj S, Abdallah F, Denoeux T (2012) Purifying training data to improve performance of multi-label classification algorithms. In: Proceedings of the 15th international conference on information fusion (FUSION 2012), pp 1784–1792. IEEE, Singapore (2012)
Koplowitz J, Brown TA (1981) On the relation of performance to editing in nearest neighbor rules. Pattern Recogn 13(3):251–255. doi:10.1016/0031-3203(81)90102-3
Li Y, Hu Z, Cai Y, Zhang W (2005) Support vector based prototype selection method for nearest neighbor rules. In: Proceedings of the first international conference on advances in natural computation, pp 528–535. Springer, Berlin. doi10.1007/11539087_68
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recogn 45(9):3084–3104. doi:10.1016/j.patcog.2012.03.004
Pavlidis P, Grundy WN (2000) Combining microarray expression data and phylogenetic profiles to learn gene functional categories using support vector machines. In: Technical report, Department of Computer Science, Columbia University, New York
Pestian JP, Brew C, Matykiewicz P, Hovermale DJ, Johnson N, Cohen KB, Duch W (2007) A shared task involving multi-label classification of clinical free text. In: Proceedings of the workshop on BioNLP 2007: biological, vol 1. Translational, and clinical language processing (BioNLP ’07), pp 97–104. Association for Computational Linguistics, Prague
Pkalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recogn 39(2):189–208. doi:10.1016/j.patcog.2005.06.012
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia—MULTIMEDIA ’07, p 17. ACM Press, Augsburg. doi:10.1145/1291233.1291245
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359. doi:10.1007/s10994-011-5256-5
Sánchez J, Pla F, Ferri F (1997) Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recogn Lett 18(6):507–513. doi:10.1016/S0167-8655(97)00035-4
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336. doi:10.1023/A:1007614523901
Schapire RE, Singer Y (2000) BoosTexter : a boosting-based system for text categorization. Mach Learn 39(2–3):135–168. doi:10.1023/A:1007649029923
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47. doi:10.1145/505282.505283
Shetty J, Adibi J (2004) The enron email dataset database schema and brief statistical report. In: Technical report, Information Sciences Institute, University of Southern California
Tahir MA, Kittler J, Bouridane A (2012) Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn Lett 33(5):513–523. doi:10.1016/j.patrec.2011.10.019
Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on world wide web (WWW ’09), p 211. ACM Press, Madrid. doi:10.1145/1526709.1526738
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybern 6(11):769–772. doi:10.1109/TSMC.1976.4309452
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I (2008) Multi-label classification of music into emotions. In: Proceedings of the 9th international conference on music information retrieval (ISMIR ’08), pp 325–330, Philadelphia
Tsoumakas G, Katakis I (2007) Multi-label classification : an overview. Int J Data Wareh Min 3(3):1–13
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, pp 667–685. Springer, Thessaloniki. DOI:10.1007/978-0-387-09823-4_34
Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng 23(7):1079–1089. doi:10.1109/TKDE.2010.164
Van Hulse J, Khoshgoftaar T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68(12):1513–1542. doi:10.1016/j.datak.2009.08.005
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421. doi:10.1109/TSMC.1972.4309137
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37. doi:10.1007/s10115-007-0114-2
Xu J (2011) An extended one-versus-rest support vector machine for multi-label classification. Neurocomputing 74(17):3114–3124. doi:10.1016/j.neucom.2011.04.024
Xu J (2012) An efficient multi-label support vector machine with a zero label. Expert Syst Appl 39(5):4796–4804. doi:10.1016/j.eswa.2011.09.138
Younes Z, Abdallah F, Denoeux T, Snoussi H (2011) A dependent multilabel classification method derived from the k-nearest neighbor rule. EURASIP J Adv Signal Process 2011(1):645964. doi:10.1155/2011/645964
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351 (2006). doi:10.1109/TKDE.2006.162
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048. doi:10.1016/j.patcog.2006.12.019
Zhou ZH, Zhang ML (2007) Multi-instance multi-label learning with application to scene classification. In: 22nd conference on artificial intelligence (AAAI ), vol 40, pp 1609–1616. MIT Press, Vancouver (2007). doi:10.1016/j.patcog.2006.12.019
Zouhal L, Denoeux T (1998) An evidence-theoretic k-NN rule with parameter optimization. IEEE Trans Syst Man Cybern Part C (applications and reviews) 28(2):263–271 (1998). doi:10.1109/5326.669565
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Evaluation measures
As discussed in Sect. 3.2, performance evaluation for multi-label learning systems differs from that of single-label classification. Let \(\mathcal {H}:\mathbb {X}\rightarrow 2^\mathcal {Y}\) be a multi-label classifier that assigns a predicted label subset of \(\mathcal {Y}=\{\omega _1,\ldots ,\omega _{Q}\}\) to each instance \(\mathbf {x}\in \mathbb {X}\), and let \(f:\mathbb {X}\times \mathcal {Y}\rightarrow [0,1]\) be the corresponding scoring function which gives a score for each label \(\omega _q\) which in turn is interpreted as the probability that \(\omega _q\) is relevant. The function \(f(.,.)\) can be transformed to a ranking function \({\rm{rank}}_f(.,.)\) which maps the outputs of \(f(\mathbf {x},\omega )\) for any \(\omega \in \mathcal {Y}\) to \(\{\omega _1,\omega _2,\ldots ,\omega _Q\}\) so that \(f(\mathbf {x}_i,\omega _q)>f(\mathbf {x}_i,\omega _r)\) implies that \({\rm{rank}}_f(\mathbf {x}_i,\omega _q)<{\rm{rank}}_f(\mathbf {x}_i,\omega _r)\).
Given a set \(\mathcal {S}=\{(\mathbf {x}_1,Y_1),\ldots ,(\mathbf {x}_m,Y_m)\}\) of \(m\) test examples, the evaluation metrics of multi-label learning systems are divided into two groups: prediction-based and ranking-based metrics. Prediction-based measures are calculated based on the average difference of the actual and the predicted set of labels over all test examples. Ranking-based metrics evaluate the label ranking quality depending on the scoring function \(f(.,.)\).
1.1 Prediction-based measures
Hamming loss: The Hamming loss metric for the set of labels is defined as the fraction of labels whose relevance is incorrectly predicted:
where \(\triangle\) denotes the symmetric difference between two sets.
Accuracy: The accuracy metric gives an average degree of similarity between the predicted and the ground truth label sets:
Precision: The precision metric computes the proportion of true positive predictions:
Recall: This metric estimates the proportion of true labels that have been predicted as positives:
F1-measure: F1 measure is defined as the harmonic mean of precision and recall. It is calculated as:
Note that the smaller the value of the Hamming loss, the better the performance. For the other metrics, higher values correspond to better classification quality.
1.2 Ranking-based measures
One-Error: This metric computes how many times the top-ranked label is not in the true set of labels of the instance, and it ignores the relevancy of all other labels.
where for any proposition \(H\), \(\langle H \rangle\) equals to 1 if \(H\) holds and 0 otherwise. Note that, for single-label classification problems, the One-Error is identical to ordinary classification error.
Coverage: Coverage computes the average of how far we need to move down the ranked label list to cover all the labels assigned to a test instance.
Ranking loss: This metric computes the number of times that an incorrect label is ranked higher than a correct label.
where \(\overline{Y}_i\) is the complementary set of \(Y_i\) in \(\mathcal {Y}\).
Average precision: This metric evaluates the average fraction of labels ranked above a particular label \(\omega \in Y_i\) which are actually in \(Y_i\).
Note that AvPrec\((f, \mathcal {S}) = 1\) means that the labels are perfectly ranked. For the other metrics, smaller values correspond to a better label ranking quality.
Appendix 2: Multi-labeled dataset statistics
Given a multi-labeled dataset \(\mathcal {D}=\{(\mathbf {x}_i,Y_i),i=1,\ldots ,n \}\) with \(\mathbf {x}_i \in \mathbb {X}\) and \(Y_i \subseteq \mathcal {Y}\), this dataset can be measured by the number of instances (\(n\)), the number of attributes in the input space, and the number of labels (\(Q\)). In the following, we review some statistics about the multi-labeled dataset \(\mathcal {D}\) [36].
Label cardinality: The label cardinality (LCard) of \(\mathcal {D}\) is the average number of labels per instance. Label cardinality is calculated asFootnote 4
Label density: The label density (LDen) of \(\mathcal {D}\) is defined as the average number of labels per instance divided by the total number of labels \(Q\). Label density is calculated as:
Both metrics indicate the number of alternative labels that characterize the examples of a multi-labeled dataset. Label cardinality is independent of the total number of labels in the classification problem, while label density takes into consideration the total number of labels. Two datasets with the same label cardinality but with different label densities may present different properties that influence the performance of the multi-label classification methods.
Distinct label sets: The distinct label sets (DL) count the number of label sets that are unique across the total number of examples. Distinct label sets are given by:
This measure gives an idea of the regularity of the labeling scheme.
Rights and permissions
About this article
Cite this article
Kanj, S., Abdallah, F., Denœux, T. et al. Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Anal Applic 19, 145–161 (2016). https://doi.org/10.1007/s10044-015-0452-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0452-8