Abstract
Semi-supervised learning methods utilize abundant unlabeled data to help to learn a better classifier when the number of labeled instances is very small. A common method is to select and label unlabeled instances that the current classifier has high classification confidence to enlarge the labeled training set and then to update the classifier, which is widely used in two paradigms of semi-supervised learning: self-training and co-training. However, the original labeled instances are more reliable than the self-labeled instances that are labeled by the classifier. If unlabeled instances are assigned wrong labels and then used to update the classifier, classification accuracy will be jeopardized. In this paper, we present a new instance selection method based on the original labeled data (ISBOLD). ISBOLD considers not only the prediction confidence of the current classifier on unlabeled data but also its performance on the original labeled data only. In each iteration, ISBOLD uses the change of accuracy of the newly learned classifier on the original labeled data as a criterion to decide whether the selected most confident unlabeled instances will be accepted to the next iteration or not. We conducted experiments in self-training and co-training scenarios when using Naive Bayes as the base classifier. Experimental results on 26 UCI datasets show that, ISBOLD can significantly improve accuracy and AUC of self-training and co-training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blum, A., Mitchell, T.: Combing labeled and unlabeled data with co-training. In: Proceedings of the 1998 Conference on Computational Learning Theory (1998)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-supervised learning. MIT Press, Cambridge (2006)
Cozman, F.G., Cohen, I.: Unlabeled data can degrade classification performance of generative classifiers. In: Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference (2002)
Guo, Y., Niu, X., Zhang, H.: An extensive empirical study on semi-supervised learning. In: The 10th IEEE International Conference on Data Mining (2010)
Li, M., Zhou, Z.H.: SETRED: self-training with editing. In: Proceedings of the Advances in Knowledge Discovery and Data Mining (2005)
Ling, C.X., Du, J., Zhou, Z.H.: When does co-training work in real data? In: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (2009)
Muslea, I., Minton, S., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: Proceedings of the Nineteenth International Conference on Machine Learning (2002)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (2000)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Wang, B., Spencer, B., Ling, C.X., Zhang, H.: Semi-supervised self-training for sentence subjectivity classification. In: The 21st Canadian Conference on Artificial Intelligence, pp. 344–355 (2008)
Witten, I.H., Frank, E. (eds.): Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Zhu, X.J.: Semi-supervised learning literature survey (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, Y., Zhang, H., Liu, X. (2011). Instance Selection in Semi-supervised Learning. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-21043-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21042-6
Online ISBN: 978-3-642-21043-3
eBook Packages: Computer ScienceComputer Science (R0)