research-article

How to select information that matters: a comparative study on active learning strategies for classification

Authors:
Christian Beyer

University Magdeburg, Magdeburg, Germany

University Magdeburg, Magdeburg, Germany
View Profile

,
Georg Krempl

Univ. Magdeburg, Magdeburg, Germany

Univ. Magdeburg, Magdeburg, Germany
View Profile

,
Vincent Lemaire

Orange Labs, Lannion, France

Orange Labs, Lannion, France
View Profile

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven BusinessOctober 2015Article No.: 2Pages 1–8https://doi.org/10.1145/2809563.2809594

Published:21 October 2015Publication History

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

Pages 1–8

ABSTRACT

Facing ever increasing volumes of data but limited human annotation capabilities, active learning strategies for selecting the most informative labels gain in importance. However, the choice of an appropriate active learning strategy itself is a complex task that requires to consider different criteria such as the informativeness of the selected labels, the versatility with respect to classification algorithms, or the processing speed. This raises the question, which combinations of active learning strategies and classification algorithms are the most promising to apply. A general answer to this question, without application-specific, label-intensive experiments on each dataset, is highly desirable, as active learning is applied in situations with limited labelled data. Therefore, this paper studies several combinations of different active learning strategies and classification algorithms and evaluates them in a series of comparative experiments.

References

A. Asuncion and D. J. Newman. UCI machine learning repository, 2015.Google Scholar
J. Attenberg, P. Melville, F. Provost, and M. Saar-Tsechansky. Selective Data Acquisition for Machine Learning, chapter 5. CRC Press, Inc., 2011.Google Scholar
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. Moa: Massive online analysis. The Journal of Machine Learning Research, 11:1601--1604, 2010. Google ScholarDigital Library
O. Chapelle. Active learning for parzen window classifier. In Proc. of the 10th Int. Workshop on AI and Statistics, pages 49--56, 2005.Google Scholar
O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-supervised Learning. MIT Press, 2006.Google ScholarDigital Library
D. Cohn. Active learning. In C. Sammut and G. I. Webb, editors, Encyclopedia of Machine Learning, pages 10--14. Springer, 2010.Google Scholar
P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 6th ACM SIGKDD int. conf. on Knowledge discovery and data mining (KDD00), pages 71--80. ACM, 2000. Google ScholarDigital Library
Y. Fu, X. Zhu, and B. Li. A survey on instance selection for active learning. Knowledge and Information Systems, 35(2):249--283, 2012.Google ScholarCross Ref
J. Gantz and D. Reinsel. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east, December 2012.Google Scholar
Y. Guo and D. Schuurmans. Discriminative batch mode active learning. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems (NIPS2007), pages 593--600, 2007.Google Scholar
I. Guyon, G. Cawley, G. Dror, V. Lemaire, and A. Statnikov, editors. Active Learning Challenge, volume 6 of Challenges in Machine Learning. Microtome Publishing, 2011.Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
G. Krempl, D. Kottke, and V. Lemaire. Optimised probabilistic active learning (OPAL) for fast, non-myopic, cost-sensitive active classification. Machine Learning, 2015. Google ScholarDigital Library
G. Krempl, D. Kottke, and M. Spiliopoulou. Probabilistic active learning: Towards combining versatility, optimality and efficiency. In S. Dzeroski, P. Panov, D. Kocev, and L. Todorovski, editors, Proc. of the 17th Int. Conf. on Discovery Science (DS), volume 8777 of LNCS, pages 168--179. Springer, 2014.Google Scholar
G. Krempl, I. Zliobaitė, D. Brzeziński, E. Hüllermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, and J. Stefanowski. Open challenges for data stream mining research. SIGKDD Explorations, 16(1):1--10, 2014. Google ScholarDigital Library
L. Lan, H. Shi, Z. Wang, and S. Vucetic. Active learning based on parzen window. Journal of Machine Learning Research, 16:99--112, 2011.Google Scholar
D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proc. of the 17th annual int. ACM SIGIR conf. on Research and development in information retrieval, SIGIR '94, pages 3--12, 1994. Google ScholarDigital Library
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in Fortran 77: The Art of Scientific Computing. Cambridge University Press, 2 edition, 1992. Google ScholarDigital Library
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. of the 18th Int. Conf. on Machine Learning, ICML 2001, pages 441--448, 2001. Google ScholarDigital Library
A. I. Schein and L. H. Ungar. Active learning for logistic regression: an evaluation. Machine Learning, 68(3):235--265, 2007. Google ScholarDigital Library
C. Seifert and M. Granitzer. User-based active learning. In Proc. of 10th Int. Conf. on Data Mining Workshops (ICDMW2010), pages 418--425, 2010. Google ScholarDigital Library
B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, Madison, Wisconsin, USA, 2009.Google Scholar
B. Settles. Active Learning. Number 18 in Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, 2012.Google Scholar
K. Tomanek and K. Morik. Inspecting sample reusability for active learning. In I. Guyon, G. C. Cawley, G. Dror, V. Lemaire, and A. R. Statnikov, editors, AISTATS workshop on Active Learning and Experimental Design, volume 16, pages 169--181. JMLR.org, 2011.Google Scholar
J. Zhu, H. Wang, B. K. Tsou, and M. Y. Ma. Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. on Audio, Speech & Language Processing, 18(6):1323--1331, 2010. Google ScholarDigital Library
J. Zhu, H. Wang, T. Yao, and B. K. Tsou. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In D. Scott and H. Uszkoreit, editors, 22nd Int. Conf. on Computational Linguistics (COLING 2008), pages 1137--1144, 2008. Google ScholarDigital Library
I. Zliobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, PP(99), 2013.Google Scholar

Index Terms

How to select information that matters: a comparative study on active learning strategies for classification
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Active learning

Recommendations

Active Learning for kNN Using Instance Impact
AI 2022: Advances in Artificial Intelligence
Abstract
Labelling unlabeled data is a time-consuming and expensive process. Labelling initiatives should select samples that are likely to enhance the classification accuracy of the classifier. Several methods can be employed to accomplish this goal. One ...
Read More
Active Sampling for Class Probability Estimation and Ranking

In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very ...
Read More
Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business
October 2015
314 pages
ISBN:9781450337212
DOI:10.1145/2809563
General Chairs:
Stefanie Lindstaedt
Know-Center Graz, Austria & Graz University of Technology
,
Tobias Ley
Tallin University, Estonia
,
Harald Sack
Hasso-Platter Institute for IT Systems Engineering, Germany
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
probabilistic active learning
selective sampling
uncertainty sampling
Qualifiers
- research-article
Conference

Acceptance Rates
i-KNOW '15 Paper Acceptance Rate25of78submissions,32%Overall Acceptance Rate77of238submissions,32%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 146
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How to select information that matters: a comparative study on active learning strategies for classification

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active Learning for kNN Using Instance Impact

Active Sampling for Class Probability Estimation and Ranking

Compression-Based Selective Sampling for Learning to Rank

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

How to select information that matters: a comparative study on active learning strategies for classification

i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

ABSTRACT

References

Cited By

Index Terms

Recommendations

Active Learning for kNN Using Instance Impact

Active Sampling for Class Probability Estimation and Ranking

Compression-Based Selective Sampling for Learning to Rank

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media