Domain Adaptation for Text Categorization by Feature Labeling

Kadar, Cristina; Iria, José

doi:10.1007/978-3-642-20161-5_42

Cristina Kadar²¹ &
José Iria²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

European Conference on Information Retrieval

6755 Accesses
3 Citations

Abstract

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrzejewski, D., Zhu, X.: Latent Dirichlet Allocation with Topic-in-Set Knowledge. In: NAACL-SSLNLP (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning 3, 993–1022 (2003)
MATH Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP (2006)
Google Scholar
Chen, B., Lam, W., Tsang, I., Wong, T.L.: Extracting discriminative concepts for domain adaptation in text mining. In: KDD (2009)
Google Scholar
Daume III., H.: Frustratingly easy domain adaptation. In: ACL (2007)
Google Scholar
Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR (2008)
Google Scholar
Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: NAACL (2009)
Google Scholar
Guo, H., Zhu, H., Guo, Z., Zhang, X., Wu, X., Su, Z.: Domain adaptation with latent semantic association for named entity recognition. In: NAACL (2009)
Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in nlp. In: ACL (2007)
Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: ICML (1999)
Google Scholar
Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Lang, K.: NewsWeeder: Learning to Filter Netnews. In: ICML (1995)
Google Scholar
Mann, G.S., McCallum, A.: Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. Journal of Machine Learning 11, 955–984 (2010)
MathSciNet MATH Google Scholar
Pan, S.J., Kwok, J.T., Yang, Q.: Transfer learning via dimensionality reduction. In: AAAI (2008)
Google Scholar
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. In: IJCAI (2009)
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS (2006)
Google Scholar
Schapire, R., Rochery, M., Rahim, M., Gupta, N.: Incorporating prior knowledge into boosting. In: ICML (2002)
Google Scholar
Ni, X., Xue, G.-R., Ling, X., Yu, Y., Yang, Q.: Exploring in the Weblog Space by Detecting Informative and Affective Articles. In: WWW (2007)
Google Scholar
Xue, G., Dai, W., Yang, Q., Yu, Y.: Topic-bridged PLSA for cross-domain text classification. In: SIGIR (2008)
Google Scholar
Xu, G., Yang, S.-H., Li, H.: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation. In: KDD (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research Zurich, Säumerstrasse 4, CH-8804, Rüschlikon, Switzerland
Cristina Kadar & José Iria

Authors

Cristina Kadar
View author publications
You can also search for this author in PubMed Google Scholar
José Iria
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information School, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Paul Clough
CLARITY: Centre for Sensor Web Technologies, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Colum Foley , Cathal Gurrin & Hyowon Lee , &
Centre for Next Generation Localisation, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Gareth J. F. Jones
TNO Human Factors, Brassersplein 2, 2612 CT, Delft, The Netherlands
Wessel Kraaij
Yahoo! Research, 177 Diagonal, 08018, Barcelona, Spain
Vanessa Mudoch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kadar, C., Iria, J. (2011). Domain Adaptation for Text Categorization by Feature Labeling. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-20161-5_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics