Skip to main content

Domain Adaptation for Text Categorization by Feature Labeling

  • Conference paper
Advances in Information Retrieval (ECIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

Abstract

We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrzejewski, D., Zhu, X.: Latent Dirichlet Allocation with Topic-in-Set Knowledge. In: NAACL-SSLNLP (2009)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP (2006)

    Google Scholar 

  4. Chen, B., Lam, W., Tsang, I., Wong, T.L.: Extracting discriminative concepts for domain adaptation in text mining. In: KDD (2009)

    Google Scholar 

  5. Daume III., H.: Frustratingly easy domain adaptation. In: ACL (2007)

    Google Scholar 

  6. Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR (2008)

    Google Scholar 

  7. Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: NAACL (2009)

    Google Scholar 

  8. Guo, H., Zhu, H., Guo, Z., Zhang, X., Wu, X., Su, Z.: Domain adaptation with latent semantic association for named entity recognition. In: NAACL (2009)

    Google Scholar 

  9. Jiang, J., Zhai, C.: Instance weighting for domain adaptation in nlp. In: ACL (2007)

    Google Scholar 

  10. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: ICML (1999)

    Google Scholar 

  11. Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lang, K.: NewsWeeder: Learning to Filter Netnews. In: ICML (1995)

    Google Scholar 

  13. Mann, G.S., McCallum, A.: Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. Journal of Machine Learning 11, 955–984 (2010)

    MathSciNet  MATH  Google Scholar 

  14. Pan, S.J., Kwok, J.T., Yang, Q.: Transfer learning via dimensionality reduction. In: AAAI (2008)

    Google Scholar 

  15. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. In: IJCAI (2009)

    Google Scholar 

  16. Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS (2006)

    Google Scholar 

  17. Schapire, R., Rochery, M., Rahim, M., Gupta, N.: Incorporating prior knowledge into boosting. In: ICML (2002)

    Google Scholar 

  18. Ni, X., Xue, G.-R., Ling, X., Yu, Y., Yang, Q.: Exploring in the Weblog Space by Detecting Informative and Affective Articles. In: WWW (2007)

    Google Scholar 

  19. Xue, G., Dai, W., Yang, Q., Yu, Y.: Topic-bridged PLSA for cross-domain text classification. In: SIGIR (2008)

    Google Scholar 

  20. Xu, G., Yang, S.-H., Li, H.: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation. In: KDD (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kadar, C., Iria, J. (2011). Domain Adaptation for Text Categorization by Feature Labeling. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics