Skip to main content

Advertisement

Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

A theory of learning from different domains

  • Open access
  • Published: 23 October 2009
  • Volume 79, pages 151–175, (2010)
  • Cite this article
Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript
A theory of learning from different domains
Download PDF
  • Shai Ben-David1,
  • John Blitzer2,
  • Koby Crammer3,
  • Alex Kulesza4,
  • Fernando Pereira5 &
  • …
  • Jennifer Wortman Vaughan6 
  • 37k Accesses

  • 2206 Citations

  • 14 Altmetric

  • Explore all metrics

Abstract

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?

We address the first question by bounding a classifier’s target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.

We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.

Article PDF

Download to read the full article text

Similar content being viewed by others

Class-prior estimation for learning from positive and unlabeled data

Article 14 November 2016

Machine Learning

Chapter © 2016

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

Article 02 April 2019

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
  • Artificial Intelligence
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Ando, R., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.

    MathSciNet  Google Scholar 

  • Anthony, M., & Bartlett, P. (1999). Neural network learning: theoretical foundations. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Bartlett, P., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

    Article  MathSciNet  Google Scholar 

  • Batu, T., Fortnow, L., Rubinfeld, R., Smith, W., & White, P. (2000). Testing that distributions are close. In: IEEE symposium on foundations of computer science (Vol. 41, pp. 259–269).

  • Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149–198.

    MATH  MathSciNet  Google Scholar 

  • Ben-David, S., Eiron, N., & Long, P. (2003). On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66, 496–514.

    Article  MATH  MathSciNet  Google Scholar 

  • Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In: Advances in neural information processing systems.

  • Bickel, S., Brückner, M., & Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In: Proceedings of the international conference on machine learning.

  • Bikel, D., Miller, S., Schwartz, R., & Weischedel, R. (1997). Nymble: a high-performance learning name-finder. In: Conference on applied natural language processing.

  • Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2007a). Learning bounds for domain adaptation. In: Advances in neural information processing systems.

  • Blitzer, J., Dredze, M., & Pereira, F. (2007b) Biographies, Bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: ACL.

  • Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania.

  • Cortes, C., Mohri, M., Riley, M., & Rostamizadeh, A. (2008). Sample selection bias correction theory. In: Proceedings of the 19th annual conference on algorithmic learning theory.

  • Crammer, K., Kearns, M., & Wortman, J. (2008). Learning from multiple sources. Journal of Machine Learning Research, 9, 1757–1774.

    MathSciNet  Google Scholar 

  • Dai, W., Yang, Q., Xue, G., & Yu, Y. (2007). Boosting for transfer learning. In: Proceedings of the international conference on machine learning.

  • Das, S., & Chen, M. (2001). Yahoo! for Amazon: extracting market sentiment from stock message boards. In: Proceedings of the Asia pacific finance association annual conference.

  • Daumé, H. (2007). Frustratingly easy domain adaptation. In: Association for computational linguistics (ACL).

  • Finkel, J. R. Manning, C. D. (2009). Hierarchical Bayesian domain adaptation. In: Proceedings of the north American association for computational linguistics.

  • Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–161.

    Article  MATH  MathSciNet  Google Scholar 

  • Huang, J., Smola, A., Gretton, A., Borgwardt, K., & Schoelkopf, B. (2007). Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems.

  • Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation. In: Proceedings of the association for computational linguistics.

  • Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In: Ver large databases.

  • Li, X., & Bilmes, J. (2007). A Bayesian divergence prior for classification adaptation. In: Proceedings of the international conference on artificial intelligence and statistics.

  • Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009a). Domain adaptation with multiple sources. In: Advances in neural information processing systems.

  • Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009b). Multiple source adaptation and the rényi divergence. In: Proceedings of the conference on uncertainty in artificial intelligence.

  • McAllester, D. (2003). Simplified PAC-Bayesian margin bounds. In: Proceedings of the sixteenth annual conference on learning theory.

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of empirical methods in natural language processing.

  • Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In: Proceedings of empirical methods in natural language processing.

  • Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., & Kawanabe, M. (2008). Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60, 699–746.

    Article  MATH  MathSciNet  Google Scholar 

  • Thomas, M., Pang, B., & Lee, L. (2006). Get out the vote: determining support or opposition from congressional floor-debate transcripts. In: Proceedings of empirical methods in natural language processing.

  • Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the association for computational linguistics.

  • Vapnik, V. (1998). Statistical learning theory. New York: Wiley.

    MATH  Google Scholar 

  • Zhang, T. (2004). Solving large-scale linear prediction problems with stochastic gradient descent. In: Proceedings of the international conference on machine learning.

Download references

Author information

Authors and Affiliations

  1. David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada

    Shai Ben-David

  2. Department of Computer Science, UC Berkeley, Berkeley, CA, USA

    John Blitzer

  3. Department of Electrical Engineering, The Technion, Haifa, Israel

    Koby Crammer

  4. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA

    Alex Kulesza

  5. Google Research, Mountain View, CA, USA

    Fernando Pereira

  6. School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

    Jennifer Wortman Vaughan

Authors
  1. Shai Ben-David
    View author publications

    You can also search for this author inPubMed Google Scholar

  2. John Blitzer
    View author publications

    You can also search for this author inPubMed Google Scholar

  3. Koby Crammer
    View author publications

    You can also search for this author inPubMed Google Scholar

  4. Alex Kulesza
    View author publications

    You can also search for this author inPubMed Google Scholar

  5. Fernando Pereira
    View author publications

    You can also search for this author inPubMed Google Scholar

  6. Jennifer Wortman Vaughan
    View author publications

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to John Blitzer.

Additional information

Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.

Preliminary versions of the work contained in this article appeared in Advances in Neural Information Processing Systems (Ben-David et al. 2006; Blitzer et al. 2007a).

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Ben-David, S., Blitzer, J., Crammer, K. et al. A theory of learning from different domains. Mach Learn 79, 151–175 (2010). https://doi.org/10.1007/s10994-009-5152-4

Download citation

  • Received: 28 February 2009

  • Revised: 12 September 2009

  • Accepted: 18 September 2009

  • Published: 23 October 2009

  • Issue Date: May 2010

  • DOI: https://doi.org/10.1007/s10994-009-5152-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Domain adaptation
  • Transfer learning
  • Learning theory
  • Sample-selection bias
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2025 Springer Nature