Abstract
Lexical paraphrasing is an inherently context sensitive problem because a word’s meaning depends on context. Most paraphrasing work finds patterns and templates that can replace other patterns or templates in some context, but we are attempting to make decisions for a specific context. In this paper we develop a global classifier that takes a word v and its context, along with a candidate word u, and determines whether u can replace v in the given context while maintaining the original meaning.
We develop an unsupervised, bootstrapped, learning approach to this problem. Key to our approach is the use of a very large amount of unlabeled data to derive a reliable supervision signal that is then used to train a supervised learning algorithm. We demonstrate that our approach performs significantly better than state-of-the-art paraphrasing approaches, and generalizes well to unseen pairs of words.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barzilay, R., Lee, L.: Catching the drift: Probabilistic content models, with applications to generation and summarization. In: Proceedings HLT-NAACL (2004)
Kauchak, D., Barzilay, R.: Paraphrasing for automatic evaluation. In: Proceedings of HLT-NAACL 2006 (2006)
Dagan, I., Glickman, O., Magnini, B.: The pascal recognizing textual entailment challenge. In: Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment (2005)
de Salvo Braz, R., Girju, R., Punyakanok, V., Roth, D., Sammons, M.: An inference model for semantic entailment in natural language. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1678–1679 (2005)
Ide, N., Veronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics (1998)
Barzilay, R., Lee, L.: Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In: Proceedings HLT-NAACL, pp. 16–23 (2003)
Barzilay, R., McKeown, K.: Extracing paraphrases from a parallel corpus. In: Proceedings ACL-01 (2004)
Glickman, O., Dagan, I.: Identifying lexical paraphrases from a single corpus: A case study for verbs. In: Recent Advantages in Natural Language Processing (RANLP-2003) (2003)
Szpektor, I., Tanev, H., Dagan, I., Coppola, B.: Scaling web-based acquisition of entailment relations. In: Proceedings of EMNLP 2004 (2004)
Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 7(4), 343–360 (2001)
Dagan, I., Glickman, O., Gliozzo, A., Marmorshtein, E., Strapparava, C.: Direct word sense matching for lexical substitution. In: Proceedings ACL-2006, pp. 449–456 (2007)
Lin, D.: Principal-based parsing without overgeneration. In: Proceedings of ACL-1993, pp. 112–120 (1993)
Golding, A.R., Roth, D.: A Winnow based approach to context-sensitive spelling correction. Machine Learning 34(1-3), 107–130 (1999)
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL-1998 (1998)
Carlson, A., Cumby, C., Rosen, J., Roth, D.: The SNoW learning architecture. Technical Report UIUCDCS-R-99-2101, UIUC Computer Science Department (May 1999)
Fellbaum, C.: Wordnet: An Electronic Lexical Database. Bradford Books (1998)
Navigli, R.: Meaningful clustering of senses helps boost word sense disambiguation performance. In: Proceedings of COLING-ACL 2006 (2006)
Landis, J., Koch, G.: The measurement of observer agreement for categorical data. In: Biometrics (1977)
Szpektor, I., Shnarch, E., Dagan, I.: Instance-based evaluation of entailment rule acquisition. In: Proceedings of ACL 2007 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Connor, M., Roth, D. (2007). Context Sensitive Paraphrasing with a Global Unsupervised Classifier. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)