Skip to main content

Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XV (IDA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Included in the following conference series:

Abstract

This paper deals with the extraction of semantic relations from scientific texts. Pattern-based representations are compared to word embeddings in unsupervised clustering experiments, according to their potential to discover new types of semantic relations and recognize their instances. The results indicate that sequential pattern mining can significantly improve pattern-based representations, even in a completely unsupervised setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Entities corresponding to multiword expressions will have their unique vector, since word2vec includes an internal module for recognizing multiword expressions.

References

  1. Banko, M., Cafarella, J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)

    Google Scholar 

  2. Baroni, M., Bernardi, R., Do, N.-Q., Shan, C.-C.: Entailment above the word level in distributional semantics. In: ACL 2012 (2012)

    Google Scholar 

  3. Baroni, M., Dinu, G., Kruszewski, G.: Dont count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL 2014 (2014)

    Google Scholar 

  4. Béchet, N., Cellier, P., Charnois, T., Crémilleux, B.: Discovering linguistic patterns using sequence mining. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, pp. 154–165. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28604-9_13

    Chapter  Google Scholar 

  5. Bordea, G., Buitelaar, P., Polajnar, T.: Domain-independent term extraction through domain modelling. In: TIA 2013 (2013)

    Google Scholar 

  6. Chavalarias, D., Cointet, J.-P.: Phylomemetic patterns in science evolution - the rise and fall of scientific fields. PLOS ONE 8(2), e54847 (2013)

    Article  Google Scholar 

  7. Daille, B.: Building bilingual terminologies from comparable corpora: the TTC termsuite. In: 5th Workshop on Building and Using Comparable Corpora, Co-located with LREC, pp. 39–32 (2012)

    Google Scholar 

  8. Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: International Conference on World Wide Web, WWW 2013 (2013)

    Google Scholar 

  9. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP 2011 (2011)

    Google Scholar 

  10. Ferret, O.: Typing relations in distributional thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds.) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol. 48, pp. 113–134. Springer, Heidelberg (2015)

    Google Scholar 

  11. Gábor, K., Zargayouna, H., Buscaldi, D., Tellier, I., Charnois, T.: Semantic annotation of the acl anthology corpus for the automatic analysis of scientific literature. In: LREC 2016, Portoroz, Slovenia (2016, in press)

    Google Scholar 

  12. Gábor, K., Zargayouna, H., Tellier, I., Buscaldi, D., Charnois, T.: A typology of semantic relations dedicated to scientific literature analysis. In: SAVE-SD Workshop at the 25th World Wide Web Conference (2016)

    Google Scholar 

  13. Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992, pp. 539–545 (1992)

    Google Scholar 

  14. Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, D., Séaghdha, P.O., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations (2010)

    Google Scholar 

  15. Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)

    Google Scholar 

  16. Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), pp. 624–639. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87479-9_59

    Chapter  Google Scholar 

  17. Korhonen, A., Krymolowski, Y., Collier, N.: The choice of features for classification of verbs in biomedical texts. In: COLING (2008)

    Google Scholar 

  18. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. ACL 3, 211–225 (2015)

    Google Scholar 

  19. Levy, O., Remus, S., Biemannm, C., Dagan, I.: Do supervised distributional methods really learn lexical inference relations? In: ACL 2015 (2015)

    Google Scholar 

  20. Lin, D., Pantel, P.: Dirt: discovery of inference rules from text. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2001)

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  23. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL (2013)

    Google Scholar 

  24. Min, B., Shi, S., Grishman, R., Lin, C.-Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: EMNLP 2012 (2012)

    Google Scholar 

  25. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. Omodei, E., Cointet, J.-P., Poibeau, T.: Mapping the natural language processing domain: experiments using the acl anthology. In: LREC 2014 (2014)

    Google Scholar 

  27. Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), pp. 134–166. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20795-2_6

    Chapter  Google Scholar 

  28. Presutti, V., Consoli, S., Nuzzolese, A.G., Recupero, D.R., Gangemi, A., Bannour, I., Zargayouna, H.: Uncovering the semantics of wikipedia pagelinks. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), pp. 413–428. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13704-9_32

    Google Scholar 

  29. Radev, D.R., Muthukrishnan, P., Qazvinian, V.: The ACL anthology network corpus. In: ACL Workshop on Text and Citation Analysis for Scholarly Digital Libraries (2009)

    Google Scholar 

  30. Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web (2015)

    Google Scholar 

  31. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, pp. 1–17. Springer, Heidelberg (1996). doi:10.1007/BFb0014140

    Google Scholar 

  32. Turney, P.D.: Similarity of semantic relations. CoRR, abs/cs/0608100 (2006)

    Google Scholar 

  33. Weeds, J., Clarke, D., Reffin, J., Weir, D., Keller, B.: Learning to distinguish hypernyms and co-hyponyms. In: COLING 2014 (2014)

    Google Scholar 

  34. Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: COLING 2002 (2002)

    Google Scholar 

  35. Yao, L., Haghighi, A., Riedel, S., McCallum, A.: Structured relation discovery using generative models. In: EMNLP 2011 (2011)

    Google Scholar 

  36. Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM (2002)

    Google Scholar 

  37. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10, 141–168 (2005)

    Article  MathSciNet  Google Scholar 

  38. Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL 2005 (2005)

    Google Scholar 

Download references

Acknowledgments

This work is part of the program “Investissements d’Avenir” overseen by the French National Research Agency, ANR-10-LABX-0083 (Labex EFL). The authors would like to thank the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kata Gábor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Gábor, K., Zargayouna, H., Tellier, I., Buscaldi, D., Charnois, T. (2016). Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46349-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46348-3

  • Online ISBN: 978-3-319-46349-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics