Abstract
This paper deals with the extraction of semantic relations from scientific texts. Pattern-based representations are compared to word embeddings in unsupervised clustering experiments, according to their potential to discover new types of semantic relations and recognize their instances. The results indicate that sequential pattern mining can significantly improve pattern-based representations, even in a completely unsupervised setting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Entities corresponding to multiword expressions will have their unique vector, since word2vec includes an internal module for recognizing multiword expressions.
References
Banko, M., Cafarella, J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)
Baroni, M., Bernardi, R., Do, N.-Q., Shan, C.-C.: Entailment above the word level in distributional semantics. In: ACL 2012 (2012)
Baroni, M., Dinu, G., Kruszewski, G.: Dont count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL 2014 (2014)
Béchet, N., Cellier, P., Charnois, T., Crémilleux, B.: Discovering linguistic patterns using sequence mining. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, pp. 154–165. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28604-9_13
Bordea, G., Buitelaar, P., Polajnar, T.: Domain-independent term extraction through domain modelling. In: TIA 2013 (2013)
Chavalarias, D., Cointet, J.-P.: Phylomemetic patterns in science evolution - the rise and fall of scientific fields. PLOS ONE 8(2), e54847 (2013)
Daille, B.: Building bilingual terminologies from comparable corpora: the TTC termsuite. In: 5th Workshop on Building and Using Comparable Corpora, Co-located with LREC, pp. 39–32 (2012)
Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: International Conference on World Wide Web, WWW 2013 (2013)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP 2011 (2011)
Ferret, O.: Typing relations in distributional thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds.) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol. 48, pp. 113–134. Springer, Heidelberg (2015)
Gábor, K., Zargayouna, H., Buscaldi, D., Tellier, I., Charnois, T.: Semantic annotation of the acl anthology corpus for the automatic analysis of scientific literature. In: LREC 2016, Portoroz, Slovenia (2016, in press)
Gábor, K., Zargayouna, H., Tellier, I., Buscaldi, D., Charnois, T.: A typology of semantic relations dedicated to scientific literature analysis. In: SAVE-SD Workshop at the 25th World Wide Web Conference (2016)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992, pp. 539–545 (1992)
Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, D., Séaghdha, P.O., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations (2010)
Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)
Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), pp. 624–639. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87479-9_59
Korhonen, A., Krymolowski, Y., Collier, N.: The choice of features for classification of verbs in biomedical texts. In: COLING (2008)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. ACL 3, 211–225 (2015)
Levy, O., Remus, S., Biemannm, C., Dagan, I.: Do supervised distributional methods really learn lexical inference relations? In: ACL 2015 (2015)
Lin, D., Pantel, P.: Dirt: discovery of inference rules from text. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2001)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL (2013)
Min, B., Shi, S., Grishman, R., Lin, C.-Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: EMNLP 2012 (2012)
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Omodei, E., Cointet, J.-P., Poibeau, T.: Mapping the natural language processing domain: experiments using the acl anthology. In: LREC 2014 (2014)
Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), pp. 134–166. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20795-2_6
Presutti, V., Consoli, S., Nuzzolese, A.G., Recupero, D.R., Gangemi, A., Bannour, I., Zargayouna, H.: Uncovering the semantics of wikipedia pagelinks. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), pp. 413–428. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13704-9_32
Radev, D.R., Muthukrishnan, P., Qazvinian, V.: The ACL anthology network corpus. In: ACL Workshop on Text and Citation Analysis for Scholarly Digital Libraries (2009)
Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web (2015)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, pp. 1–17. Springer, Heidelberg (1996). doi:10.1007/BFb0014140
Turney, P.D.: Similarity of semantic relations. CoRR, abs/cs/0608100 (2006)
Weeds, J., Clarke, D., Reffin, J., Weir, D., Keller, B.: Learning to distinguish hypernyms and co-hyponyms. In: COLING 2014 (2014)
Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: COLING 2002 (2002)
Yao, L., Haghighi, A., Riedel, S., McCallum, A.: Structured relation discovery using generative models. In: EMNLP 2011 (2011)
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM (2002)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10, 141–168 (2005)
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL 2005 (2005)
Acknowledgments
This work is part of the program “Investissements d’Avenir” overseen by the French National Research Agency, ANR-10-LABX-0083 (Labex EFL). The authors would like to thank the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Gábor, K., Zargayouna, H., Tellier, I., Buscaldi, D., Charnois, T. (2016). Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)