Abstract
The continuous development of the information technology leads to the explosive growth of many information domains. Obtaining the required information from a large-scale text in a quick and accurate way has become a great challenge. Keyword extraction is a kind of effective method to solve these problems. It is one of the core technologies in the research area of text mining, and plays a very important role. Currently, the keywords of most text information have not been provided. Some keywords of a text are not contained in the text content. There is not any elegant solution, offered by the existing algorithms, for this problem yet. To solve it, this paper proposes a keyword extraction method based on word vectors. The concept of a text turns into computer understandable space by training word vectors using a word2vec algorithm. This method trains all the words and keywords which appear in the text into vector sets through the word2vec training method, and then the words in the test text will be replaced by word term vectors. The Euclidean distances between every candidate words and every text words are calculated to find out the top-N-closest keywords as the automatic text extraction keywords. The experiment uses computer field papers as a training text. The results show that the method can improve the accuracy of the phrase keyword extraction and find the keywords not appearing in the text.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mihalcea, T.P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining: Theory and Applications. Wiley, Hoboken (2010)
Xiaolin, W., Lin, Y., Dong, W., Lihua, Z.: Improved TF-IDF keyword extraction algorithm. Comput. Sci. Appl. 3, 64–68 (2013)
Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, California, US, pp. 254–256. ACM (1999)
SuJian, L., HouFeng, W., ShiWen, Y., ChengSheng, X.: Research on maximum entropy model for keyword indexing. Chin. J. Comput. 27(9), 1192–1197 (2004)
Gonenc, E., Ilyas, C.: Using lexical chains for keyword extraction. Inf. Process. Manage. 43(6), 1705–1714 (2007)
Yih, W., Goodman, J., Carbalho, V.: Finding advertising keywords on web pages. In: International World Wide Web Conference Committee (IW3C2), May 23-26 (2006)
Zhunchen, L., Ting, W.: Research on the chinese keyword extraction algorithm based on separate models. J. Chin. Inf. Process. 23(1), 63–70 (2009)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. Cornell University Library, 7 September 2013 (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies (2013)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Citeseer, October 2001
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Cornell Unicersity Library, 16 October 2013 (2013)
Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: AISTATS (2005)
Hinton, G.E.: Learning distributed representations of concepts. In: Proceeding of the Eighth Annual Conference of the Cognitive Science Society (1986)
Mnih, A., Hinton, G.: There new graphical models for statistical language modeling. In: Proceedings of the 24th International Conference on Machine learning, pp. 641–648 (2007)
Hulth, A.: Combining machine learning and natural language processing for automatic keyword extraction. Stockholm University, Faculty of Social Science, Department of Computer and System Science (together with KTH) (2015)
Hiton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Luhn, H.P.: A statistical approach to the mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
Wang, L.: The research of keywords extraction algorithm in text mining. College of Computer Science and Technology, Zhejiang University of Technology, Zhejiang (2013)
Acknowledgements
The work of this paper is partially supported by National Natural Science Foundation of China (No. 61303097) and Ph.D. Programs Foundation of Ministry of Education of China (No. 20123108120026).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, Q., Zhu, W., Lu, Z. (2016). Predicting Abstract Keywords by Word Vectors. In: Xie, J., Chen, Z., Douglas, C., Zhang, W., Chen, Y. (eds) High Performance Computing and Applications. HPCA 2015. Lecture Notes in Computer Science(), vol 9576. Springer, Cham. https://doi.org/10.1007/978-3-319-32557-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-32557-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32556-9
Online ISBN: 978-3-319-32557-6
eBook Packages: Computer ScienceComputer Science (R0)