Predicting Abstract Keywords by Word Vectors

Li, Qing; Zhu, Wenhao; Lu, Zhiguo

doi:10.1007/978-3-319-32557-6_20

Predicting Abstract Keywords by Word Vectors

Qing Li¹⁸,
Wenhao Zhu¹⁸ &
Zhiguo Lu¹⁹

Conference paper
First Online: 20 July 2016

896 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9576))

Abstract

The continuous development of the information technology leads to the explosive growth of many information domains. Obtaining the required information from a large-scale text in a quick and accurate way has become a great challenge. Keyword extraction is a kind of effective method to solve these problems. It is one of the core technologies in the research area of text mining, and plays a very important role. Currently, the keywords of most text information have not been provided. Some keywords of a text are not contained in the text content. There is not any elegant solution, offered by the existing algorithms, for this problem yet. To solve it, this paper proposes a keyword extraction method based on word vectors. The concept of a text turns into computer understandable space by training word vectors using a word2vec algorithm. This method trains all the words and keywords which appear in the text into vector sets through the word2vec training method, and then the words in the test text will be replaced by word term vectors. The Euclidean distances between every candidate words and every text words are calculated to find out the top-N-closest keywords as the automatic text extraction keywords. The experiment uses computer field papers as a training text. The results show that the method can improve the accuracy of the phrase keyword extraction and find the keywords not appearing in the text.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mihalcea, T.P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining: Theory and Applications. Wiley, Hoboken (2010)
Google Scholar
Xiaolin, W., Lin, Y., Dong, W., Lihua, Z.: Improved TF-IDF keyword extraction algorithm. Comput. Sci. Appl. 3, 64–68 (2013)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: practical automatic keyphrase extraction. In: Proceedings of the 4^th ACM Conference on Digital Libraries, Berkeley, California, US, pp. 254–256. ACM (1999)
Google Scholar
SuJian, L., HouFeng, W., ShiWen, Y., ChengSheng, X.: Research on maximum entropy model for keyword indexing. Chin. J. Comput. 27(9), 1192–1197 (2004)
Google Scholar
Gonenc, E., Ilyas, C.: Using lexical chains for keyword extraction. Inf. Process. Manage. 43(6), 1705–1714 (2007)
Article Google Scholar
Yih, W., Goodman, J., Carbalho, V.: Finding advertising keywords on web pages. In: International World Wide Web Conference Committee (IW3C2), May 23-26 (2006)
Google Scholar
Zhunchen, L., Ting, W.: Research on the chinese keyword extraction algorithm based on separate models. J. Chin. Inf. Process. 23(1), 63–70 (2009)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. Cornell University Library, 7 September 2013 (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies (2013)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Citeseer, October 2001
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Cornell Unicersity Library, 16 October 2013 (2013)
Google Scholar
Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: AISTATS (2005)
Google Scholar
Hinton, G.E.: Learning distributed representations of concepts. In: Proceeding of the Eighth Annual Conference of the Cognitive Science Society (1986)
Google Scholar
Mnih, A., Hinton, G.: There new graphical models for statistical language modeling. In: Proceedings of the 24^th International Conference on Machine learning, pp. 641–648 (2007)
Google Scholar
Hulth, A.: Combining machine learning and natural language processing for automatic keyword extraction. Stockholm University, Faculty of Social Science, Department of Computer and System Science (together with KTH) (2015)
Google Scholar
Hiton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Luhn, H.P.: A statistical approach to the mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
Article MathSciNet Google Scholar
Wang, L.: The research of keywords extraction algorithm in text mining. College of Computer Science and Technology, Zhejiang University of Technology, Zhejiang (2013)
Google Scholar

Download references

Acknowledgements

The work of this paper is partially supported by National Natural Science Foundation of China (No. 61303097) and Ph.D. Programs Foundation of Ministry of Education of China (No. 20123108120026).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Qing Li & Wenhao Zhu
Shanghai University Library, Shanghai University, Shanghai, China
Zhiguo Lu

Authors

Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiguo Lu .

Editor information

Editors and Affiliations

School of Computer Engineering, Shanghai University, Shanghai, China
Jiang Xie
Chemical and Petroleum Engineering, University of Calgary, Calgary, Alberta, Canada
Zhangxin Chen
Mathematics Department, University of Wyoming, Laramie, Wyoming, USA
Craig C. Douglas
School of Computer Engineering, Shanghai University, Shanghai, China
Wu Zhang
Mathematics and Informatics, South China Agricultural University, Guangzhou, China
Yan Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Zhu, W., Lu, Z. (2016). Predicting Abstract Keywords by Word Vectors. In: Xie, J., Chen, Z., Douglas, C., Zhang, W., Chen, Y. (eds) High Performance Computing and Applications. HPCA 2015. Lecture Notes in Computer Science(), vol 9576. Springer, Cham. https://doi.org/10.1007/978-3-319-32557-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-32557-6_20
Published: 20 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32556-9
Online ISBN: 978-3-319-32557-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics