Skip to main content

Predicting Abstract Keywords by Word Vectors

  • Conference paper
  • First Online:
  • 896 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9576))

Abstract

The continuous development of the information technology leads to the explosive growth of many information domains. Obtaining the required information from a large-scale text in a quick and accurate way has become a great challenge. Keyword extraction is a kind of effective method to solve these problems. It is one of the core technologies in the research area of text mining, and plays a very important role. Currently, the keywords of most text information have not been provided. Some keywords of a text are not contained in the text content. There is not any elegant solution, offered by the existing algorithms, for this problem yet. To solve it, this paper proposes a keyword extraction method based on word vectors. The concept of a text turns into computer understandable space by training word vectors using a word2vec algorithm. This method trains all the words and keywords which appear in the text into vector sets through the word2vec training method, and then the words in the test text will be replaced by word term vectors. The Euclidean distances between every candidate words and every text words are calculated to find out the top-N-closest keywords as the automatic text extraction keywords. The experiment uses computer field papers as a training text. The results show that the method can improve the accuracy of the phrase keyword extraction and find the keywords not appearing in the text.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Mihalcea, T.P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)

    Google Scholar 

  2. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining: Theory and Applications. Wiley, Hoboken (2010)

    Google Scholar 

  3. Xiaolin, W., Lin, Y., Dong, W., Lihua, Z.: Improved TF-IDF keyword extraction algorithm. Comput. Sci. Appl. 3, 64–68 (2013)

    Google Scholar 

  4. Witten, I.H., Paynter, G.W., Frank, E., et al.: KEA: practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, California, US, pp. 254–256. ACM (1999)

    Google Scholar 

  5. SuJian, L., HouFeng, W., ShiWen, Y., ChengSheng, X.: Research on maximum entropy model for keyword indexing. Chin. J. Comput. 27(9), 1192–1197 (2004)

    Google Scholar 

  6. Gonenc, E., Ilyas, C.: Using lexical chains for keyword extraction. Inf. Process. Manage. 43(6), 1705–1714 (2007)

    Article  Google Scholar 

  7. Yih, W., Goodman, J., Carbalho, V.: Finding advertising keywords on web pages. In: International World Wide Web Conference Committee (IW3C2), May 23-26 (2006)

    Google Scholar 

  8. Zhunchen, L., Ting, W.: Research on the chinese keyword extraction algorithm based on separate models. J. Chin. Inf. Process. 23(1), 63–70 (2009)

    Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. Cornell University Library, 7 September 2013 (2013)

    Google Scholar 

  10. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies (2013)

    Google Scholar 

  11. Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Citeseer, October 2001

    Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Cornell Unicersity Library, 16 October 2013 (2013)

    Google Scholar 

  13. Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: AISTATS (2005)

    Google Scholar 

  14. Hinton, G.E.: Learning distributed representations of concepts. In: Proceeding of the Eighth Annual Conference of the Cognitive Science Society (1986)

    Google Scholar 

  15. Mnih, A., Hinton, G.: There new graphical models for statistical language modeling. In: Proceedings of the 24th International Conference on Machine learning, pp. 641–648 (2007)

    Google Scholar 

  16. Hulth, A.: Combining machine learning and natural language processing for automatic keyword extraction. Stockholm University, Faculty of Social Science, Department of Computer and System Science (together with KTH) (2015)

    Google Scholar 

  17. Hiton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  18. Luhn, H.P.: A statistical approach to the mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)

    Article  MathSciNet  Google Scholar 

  19. Wang, L.: The research of keywords extraction algorithm in text mining. College of Computer Science and Technology, Zhejiang University of Technology, Zhejiang (2013)

    Google Scholar 

Download references

Acknowledgements

The work of this paper is partially supported by National Natural Science Foundation of China (No. 61303097) and Ph.D. Programs Foundation of Ministry of Education of China (No. 20123108120026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, Q., Zhu, W., Lu, Z. (2016). Predicting Abstract Keywords by Word Vectors. In: Xie, J., Chen, Z., Douglas, C., Zhang, W., Chen, Y. (eds) High Performance Computing and Applications. HPCA 2015. Lecture Notes in Computer Science(), vol 9576. Springer, Cham. https://doi.org/10.1007/978-3-319-32557-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32557-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32556-9

  • Online ISBN: 978-3-319-32557-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics