skip to main content
10.1145/3132847.3132956acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

MIKE: Keyphrase Extraction by Integrating Multidimensional Information

Published:06 November 2017Publication History

ABSTRACT

Traditional supervised keyphrase extraction models depend on the features of labelled keyphrases while prevailing unsupervised models mainly rely on structure of the word graph, with candidate words as nodes and edges capturing the co-occurrence information between words. However, systematically integrating all these multidimensional heterogeneous information into a unified model is relatively unexplored. In this paper, we focus on how to effectively exploit multidimensional information to improve the keyphrase extraction performance (MIKE). Specifically, we propose a random-walk parametric model, MIKE, that learns the latent representation for a candidate keyphrase that captures the mutual influences among all information, and simultaneously optimizes the parameters and ranking scores of candidates in the word graph. We use the gradient-descent algorithm to optimize our model and show the comprehensive experiments with two publicly-available WWW and KDD datasets in Computer Science. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art graph-based keyphrase extraction approaches.

References

  1. Kolawole John Adebayo, Di Caro Luigi, and Boella Guido. 2016. A Supervised KeyPhrase Extraction System. In Proceedings of SEMANTICS. ACM, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abdelghani Bellaachia and Mohammed Al-Dhelaan. 2014. HG-RANK: A Hypergraph-Based Keyphrase Extraction for Short Documents in Dynamic Genre Proceedings of 4th Workshop on Making Sense of Microposts co-located with WWW. ACM, 42--49.Google ScholarGoogle Scholar
  3. Gabor Berend. 2014. Exploiting Extra-textual and Linguistic Information in Keyphrase Extraction. Natural Language Engineering Vol. 22, 1 (2014), 73--95.Google ScholarGoogle ScholarCross RefCross Ref
  4. Steven Bird, Ewan Klein, and Edward Loper. 2009.. O'Reilly Media.Google ScholarGoogle Scholar
  5. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Florian Boudin. 2013. A Comparison of Centrality Measures for Graph-based Keyphrase Extraction Proceedings of IJCNLP. 834--838.Google ScholarGoogle Scholar
  7. Florian Boudin. 2015. Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming. In Proceedings of Workshop on Novel Computational Approaches to Keyphrase Extraction. ACL, 19--24.Google ScholarGoogle ScholarCross RefCross Ref
  8. Florin Bulgarov and Cornelia Caragea. 2015. A Comparison of Supervised Keyphrase Extraction Models Proceedings of WWW. ACM, 13--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cornelia Caragea, Florin Bulgarov, Andreea Godea, and Sujatha Das Gollapalli. 2014. Citation-enhanced Keyphrase Extraction from Research Papers: A Supervised Approach. Proceedings of EMNLP. ACL, 1435--1446.Google ScholarGoogle ScholarCross RefCross Ref
  10. Zhuoye Ding, Qi Zhang, and Xuanjing Huang. 2011. Keyphrase Extraction from Online News Using Binary Integer Programming Proceedings of AFNLP. 165--173.Google ScholarGoogle Scholar
  11. Kathrin Eichler and Günter Neumann. 2010. DFKI KeyWE: Ranking Keyphrases Etracted from Scientific Aarticles Proceedings of International Workshop on Semantic Evaluation. ACL, 150--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Corina Florescu and Cornelia Caragea. 2017. A Position-Biased PageRank Algorithm for Keyphrase Extraction Proceedings of AAAI. AAAI, 4923--4924.Google ScholarGoogle Scholar
  13. Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific Keyphrase Extraction. In Proceedings of IJCAI. Morgan Kaufmann, 668--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sujatha Das Gollapalli and Cornelia Caragea. 2014. Extracting Keyphrases from Research Papers using Citation Networks Proceedings of AAAI. AAAI, 1629--1635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sujatha Das Gollapalli, Xiaoli Li, and Peng Yang. 2017. Incorporating Expert Knowledge into Keyphrase Extraction Proceedings of AAAI. AAAI, 3180--3187.Google ScholarGoogle Scholar
  16. Kazi Saidul Hasan and Vincent Ng. 2010. Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-art Proceedings of COLING. ACM, 365--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art Proceedings of ACL. ACL, 1262--1273.Google ScholarGoogle Scholar
  18. Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010. Online Learning for Latent Dirichlet Allocation. Proceedings of Neural Information Processing Systems. MIT Press, 856--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In Proceedings of SIGIR. ACM, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anette Hulth. 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge Proceedings of EMNLP. ACL, 216--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xin Jiang, Yunhua Hu, and Hang Li. 2009. A Ranking Approach to Keyphrase Extraction. In Proceedings of SIGIR. ACM, 756--757. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. 2013. Automatic Keyphrase Extraction from Scientific Articles. Language Resources and Evaluation Vol. 43, 3 (2013), 723--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas K Landauer, Peter W. Foltz, and Darrell Laham. 1998. An Introduction to Latent Semantic Analysis. Discourse Processes, Vol. 25, 2--3 (1998), 259--284.Google ScholarGoogle ScholarCross RefCross Ref
  24. Huajing Li, Isaac G. Councill, Levent Bolelli, Ding Zhou, Yang Song, Wang-Chien Lee, Anand Sivasubramaniam, and C. Lee Giles. 2006. CiteSeer$^x$-A Scalable Autonomous Scientific Digital Library Proceedings of International Conference on Scalable Information Systems. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Li and Fei Song. 2015. Keyphrase Extraction and Grouping based on Association Rules Proceedings of FLAIRS. AAAI, 181--186.Google ScholarGoogle Scholar
  26. Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic Keyphrase Extraction via Topic Decomposition Proceedings of EMNLP. ACL, 366--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to Find Exemplar Terms for Keyphrase Extraction Proceedings of EMNLP. ACL, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Patrice Lopez and Laurent Romary. 2010. HUMB: Automatic Key term Extraction from Scientific Articles in GROBID Proceedings of Workshop on Semantic Evaluation. ACL, 248--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle Scholar
  30. Olena Medelyan, Eibe Frank, and Ian H. Witten. 2009. Human-competitive Tagging using Automatic Keyphrase Extraction Proceedings of EMNLP. ACL, 1318--1327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing Order into Text. In Proceedings of EMNLP. ACL, 404--411.Google ScholarGoogle Scholar
  32. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. 1--14 pages.Google ScholarGoogle Scholar
  33. M.F. Porter. 2006. An Algorithm for Suffix Stripping. Program: Electronic Library and Information Systems, Vol. 40, 3 (2006), 211--218.Google ScholarGoogle ScholarCross RefCross Ref
  34. Lucas Sterckx, Cornelia Caragea, Thomas Demeester, and Chris Develder. 2016. Supervised Keyphrase Extraction as Positive Unlabeled Learning Proceedings of EMNLP. ACL, 1924--1929.Google ScholarGoogle Scholar
  35. Lucas Sterckx, Thomas Demeester, and Johannes Deleu. 2015. Topical Word Importance for Fast Keyphrase Extraction Proceedings of WWW. ACM, 121--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Takashi Tomokiyo and Matthew Hurst. 2003. A Language Model Approach to Keyphrase Extraction. Proceedings of ACL Workshop on Multiword Expressions. ACL, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Peter D. Turney. 2000. Learning Algorithms for Keyphrase Extraction. Information Retrieval Journal Vol. 2, 4 (2000), 303--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaojun Wan and Jianguo Xiao. 2008. Single Document Keyphrase Extraction using Neighborhood Knowledge Proceedings of AAAI. AAAI, 855--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chen Wang and Sujian Li. 2011. CoRankBayes: Bayesian Learning to Rank under the Co-training Framework and Its Application in Keyphrase Extraction. In Proceedings of CIKM. ACM, 2241--2244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fang Wang, Zhongyuan Wang, Senzhang Wang, and Zhoujun Li. 2014. Exploiting Description Knowledge for Keyphrase Extraction Proceedings of PRICAI. 130--142.Google ScholarGoogle Scholar
  41. Rui Wang, Wei Liu, and Chris McDonald. 2015 b. Corpus-independent Generic Keyphrase Extraction using Word Embedding Vectors Proceedings of DL-WSDM. ACM, 39--46.Google ScholarGoogle Scholar
  42. Senzhang Wang, Lifang He, Leon Stenneth, Philip S. Yu, and Zhoujun Li. 2015 a. Citywide Traffic Congestion Estimation with Social Media Proceedings of ACM SIGSPATIAL. ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wei Wei, Bin Gao, Tieyan Liu, Taifeng Wang, Guohui Li, and Hang Li. 2016. A Ranking Approach on Large-scale Graph with Multidimensional Heterogeneous Information. IEEE Transactions on Cybernetics Vol. 46, 4 (2016), 930--944.Google ScholarGoogle ScholarCross RefCross Ref
  44. Zhang Weinan, Ming Zhaoyan, Zhang Yu, Liu Ting, and Chua Tatseng. 2015. Exploring Key Concept Paraphrasing based on Pivot Language Translation for Question Retrieval Proceedings of AAAI. AAAI, 410--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. 2006. Finding Advertising Keywords on Web Pages. In Proceedings of WWW. ACM, 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Qi Zhang, Yang Wang, Yeyun Gong, and Xuanjing Huang. 2016. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter Proceedings of EMNLP. ACL, 836--844.Google ScholarGoogle Scholar
  47. Wei Zhang, Wei Feng, and Jianyong Wang. 2013. Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction Proceedings of IJCAI. Morgan Kaufmann, 139--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wayne Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achananuparp, Ee-Peng Lim, and Xiaoming Li. 2011. Topical Keyphrase Extraction from Twitter. In Proceedings of ACL. ACL, 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yu Zheng. 2015. Methodologies for Cross-domain Data Fusion: An Overview. IEEE Transactions on Big Data Vol. 1, 1 (2015), 16--34.Google ScholarGoogle ScholarCross RefCross Ref
  50. Maxim Zhukovskiy, Gleb Gusev, and Pavel Serdyukov. 2014. Supervised Nested PageRank. In Proceedings of CIKM. ACM, 1059--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. MIKE: Keyphrase Extraction by Integrating Multidimensional Information

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
      November 2017
      2604 pages
      ISBN:9781450349185
      DOI:10.1145/3132847

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader