ABSTRACT
Traditional supervised keyphrase extraction models depend on the features of labelled keyphrases while prevailing unsupervised models mainly rely on structure of the word graph, with candidate words as nodes and edges capturing the co-occurrence information between words. However, systematically integrating all these multidimensional heterogeneous information into a unified model is relatively unexplored. In this paper, we focus on how to effectively exploit multidimensional information to improve the keyphrase extraction performance (MIKE). Specifically, we propose a random-walk parametric model, MIKE, that learns the latent representation for a candidate keyphrase that captures the mutual influences among all information, and simultaneously optimizes the parameters and ranking scores of candidates in the word graph. We use the gradient-descent algorithm to optimize our model and show the comprehensive experiments with two publicly-available WWW and KDD datasets in Computer Science. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art graph-based keyphrase extraction approaches.
- Kolawole John Adebayo, Di Caro Luigi, and Boella Guido. 2016. A Supervised KeyPhrase Extraction System. In Proceedings of SEMANTICS. ACM, 57--62. Google ScholarDigital Library
- Abdelghani Bellaachia and Mohammed Al-Dhelaan. 2014. HG-RANK: A Hypergraph-Based Keyphrase Extraction for Short Documents in Dynamic Genre Proceedings of 4th Workshop on Making Sense of Microposts co-located with WWW. ACM, 42--49.Google Scholar
- Gabor Berend. 2014. Exploiting Extra-textual and Linguistic Information in Keyphrase Extraction. Natural Language Engineering Vol. 22, 1 (2014), 73--95.Google ScholarCross Ref
- Steven Bird, Ewan Klein, and Edward Loper. 2009.. O'Reilly Media.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022. Google ScholarDigital Library
- Florian Boudin. 2013. A Comparison of Centrality Measures for Graph-based Keyphrase Extraction Proceedings of IJCNLP. 834--838.Google Scholar
- Florian Boudin. 2015. Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming. In Proceedings of Workshop on Novel Computational Approaches to Keyphrase Extraction. ACL, 19--24.Google ScholarCross Ref
- Florin Bulgarov and Cornelia Caragea. 2015. A Comparison of Supervised Keyphrase Extraction Models Proceedings of WWW. ACM, 13--14. Google ScholarDigital Library
- Cornelia Caragea, Florin Bulgarov, Andreea Godea, and Sujatha Das Gollapalli. 2014. Citation-enhanced Keyphrase Extraction from Research Papers: A Supervised Approach. Proceedings of EMNLP. ACL, 1435--1446.Google ScholarCross Ref
- Zhuoye Ding, Qi Zhang, and Xuanjing Huang. 2011. Keyphrase Extraction from Online News Using Binary Integer Programming Proceedings of AFNLP. 165--173.Google Scholar
- Kathrin Eichler and Günter Neumann. 2010. DFKI KeyWE: Ranking Keyphrases Etracted from Scientific Aarticles Proceedings of International Workshop on Semantic Evaluation. ACL, 150--153. Google ScholarDigital Library
- Corina Florescu and Cornelia Caragea. 2017. A Position-Biased PageRank Algorithm for Keyphrase Extraction Proceedings of AAAI. AAAI, 4923--4924.Google Scholar
- Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific Keyphrase Extraction. In Proceedings of IJCAI. Morgan Kaufmann, 668--673. Google ScholarDigital Library
- Sujatha Das Gollapalli and Cornelia Caragea. 2014. Extracting Keyphrases from Research Papers using Citation Networks Proceedings of AAAI. AAAI, 1629--1635. Google ScholarDigital Library
- Sujatha Das Gollapalli, Xiaoli Li, and Peng Yang. 2017. Incorporating Expert Knowledge into Keyphrase Extraction Proceedings of AAAI. AAAI, 3180--3187.Google Scholar
- Kazi Saidul Hasan and Vincent Ng. 2010. Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-art Proceedings of COLING. ACM, 365--373. Google ScholarDigital Library
- Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art Proceedings of ACL. ACL, 1262--1273.Google Scholar
- Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010. Online Learning for Latent Dirichlet Allocation. Proceedings of Neural Information Processing Systems. MIT Press, 856--864. Google ScholarDigital Library
- Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In Proceedings of SIGIR. ACM, 50--57. Google ScholarDigital Library
- Anette Hulth. 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge Proceedings of EMNLP. ACL, 216--223. Google ScholarDigital Library
- Xin Jiang, Yunhua Hu, and Hang Li. 2009. A Ranking Approach to Keyphrase Extraction. In Proceedings of SIGIR. ACM, 756--757. Google ScholarDigital Library
- Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. 2013. Automatic Keyphrase Extraction from Scientific Articles. Language Resources and Evaluation Vol. 43, 3 (2013), 723--742. Google ScholarDigital Library
- Thomas K Landauer, Peter W. Foltz, and Darrell Laham. 1998. An Introduction to Latent Semantic Analysis. Discourse Processes, Vol. 25, 2--3 (1998), 259--284.Google ScholarCross Ref
- Huajing Li, Isaac G. Councill, Levent Bolelli, Ding Zhou, Yang Song, Wang-Chien Lee, Anand Sivasubramaniam, and C. Lee Giles. 2006. CiteSeer$^x$-A Scalable Autonomous Scientific Digital Library Proceedings of International Conference on Scalable Information Systems. ACM. Google ScholarDigital Library
- Xin Li and Fei Song. 2015. Keyphrase Extraction and Grouping based on Association Rules Proceedings of FLAIRS. AAAI, 181--186.Google Scholar
- Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic Keyphrase Extraction via Topic Decomposition Proceedings of EMNLP. ACL, 366--376. Google ScholarDigital Library
- Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to Find Exemplar Terms for Keyphrase Extraction Proceedings of EMNLP. ACL, 257--266. Google ScholarDigital Library
- Patrice Lopez and Laurent Romary. 2010. HUMB: Automatic Key term Extraction from Scientific Articles in GROBID Proceedings of Workshop on Semantic Evaluation. ACL, 248--251. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
- Olena Medelyan, Eibe Frank, and Ian H. Witten. 2009. Human-competitive Tagging using Automatic Keyphrase Extraction Proceedings of EMNLP. ACL, 1318--1327. Google ScholarDigital Library
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing Order into Text. In Proceedings of EMNLP. ACL, 404--411.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. 1--14 pages.Google Scholar
- M.F. Porter. 2006. An Algorithm for Suffix Stripping. Program: Electronic Library and Information Systems, Vol. 40, 3 (2006), 211--218.Google ScholarCross Ref
- Lucas Sterckx, Cornelia Caragea, Thomas Demeester, and Chris Develder. 2016. Supervised Keyphrase Extraction as Positive Unlabeled Learning Proceedings of EMNLP. ACL, 1924--1929.Google Scholar
- Lucas Sterckx, Thomas Demeester, and Johannes Deleu. 2015. Topical Word Importance for Fast Keyphrase Extraction Proceedings of WWW. ACM, 121--122. Google ScholarDigital Library
- Takashi Tomokiyo and Matthew Hurst. 2003. A Language Model Approach to Keyphrase Extraction. Proceedings of ACL Workshop on Multiword Expressions. ACL, 33--40. Google ScholarDigital Library
- Peter D. Turney. 2000. Learning Algorithms for Keyphrase Extraction. Information Retrieval Journal Vol. 2, 4 (2000), 303--336. Google ScholarDigital Library
- Xiaojun Wan and Jianguo Xiao. 2008. Single Document Keyphrase Extraction using Neighborhood Knowledge Proceedings of AAAI. AAAI, 855--860. Google ScholarDigital Library
- Chen Wang and Sujian Li. 2011. CoRankBayes: Bayesian Learning to Rank under the Co-training Framework and Its Application in Keyphrase Extraction. In Proceedings of CIKM. ACM, 2241--2244. Google ScholarDigital Library
- Fang Wang, Zhongyuan Wang, Senzhang Wang, and Zhoujun Li. 2014. Exploiting Description Knowledge for Keyphrase Extraction Proceedings of PRICAI. 130--142.Google Scholar
- Rui Wang, Wei Liu, and Chris McDonald. 2015 b. Corpus-independent Generic Keyphrase Extraction using Word Embedding Vectors Proceedings of DL-WSDM. ACM, 39--46.Google Scholar
- Senzhang Wang, Lifang He, Leon Stenneth, Philip S. Yu, and Zhoujun Li. 2015 a. Citywide Traffic Congestion Estimation with Social Media Proceedings of ACM SIGSPATIAL. ACM, 1--10. Google ScholarDigital Library
- Wei Wei, Bin Gao, Tieyan Liu, Taifeng Wang, Guohui Li, and Hang Li. 2016. A Ranking Approach on Large-scale Graph with Multidimensional Heterogeneous Information. IEEE Transactions on Cybernetics Vol. 46, 4 (2016), 930--944.Google ScholarCross Ref
- Zhang Weinan, Ming Zhaoyan, Zhang Yu, Liu Ting, and Chua Tatseng. 2015. Exploring Key Concept Paraphrasing based on Pivot Language Translation for Question Retrieval Proceedings of AAAI. AAAI, 410--416. Google ScholarDigital Library
- Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. 2006. Finding Advertising Keywords on Web Pages. In Proceedings of WWW. ACM, 213--222. Google ScholarDigital Library
- Qi Zhang, Yang Wang, Yeyun Gong, and Xuanjing Huang. 2016. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter Proceedings of EMNLP. ACL, 836--844.Google Scholar
- Wei Zhang, Wei Feng, and Jianyong Wang. 2013. Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction Proceedings of IJCAI. Morgan Kaufmann, 139--160. Google ScholarDigital Library
- Wayne Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achananuparp, Ee-Peng Lim, and Xiaoming Li. 2011. Topical Keyphrase Extraction from Twitter. In Proceedings of ACL. ACL, 379--388. Google ScholarDigital Library
- Yu Zheng. 2015. Methodologies for Cross-domain Data Fusion: An Overview. IEEE Transactions on Big Data Vol. 1, 1 (2015), 16--34.Google ScholarCross Ref
- Maxim Zhukovskiy, Gleb Gusev, and Pavel Serdyukov. 2014. Supervised Nested PageRank. In Proceedings of CIKM. ACM, 1059--1068. Google ScholarDigital Library
- MIKE: Keyphrase Extraction by Integrating Multidimensional Information
Recommendations
NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01The massive growth of the micro-blogging service Twitter has shed the light on the challenging problem of summarizing a collection of large number of tweets. This paper attempts to extract topical key phrases that would represent topics in tweets. Due ...
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction
Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase ...
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDocument keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Comments