research-article

MIKE: Keyphrase Extraction by Integrating Multidimensional Information

Authors:
Yuxiang Zhang

Civil Aviation University of China, Tianjin, China

Civil Aviation University of China, Tianjin, China
View Profile

,
Yaocheng Chang

Civil Aviation University of China, Tianjin, China

Civil Aviation University of China, Tianjin, China
View Profile

,
Xiaoqing Liu

Civil Aviation University of China, Tianjin, China

Civil Aviation University of China, Tianjin, China
View Profile

,
Sujatha Das Gollapalli

A*STAR, Singapore, Singapore

A*STAR, Singapore, Singapore
View Profile

,
Xiaoli Li

A*STAR, Singapore, China

A*STAR, Singapore, China
View Profile

,
Chunjing Xiao

Civil Aviation University of China, Tianjin, China

Civil Aviation University of China, Tianjin, China
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 1349–1358https://doi.org/10.1145/3132847.3132956

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 1349–1358

ABSTRACT

Traditional supervised keyphrase extraction models depend on the features of labelled keyphrases while prevailing unsupervised models mainly rely on structure of the word graph, with candidate words as nodes and edges capturing the co-occurrence information between words. However, systematically integrating all these multidimensional heterogeneous information into a unified model is relatively unexplored. In this paper, we focus on how to effectively exploit multidimensional information to improve the keyphrase extraction performance (MIKE). Specifically, we propose a random-walk parametric model, MIKE, that learns the latent representation for a candidate keyphrase that captures the mutual influences among all information, and simultaneously optimizes the parameters and ranking scores of candidates in the word graph. We use the gradient-descent algorithm to optimize our model and show the comprehensive experiments with two publicly-available WWW and KDD datasets in Computer Science. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art graph-based keyphrase extraction approaches.

References

Kolawole John Adebayo, Di Caro Luigi, and Boella Guido. 2016. A Supervised KeyPhrase Extraction System. In Proceedings of SEMANTICS. ACM, 57--62. Google ScholarDigital Library
Abdelghani Bellaachia and Mohammed Al-Dhelaan. 2014. HG-RANK: A Hypergraph-Based Keyphrase Extraction for Short Documents in Dynamic Genre Proceedings of 4th Workshop on Making Sense of Microposts co-located with WWW. ACM, 42--49.Google Scholar
Gabor Berend. 2014. Exploiting Extra-textual and Linguistic Information in Keyphrase Extraction. Natural Language Engineering Vol. 22, 1 (2014), 73--95.Google ScholarCross Ref
Steven Bird, Ewan Klein, and Edward Loper. 2009.. O'Reilly Media.Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022. Google ScholarDigital Library
Florian Boudin. 2013. A Comparison of Centrality Measures for Graph-based Keyphrase Extraction Proceedings of IJCNLP. 834--838.Google Scholar
Florian Boudin. 2015. Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming. In Proceedings of Workshop on Novel Computational Approaches to Keyphrase Extraction. ACL, 19--24.Google ScholarCross Ref
Florin Bulgarov and Cornelia Caragea. 2015. A Comparison of Supervised Keyphrase Extraction Models Proceedings of WWW. ACM, 13--14. Google ScholarDigital Library
Cornelia Caragea, Florin Bulgarov, Andreea Godea, and Sujatha Das Gollapalli. 2014. Citation-enhanced Keyphrase Extraction from Research Papers: A Supervised Approach. Proceedings of EMNLP. ACL, 1435--1446.Google ScholarCross Ref
Zhuoye Ding, Qi Zhang, and Xuanjing Huang. 2011. Keyphrase Extraction from Online News Using Binary Integer Programming Proceedings of AFNLP. 165--173.Google Scholar
Kathrin Eichler and Günter Neumann. 2010. DFKI KeyWE: Ranking Keyphrases Etracted from Scientific Aarticles Proceedings of International Workshop on Semantic Evaluation. ACL, 150--153. Google ScholarDigital Library
Corina Florescu and Cornelia Caragea. 2017. A Position-Biased PageRank Algorithm for Keyphrase Extraction Proceedings of AAAI. AAAI, 4923--4924.Google Scholar
Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific Keyphrase Extraction. In Proceedings of IJCAI. Morgan Kaufmann, 668--673. Google ScholarDigital Library
Sujatha Das Gollapalli and Cornelia Caragea. 2014. Extracting Keyphrases from Research Papers using Citation Networks Proceedings of AAAI. AAAI, 1629--1635. Google ScholarDigital Library
Sujatha Das Gollapalli, Xiaoli Li, and Peng Yang. 2017. Incorporating Expert Knowledge into Keyphrase Extraction Proceedings of AAAI. AAAI, 3180--3187.Google Scholar
Kazi Saidul Hasan and Vincent Ng. 2010. Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-art Proceedings of COLING. ACM, 365--373. Google ScholarDigital Library
Kazi Saidul Hasan and Vincent Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art Proceedings of ACL. ACL, 1262--1273.Google Scholar
Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010. Online Learning for Latent Dirichlet Allocation. Proceedings of Neural Information Processing Systems. MIT Press, 856--864. Google ScholarDigital Library
Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In Proceedings of SIGIR. ACM, 50--57. Google ScholarDigital Library
Anette Hulth. 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge Proceedings of EMNLP. ACL, 216--223. Google ScholarDigital Library
Xin Jiang, Yunhua Hu, and Hang Li. 2009. A Ranking Approach to Keyphrase Extraction. In Proceedings of SIGIR. ACM, 756--757. Google ScholarDigital Library
Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. 2013. Automatic Keyphrase Extraction from Scientific Articles. Language Resources and Evaluation Vol. 43, 3 (2013), 723--742. Google ScholarDigital Library
Thomas K Landauer, Peter W. Foltz, and Darrell Laham. 1998. An Introduction to Latent Semantic Analysis. Discourse Processes, Vol. 25, 2--3 (1998), 259--284.Google ScholarCross Ref
Huajing Li, Isaac G. Councill, Levent Bolelli, Ding Zhou, Yang Song, Wang-Chien Lee, Anand Sivasubramaniam, and C. Lee Giles. 2006. CiteSeer$^x$-A Scalable Autonomous Scientific Digital Library Proceedings of International Conference on Scalable Information Systems. ACM. Google ScholarDigital Library
Xin Li and Fei Song. 2015. Keyphrase Extraction and Grouping based on Association Rules Proceedings of FLAIRS. AAAI, 181--186.Google Scholar
Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic Keyphrase Extraction via Topic Decomposition Proceedings of EMNLP. ACL, 366--376. Google ScholarDigital Library
Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to Find Exemplar Terms for Keyphrase Extraction Proceedings of EMNLP. ACL, 257--266. Google ScholarDigital Library
Patrice Lopez and Laurent Romary. 2010. HUMB: Automatic Key term Extraction from Scientific Articles in GROBID Proceedings of Workshop on Semantic Evaluation. ACL, 248--251. Google ScholarDigital Library
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
Olena Medelyan, Eibe Frank, and Ian H. Witten. 2009. Human-competitive Tagging using Automatic Keyphrase Extraction Proceedings of EMNLP. ACL, 1318--1327. Google ScholarDigital Library
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing Order into Text. In Proceedings of EMNLP. ACL, 404--411.Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. 1--14 pages.Google Scholar
M.F. Porter. 2006. An Algorithm for Suffix Stripping. Program: Electronic Library and Information Systems, Vol. 40, 3 (2006), 211--218.Google ScholarCross Ref
Lucas Sterckx, Cornelia Caragea, Thomas Demeester, and Chris Develder. 2016. Supervised Keyphrase Extraction as Positive Unlabeled Learning Proceedings of EMNLP. ACL, 1924--1929.Google Scholar
Lucas Sterckx, Thomas Demeester, and Johannes Deleu. 2015. Topical Word Importance for Fast Keyphrase Extraction Proceedings of WWW. ACM, 121--122. Google ScholarDigital Library
Takashi Tomokiyo and Matthew Hurst. 2003. A Language Model Approach to Keyphrase Extraction. Proceedings of ACL Workshop on Multiword Expressions. ACL, 33--40. Google ScholarDigital Library
Peter D. Turney. 2000. Learning Algorithms for Keyphrase Extraction. Information Retrieval Journal Vol. 2, 4 (2000), 303--336. Google ScholarDigital Library
Xiaojun Wan and Jianguo Xiao. 2008. Single Document Keyphrase Extraction using Neighborhood Knowledge Proceedings of AAAI. AAAI, 855--860. Google ScholarDigital Library
Chen Wang and Sujian Li. 2011. CoRankBayes: Bayesian Learning to Rank under the Co-training Framework and Its Application in Keyphrase Extraction. In Proceedings of CIKM. ACM, 2241--2244. Google ScholarDigital Library
Fang Wang, Zhongyuan Wang, Senzhang Wang, and Zhoujun Li. 2014. Exploiting Description Knowledge for Keyphrase Extraction Proceedings of PRICAI. 130--142.Google Scholar
Rui Wang, Wei Liu, and Chris McDonald. 2015 b. Corpus-independent Generic Keyphrase Extraction using Word Embedding Vectors Proceedings of DL-WSDM. ACM, 39--46.Google Scholar
Senzhang Wang, Lifang He, Leon Stenneth, Philip S. Yu, and Zhoujun Li. 2015 a. Citywide Traffic Congestion Estimation with Social Media Proceedings of ACM SIGSPATIAL. ACM, 1--10. Google ScholarDigital Library
Wei Wei, Bin Gao, Tieyan Liu, Taifeng Wang, Guohui Li, and Hang Li. 2016. A Ranking Approach on Large-scale Graph with Multidimensional Heterogeneous Information. IEEE Transactions on Cybernetics Vol. 46, 4 (2016), 930--944.Google ScholarCross Ref
Zhang Weinan, Ming Zhaoyan, Zhang Yu, Liu Ting, and Chua Tatseng. 2015. Exploring Key Concept Paraphrasing based on Pivot Language Translation for Question Retrieval Proceedings of AAAI. AAAI, 410--416. Google ScholarDigital Library
Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. 2006. Finding Advertising Keywords on Web Pages. In Proceedings of WWW. ACM, 213--222. Google ScholarDigital Library
Qi Zhang, Yang Wang, Yeyun Gong, and Xuanjing Huang. 2016. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter Proceedings of EMNLP. ACL, 836--844.Google Scholar
Wei Zhang, Wei Feng, and Jianyong Wang. 2013. Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction Proceedings of IJCAI. Morgan Kaufmann, 139--160. Google ScholarDigital Library
Wayne Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achananuparp, Ee-Peng Lim, and Xiaoming Li. 2011. Topical Keyphrase Extraction from Twitter. In Proceedings of ACL. ACL, 379--388. Google ScholarDigital Library
Yu Zheng. 2015. Methodologies for Cross-domain Data Fusion: An Overview. IEEE Transactions on Big Data Vol. 1, 1 (2015), 16--34.Google ScholarCross Ref
Maxim Zhukovskiy, Gleb Gusev, and Pavel Serdyukov. 2014. Supervised Nested PageRank. In Proceedings of CIKM. ACM, 1059--1068. Google ScholarDigital Library

MIKE: Keyphrase Extraction by Integrating Multidimensional Information
1. Information systems
  1. Information systems applications

Recommendations

NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

The massive growth of the micro-blogging service Twitter has shed the light on the challenging problem of summarizing a collection of large number of tweets. This paper attempts to extract topical key phrases that would represent topics in tweets. Due ...
Read More
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase ...
Read More
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph-based keyphrase extraction approach
keyphrase extraction
multidimensional information
parametric model
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 344
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MIKE: Keyphrase Extraction by Integrating Multidimensional Information

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Recommendations

NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Domain-specific keyphrase extraction