Hashtag Biased Ranking for Keyword Extraction from Microblog Posts

Li, Lin; Su, Chang; Sun, Yueqing; Xiong, Shengwu; Xu, Guandong

doi:10.1007/978-3-319-25159-2_32

Lin Li²²,
Chang Su²²,
Yueqing Sun²²,
Shengwu Xiong²² &
…
Guandong Xu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2970 Accesses
1 Citations

Abstract

Nowadays, a huge amount of text is being generated for social networking purpose on the Web. Keyword extraction from such text benefit many applications such as advertising, search, and content filtering. Recent studies show that graph based ranking is more effective than traditional term or document frequecy based approaches. However, most work in the literature constructs word to word graph within a document or a collection of documents before applying a kind of random walk. Such a graph does not consider the influence of document importance on keyword extraction. Moreover, social text like a microblog post usually has speical social features such as hashtag and so on, which can help us understand its topic. In this paper, we propose hashtag biased ranking for keyword extraction from a collection of microblog posts. We first build a word-post weighted graph by taking into account the posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides our approach to extract keywords according to the hashtag topic. Last, the final ranking of a word is determined by the stationary probability after a number of interations. We evaluate our proposed method on a real Chinese microblog posts. Experiments show that our method is more effective than the traditional word to word graph based ranking in terms of precision.

This research was undertaken as part of Project 15BGL048, 2015AA015403, 2015BAA072 and 61303029 and supported by Hubei Key Laboratory of Transportation Internet of Things.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bi, B., Tian, Y., Sismanis, Y., Balmin, A., Cho, J.: Scalable topic-specific influence analysis on microblogs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 513–522. ACM (2014)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. McGraw-Hill (1990)
Google Scholar
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM (2009)
Google Scholar
Haveliwala, T.H.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. on Knowl. and Data Eng. 15(4), 784–796 (2003)
Article Google Scholar
Hu, X., Tang, J., Liu, H.: Leveraging knowledge across media for spammer detection in microblogging. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 547–556. ACM (2014)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Article MATH Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)
Google Scholar
Li, Z., Zhou, D., Juan, Y.-F., Han, J.: Keyword extraction for social snippets. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1143–1144. ACM (2010)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 366–376. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 1, pp. 257–266. Association for Computational Linguistics (2009)
Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.), Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics 2004
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Google Scholar
Qiang, R., Liang, F., Yang, J.: Exploiting ranking factorization machines for microblog retrieval. In: Proceedings of the 22nd ACM international conference on Conference on Information and knowledge management, CIKM 2013, pp. 1783–1788. ACM (2013)
Google Scholar
Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 613–622. IEEE Computer Society (2006)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)
Article Google Scholar
Vosecky, J., Leung, K.W.-T., Ng, W.: Collaborative personalized twitter search with topic-language models. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 53–62. ACM (2014)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 855–860. AAAI Press (2008)
Google Scholar
Wang, J., Liu, J., Wang, C.: Keyword extraction based on pagerank. In: Li, H., Yang, Q., Zhou, Z.-H. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)
Chapter Google Scholar
Wang, W., Xu, H., Yang, W., Huang, X.: Constrained-hLDA for topic discovery in Chinese Microblogs. In: Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y., Tseng, V.S. (eds.) PAKDD 2014, Part II. LNCS, vol. 8444, pp. 608–619. Springer, Heidelberg (2014)
Chapter Google Scholar
Wang, X., Wang, L., Li, J., Li, S.: Exploring simultaneous keyword and key sentence extraction: Improve graph-based ranking using wikipedia. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2619–2622. ACM (2012)
Google Scholar
Wu, W., Zhang, B., Ostendorf, M.: Automatic generation of personalized annotation tags for twitter users. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 689–692 (2010)
Google Scholar
Zhang, W., Feng, W., Wang, J.: Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 2225–2231. AAAI Press (2013)
Google Scholar
Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.-P., Li, X.: Topical keyphrase extraction from twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol.1, pp. 379–388 (2011)
Google Scholar
Zhiyuan, L., Xinxiong, C., Maosong, S.: Mining the interests of chinese microbloggers via keyword extraction. Foundations and Trends in Information Retrieval 6(1), 76–87 (2012)
MathSciNet Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schlkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Wuhan University of Technology, Wuhan, China
Lin Li, Chang Su, Yueqing Sun & Shengwu Xiong
Advanced Analytics Institute, University of Technology, Sydney, Ultimo, Australia
Guandong Xu

Authors

Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Chang Su
View author publications
You can also search for this author in PubMed Google Scholar
Yueqing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengwu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Guandong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengwu Xiong .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Songmao Zhang
Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Southwest University, Chongqing, China
Zili Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Su, C., Sun, Y., Xiong, S., Xu, G. (2015). Hashtag Biased Ranking for Keyword Extraction from Microblog Posts. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-25159-2_32
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics