Abstract
Nowadays, a huge amount of text is being generated for social networking purpose on the Web. Keyword extraction from such text benefit many applications such as advertising, search, and content filtering. Recent studies show that graph based ranking is more effective than traditional term or document frequecy based approaches. However, most work in the literature constructs word to word graph within a document or a collection of documents before applying a kind of random walk. Such a graph does not consider the influence of document importance on keyword extraction. Moreover, social text like a microblog post usually has speical social features such as hashtag and so on, which can help us understand its topic. In this paper, we propose hashtag biased ranking for keyword extraction from a collection of microblog posts. We first build a word-post weighted graph by taking into account the posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides our approach to extract keywords according to the hashtag topic. Last, the final ranking of a word is determined by the stationary probability after a number of interations. We evaluate our proposed method on a real Chinese microblog posts. Experiments show that our method is more effective than the traditional word to word graph based ranking in terms of precision.
This research was undertaken as part of Project 15BGL048, 2015AA015403, 2015BAA072 and 61303029 and supported by Hubei Key Laboratory of Transportation Internet of Things.
Preview
Unable to display preview. Download preview PDF.
References
Bi, B., Tian, Y., Sismanis, Y., Balmin, A., Cho, J.: Scalable topic-specific influence analysis on microblogs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 513–522. ACM (2014)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. McGraw-Hill (1990)
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM (2009)
Haveliwala, T.H.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. on Knowl. and Data Eng. 15(4), 784–796 (2003)
Hu, X., Tang, J., Liu, H.: Leveraging knowledge across media for spammer detection in microblogging. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 547–556. ACM (2014)
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 216–223 (2003)
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)
Li, Z., Zhou, D., Juan, Y.-F., Han, J.: Keyword extraction for social snippets. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1143–1144. ACM (2010)
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 366–376. Association for Computational Linguistics, Stroudsburg (2010)
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 1, pp. 257–266. Association for Computational Linguistics (2009)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.), Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics 2004
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Qiang, R., Liang, F., Yang, J.: Exploiting ranking factorization machines for microblog retrieval. In: Proceedings of the 22nd ACM international conference on Conference on Information and knowledge management, CIKM 2013, pp. 1783–1788. ACM (2013)
Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 613–622. IEEE Computer Society (2006)
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)
Vosecky, J., Leung, K.W.-T., Ng, W.: Collaborative personalized twitter search with topic-language models. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 53–62. ACM (2014)
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 855–860. AAAI Press (2008)
Wang, J., Liu, J., Wang, C.: Keyword extraction based on pagerank. In: Li, H., Yang, Q., Zhou, Z.-H. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)
Wang, W., Xu, H., Yang, W., Huang, X.: Constrained-hLDA for topic discovery in Chinese Microblogs. In: Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y., Tseng, V.S. (eds.) PAKDD 2014, Part II. LNCS, vol. 8444, pp. 608–619. Springer, Heidelberg (2014)
Wang, X., Wang, L., Li, J., Li, S.: Exploring simultaneous keyword and key sentence extraction: Improve graph-based ranking using wikipedia. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2619–2622. ACM (2012)
Wu, W., Zhang, B., Ostendorf, M.: Automatic generation of personalized annotation tags for twitter users. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 689–692 (2010)
Zhang, W., Feng, W., Wang, J.: Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 2225–2231. AAAI Press (2013)
Zhao, W.X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.-P., Li, X.: Topical keyphrase extraction from twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol.1, pp. 379–388 (2011)
Zhiyuan, L., Xinxiong, C., Maosong, S.: Mining the interests of chinese microbloggers via keyword extraction. Foundations and Trends in Information Retrieval 6(1), 76–87 (2012)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schlkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, L., Su, C., Sun, Y., Xiong, S., Xu, G. (2015). Hashtag Biased Ranking for Keyword Extraction from Microblog Posts. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-25159-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)