ABSTRACT
Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the 2010 World Wide Web, pages 450--453, 2010.Google Scholar
- J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46, 1960.Google ScholarCross Ref
- N. A. Diakopoulous and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. Proceedings of CHI 2010, pages 1195--1198, 2010. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235, 2004.Google ScholarCross Ref
- L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In proceedings of the First Workshop on Social Media Analytics, pages 80--88, 2010. Google ScholarDigital Library
- K. Inui, S. Abe, H. Morita, M. Eguchi, A. Sumida, C. Sao, K. Hara, K. Murakami, and S. Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pages 314--321, 2008. Google ScholarDigital Library
- W. Karen. Celebrating #twitter7. https://blog.twitter.com/2013/celebrating_twitter7, March 2013.Google Scholar
- T. Kurashima, T. Tezuka, and K. Tanaka. Extracting and geographically mapping visitor experiences from urban blogs. The 6th International Conference on Web Information Systems Engineering, pages 496--503, 2005. Google ScholarDigital Library
- M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 International Conference on Management of Data, pages 1155--1158, 2010. Google ScholarDigital Library
- MeCab. Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net/, 2005.Google Scholar
- M. Michelson and S. A. MacsKassy. Discovering users' topics of interest on twitter: a first look. In proceedings of the fourth workshop on Analytics for noisy unstructured text data, pages 73--80, 2010. Google ScholarDigital Library
- D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Proceedings of the 4th Int'l AAAI Conference on Weblogs and Social Media, pages 130--137, 2010.Google Scholar
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of 18th International World Wide Web Conference, pages 851--860, 2010. Google ScholarDigital Library
- M. Yamamoto, H. Ogasawara, I. Suzuki, and M. Furukawa. Tourism informatics:9. information propagation network for 2012 tohoku earthquake and tsunami on twitter. IPSJ Magazine, 53(11):1184--1191, 2012 (in Japanese).Google Scholar
- S. Yamamoto and T. Satoh. Real life information extraction method from twitter. The 4th Forum on Data Engineering and Information Management, 2012 (in Japanese).Google Scholar
- S. Yamamoto and T. Satoh. Two phase extraction method for extracting real life tweets using lda. The 15th Asia-Pacific Web Conference, pages 340--347, 2013.Google ScholarCross Ref
- Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In proceedings of the 14th International Conference on Machine Learning, pages 412--420, 1997. Google ScholarDigital Library
- X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E. P. LIM, and X. Li. Topical key phrase extraction from twitter. The 49th Annual Meeting of the Association for Computational Linguistics, pages 379--388, 2011. Google ScholarDigital Library
Index Terms
- Two Phase Extraction Method for Multi-label Classification of Real Life Tweets
Recommendations
Extraction of commentary tweets about news articles
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & ServicesOn Twitter, vast numbers of tweets have been written about news articles. These tweets include not only opinions and sentiments, but also comments related to the news articles. However, tweets that include comments about news article are believed by ...
Followee recommendation based on topic extraction and sentiment analysis from tweets
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesTwitter has become a popular social media service, accumulating and distributing vast amounts of information for its numerous users. One feature of Twitter is that it enables a user to follow other users, who can obtain the information her/his followees ...
Topic and Opinion Classification Based Information Credibility Analysis on Twitter
SMC '13: Proceedings of the 2013 IEEE International Conference on Systems, Man, and CyberneticsAt the Great Eastern Japan Earthquake in 2011, a huge amount of information about the disaster were exchanged on Twitter. On the other hand, various false information and rumor were also spread on Twitter. Therefore, it is required that people easily ...
Comments