research-article

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

Authors:
Shuhei Yamamoto

Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan

Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan
View Profile

,
Tetsuji Satoh

Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan

Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan
View Profile

IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesDecember 2013Pages 16–25https://doi.org/10.1145/2539150.2539197

Published:02 December 2013Publication History

IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

Pages 16–25

ABSTRACT

Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.

References

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the 2010 World Wide Web, pages 450--453, 2010.Google Scholar
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46, 1960.Google ScholarCross Ref
N. A. Diakopoulous and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. Proceedings of CHI 2010, pages 1195--1198, 2010. Google ScholarDigital Library
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235, 2004.Google ScholarCross Ref
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In proceedings of the First Workshop on Social Media Analytics, pages 80--88, 2010. Google ScholarDigital Library
K. Inui, S. Abe, H. Morita, M. Eguchi, A. Sumida, C. Sao, K. Hara, K. Murakami, and S. Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pages 314--321, 2008. Google ScholarDigital Library
W. Karen. Celebrating #twitter7. https://blog.twitter.com/2013/celebrating_twitter7, March 2013.Google Scholar
T. Kurashima, T. Tezuka, and K. Tanaka. Extracting and geographically mapping visitor experiences from urban blogs. The 6th International Conference on Web Information Systems Engineering, pages 496--503, 2005. Google ScholarDigital Library
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 International Conference on Management of Data, pages 1155--1158, 2010. Google ScholarDigital Library
MeCab. Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net/, 2005.Google Scholar
M. Michelson and S. A. MacsKassy. Discovering users' topics of interest on twitter: a first look. In proceedings of the fourth workshop on Analytics for noisy unstructured text data, pages 73--80, 2010. Google ScholarDigital Library
D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Proceedings of the 4th Int'l AAAI Conference on Weblogs and Social Media, pages 130--137, 2010.Google Scholar
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of 18th International World Wide Web Conference, pages 851--860, 2010. Google ScholarDigital Library
M. Yamamoto, H. Ogasawara, I. Suzuki, and M. Furukawa. Tourism informatics:9. information propagation network for 2012 tohoku earthquake and tsunami on twitter. IPSJ Magazine, 53(11):1184--1191, 2012 (in Japanese).Google Scholar
S. Yamamoto and T. Satoh. Real life information extraction method from twitter. The 4th Forum on Data Engineering and Information Management, 2012 (in Japanese).Google Scholar
S. Yamamoto and T. Satoh. Two phase extraction method for extracting real life tweets using lda. The 15th Asia-Pacific Web Conference, pages 340--347, 2013.Google ScholarCross Ref
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In proceedings of the 14th International Conference on Machine Learning, pages 412--420, 1997. Google ScholarDigital Library
X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E. P. LIM, and X. Li. Topical key phrase extraction from twitter. The 49th Annual Meeting of the Association for Computational Linguistics, pages 379--388, 2011. Google ScholarDigital Library

Index Terms

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Extraction of commentary tweets about news articles
iiWAS '17: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services

On Twitter, vast numbers of tweets have been written about news articles. These tweets include not only opinions and sentiments, but also comments related to the news articles. However, tweets that include comments about news article are believed by ...
Read More
Followee recommendation based on topic extraction and sentiment analysis from tweets
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services

Twitter has become a popular social media service, accumulating and distributing vast amounts of information for its numerous users. One feature of Twitter is that it enables a user to follow other users, who can obtain the information her/his followees ...
Read More
Topic and Opinion Classification Based Information Credibility Analysis on Twitter
SMC '13: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics

At the Great Eastern Japan Earthquake in 2011, a huge amount of information about the disaster were exchanged on Twitter. On the other hand, various false information and rumor were also spread on Twitter. Therefore, it is required that people easily ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services
December 2013
753 pages
ISBN:9781450321136
DOI:10.1145/2539150
Conference Chairs:
Edger Weippl,
Maria Indrawan-Santiago,
Matthias Steinbauer,
Gabriele Kotsis,
Ismail Khalil
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Latent Dirichlet Allocation
Multi-label
Real Life
Twitter
Two Phase Extraction
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 110
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extraction of commentary tweets about news articles

Followee recommendation based on topic extraction and sentiment analysis from tweets

Topic and Opinion Classification Based Information Credibility Analysis on Twitter