skip to main content
10.1145/2539150.2539197acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

Authors Info & Claims
Published:02 December 2013Publication History

ABSTRACT

Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the 2010 World Wide Web, pages 450--453, 2010.Google ScholarGoogle Scholar
  3. J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  4. N. A. Diakopoulous and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. Proceedings of CHI 2010, pages 1195--1198, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In proceedings of the First Workshop on Social Media Analytics, pages 80--88, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Inui, S. Abe, H. Morita, M. Eguchi, A. Sumida, C. Sao, K. Hara, K. Murakami, and S. Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pages 314--321, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Karen. Celebrating #twitter7. https://blog.twitter.com/2013/celebrating_twitter7, March 2013.Google ScholarGoogle Scholar
  9. T. Kurashima, T. Tezuka, and K. Tanaka. Extracting and geographically mapping visitor experiences from urban blogs. The 6th International Conference on Web Information Systems Engineering, pages 496--503, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 International Conference on Management of Data, pages 1155--1158, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. MeCab. Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net/, 2005.Google ScholarGoogle Scholar
  12. M. Michelson and S. A. MacsKassy. Discovering users' topics of interest on twitter: a first look. In proceedings of the fourth workshop on Analytics for noisy unstructured text data, pages 73--80, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Proceedings of the 4th Int'l AAAI Conference on Weblogs and Social Media, pages 130--137, 2010.Google ScholarGoogle Scholar
  14. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of 18th International World Wide Web Conference, pages 851--860, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Yamamoto, H. Ogasawara, I. Suzuki, and M. Furukawa. Tourism informatics:9. information propagation network for 2012 tohoku earthquake and tsunami on twitter. IPSJ Magazine, 53(11):1184--1191, 2012 (in Japanese).Google ScholarGoogle Scholar
  16. S. Yamamoto and T. Satoh. Real life information extraction method from twitter. The 4th Forum on Data Engineering and Information Management, 2012 (in Japanese).Google ScholarGoogle Scholar
  17. S. Yamamoto and T. Satoh. Two phase extraction method for extracting real life tweets using lda. The 15th Asia-Pacific Web Conference, pages 340--347, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In proceedings of the 14th International Conference on Machine Learning, pages 412--420, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E. P. LIM, and X. Li. Topical key phrase extraction from twitter. The 49th Annual Meeting of the Association for Computational Linguistics, pages 379--388, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services
      December 2013
      753 pages
      ISBN:9781450321136
      DOI:10.1145/2539150

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 December 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader