skip to main content
10.1145/2811411.2811535acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Improving tweet clustering using bigrams formed from word associations

Authors Info & Claims
Published:09 October 2015Publication History

ABSTRACT

In this work we propose an innovative clustering algorithm for twitter data. In the the context of e-commerce, we use Apiori algorithm to form 2-gram association rules and cluster tweets using self organizing maps. Since tweets are relatively small, word association becomes all the more important in mining the information. To check if 2-grams formed using word associations, help in increasing clustering tendency we use Hopkins index. Tested on two separate datasets, of 200 and 10,000 tweets each related to the key word "Amazon", our results of the analysis show that there is improvement in the clustering tendency in both the datasets. This improvement in clustering tendency is potentially useful because customer grouping based on the tweets can help businesses determine new trends and identify customers with different sentiments.

References

  1. Agrawal, R., & Srikant, R. 1994, September. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487--499). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chakrabarti, S. 2003. Mining the Web: Discovering knowledge from hypertext data. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cheong, M., & Lee, V. 2010, August. A study on detecting patterns in twitter intra-topic user and message clustering. In Pattern Recognition (ICPR), 2010 20th International Conference on (pp. 3125--3128). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cluster, http://www.statmethods.net/advstats/cluster.htmlGoogle ScholarGoogle Scholar
  5. "Distances between Clustering, Hierarchical Clustering", accessed from http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdfGoogle ScholarGoogle Scholar
  6. "Evolution of e-commerce in India Creating the bricks behind the clicks", www.pwc.in accessed on Feb 3, 2015Google ScholarGoogle Scholar
  7. Han, J., Kamber, M., & Pei, J. (2006). Data mining, southeast asia edition: Concepts and techniques. Morgan kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. "Hands-On Data Science with R", Graham William, 2014 Hopkins, B., & Skellam, J. G. (1954). A new method for determining the type of distribution of plant individuals. Annals of Botany, 18(2), 213--227.Google ScholarGoogle Scholar
  9. Kohonen, T. 1995. Self-organizing maps. Springer-Verlag, Berlinpackage. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science. Google ScholarGoogle Scholar
  10. Liu, X., Li, K., Zhou, M., & Xiong, Z. 2011, July. Collective semantic role labeling for tweets with clustering. In IJCAI (Vol. 11, pp. 1832--1837). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rosa, K. D., Shah, R., Lin, B., Gershman, A., & Frederking, R. 2011. Topical clustering of tweets. Proceedings of the ACM SIGIR: SWSM.Google ScholarGoogle Scholar
  12. Sharma, A., & Dey, S. 2013. Using Self-Organizing Maps for Sentiment Analysis. arXiv preprint arXiv:1309.3946.Google ScholarGoogle Scholar
  13. Twitter, www.twitter.comGoogle ScholarGoogle Scholar
  14. Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Steinberg, D. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ultsch, A., & Mörchen, F. 2005. ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM.Google ScholarGoogle Scholar
  16. Vakeel, K., & Dey, S. 2014, October. Impact of News Articles on Stock Prices: An Analysis using Machine Learning. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference (I-CARE) on I-CARE 2014 (pp. 1--4). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving tweet clustering using bigrams formed from word associations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems
          October 2015
          540 pages
          ISBN:9781450337380
          DOI:10.1145/2811411

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 October 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          RACS '15 Paper Acceptance Rate75of309submissions,24%Overall Acceptance Rate393of1,581submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader