ABSTRACT
In this work we propose an innovative clustering algorithm for twitter data. In the the context of e-commerce, we use Apiori algorithm to form 2-gram association rules and cluster tweets using self organizing maps. Since tweets are relatively small, word association becomes all the more important in mining the information. To check if 2-grams formed using word associations, help in increasing clustering tendency we use Hopkins index. Tested on two separate datasets, of 200 and 10,000 tweets each related to the key word "Amazon", our results of the analysis show that there is improvement in the clustering tendency in both the datasets. This improvement in clustering tendency is potentially useful because customer grouping based on the tweets can help businesses determine new trends and identify customers with different sentiments.
- Agrawal, R., & Srikant, R. 1994, September. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487--499). Google ScholarDigital Library
- Chakrabarti, S. 2003. Mining the Web: Discovering knowledge from hypertext data. Morgan Kaufmann. Google ScholarDigital Library
- Cheong, M., & Lee, V. 2010, August. A study on detecting patterns in twitter intra-topic user and message clustering. In Pattern Recognition (ICPR), 2010 20th International Conference on (pp. 3125--3128). IEEE. Google ScholarDigital Library
- Cluster, http://www.statmethods.net/advstats/cluster.htmlGoogle Scholar
- "Distances between Clustering, Hierarchical Clustering", accessed from http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdfGoogle Scholar
- "Evolution of e-commerce in India Creating the bricks behind the clicks", www.pwc.in accessed on Feb 3, 2015Google Scholar
- Han, J., Kamber, M., & Pei, J. (2006). Data mining, southeast asia edition: Concepts and techniques. Morgan kaufmann. Google ScholarDigital Library
- "Hands-On Data Science with R", Graham William, 2014 Hopkins, B., & Skellam, J. G. (1954). A new method for determining the type of distribution of plant individuals. Annals of Botany, 18(2), 213--227.Google Scholar
- Kohonen, T. 1995. Self-organizing maps. Springer-Verlag, Berlinpackage. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science. Google Scholar
- Liu, X., Li, K., Zhou, M., & Xiong, Z. 2011, July. Collective semantic role labeling for tweets with clustering. In IJCAI (Vol. 11, pp. 1832--1837). Google ScholarDigital Library
- Rosa, K. D., Shah, R., Lin, B., Gershman, A., & Frederking, R. 2011. Topical clustering of tweets. Proceedings of the ACM SIGIR: SWSM.Google Scholar
- Sharma, A., & Dey, S. 2013. Using Self-Organizing Maps for Sentiment Analysis. arXiv preprint arXiv:1309.3946.Google Scholar
- Twitter, www.twitter.comGoogle Scholar
- Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Steinberg, D. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1--37. Google ScholarDigital Library
- Ultsch, A., & Mörchen, F. 2005. ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM.Google Scholar
- Vakeel, K., & Dey, S. 2014, October. Impact of News Articles on Stock Prices: An Analysis using Machine Learning. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference (I-CARE) on I-CARE 2014 (pp. 1--4). ACM. Google ScholarDigital Library
Index Terms
Improving tweet clustering using bigrams formed from word associations
Recommendations
Re-mining item associations: Methodology and a case study in apparel retailing
Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not ...
Efficient Discovery of the Most Interesting Associations
Self-sufficient itemsets have been proposed as an effective approach to summarizing the key associations in data. However, their computation appears highly demanding, as assessing whether an itemset is self-sufficient requires consideration of all ...
Using a cosine-type measure to derive strong association mining rules
Association mining rule algorithms have two major drawbacks – the need to repeatedly scan the dataset and the generation of too many association rules. In this paper we present an algorithm that concentrates on addressing these drawbacks. We ...
Comments