Abstract
Twitter, currently the leading microblogging social network, has attracted a great body of research works. This paper proposes a data analysis framework to discover groups of similar twitter messages posted on a given event. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as aspects characterizing events according to user perception. To deal with the inherent sparseness of micro-messages, the proposed approach relies on a multiple-level strategy that allows clustering text data with a variable distribution. Clusters are then characterized through the most representative words appearing in their messages, and association rules are used to highlight correlations among these words. To measure the relevance of specific words for a given event, text data has been represented in the Vector Space Model using the TF-IDF weighting score. As a case study, two real Twitter datasets have been analysed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bender, M., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J., Schenkel, R., Weikum, G.: Exploiting social relations for query expansion and result ranking. In: IEEE 24th Int. Conf. on Data Engineering Workshop, pp. 501–506 (2008)
Cagliero, L., Fiori, A.: Generalized association rule mining from Twitter. Intelligent Data Analysis 17(4) (2013)
Cheong, M., Lee, V.: Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base. In: 2nd ACM Workshop on Social Web Search and Mining, pp. 1–8 (2009)
Lopes, A.A., Pinho, R., Paulovich, F.V., Minghim, R.: Visual text mining using association rules. Comput. Graph. 31(3), 316–326 (2007)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000, Dallas, TX (May 2000)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 53–65 (1987)
Antonelli, D., Baralis, E., Bruno, G., Cerquitelli, T., Chiusano, S., Mahoto, N.: Analysis of diabetic patients through their examination history. Expert Systems with Applications 40(11) (2013)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: An introduction to cluster analysis. Wiley (1990)
DBDMG (2013), http://dbdmg.polito.it/wordpress/research/analysis-of-twitter-data-using-a-multiple-level-clustering-strategy/
Rapid Miner Project, The Rapid Miner Project for Machine Learning (2013), http://rapid-i.com/ (last access on January 2013)
Li, X., Guo, L., Zhao, Y.: Tag-based social interest discovery. In: 17th Int. Conf. on World Wide Web, pp. 675–684 (2008)
Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for web object classification. In: 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 957–966 (2009)
Heymann, P., Ramage, D., Garcia-Molina, H.: Social tag prediction. In: 31st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 531–538 (2008)
Alvanaki, F., Michel, S., Ramamritham, K., Weikum, G.: See what’s enblogue - real-time emergent topic identification in social media. In: 15th Int. Conf. on Extending Database Technology, pp. 336–347 (2012)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: ACM Int. Conf. on Management of Data, pp. 1155–1158 (2010)
Li, X., Guo, L., Zhao, Y.E.: Tag-based social interest discovery. In: 17th Int. Conf. on World Wide Web, pp. 675–684 (2008)
Chen, Q., Shipper, T., Khan, L.: Tweets mining using wikipedia and impurity cluster measurement. In: Int. Conf. Intelligence and Security Informatics, pp. 141–143 (2010)
Kim, S., Jeon, S., Kim, J., Park, Y.-H.: Finding core topics: Topic extraction with clustering on tweet. In: IEEE Int. Conf. on Cloud and Green Computing, pp. 777–782 (2012)
Subramani, K., Velkov, A., Ntoutsi, I., Kroger, P.: Density-based community detection in social networks. In: IEEE Int. Conf. on Internet Multimedia Systems Architecture and Application, pp. 1–8 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baralis, E., Cerquitelli, T., Chiusano, S., Grimaudo, L., Xiao, X. (2013). Analysis of Twitter Data Using a Multiple-level Clustering Strategy. In: Cuzzocrea, A., Maabout, S. (eds) Model and Data Engineering. MEDI 2013. Lecture Notes in Computer Science, vol 8216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41366-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-41366-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41365-0
Online ISBN: 978-3-642-41366-7
eBook Packages: Computer ScienceComputer Science (R0)