Abstract
Twitter continues to gain popularity as a source of up-to-date news and information. As a result, numerous event detection techniques have been proposed to cope with the steadily increasing rate and volume of social media data streams. Although most of these works conduct some evaluation of the proposed technique, comparing their effectiveness is a challenging task. In this paper, we examine the challenges to reproducing evaluation results for event detection techniques. We apply several event detection techniques and vary four parameters, namely time window (15 vs. 30 vs. 60 mins), stopwords (include vs. exclude), retweets (include vs. exclude), and the number of terms that define an event (1...5 terms). Our experiments use real-world Twitter streaming data and show that varying these parameters alone significantly influences the outcomes of the event detection techniques, sometimes in unforeseen ways. We conclude that even minor variations in event detection techniques may lead to major difficulties in reproducing experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
https://dev.twitter.com (April 28, 2016).
References
Aiello, L.M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Göker, A., Kompatsiaris, I.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013)
Alvanaki, F., Michel, S., Ramamritham, K., Weikum, G.: See what’s enBlogue: real-time emergent topic identification in social media. In: Proceedings of International Conference on Extending Database Technology (EDBT), pp. 336–347 (2012)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter. In: Proceedings of International Conference on Weblogs and Social Media (ICWSM), pp. 438–441 (2011)
Beel, J., Breitinger, C., Langer, S., Lommatzsch, A., Gipp, B.: Towards reproducibility in recommender-systems research. User Model. User-Adap. Inter. 26(1), 69–101 (2016)
Beel, J., Langer, S.: A Comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 153–168. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24592-8_12
Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing Docear’s research paper recommender system. In: Proceedings of Joint Conference on Digital Libraries (JCDL), pp. 459–460 (2013)
Bethard, S., Jurafsky, D.: Who should i cite: learning literature search models from citation behavior. In: Proceedings of International Conference on Information and Knowledge Management (CIKM), pp. 609–618 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cordeiro, M.: Twitter event detection: combining wavelet analysis and topic inference summarization. In: Proceedings of Doctoral Symposium on Informatics Engineering (DSIE) (2012)
Farzindar, A., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)
Guille, A., Favre, C.: Mention-anomaly-based event detection and tracking in Twitter. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 375–382 (2014)
He, Q., Pei, J., Kifer, D., Mitra, P., Giles, L.: Context-aware citation recommendation. In: Proceedings of International Conference on World Wide Web (WWW), pp. 421–430 (2010)
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of International Conference on Information and Knowledge Management (CIKM), pp. 155–164 (2012)
Lu, Y., He, J., Shan, D., Yan, H.: Recommending citations with translation model. In: Proceedings of International Conference on Information and Knowledge Management (CIKM), pp. 2017–2020 (2011)
Madani, A., Boussaid, O., Zegour, D.E.: What’s happening: a survey of tweets event detection. In: Proceedings of International Conference on Communications, Computation, Networks and Technologies (INNOV), pp. 16–22 (2014)
Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 1155–1158 (2010)
McCallum, A.K.: MALLET: A Machine Learning for Language Toolkit (2002). http://mallet.cs.umass.edu
McCreadie, R., Soboroff, I., Lin, J., Macdonald, C., Ounis, I., McCullough, D.: On building a reusable Twitter corpus. In: Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR), pp. 1113–1114 (2012)
McMinn, A.J., Moshfeghi, Y., Jose, J.M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of International Conference on Information and Knowledge Management (CIKM), pp. 409–418 (2013)
Nurwidyantoro, A., Winarko, E.: Event detection in social media: a survey. In: Proceedings of International Conference on ICT for Smart Society (ICISS), pp. 1–5 (2013)
Parikh, R., Karlapalem, K.: ET: Events from Tweets. In: Proceedings of International Conference Companion on World Wide Web (WWW), pp. 613–620 (2013)
Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Conference on the North American Chapter of the Association for Computational Linguistics (HLT), pp. 181–189 (2010)
Petrović, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), pp. 338–346 (2012)
Weiler, A.: Design and evaluation of event detection techniques for social media data streams. Ph.D. thesis, University of Konstanz, Konstanz (2016)
Weiler, A., Grossniklaus, M., Scholl, M.H.: Event identification and tracking in social media streaming data. In: Proceedings of EDBT Workshop on Multimodal Social Data Management (MSDM), pp. 282–287 (2014)
Weiler, A., Grossniklaus, M., Scholl, M.H.: Evaluation measures for event detection techniques on Twitter data streams. In: Maneth, S. (ed.) BICOD 2015. LNCS, vol. 9147, pp. 108–119. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20424-6_11
Weiler, A., Grossniklaus, M., Scholl, M.H.: Run-time and task-based performance of event detection techniques for Twitter. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 35–49. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19069-3_3
Weiler, A., Scholl, M.H., Wanner, F., Rohrdantz, C.: Event identification for local areas using social media streaming data. In: Proceedings of SIGMOD Workshop on Databases and Social Networks (DBSocial), pp. 1–6 (2013)
Weng, J., Lee, B.S.: Event detection in Twitter. In: Proceedings of International Conference on Weblogs and Social Media (ICWSM), pp. 401–408 (2011)
Zarrinkalam, F., Kahani, M.: SemCiR - a citation recommendation system based on a novel semantic distance measure. Program: Electron. Libr. Inf. Syst. 47(1), 92–112 (2013)
Acknowledgement
The research presented in this paper is funded in part by the Deutsche Forschungsgemeinschaft (DFG), Grant No. GR 4497/4: “Adaptive and Scalable Event Detection Techniques for Twitter Data Streams” and by a fellowship within the FITweltweit programme of the German Academic Exchange Service (DAAD). We would also like to thank the students Christina Papavasileiou, Harry Schilling, and Wai-Lok Cheung for their contributions to the implementations of WATIS, EDCoW, and enBlogue.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Weiler, A., Beel, J., Gipp, B., Grossniklaus, M. (2016). Stability Evaluation of Event Detection Techniques for Twitter. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)