Abstract
We reproduce three classification approaches with diverse feature sets for the task of classifying the sentiment expressed in a given tweet as either positive, neutral, or negative. The reproduced approaches are also combined in an ensemble, averaging the individual classifiers’ confidence scores for the three classes and deciding sentiment polarity based on these averages. Our experimental evaluation on SemEval data shows our re-implementations to slightly outperform their respective originals. Moreover, in the SemEval Twitter sentiment detection tasks of 2013 and 2014, the ensemble of reproduced approaches would have been ranked in the top-5 among 50 participants. An error analysis shows that the ensemble classifier makes few severe misclassifications, such as identifying a positive sentiment in a negative tweet or vice versa. Instead, it tends to misclassify tweets as neutral that are not, which can be viewed as the safest option.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asker, L., Maclin, R.: Ensembles as a sequence of classifiers. In: Proc. of IJCAI, pp. 860–865 (1997)
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proc. of LREC (2010)
Balahur, A., Turchi, M.: Improving sentiment analysis in twitter using multilingual machine translated data. In: Proc. of RANLP 2013, pp. 49–55 (2013)
Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proc. of COLING 2010, pp. 36–44 (2010)
Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Proc. of DS 2010, pp. 1–15 (2010)
Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proc. of ICWSM 2011 (2011)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using word lengthening to detect sentiment in microblogs. In: Proc. of EMNLP 2011, pp. 562–570 (2011)
Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
de Albornoz, J.C., Plaza, L., Gervás, P., Díaz, A.: A joint model of feature mining and sentiment analysis for product review rating. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 55–66. Springer, Heidelberg (2011)
Demartini, G.: ARES: A Retrieval Engine Based on Sentiments. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 772–775. Springer, Heidelberg (2011)
Diakopoulos, N., Shamma, D.A.: Characterizing debate performance via aggregated twitter sentiment. In: Proc. of CHI 2010, pp. 1195–1198 (2010)
Ermakov, S., Ermakova, L.: Sentiment classification based on phonetic characteristics. In: Proc. of ECIR 2013, pp. 706–709 (2013)
Feldman, R.: Techniques and applications for sentiment analysis. CACM 56(4), 82–89 (2013)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. of ICML 1996, pp. 148–156 (1996)
Fung, G.P.C., Yu, J.X., Wang, H., Cheung, D.W., Liu, H.: A balanced ensemble approach to weighting classifiers for text classification. In: Proc. of ICDM 2006, pp. 869–873 (2006)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Project Report CS224N, Stanford University (2009)
Günther, T., Furrer, L.: GU-MLT-LT: Sentiment analysis of short messages using linguistic features and stochastic gradient descent. In: Proc. of SemEval 2013, pp. 328–332 (2013)
He, Y.: Latent sentiment model for weakly-supervised cross-lingual sentiment classification. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 214–225. Springer, Heidelberg (2011)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. of KDD 2004, pp. 168–177 (2004)
Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent twitter sentiment classification. In: Proc. of HLT 2011, pp. 151–160 (2011)
Karlgren, J., Sahlgren, M., Olsson, F., Espinoza, F., Hamfors, O.: Usefulness of sentiment analysis. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 426–435. Springer, Heidelberg (2012)
Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: The good the bad and the OMG! In: Proc. of ICWSM (2011)
Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In: Proc. of HLT 2010 Workshop CAAGET 2010, pp. 26–34 (2010)
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29(3), 436–465 (2013)
Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In: Proc. of SemEval 2013, pp. 321–327 (2013)
Moniz, A., de Jong, F.: Sentiment analysis and the impact of employee satisfaction on firm earnings. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 519–527. Springer, Heidelberg (2014)
Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in Twitter. In: Proc. of SemEval 2013, pp. 312–320 (2013)
Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: Proc. of ESWC 2011 Workshop MSM 2011, pp. 93–98 (2011)
Opitz, D.W., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proc. of HLT 2013, pp. 380–390 (2013)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proc. of EMNLP 2002, pp. 79–86 (2002)
Polikar, R.: Ensemble based systems in decision making. IEEE CASS Mag 6(3), 21–45 (2006)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Proisl, T., Greiner, P., Evert, S., Kabashi, B.: Klue: Simple and robust methods for polarity classification. In: Proc. of SemEval 2013, pp. 395–401 (2013)
Rokach, L.: Ensemble-based classifiers. Artificial Intelligence Review 33(1-2), 1–39 (2010)
Rokach, L., Schclar, A., Itach, E.: Ensemble methods for multi-label classification. Expert Systems with Applications 41(16), 7507–7523 (2014)
Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task 9: Sentiment analysis in twitter. In: Proc. of SemEval 2014, pp. 73–80 (2014)
Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proc. of ACL 2014, pp. 1555–1565 (2014)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proc. of ACL 2002, pp. 417–424 (2002)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proc. of EMNLP 2005, pp. 347–354 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hagen, M., Potthast, M., Büchner, M., Stein, B. (2015). Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_81
Download citation
DOI: https://doi.org/10.1007/978-3-319-16354-3_81
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)