Skip to main content
Log in

Opinion mining on large scale data using sentiment analysis and k-means clustering

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the rapid growth of web technology and easy access of internet, online shopping has been increased. Now people express their opinions and share their experiences that greatly influence new buyers for purchasing products, thereby generating large data sets. This large data is very helpful for analyzing customer preference, needs and its behavior toward a product. Companies face the challenge of analyzing this sheer amount of data to extract customer opinion. To address this challenge, in this paper, we performed sentiment analysis on the customer review real-world data at phrase level to find out customer preference by analyzing subjective expressions. Then we calculated the strength of sentiment word to find out the intensity of each expression and applied clustering for placing the words in various clusters based on their intensity. We also compared the results of our technique with star-ranking given on the same dataset and found the drastic change in our results. We also provide a visual representation of our results to provide a clear insight of customer preference and behavior to help decision makers for better decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://times.cs.uiuc.edu/~wang296/Data/.

  2. https://opal-convert-json-to-csv-to-json.en.softonic.com/.

  3. http://opennlp.sourceforge.net/models-1.5/.

  4. http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/.

  5. A subjective expression is any word or phrase used to express an opinion, emotion, evaluation, stance, and speculation.

References

  1. Smith, A., Anderson, M.: Online Shopping and E-Commerce. Pew Research Center, Washington, DC (2016)

    Google Scholar 

  2. Liu, B.: Sentiment analysis and subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Boca Raton (2010)

    Google Scholar 

  3. Asghar, M.Z., Ahmad, S., Qasim, M., Zahra, S.R., Kundi, F.M.: SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5, 1139 (2016)

    Article  Google Scholar 

  4. Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)

  5. Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012)

  6. Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs (2011). arXiv:1103.2903

  7. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)

    Article  Google Scholar 

  8. Bai, X.: Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732–742 (2011)

    Article  Google Scholar 

  9. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 15–21 (2013)

    Article  Google Scholar 

  10. Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57, 1485–1509 (2011)

    Article  Google Scholar 

  11. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)

    Article  Google Scholar 

  12. Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000–6010 (2012)

    Article  Google Scholar 

  13. Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved Fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38, 8696–8702 (2011)

    Article  Google Scholar 

  14. Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Syst. 89, 14–46 (2015)

    Article  Google Scholar 

  15. Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)

    Article  Google Scholar 

  16. Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG!. ICWSM 11, 164 (2011)

    Google Scholar 

  17. Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. Semant. Web-ISWC 2012, 508–524 (2012)

    Google Scholar 

  18. Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36-44 (2010)

  19. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005)

  20. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)

    Article  Google Scholar 

  21. Lu, Y., Kong, X., Quan, X., Liu, W., Xu, Y.: Exploring the sentiment strength of user reviews. In: International Conference on Web-Age Information Management, pp. 471–482 (2010)

    Chapter  Google Scholar 

  22. Eirinaki, M., Pisal, S., Singh, J.: Feature-based opinion mining and ranking. J. Comput. Syst. Sci. 78, 1175–1184 (2012)

    Article  MathSciNet  Google Scholar 

  23. Deng, Z.-H., Luo, K.-H., Yu, H.-L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41, 3506–3513 (2014)

    Article  Google Scholar 

  24. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011)

  25. Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014)

    Article  Google Scholar 

  26. Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12, e0171649 (2017)

    Article  Google Scholar 

  27. Mostafa, M.M.: More than words: social networks’ text mining for consumer brand sentiments. Expert Syst. Appl. 40, 4241–4251 (2013)

    Article  Google Scholar 

  28. Asghar, M.Z., Khan, A., Ahmad, S., Khan, I.A., Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS oNE 10, e0140204 (2015)

    Article  Google Scholar 

  29. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013)

    Article  Google Scholar 

  30. Bell, D., Koulouri, T., Lauria, S., Macredie, R.D., Sutton, J.: Microblogging as a mechanism for human-robot interaction. Knowl. Syst. 69, 64–77 (2014)

    Article  Google Scholar 

  31. Popescu, O., Strapparava, C.: Time corpora: epochs, opinions and changes. Knowl. Syst. 69, 3–13 (2014)

    Article  Google Scholar 

  32. Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2, 22–36 (2011)

    Article  Google Scholar 

  33. Asghar, M.Z., Khan, A., Ahmad, A., Kundi, F.M.: Preprocessing in natural language processing. Emerg. Issues Nat. Appl. Sci. 10, 152–161 (2013)

    Google Scholar 

  34. Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries: ADL 98, pp. 12–18 (1998)

  35. Lee, D., Jeong, O.-R., Lee, S.: Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, pp. 230–235 (2008)

  36. https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html. Accessed 20 May 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Kamran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riaz, S., Fatima, M., Kamran, M. et al. Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Comput 22 (Suppl 3), 7149–7164 (2019). https://doi.org/10.1007/s10586-017-1077-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1077-z

Keywords

Navigation