SentiHealth-Cancer: A sentiment analysis tool to help detecting mood of patients in online social networks

https://doi.org/10.1016/j.ijmedinf.2015.09.007Get rights and content

Highlights

  • Hashtags and emoticons is helpful to the Sentiment Analysis (SA) of patients.

  • The SA helps to identify the mood of authors when themselves are the target.

  • Proposed SentiHealth identifies the mood of the people in the disease context.

  • Proposed SentiHealth-Cancer helps to monitor the mood of people related to cancer.

Abstract

Background

Cancer is a critical disease that affects millions of people and families around the world. In 2012 about 14.1 million new cases of cancer occurred globally. Because of many reasons like the severity of some cases, the side effects of some treatments and death of other patients, cancer patients tend to be affected by serious emotional disorders, like depression, for instance. Thus, monitoring the mood of the patients is an important part of their treatment. Many cancer patients are users of online social networks and many of them take part in cancer virtual communities where they exchange messages commenting about their treatment or giving support to other patients in the community. Most of these communities are of public access and thus are useful sources of information about the mood of patients. Based on that, Sentiment Analysis methods can be useful to automatically detect positive or negative mood of cancer patients by analyzing their messages in these online communities.

Objective

The objective of this work is to present a Sentiment Analysis tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language. The SHC-pt is a sentiment analysis tool which is tailored specifically to detect positive, negative or neutral messages of patients in online communities of cancer patients. We conducted a comparative study of the proposed method with a set of general-purpose sentiment analysis tools adapted to this context.

Methods

Different collections of posts were obtained from two cancer communities in Facebook. Additionally, the posts were analyzed by sentiment analysis tools that support the Portuguese language (Semantria and SentiStrength) and by the tool SHC-pt, developed based on the method proposed in this paper called SentiHealth. Moreover, as a second alternative to analyze the texts in Portuguese, the collected texts were automatically translated into English, and submitted to sentiment analysis tools that do not support the Portuguese language (AlchemyAPI and Textalytics) and also to Semantria and SentiStrength, using the English option of these tools. Six experiments were conducted with some variations and different origins of the collected posts. The results were measured using the following metrics: precision, recall, F1-measure and accuracy

Results

The proposed tool SHC-pt reached the best averages for accuracy and F1-measure (harmonic mean between recall and precision) in the three sentiment classes addressed (positive, negative and neutral) in all experimental settings. Moreover, the worst accuracy value (58%) achieved by SHC-pt in any experiment is 11.53% better than the greatest accuracy (52%) presented by other addressed tools. Finally, the worst average F1 (48.46%) reached by SHC-pt in any experiment is 4.14% better than the greatest average F1 (46.53%) achieved by other addressed tools. Thus, even when we compare the SHC-pt results in complex scenario versus others in easier scenario the SHC-pt is better.

Conclusions

This paper presents two contributions. First, it proposes the method SentiHealth to detect the mood of cancer patients that are also users of communities of patients in online social networks. Second, it presents an instantiated tool from the method, called SentiHealth-Cancer (SHC-pt), dedicated to automatically analyze posts in communities of cancer patients, based on SentiHealth. This context-tailored tool outperformed other general-purpose sentiment analysis tools at least in the cancer context. This suggests that the SentiHealth method could be instantiated as other disease-based tools during future works, for instance SentiHealth-HIV, SentiHealth-Stroke and SentiHealth-Sclerosis.

Introduction

Sentiment Analysis (SA) is widely used to analyze opinions from people about a target, for example a product or a service. Existing SA techniques can be divided into three categories, depending on the level at which the analysis is made in the text [1], [2], [3], [4]: document level, sentence level and entity/aspect level.

At the document level, the opinion in a document is classified as positive, negative or neutral. In this type of analysis it is not possible to classify a document that covers more than one entity, because each document is interpreted as having a text referencing just a single entity [5].

Differently of that, the analysis in sentence level classifies an opinion into three classes: positive, negative or neutral, and each sentence of the document is analyzed separately [5].

Both analysis in the document level and sentence level use only the language constructs to classify an opinion. However, the analysis in the entity and aspect level considers that for every opinion there is a target. Therefore, seeks to identify the target of each existing opinion in the text. This allows to analyze more than one opinion in a same sentence [5]. For example, the phrase “Although a bad service, I still like that restaurant.” have more positive opinion than negative about the restaurant, but it has in fact two aspects evaluated: the service offered and the restaurant itself. These are the targets of the opinion.

Some works were done about SA, especially comparing the tools proposed. The work presented in Ref. [6] makes the comparison of nine SA tools: AchemyAPI, Lymbix, MLAnalyzer, Repustate, Semantria, Sentigem, Skytle, Textalytics and Textprocessing. To calculate the accuracy of each tool, texts of different sources were collected (news, comments and tweets). The tools with the greatest accuracy were Textalytics (75%), Skytle (73%) and Semantria (68%).

In other work, [7], SA tools were also compared. Twenty tools were chosen: fifteen stand-alone SA tools (SentiStrength, Chatterbox, Sentiment140, Textalytics, Intridea, AiApplied, ViralHeat, Lymbix, SentimentAnalyzer, TextProcessing, Semantria, uClassify, MLAnalyzer, Repustate and a last one referred to as Anonymous by the authors1) and five workbench tools (BPEF, Lightside, FRN, EWGA, RapidMiner). The texts used to analyze the tools were tweets related to the themes: telecommunications, pharmaceutical, security, technology and consumer products at retail. Among the stand-alone tools, the one with the greatest average accuracy (67%) was the SentiStrength. Among the workbench tools, BPEF presented best average accuray (71%).

Another study is Ref. [8]. This tested AlchemyAPI, OpenAmplify and Texterra. The last one was a new tool proposed by the article and presented accuracy of 79%, higher than the AlchemyAPI (42%) and OpenAmplify (57%).

In the study presented in Ref. [8], experiments using texts written in English and Russian were conducted. The texts in English are of general affairs, political and reviews of movies. The texts in Russian are comments about movies, books and cameras.

The tools AlchemyAPI, Semantria, SentiStrength and Textalytics are used in the experiments reported in this work because they presented good results in related work. Also, they provide access to the API (Application Programming Interface) Java [9] (which facilitates the integration in one application) and they allow to analyze arbitrary texts. The Texterra tool, which according to Ref. [8] is more accurate than others, was not considered in this article because it presented access failures during the tests.

Although there are several studies about SA, few scientific studies use SA to classify a person emotional state considering the person himself as the target of the analysis. However, it is possible to know if a person has more positive or negative thoughts by analyzing his texts [10]. For example, if most of texts are negatives in a window of time, this person probably is in a negative emotional state.

Unlike Refs. [6], [7], [8], this article considers texts written in Portuguese, extracted from posts appearing in Facebook communities of cancer patients. The article regards the authors of the texts as the targets of the analysis and uses SA solutions to classify the sentiment of the authors.

According to Ref. [11], SA techniques applied on posts in cancer online communities may be used not only to detect pessimistic emotional state but also to detect changes in a person mood as consequence of his interactions with other patients in a community. Who writes a text without the specific purpose of reporting their emotional state may end up revealing, unintentionally, if he is more positive or negative. This can be used to give emotional support to patients.

A chronic patient may face various difficulties such as physical pain, stress, extreme anxieties, anger, depression, and frustration [12]. These difficulties could cause suffering during the treatment and, even, take the patient to interrupt his treatment.

So, many patients seek support in social networks to obtain information, encouragement, motivation, feedback, emotional support, tangible support, and network support exchanged among peers [13], [14], [15], [16], [17], [18], [19] to have a better quality of life during their treatment.

Thus, the automatic analysis of patients' mood can be very useful for assistants, family and patients themselves. SA methods are good options for this analysis. However, most works of SA assess opinions on an a target which is different from the author of the emitted opinion [20], [21], [22], [23], [24]. In addition, few studies focus on analyzing sentiment of cancer patients and their families, who go through strong experiences too and usually, are influenced emotionally by the context surround the patient [11], [25]. Finally, there are few works proposing context-driven SA solutions aiming at improving accuracy of the classification result [26], [27], [28]. In most cases, the techniques are generalist and as such do not perform well on specific contexts.

Thus, some research questions can be listed:

  • What is the performance of the tools AlchemyAPI, Semantria, SentiStrength and Textalytics to analyze the authors emotional state of messages in online cancer communities?

  • Does the use of specific information of the cancer field and forms of communication on the Internet improve the accuracy of a lexical approach method of SA?

  • Does the origin of the analyzed groups' messages influence the accuracy of SA?

The objective of this work is to present a SA tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language.

Section snippets

Study context

Cancer, according to American Cancer Society, is the name given to a set of more than 100 diseases that have in common the uncontrolled growth of (malignant) cells that invade tissues and organs and can spread (metastasize) to other parts of the body [29].

The World Health Organization (WHO) estimates that in the year 2012 there were 14.1 million new cancer cases and 8.2 million deaths due to this disease in the world [30]. Specifically in Brazil, according to the National Cancer Institute José

Methods

This work is part of a project evaluated and approved by the Ethics Committee under the number 31191214.7.0000.5083/UFG. Moreover, all data collected from online social network were published by the user as a public text in a public group. A Java application was developed to connect with the social network’s API and collect posts from selected groups.

The online social network selected by us as source of texts was the Facebook because it contains user groups in the theme of this work and also

Results and output data of the study

In this section we present the results obtained by applying the variations of the tools on the different collections defined in Section 3.4.2. The experiments executed consider the language in which the messages are written, the amount of texts analyzed, the origin of messages and changes in the dictionaries used by the tools.

Answers to study questions

In the Introduction of this article we presented three main research questions that we aimed to answer. The first one is about the effectiveness of existing SA tools AlchemyAPI, Semantria, SentiStrength and Textalytics for performing SA on posts from Brazilian groups in Facebook related to cancer. Tests were executed with these tools that recognize only the English language and also using the English recognizing options of the above tools that work with both English and Portuguese languages. In

Conclusion

Existing tools for SA have a very low accuracy when used in web texts of the cancer context, written in Portuguese language. To solve this problem, we developed a new tool (SHC-pt) for SA at the sentence level using a lexicon and heuristics to analyze people’s texts involved with cancer. These texts were collected from posts from Facebook’s Brazilian cancer groups. Unlike other tools [6], [7], [8], the method proposed in this work considers the author of messages as the target of analysis.

Many

Author contributions

Celso Camilo-Junior coordinated the research of this study, advising the researchers to use more effective techniques in the applied context. Furthermore, Celso Camilo, along with Thierson Couto and Ramon Gouveia, conducted the analyses of the data collected to verify the accuracy of the method developed and helped to classify texts into positive, negative and neutral. Finally, he helped Ramon Gouveia in the creation and definition of the method and the sentimental strengths of the dictionaries

Competing interests

The authors declare no conflicts of interest.

Summary points

What was already known about this study:

  • Already exists tools that are capable of making sentiment analysis in texts from online social network.

  • The existing methods of sentiment analysis don't consider the author himself as the target.

  • The sentiment analysis can help people who are treating cancer and their families.

What this study has added to our knowledge:

  • The sentiment analysis, considering the author himself as the target of analysis,

Acknowledgements

Thanks for the CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Higher Education Personnel Training Coordination) for providing scholarship to this project.

References (61)

  • K.M. AlGhamdi et al.

    Internet use by the public to search for health-related information

    Int. J. Med. Inf.

    (2012)
  • H.S. Wentzer et al.

    Narratives of empowerment and compliance: studies of communication in online patient support groups

    Int. J. Med. Inf.

    (2013)
  • R.S. Valdez et al.

    Exploring patients health information communication practices with social network members as a foundation for consumer health IT design

    Int. J. Med. Inf.

    (2015)
  • S.E. Bedell et al.

    A systematic critique of diabetes on the world wide web for patients and their physicians

    Int. J. Med. Inf.

    (2004)
  • A. Dey et al.

    Perceptions and behavior of access of the Internet: a study of women attending a breast screening service in Sydney, Australia

    Int. J. Med. Inf.

    (2008)
  • J.F. Etter

    Internet-based smoking cessation programs

    Int. J. Med. Inf.

    (2006)
  • B.S. Shenker

    The accuracy of Internet search engines to predict diagnoses from symptoms can be assessed with a validated scoring system

    Int. J. Med. Inf.

    (2014)
  • T. Ramani et al.

    Survey, A techniques implemented on opinion mining

    Int. J. Comput. Sci. Eng. Technol.

    (2014)
  • R. Tejwani

    Sentiment Analysis: A Survey

    (2014)
  • S.K. Yadav

    Sentiment analysis and classification: a survey

    Int. J. Adv. Res. Comput. Sci. Manage. Stud.

    (2015)
  • B. Liu

    Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies

    (2012)
  • M. Cieliebak et al.

    Potential and limitations of commercial sentiment detection tools

  • A. Abbasi et al.

    Benchmarking twitter sentiment analysis tools

  • D.Y. Turdakov et al.

    Texterra: a framework for text analysis

    Program. Comput. Software

    (2014)
  • O. Corporation, Java, https://www.java.com/en/, 2014 (accessed November,...
  • A.D.I. Kramer et al.

    Experimental evidence of massive-scale emotional contagion through social networks

    Proc. Natl. Acad. Sci. U. S. A.

    (2014)
  • K. Portier et al.

    Understanding topics and sentiment in an online cancer survivor community

    J. Natl. Cancer Inst.—Monogr.

    (2013)
  • P.-C. Sian et al.

    A survey on quality of life and situational motivation among parents of children with autism spectrum disorder in malaysia

    Int. Conf. Sociality Humanities

    (2012)
  • C.H. Kroenke et al.

    Social networks, social support, and survival after breast cancer diagnosis

    J. Clin. Oncol.

    (2006)
  • J.S.M. Rodrigues et al.

    Structure and functionality of the social support network for adults with cancer

    Acta Paulista Enfermagem

    (2012)
  • Cited by (85)

    • The analysis of CIRSmedical.de using Natural Language Processing

      2022, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen
    View all citing articles on Scopus
    View full text