Sentiment Analysis (SA) is widely used to analyze opinions from people about a target, for example a product or a service. Existing SA techniques can be divided into three categories, depending on the level at which the analysis is made in the text [1], [2], [3], [4]: document level, sentence level and entity/aspect level.
At the document level, the opinion in a document is classified as positive, negative or neutral. In this type of analysis it is not possible to classify a document that covers more than one entity, because each document is interpreted as having a text referencing just a single entity [5].
Differently of that, the analysis in sentence level classifies an opinion into three classes: positive, negative or neutral, and each sentence of the document is analyzed separately [5].
Both analysis in the document level and sentence level use only the language constructs to classify an opinion. However, the analysis in the entity and aspect level considers that for every opinion there is a target. Therefore, seeks to identify the target of each existing opinion in the text. This allows to analyze more than one opinion in a same sentence [5]. For example, the phrase “Although a bad service, I still like that restaurant.” have more positive opinion than negative about the restaurant, but it has in fact two aspects evaluated: the service offered and the restaurant itself. These are the targets of the opinion.
Some works were done about SA, especially comparing the tools proposed. The work presented in Ref. [6] makes the comparison of nine SA tools: AchemyAPI, Lymbix, MLAnalyzer, Repustate, Semantria, Sentigem, Skytle, Textalytics and Textprocessing. To calculate the accuracy of each tool, texts of different sources were collected (news, comments and tweets). The tools with the greatest accuracy were Textalytics (75%), Skytle (73%) and Semantria (68%).
In other work, [7], SA tools were also compared. Twenty tools were chosen: fifteen stand-alone SA tools (SentiStrength, Chatterbox, Sentiment140, Textalytics, Intridea, AiApplied, ViralHeat, Lymbix, SentimentAnalyzer, TextProcessing, Semantria, uClassify, MLAnalyzer, Repustate and a last one referred to as Anonymous by the authors1) and five workbench tools (BPEF, Lightside, FRN, EWGA, RapidMiner). The texts used to analyze the tools were tweets related to the themes: telecommunications, pharmaceutical, security, technology and consumer products at retail. Among the stand-alone tools, the one with the greatest average accuracy (67%) was the SentiStrength. Among the workbench tools, BPEF presented best average accuray (71%).
Another study is Ref. [8]. This tested AlchemyAPI, OpenAmplify and Texterra. The last one was a new tool proposed by the article and presented accuracy of 79%, higher than the AlchemyAPI (42%) and OpenAmplify (57%).
In the study presented in Ref. [8], experiments using texts written in English and Russian were conducted. The texts in English are of general affairs, political and reviews of movies. The texts in Russian are comments about movies, books and cameras.
The tools AlchemyAPI, Semantria, SentiStrength and Textalytics are used in the experiments reported in this work because they presented good results in related work. Also, they provide access to the API (Application Programming Interface) Java [9] (which facilitates the integration in one application) and they allow to analyze arbitrary texts. The Texterra tool, which according to Ref. [8] is more accurate than others, was not considered in this article because it presented access failures during the tests.
Although there are several studies about SA, few scientific studies use SA to classify a person emotional state considering the person himself as the target of the analysis. However, it is possible to know if a person has more positive or negative thoughts by analyzing his texts [10]. For example, if most of texts are negatives in a window of time, this person probably is in a negative emotional state.
Unlike Refs. [6], [7], [8], this article considers texts written in Portuguese, extracted from posts appearing in Facebook communities of cancer patients. The article regards the authors of the texts as the targets of the analysis and uses SA solutions to classify the sentiment of the authors.
According to Ref. [11], SA techniques applied on posts in cancer online communities may be used not only to detect pessimistic emotional state but also to detect changes in a person mood as consequence of his interactions with other patients in a community. Who writes a text without the specific purpose of reporting their emotional state may end up revealing, unintentionally, if he is more positive or negative. This can be used to give emotional support to patients.
A chronic patient may face various difficulties such as physical pain, stress, extreme anxieties, anger, depression, and frustration [12]. These difficulties could cause suffering during the treatment and, even, take the patient to interrupt his treatment.
So, many patients seek support in social networks to obtain information, encouragement, motivation, feedback, emotional support, tangible support, and network support exchanged among peers [13], [14], [15], [16], [17], [18], [19] to have a better quality of life during their treatment.
Thus, the automatic analysis of patients' mood can be very useful for assistants, family and patients themselves. SA methods are good options for this analysis. However, most works of SA assess opinions on an a target which is different from the author of the emitted opinion [20], [21], [22], [23], [24]. In addition, few studies focus on analyzing sentiment of cancer patients and their families, who go through strong experiences too and usually, are influenced emotionally by the context surround the patient [11], [25]. Finally, there are few works proposing context-driven SA solutions aiming at improving accuracy of the classification result [26], [27], [28]. In most cases, the techniques are generalist and as such do not perform well on specific contexts.
Thus, some research questions can be listed:
- •
What is the performance of the tools AlchemyAPI, Semantria, SentiStrength and Textalytics to analyze the authors emotional state of messages in online cancer communities?
- •
Does the use of specific information of the cancer field and forms of communication on the Internet improve the accuracy of a lexical approach method of SA?
- •
Does the origin of the analyzed groups' messages influence the accuracy of SA?
The objective of this work is to present a SA tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language.