Keywords

1 Introduction

Conversational user interfaces (CUIs) are interactive user interfaces that allow users to express themselves conversationally, and are often powered by a combination of humans and machines at the back end [1]. Across a wide range of applications, from assisting with voice-command texting while driving to sending alerts when household consumables need to be ordered, CUIs have become a part of our everyday lives. In particular, bots within messaging platforms have witnessed rapid consumer proliferation. These platforms cater to a wide spectrum of human queries and messages, both domain-specific as well as general purpose [2].

Depending on the type of issue being discussed, human-to-human conversations in CUIs (i.e., conversations between human in the CUI backend and the human end-user) could involve varying amounts of emotional content. While some of these emotions could be expressed in the conversation, others are felt or experienced internally within the individual [12]. An expressed emotion need not always correspond to what the user is actually experiencing internally. For example, one can suppress the internal feelings and express some other emotion in order to be consistent with certain socio-cultural norms [23]. Since experienced emotions are felt internally, they may not be easily perceived by others.

Understanding relationship between expressed and experienced emotions could facilitate better communication between the end-user and human in the CUI backend [7]. Analyzing experienced emotions could also help in uncovering certain aspects of an individual that needs attention and care. For example, a feeling of extreme sadness within an individual could be expressed externally as anger [6]. Employing this type of emotion metric could enhance both the scope and usage of CUIs. In this paper, we propose such an emotion metric by developing a machine learning method to estimate probabilities of experienced emotions based on the expressed emotions of a user.

Problem Setting: We consider the scenario of textual conversations involving individuals needing emotional support. For convenience, we refer to individuals needing support as users. On the other end of the conversation platform are the human listeners (typically counselors). The human listener chats directly with the user using a text-only interface and our algorithm (i.e. the machine) analyzes the texts of the end-user. The machine provides a quantitative assessment of the experienced emotions in the user’s text. All assessments are specific to the user under consideration.

The machine first evaluates the conditional probability of experiencing an emotion \(emo_{n}\) internally given that an emotion \(emo_{m}\) is explicitly expressed. In the rest of this paper we represent this conditional probability as \(P_t(emo_{n} | emo_{m})\). For example, the probability of experiencing sadness internally given that anger has been expressed, is represented as \(P_t(sad|angry)\). From these conditional probabilities, the probabilities of various experienced emotions (\(P(emo_n)\)) are obtained. A detailed explanation of the procedure is described in Sect. 3.

2 Related Work

CUIs are used for a variety of applications. For example, IBM’s Watson technology has been used to create a teaching assistant for a course taught at Georgia Tech [13], Google chatbot, “Danielle”, can act like book characters [14], and so on. There are also emotion-based models for chatbots such as [25], wherein the authors propose to model the emotions of a conversational agent.

A summary of affect computing measures is provided in D’Mello et al. [16]. Mower et al. [27] propose an emotion classification paradigm based on emotion profiles. There have been efforts to make machines social and emotionally aware [23]. There are methods to understand sentiments in human-computer dialogues [18], in naturalistic user behavior [24] and even in handwriting [26]. However, we are not aware of any work that estimates the underlying, experienced emotions in text conversations.

Bayesian theory has been used to understand many kinds of relationships in domains such as computer vision, natural language processing, economics, medicine, etc. For example, Ba et al. [15] use Bayesian methods for head and pose estimation. Dang et al. [17] leverage Bayesian framework for metaphor identification. Bayesian inference has been used in recent years to develop algorithms for identifying e-mail spam [28]. More recently, Microsoft Research created a Bayesian network with the goal of accurately modeling the relative skill of players in head-to-head competitions [29]. Our work describes a new application of Bayesian theory, namely, to estimate experienced emotions in text conversations.

3 Method

Let the conditional probability of experiencing an emotion \(emo_{n}\) given that an emotion \(emo_{m}\) is expressed be denoted by \(P_t(emo_{n}|emo_{m})\). We evaluate \(P_t(emo_{n}|emo_{m})\) using a Bayesian framework. These are then normalized over the space of all expressed emotions \(emo_m\) to obtain the probabilities of various experienced emotions \(emo_n\).

First, an emotion recognition algorithm is run on the end-user’s texts to determine the probabilities of various expressed emotions. These probabilities serve as priors in the Bayesian framework. Next, we leverage large datasets containing emotional content across many people (such as blogs, etc.) to measure the similarities between words corresponding to a pair of emotions. This information is computed across several people and is reflective of the general relatedness between two emotion-indicating words (for example, between the words “sad” and “angry”). This measure is then normalized (across all possible pairs of emotions considered) to constitute the likelihood probability in the Bayesian framework. The priors and likelihoods are then integrated to obtain \(P_t(emo_{n}|emo_{m})\). This conditional probability is specific to the end-user under consideration. This is then normalized over all possible choices of expressed emotions to obtain probabilities of experienced emotions for the end-user under consideration.

While a variety of other approaches could be used for this computation, our choice of the Bayesian framework is motivated by the following facts. First, Bayesian models have been successful in characterizing several aspects of human cognition such as inductive learning, causal inference, language processing, social cognition, reasoning and perception [30]. Second, Bayesian learning incorporates the notion of prior knowledge which is a crucial element in human learning. Finally, these models have been successful in learning from limited data, akin to human inference [31].

3.1 Estimation of Priors

During the course of the user’s conversation with a human listener, we perform text analysis at regular time instances to get probabilities of different emotions. These probabilities are determined based on the occurrences of words representative of emotions in user’s text. In our setting, we measure the probability of the following emotions—happy, sad, angry, scared, surprised, worried, and troubled. We arrived at these seven emotions by augmenting commonly observed emotions in counseling platforms with those that are widely accepted in psychological research [32]. These probabilities provide some “prior” information about the user’s emotions and hence serve as the priors in the Bayesian framework.

Let the prior probability of an emotion i, be denoted by \(P_p(emo_i)\). Thus, there are multiple emotion variables, with each of these variables taking a value in the range [0, 1] indicating their probabilities. We leverage word synsets to obtain a rich set of words related to each of the emotions that we want to recognize. Synsets are defined as a set of synonyms for a word. Let the set of synsets across all the emotion categories be referred to as the emotion vocabulary. The words in a user’s text are then matched for co-occurrence with the emotion vocabulary and are weighted (normalized) based on their frequency of occurrence to obtain probability of an emotion. We found this simple approach quite reliable for our data. This will give the probabilities for various expressed emotions.

3.2 Estimation of Likelihoods

We estimate similarities between words corresponding to a pair of emotions by training neural word embeddings on large datasets [9]. This similarity gives a measure of relatedness between two emotion-indicating words in a general sense. For example, if the word “sad” has higher similarity with word “anger” than with the word “worry”, then we assume that the relatedness between emotions “sad” and “anger” is higher than the relatedness between “sad” and “worry”. This may not necessarily be true with respect to every user, but is true in an average sense since the calculation is based on very large datasets of emotional content across several people. Since this measure is data-dependent, we have to choose appropriate datasets containing significant emotional content to get reliable estimates. We then normalize the similarity scores to obtain the likelihood probability. The details are as follows:

Specifically, we train a skip-gram model on a large corpus of news articles (over a million words), blogs and conversations that contain information pertaining to people’s emotions, behavior, reactions and opinions. As a result, the model can provide an estimate of relatedness \(r_{emo_i-emo_j}\) between two emotions (\(emo_i\) and \(emo_j\)) leveraging information across a wide set of people and contexts. This quantity is just capturing the relatedness between any two emotions in a general sense, and is not specific to a particular user. We compute likelihood probability of observing emotions \(emo_j\) given \(emo_i\), \(P_l(emo_j|emo_i)\) based on normalizing the similarities \(r_{emo_i-emo_j}\) over the space of all possible emotions under consideration. Thus,

$$\begin{aligned} P_l(emo_j|emo_i)=\frac{r_{emo_i-emo_j}}{\sum _{all-emo} r_{emo_i-emo_j}} \end{aligned}$$
(1)

The emotion pairs considered in Eq. (1) do not necessarily represent expressed or experienced emotions; the likelihood probability is just a measure of relatedness between a pair of emotions computed from large datasets.

3.3 Estimating Conditional Probabilities

We employ a Bayesian framework to integrate emotion priors with the likelihood probabilities. Let \(P_p(emo_n)\) be the prior probability of an emotion \(emo_n\) as obtained from an emotion analysis algorithm, and \(P_l(emo_m|emo_n)\), be the likelihood probability of \(emo_m\) given \(emo_n\), obtained by using appropriate training dataset. Then, the posterior probability of experiencing an \(emo_n\) given an expressed emotion \(emo_m\) is given by

$$\begin{aligned} P_t(emo_n|emo_m)= \frac{P_l(emo_m|emo_n)P_p(emo_n)}{\sum _{all-emo} P_l(emo_m|emo_n)P_p(emo_n)} \end{aligned}$$
(2)

The above quantity is specific to the user under consideration.

3.4 Estimating Probabilities of Experienced Emotions

The conditional probabilities computed from Eq. (2) are specific to a user. By normalizing these conditional probabilities across all possible choices of expressed emotions, we obtain the probabilities of various experienced emotions. Specifically,

$$\begin{aligned} P(emo_b)= \sum _{a} P_t(emo_b|emo_a) P_p(emo_a) \end{aligned}$$
(3)

where in \(emo_a\) is an expressed emotion and \(emo_b\) is an experienced emotion. The set of expressed and experienced emotions need not be mutually exclusive.

3.5 Dataset

We studied the performance of the algorithm on a dataset consisting of 16 anonymous user conversations with a human listener spanning a total of more than 20 h. Conversations between users and human listener dealt with a variety of topics such as relationship issues, emotional wellbeing, friendship problems, etc. On average, the conversation between a user and the human listener lasted approximately 30 min. Some of these conversations lasted more than an hour (the longest was 70 min) while some lasted only 10 min. We divided the conversations into segments corresponding to the time a user spoke uninterrupted by a human listener. For convenience we refer to each segment as a “transcript”. Transcripts numbered A.x are all contiguous parts of the same conversation A. There were over fifty transcripts in the dataset.

4 Results

We illustrate the performance of the proposed method on some user conversations. Users converse with a human listener, henceforth abbreviated as “HL”. All results are specific to the user part of the conversation only and apply to the specific time interval only. The identities of the users and the human listener were anonymized by the conversation platform. It is to be noted that an experienced emotion could become expressed at a later time, so the set of expressed and experienced emotions are not mutually exclusive. Also, the algorithm can compute probabilities of experienced emotions only for those emotions for which there is a prior.

4.1 Case Studies

Transcript 1.1 (0 \(^{th}\) –10 \(^{th}\) min)

  • user: Hi, can you please help me with anxiety.

  • hl: I’m sorry you’re feeling anxious. Can you tell me more about it?

  • user: I have no self confidence and have a girlfriend who I really like. I can’t cope thinking she is going to find someone better. I am drinking to kill the anxiety.

  • hl: It sounds like you’re feeling really anxious about your girlfriend staying with you. That sounds really difficult.

  • user: She is out with work tonight and a colleague who she dated for a bit is there. I don’t know how to cope.

  • hl: It sounds like you’re feeling really anxious that she is out with other people including her ex. And you not being there with her is making you feel worse. I’m sorry - that’s a really hard feeling.

  • user: Can you help?

  • hl: I can listen to you. And I really am sorry that you’re feeling so anxious. Maybe you can tell me more about your relationship and why you are feeling insecure.

  • user: I am an insecure person. I am a good looking guy, always get chatted up, but I have no confidence.

Tables 1 and 2 list the expressed and experienced emotions during the first 10 min of the conversation.

Table 1. Expressed emotions for transcript 1.1: 0\(^{th}\)–10\(^{th}\) min
Table 2. Experienced emotions for transcript 1.1: 0\(^{th}\)–10\(^{th}\) min

Transcript 1.2 (10 \(^{th}\) –20 \(^{th}\) min)

  • user: I Dont know why I am insecure with her, I just feel inadequate.

  • hl: You feel insecure and inadequate with her. Have you felt like this with other girlfriends?

  • user: Once before but not as bad. She is beautiful.

  • hl: It sounds like she is really special to you - it’s nice that you have a beautiful girlfriend.

  • user: She really is. But I Dont think the same the other way.

  • hl: I’m not sure I understand what you mean. You mentioned that you were also good looking.

  • user: I Dont know if she feels the same. Yes I am. Not by my own admission but by what people tell me.

  • hl: So you think she is beautiful but you’re not sure how she feels about you?

  • user: I Dont know, I think I might be over eager and care for her too much.

Tables 3 and 4 lists the results of the algorithm.

Table 3. Expressed emotions for transcript 1.2: 10\(^{th}\)–20\(^{th}\) min
Table 4. Experienced emotions for transcript 1.2: 10\(^{th}\)–20\(^{th}\) min

Results are listed in Tables 5 and 6. Similar analysis was carried out throughout the conversation. The following is the last transcript of this conversation.

Table 5. Expressed emotions for transcript 1.3: 20\(^{th}\)–30\(^{th}\) min
Table 6. Experienced emotions for transcript 1.3: 20\(^{th}\)–30\(^{th}\) min

Transcript 1.4 (40 \(^{th}\) –50 \(^{th}\) min)

  • user: I Dont have anyone I can confide in.

  • hl: That sounds lonely. I think many people feel like that which is why it’s nice that we can be there for each other online.

  • user: Very lonely. Which is why I’m afraid of losing her. I’ve told her everything about myself.

  • hl: it’s nice that you’ve found a confidant in her. and of course now you don’t want to loose that connection.

  • user: I made it a point to tell her everything, something which I haven’t done previous. Its part the reason why in terrified to lose her.

  • hl: yeah, it sounds like you feel really open but also very vulnerable because of everything you’ve shared. that’s hard.

  • user: I’m very vulnerable. Should I go to the doctor?

  • hl: I’m not sure. If you’re thinking about it, it might be a good idea. What kind of advice are you looking for from them?

  • user: I Dont know, maybe medication

  • hl: Ah, I see what you’re saying. Medication can help a lot with anxiety for sure. It sounds like you’re feeling really bad and anxious and really don’t want to feel like this anymore. I think it’s always good to find out if a doctor can help with something like that...

Tables 7 and 8 provide the assessment of expressed/experienced emotions.

Table 7. Expressed emotions for transcript 1.4: 40\(^{th}\)–40\(^{th}\) min
Table 8. Experienced emotions for transcript 1.4: 40\(^{th}\)–40\(^{th}\) min

We present another case study. For brevity, we omit the conversation excerpts of HL (machine analyzes only user texts) and show results for first part of conversation. Similar analysis was carried for the rest of the conversation.

Transcript 2.1 (0 \(^{th}\) –10 \(^{th}\) min)

  • user: okay, so I am 18 and my boyfriend is 17. He has BAD anger, it’s never been anything physical. but he always gets mad over the littlest things and he always acts like everything bothers him when I say something wrong... but when he does something like that I am supposed to take it as a joke. and then he gets mad and tries to blow it off when I say something as a joke like “yep.” “yeah.” “nope I am fine.” and acts short (Tables 9 and 10).

Table 9. Expressed emotions for transcript 2.1: 0\(^{th}\)–10\(^{th}\) min
Table 10. Experienced emotion for transcript 2.1: 0\(^{th}\)–10\(^{th}\) min

Transcript 2.2 (10 \(^{th}\) –20 \(^{th}\) min)

  • user: yeah i just need help getting through that it. yeah... and i’m worried with me going to college it’ll get worse. I guess... it’s just hard, not only that but my mom is freaking out on me and mad at me. all the time when i haven’t done ANYTHING and that is really stressing me out... i don’t know... i really don’t she makes me feel lie i am a failure because i don have a job or anything and it doesn’t help that she going through a change because she is 50... her and my little brother and stepfather constantly gang up on me. my brother is the worst. My boyfriend says i should leave since i am 18 but i have no where to go because i do not have a job nor any money (Tables 11 and 12).

Machine Observations

Table 11. Expressed emotions for transcript 2.2: 10\(^{th}\)–20\(^{th}\) min
Table 12. Experienced emotion for transcript 2.2: 10\(^{th}\)–20\(^{th}\) min

4.2 Analysis

Validation with Human Experts: In order to investigate the effectiveness of the algorithm, we asked human experts to state the top 3 emotions the user in any given transcript was experiencing. The human experts were chosen based on their knowledge and experience in the psychology of active listening. The experts were not restricted to use the same set of emotions as the machine could identify, instead they were free to mention anything they found appropriate. To compare with the machine’s performance, we mapped similar emotion-describing words into the same category. For example, “anxious” was mapped to “worried”. In 75% of the transcripts, the top emotion chosen by the evaluators matched with the top experienced emotion as computed by the machine. In the absence of ground truth (i.e., we did not have information from the user as to what they were experiencing), this accuracy is reasonable.

It is to be noted that with more information about the user (such as their conversation history), the machine will be able to uncover more hidden emotions. Also given that human evaluation itself was subjective, machine’s result can serve as an additional source of information. For example, for the user in Transcript 2, the machine result suggested that sadness was the highest experienced emotion. Interestingly, none of the human experts identified sadness in the top 3 experienced emotions. However, given the situation of the user, it may not be unreasonable to say that sadness is likely underneath all her other emotions.

Understanding the User: In this study, one of our goals was to understand the patterns of expressed and experienced emotions in users. Figure 1 is a plot of the highest expressed and experienced emotions at every time interval for the user in transcript 1. Throughout, the expressed emotion seems consistent with the experienced emotions. Also, there isn’t any statistically significant difference between the degree of expressed and experienced emotions. Figure 2 is a plot of the lowest expressed and experienced emotions. Except for one time interval ( the last time interval wherein the lowest expressed emotion is worried and the lowest experienced emotion is fear), the lowest expressed and experienced emotions are the same, with no statistically significant difference in their intensity.

Fig. 1.
figure 1

Highest expressed and experienced emotions for user in Transcript 1.

Fig. 2.
figure 2

Lowest expressed and experienced emotions for user in Transcript 1.

Thus, this user is mostly expressing what s/he is experiencing. As another case study, consider the user in transcript 2. Figure 3 summarizes the highest expressed and experienced emotions for this user. Figure 4 shows the plot for lowest expressed and experienced emotions for this user. As can be noticed from Figs. 3 and 4, this user is always expressing what she is experiencing.

Fig. 3.
figure 3

Highest expressed and experienced emotions for user in Transcript 2.

Fig. 4.
figure 4

Lowest expressed and experienced emotions for user in Transcript 2.

There is generally a gap between what people express and what they experience. The aforementioned case studies were illustrations wherein one user mostly expressed what was experienced and the other always expressed what was experienced. However, there could be cases where people mostly hide certain emotions or never exhibit them. Thus, such quantitative studies of expressed and experienced emotions, can be useful in constructing “emotion profiles” of users. Emotion profiles can be thought of as some characteristic patterns exhibited by users in expressing and experiencing emotions. Understanding such details can help both the users as well as counselors assisting them. For example, if someone is scared, but only shows anger, it would be helpful to gently show (this user) that his/her underlying emotion is fear so that s/he can address it better. Such insights would also help a counselor in recommending suitable solution strategies.

5 Conclusions

We presented an approach to understand relationship between expressed emotions and experienced emotions during the course of a conversation. Specifically, we evaluated the probability of a user experiencing an emotion based on the knowledge of their expressed emotions. We discussed how the relationship between the expressed and experienced emotions can be leveraged in understanding a user. Such an emotion analytic can be powerfully deployed in conversation platforms, that have machines or humans in the backend. We hope our findings will help in providing personalized solutions to end-users of a CUI by means of augmenting CUIs with an algorithmically guided emotional sense, which would help in having more effective conversations with end-users.