Elsevier

Computers in Human Behavior

Volume 34, May 2014, Pages 187-193
Computers in Human Behavior

Letter repetitions in computer-mediated communication: A unique link between spoken and online language

https://doi.org/10.1016/j.chb.2014.01.047Get rights and content

Highlights

  • The study explores how letter repetitions in CMC (e.g. yessss) are used as cues.

  • Over 500,000 email messages from the Enron Corpus were analyzed.

  • It is shown that letter repetitions often, but not always, emulate spoken nonverbal cues.

  • A longitudinal analysis demonstrates the dynamic nature of the link to spoken cues.

Abstract

Computer-mediated communication (CMC) affords many CMC cues which augment the verbal content of the message: all uppercase letters, asterisks, emoticons, punctuation marks, chronemics (time-related messages) and letter repetitions, to name a few. Letter repetitions are unique CMC cues in that they appear to be a written emulation of a spoken paralinguistic cue – phoneme extension. In this study we explore letter repetitions as a CMC cue, with specific emphasis on elucidating the link between them and spoken nonverbal cues. The letter repetitions are studied in the Enron Corpus, a large ecologically valid collection (∼500,000) of e-mail messages sent by and to employees of the Enron Corporation. We conclude that letter repetitions in the corpus often, but not always, emulate spoken nonverbal cues. This conclusion is examined in a longitudinal analysis that demonstrates the dynamic nature of this cue, and suggests that the usage of letter repetitions is increasing over time, while the link to spoken language is diminishing.

Introduction

One of the tools used to convey important social and relational information in computer-mediated communication (CMC) are CMC cues.1 The information the cues convey cannot be extracted from the lexical or literal meaning of the words that comprise the message, and their creation and interpretation are context dependent and complex. These characteristics of CMC cues are reminiscent of the characteristics of nonverbal cues in traditional communication (Burgoon & Hoobler, 2002). These traditional cues have been defined as “those behaviors that could reasonably function as messages within a given speech community. More specifically, it includes those behaviors other than words themselves that form a socially shared coding system” (p. 244). In this paper, we use the term CMC cues as an analog to traditional nonverbal cues, and define CMC cues as those modifications of a CMC message that, within a socially shared coding system, modify the meaning of the message while preserving the words of the message and their sequence.

This paper focuses on elucidating the mechanism by which one category of CMC cues, letter repetitions, are used to enrich online language. We begin the introduction with a brief review of the controversy over the richness of online language, and show that although the emerging consensus is that CMC is capable of conveying social and relational information, our understanding of the mechanisms through which this capacity is achieved is inadequate. We then focus on elucidating some of these mechanisms in letter repetitions through an in depth analysis of a large corpus of CMC messages.

Over the past two decades, there has been a great deal of debate in the literature about the richness of text-based computer-mediated communication (CMC). Media richness theory labeled CMC as poor in relation to other media such as face-to-face or phone communication (Daft & Lengel, 1986), and the cues filtered out model emphasized the impoverishment of CMC given its reduced social context cues (Sproull & Kiesler, 1986). Later work tried to explore the impact media leanness has on the outcomes of group decision making (Baltes et al., 2002, Dennis and Kinney, 1998), on online collaboration (Kerr & Murthy, 2009), in very large groups (Lowry, Romano, Jenkins, & Guthrie, 2009), and more (e.g. Otondo et al., 2008, Sia et al., 2002). The results suggest that the early theories could not account for the mounting evidence that CMC is being used extensively and effectively in contexts requiring subtle interpersonal and socially-oriented communication. More contemporary frameworks such as social information processing (SIP) and social identity/deindividuation (SIDE) theory (Walther, 2011, Walther and Parks, 2002) explore the conditions under which CMC is as effective as traditional modes of communication, or even more effective. Both SIP and SIDE acknowledge that CMC does not transmit the same nonverbal cues that traditional spoken conversation does. Both also emphasize the importance of the cues which are transmitted in CMC. SIP puts special emphasis on chronemic cues and the importance of time in online communication (Walther, 2002). SIDE emphasizes paralanguage, which includes alternative usage of characters in the written message such as capitalization, spelling, and punctuation marks (e.g. Lea & Spears, 1992). We review the evidence for the existence of CMC cues, their prevalence, and their usage, as well as the relatively scant research on the mechanisms that enable CMC to convey these socio-emotional cues. Following the review, we focus on one category of cues, letter repetitions, and explore their link to spoken nonverbal cues. We demonstrate the strength of this link in a large corpus of email messages from the late 20th century. In our discussion of these findings we present evidence that the usage of this CMC cue is dynamic, and that as its usage increases over time, the link to spoken language diminishes.

In this section we review the cues used in CMC, starting with those that received more extensive attention in past research, namely chronemic cues and emoticons, and continuing with those that have not been studied as extensively. We conclude with a proposed definition for all CMC cues.

One category of cues that has been extensively studied with respect to its role in social communication is chronemics. Chronemics refers to time-related messages and the ways in which the temporal aspects of messaging influence communication. The pioneering experimental study of chronemic nonverbal cues in e-mail by Walther and Tidwell (1995) showed that response latency, as well as the time of day a message is sent, can influence one’s perception of the communicator. They also demonstrated that these chronemic cues are context sensitive and can interact with message valence. Later studies of CMC chronemics further demonstrated how chronemic cues can influence the ways in which communicators perceive and make attributions about the social and interpersonal characteristics of those with whom they are communicating (Döring and Pöschl, 2008, Kalman and Rafaeli, 2011, Sheldon et al., 2006).

Another category of cues that has received extensive attention is emoticons. Emoticons are graphical icons that express emotion, through the representation of a human face. They have been shown, under some conditions, to impact message interpretation (e.g. Derks et al., 2007, Walther and D’Addario, 2001). Not unlike nonverbal cues in traditional communication, emoticons are employed in a highly context sensitive manner (Huffaker and Calvert, 2005, Wolf, 2000).

While chronemic cues and emoticons are the two most extensively investigated cues in the literature, there exist a large number of other CMC cues. One of the earliest experimental manipulations of these cues is described in a paper by Lea and Spears (1992). They describe two studies which explore the role of what they labeled as paralinguistic cues in CMC. In the first study, the messages either included or did not include (1) a spelling error in two words in the message; (2) two mistyped words in the message in which the sequence of a pair of letters was reversed; and (3) exclamation marks that were added to the end of one sentence and ellipses at the end of another. The results showed that minor changes in the paralinguistic content of the messages had a significant influence on the impression subjects formed of the anonymous authors of the messages. In the second study, the investigators collected transcripts of online discussions that took place between partners who were either individuated or de-individuated, and who were placed under high or low group salience conditions. The transcripts were analyzed for a series of paralinguistic cues (ellipses, inverted commas, question marks and exclamation marks, as well as sequences of symbols). The results showed significant correlation between paralanguage use and perceived personal attributes. For example, in a high group salience condition there was a strong positive correlation between the use of these paralinguistic cues and measures such as warmth, dominance, liking and responsibility. In the low group salience condition the correlation was either weakened or reversed. These studies lend support to the notion that paralanguage can be a conduit of social information in CMC. In a later study, Postmes and colleagues (Postmes, Spears, & Lea, 2000), looked at the distribution of the same cues, as well as additional cues, in online groups that formed among students taking an academic course. The other cues included nonconventional spelling, deliberately distorted spelling, use of foreign language, capital letter “shouting”, message length and chronemic aspects of the communication such as time of day and communication frequency. They show the gradual formation of diverse CMC styles in the different groups, styles which are defined by some of the CMC cues, but not by other cues. This is further evidence for the social meaning of CMC cues. Additional evidence for the role of CMC cues other than chronemics and emoticons in social communication comes from a study of short-message system (SMS) messages posted to a public interactive TV website (Herring & Zelenkauskaite, 2009). An analysis of the properties of 160-character SMS messages posted to the website showed that every message had 8–9 nonstandard typographic features, and that a gender difference exists in relation to the usage of this nonstandard typography: women used more repeated punctuation and more insertions in their messages. The authors conclude that “the resources of written language are employed variably to communicate social meanings that are traditionally conveyed through speech” (p. 27).

While these latter studies begin to expand the notion of CMC cues beyond that of chronemic cues and emoticons, there still exist a large number of relatively unexplored cues in text-based CMC. In the next paragraph we describe some of the key studies that attempted to identify and classify text-based CMC cues.

One of the earliest studies of the wide range of CMC cues is Carey’s (1980) work on paralanguage in CMC. Carey identified five categories of cues which he designated as vocal spelling (e.g. “biznis” and “weeeeel”); lexical surrogates and vocal surrogates (e.g. “I like the idea, but then again, it was mine (she said blushingly)” and “hmmm”, respectively); spatial arrays which include letters arranged to make a picture, as well as tools such as extra spaces between words to indicate pause or set off a word or a phrase; manipulation of grammatical markers (e.g. multiple exclamation marks or words written in capital letters); and, minus features which is the absence of certain features in the text. This last cue lends a tone to the message such as in the case where no special attention has been given to correcting spelling errors. Another brief exploration of the strategies used to enhance and enrich the written word is by Spitzer (1986) who described a host of typographical devices or “gimmicks”, such as usage of capital letters, asterisks, blank spaces, or character repetitions, as well as combinations of these devices. He describes how these are used for emphasis, to show anger, express humor, etc. The next extensive exploration into cues in CMC was by Blackman (1990). This work identified 22 types of nonverbal surrogates. These were divided into seven categories: Kinesic surrogates (kinesic descriptions such as <grin>, kinesic pictographs such as:-), and self pointing such as this arrow pointing at the source’s name <===); vocalic surrogates (multiple punctuation marks, all-caps, asterisk bracketing, extended letter repetition, spaces between letters, run-together words, ellipsis, blank spaces in line, vocal characterizations such as (cough), vocal segregates such as er that are used to fill pauses, and interjections such as oops); haptic surrogates (touch descriptions such as KISS and haptic pictographs such as xoxoxo for kisses and hugs); physical appearance surrogates (appearance descriptions and handle pictographs such as (spider/o/)); artifact surrogates (object displays which occur when a user mentions owning or using some object or substance); action surrogates (action descriptions and sound effects); and miscellaneous (conventional symbols such as $ or #). This study carefully analyzed the frequency of these cues in synchronous and asynchronous CompuServe forums. It reported a rate of about 180 nonverbal surrogates per thousand words in one of the synchronous messaging modes, about 50 per thousand in another synchronous mode, and about 20 per thousand in a third asynchronous mode. Finally, (Riordan & Kreuz, 2010) studied these cues in several contemporary online corpora and identified a frequency of 0.19–0.98% between the different corpora. An automated Linguistic Inquiry and Word Count (LIWC) analysis (Tausczik & Pennebaker, 2010) of one third of the cue laden words revealed that the two largest categories these words fall under are words of affect and words indicating cognitive mechanisms.

Unlike chronemics and emoticons, which have been defined and are studied carefully in various media and contexts, the dozens of other cue categories in the aforementioned studies (Blackman, 1990, Carey, 1980, Riordan and Kreuz, 2010, Spitzer, 1986) have received far less attention. This lack of attention is not surprising, given the resource demanding methodologies required for these studies: careful reading of messages, manual classification of a large number of cues (e.g. Blackman, 1990), and individual interpretation of the meaning of these cues (e.g. Crystal, 2001). Given that these cues are often subtle, highly variable, and that their relative frequency is often low, these studies rarely measured the distributions and identified regularities in the data that could elucidate the possible mechanisms that allow these cues to convey the socio-emotional information.

In this study, we close this gap using methods that enhance manual coding through the power of automated search. This allows us to focus on one specific cue, and explore its usage in an extensive dataset of e-mails. We explore the usage of letter repetitions. This cue has been described in several of the previous descriptive studies reviewed above. It has also been included, aggregated with other cues, in several SIDE-oriented experimental studies which proved the ability of such cues to convey social and relational information. Nevertheless, none of these descriptive or experimental studies attempted to elucidate the principles by which this cue operates in CMC. In this study we aim to collect a sufficiently large and diverse sample of letter repetitions and use it to ask specific research questions about general principles by which these repetitions act as cues in CMC.

In this study we examine the role of letter repetitions in e-mail messages. Like many other CMC cues besides chronemics and emoticons, letter repetitions have not been systematically studied, and their usage is not well understood. Given the uniqueness of letter repetitions and their potential link to spoken nonverbal cues, the goal of the study is to understand the usage of letter repetitions in an ecologically valid, large-scale sample, and to elucidate possible mechanisms for the way CMC cues operate.

The study was conducted in the context of a single dataset (the Enron Corpus, see below) which was extensive and comprehensive, and it employed a search tool that was specifically developed to understand letter repetitions in this corpus. The research question was:

This general question was split into several subsidiary questions. The first of these is primarily concerned with understanding the link between letter repetitions and spoken communication. Letter repetitions could be interpreted as emulating an extension (repetition) of the phoneme encoded by the repeated letter. If this is so, then it should be possible to vocally articulate the extended phoneme. By answering the question whether the letter repetitions are articulable or not, we gain insight into the question whether the letter repetition might be used to convey the equivalent of the spoken paralinguistic cue of extending a phoneme. For more details on how this classification was made, see the Method section.

The second subsidiary question was the result of an anecdotal examination of a small sample of letter repetitions from a collection of emails. The examination revealed that many of the repetitions appeared in onomatopoeic words (e.g. boooo or hmmm). This led to two questions, one specific to onomatopoeic words, and another, more general, about parts of speech: To what extent are the words that contain letter repetitions onomatopoeic? And, which parts of speech do the words that include the letter repetitions belong to? Answering these questions could provide some evidence as to the function that repetitions play, and provide some insight into repetitions’ relation to cues in spoken language.

Further, we asked whether letter repetitions are a cue used by a small subset of the population of those represented in our corpus (described in detail below), or whether it was more widely used. Since CMC cues are meaningful within a specific social context, if a cue is used by a small subset of the group, it would be important to try and identify this group, as well as to limit our conclusions to that group. Thus, we examined the amount of letter repetitions in the email messages composed by different users in our corpus.

Section snippets

Method

The corpus that was used in this study is the Enron Corpus. This corpus is based on the email archives of Enron Inc., which were confiscated and published online as a part of the U.S. Federal Energy Regulatory Commission’s investigation of the company (Berman, 2003). The original dataset was then processed to accommodate the needs of academic researchers and resulted in a corpus of approximately 500,000 e-mail messages in .txt format (Cohen, 2005). This corpus is one of the few publicly

Frequency and usage of letter repetitions in the Enron Corpus

The full concordance included 815 independent entries, representing a total of 2926 occurrences of letter repetitions. Thus, every independent entry had on average 3.6 dependent entries in the dataset (range: 1–54). These entries were collapsed by root word, resulting in a list of 201 root words. The 16 root words which appeared in the dataset ten or more independent times are listed in Table 1.

Of the 815 entries, the vast majority (767) were classified as articulable. For example:

Discussion

This study explores the frequency and usage of letter repetitions in e-mail communication. We conclude that the findings suggest that letter repetitions often, but not always, emulate spoken nonverbal cues as evidenced by the fact that over 94% of the repetitions classified were found to be articulable, and by the disproportional representation of onomatopoeic words in the list of words with letter repetitions. We provide several examples from the corpus that demonstrate this linkage between

Conclusion

Our findings on the presence and usage of letter repetitions in CMC reveal a link between this textual cue and paralinguistic cues used in spoken conversation. We also present evidence that when emotionally-laden interjections are used, repetitions are more likely to be employed. These findings are in line with the suggestion that letter repetitions in CMC serve as CMC cues that extend the lexical meaning of the words, add character and richness to the sentences, and allow the fine-tuning and

Acknowledgements

We thank Alberto Gonzalez for his work on the development of CorpusCruizer, Amanda Yentz for her work on the Enron Corpus, and Joe Walther for helpful and inspiring discussions. This work was funded, in part, by National Science Foundation grant #0953943.

References (43)

  • Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect...
  • J.K. Burgoon et al.

    Nonverbal signals

  • Carey, J. (1980). Paralanguage in computer mediated communication. In Proceedings of the 18th annual meeting on...
  • J. Clark et al.

    An introduction to phonetics and phonology

    (2006)
  • Cohen, W. W. (2005). Enron email dataset....
  • D. Crystal

    Language and the internet

    (2001)
  • R.L. Daft et al.

    Organizational information requirements, media richness and structural design

    Management Science

    (1986)
  • A.R. Dennis et al.

    Testing media richness theory in the new media: The effects of cues, feedback, and task equivocality

    Information Systems Research

    (1998)
  • Döring, N., & Pöschl, S. (2008). Nonverbal cues in mobile phone text messages: The effects of chronemics and proxemics....
  • S.C. Herring et al.

    Symbolic capital in a virtual heterosexual market: Abbreviation and insertion in Italian iTV SMS

    Written Communication

    (2009)
  • D.A. Huffaker et al.

    Gender, identity, and language use in teenage blogs

    Journal of Computer-Mediated Communication

    (2005)
  • Cited by (67)

    • Textual variations affect human judgements of sentiment values

      2022, Electronic Commerce Research and Applications
      Citation Excerpt :

      In contrast, negative comments have a lower (more negative) sentiment value when they contain more exclamation marks. In the studies by (Kalman and Gergle, 2014, 2009) on the repetition of punctuation, they observed that the repetition of exclamation marks has the most extensive use among other observed punctuation in their dataset. They provided examples that show exclamation marks being used multiple times in the same context (Kalman and Gergle, 2009).

    • Interactivity in online chat: Conversational cues and visual cues in the service recovery process

      2021, International Journal of Information Management
      Citation Excerpt :

      People usually associate visual cues to certain personality traits (Heide et al., 2012; Iacobelli, Gill, Nowson, & Oberlander, 2011; Schwartz et al., 2013) regardless of whether the communicator effectively possesses those characteristics. Two of the most significant visual cues in CMC are emoticons (Kalman & Gergle, 2014) and photos (Hancock & Toma, 2009). Emoticons are significant visual cues that are considered to reflect a sender’s emotions in CMC.

    View all citing articles on Scopus
    View full text