Keywords

1 Introduction

Data visualization (further abbreviated as DV) is a rapidly developing visual means of communication, strongly influenced by the advent of new digital technologies [1]. The amount of accessible data is greater than ever [2] and the forms of representation are constantly being developed further [3]. Consequently, the desire to express a variety of meanings through DV is increasing, and so are the graphical opportunities to do it. Such ʻnew modes of production bring with them new affordancesʼ [4], which means that the conventions that connect visual expressions to culturally shared meanings are constantly under development. A specific graphical element, namely the line that is used to connect two entities (further referred to as connecting line), appears in many different visualization types (such as line charts, network diagrams, route maps etc.) and not just since the digital age.Footnote 1 However, the possibilities to signify specific meanings with such connecting lines have increased, because the application of transparency, interaction effects, animations etc., has become much easier in the digital era.

In the practice field of DV, uncertainty is a much-disputed topicFootnote 2. One reason for that is that many big and complex datasets available today include elements of uncertainty related to confidence, variability, trends etc. [2]. The consequence for DV designers (here used as a collective term for all persons included in the DV production process) is that they have to find ways to visualize this uncertainty. Especially when designing visualizations for lay audiences, depicting uncertainty still remains a challenge, so big that it sometimes is not visualized at all [10].

Modality, as investigated by linguists and semioticians in verbal and visual text, is a concept that to some extent overlaps with uncertainty, as it is discussed in the practice field of DV. However, modality has not been brought into that field so far.

In the following, I will present the different perspectives on modality, both from a functional grammar point of view [following 11] and from a multimodality point of view [following 4 and 12] and relate the concept of modality to the concept of uncertainty. This is done because the linguistic concept of modality is very elaborated, and it is a hypothesis underlying this study, that it is relevant and useful also in the investigation of DVs. Following this theoretical trajectory, a two-level analysis method of modality and uncertainty in DVs will be presented.

This method is applied in the second, analytical part of the article. A corpus analysis of 163 award-winning DVs that include connecting lines, is presented. The focus of analysis is whether and how modality and uncertainty are expressed through connecting lines. Summing up, this article aims for three goals: (1) to clarify the relation between the concepts of modality and uncertainty in the field of DV, (2) to present a two-level method of analysis of modality and uncertainty in this text type, (3) to reveal graphical variations and conventions concerning the expression of modality and uncertainty through connecting lines within a corpus of award-winning DVs.

Corpus-based studies on current digital DVs in general and particularly those focusing on single graphical elements are still rare. Possible reasons for that may be the low availability of ready-to-use corpora and that the methods for accurately and time-effectively analyse such material are still at beginning stages [13,14,15].

2 Theoretical Perspectives on Modality and Uncertainty

2.1 Modality in Verbal Language

As a discussion capturing the breadth of works around the concept of modality is well outside the scope of this article I shall here only briefly introduce the work of the linguist Michael Halliday, who extended the system of modality with several aspects [16] relevant in this context.

Halliday sees modality as ʻthe speaker’s judgement, or request of the judgement of the listener, on the status of what is being saidʼ [11]. Modality ʻconstrue[s] the region of uncertainty that lies between “yes” and “no”ʼ [11] and is therefore an ʻexpression of indeterminacyʼ [11]. His system of modality – as applied on the clause level – includes four variables: the ‘modality typeʼ, ʻvalueʼ, ʻpolarityʼ and ʻorientationʼ [11]. I will further go deeper into the first two of these variables. The modality type ʻmodalizationʼ [11] is most relevant in the context of DVs because it counts clauses that indicate some degree of a proposition’s probability or usuality. Indications of probability, verbally expressed, for example, with adverbs (modal disjuncts) like certainly, probably or possibly, play an important role in some DVs, where an element of uncertainty is aimed to be communicated. They express a high, median and low modality value, respectively, which are the three modal judgement options suggested by Halliday [11]. Demonstrated with an example of Halliday [11]: It certainly is expresses a higher probability of this proposition than It possibly is, but both lie in between It is and It isn’t.

2.2 Modality in Visual Material

As seen above, Halliday looked at the ways in which single words or word groups can express different degrees of probability. When it comes to modality in visual material, Gunther Kress and Theo van Leeuwen have borrowed the basic concepts from Halliday’s functional grammar [12, 16]. The different levels of modality (modality value) are defined on scales of modality markers, such as colour saturation [12]. What constitutes a modality marker and where exactly on the scale the highest or lowest modality value is determined, is dependent on the ʻcoding orientationʼ [12]. Coding orientation, as Kress and Van Leeuwen further explain, refers to what counts as real in different social practices. Four types are named: the ʻtechnologicalʼ, ʻsensoryʼ, ʻabstractʼ and ʻnaturalistic coding orientationʼ [12].

In contexts where the semiotic content is a ʻgeneral patternʼ or a ʻdeeper “essence” of what it depictsʼ [16] (as it often is in DVs), an abstract coding orientation will be applied. In such cases, semiotic reduction is crucial. This means a DV is valued as realistic if the most ʻreduced articulationʼ [16] possible is used. A photo, on the other hand, is, according to Van Leeuwen [16], judged realistic if the colours, the articulation of depth, light and shadow, detail and background etc. are natural. Thus, a naturalistic coding orientation is applied.

By introducing the concept of coding orientation to different types of visual material, the issue of the ʻconstrual and evaluation of the reliability of messagesʼ [4] is focused. This constitutes a different aspect of a statement than probability. Thus, expressions of probability (it will probably rain tomorrow) and reliability (you can believe me when I say that it will rain tomorrow) have to be considered separately. However, especially in the context of statements realized by DVs, expressions of probability and reliability may be combined (you can believe me when I say that it will probably rain tomorrow). Moreover, it should be noted that the exemplary visual analyses carried out by Kress and Van Leeuwen regard the visual representation mainly as a whole [17] and therefore evaluate whether it, in its entirety, represents the ʻgiven “proposition” (…) as true or notʼ [12]. In contrast to that, Halliday [11] looks at modality expressions on the clause level which means that the modality value of single sentences within a verbal text can vary.

2.3 Relating Modality Theory to the Analysis of Data Visualizations

As Halliday’s statement that modality ʻconstrue[s] the region of uncertainty that lies between “yes” and “no”’ [11] implies, modality and uncertainty are intertwined. However, uncertainty is not only a research object for linguists and semioticians, but also widely disputed within the practice field of DV. Uncertainty is in that context related to different stages of the DV communication process. As a basis for the production of a DV, the designer has collected data about an aspect of the world that is either certain or uncertain. If the data is uncertain, this is what Dasgupta et al. [18] call data uncertainty. Data uncertainty may be caused by several reasons, like measurement imprecision, incompleteness of data (including missing values, sampling, aggregation), inference (including predictions, modeling and describing past events), disagreement and data incredibility [19].

During the design phase, the designer must decide what level of certainty that is most expedient to signal. The designer can decide to signal a high or low level of probability and reliability – or not to signal modality at all. After that decision, visual techniques for intendedly signalling a certain level of probability and reliability are chosen and applied by the designer. The results can be seen as visual expressions of modality. In most cases, what Dasgupta et al. [18] call visual uncertainty correlates with an intention to express a lowered level of probability (based on data uncertainty) or reliability. But it can also be a result of an unintended or unconscious application of visual forms that by convention or by earlier experience are associated with uncertainty by readers.

Summing up, uncertainty is a wider concept than modality, because it includes all factors causing uncertainty on the side of the reader, whether or not intended by the producer. In the present study, I am only interested in the visual expressions of modality that relate to lowered probability or reliability.

This comparative discussion of uncertainty (as discussed in the practice field of DV) and modality (as discussed by linguists and semioticians) allows for applying a more nuanced vocabulary when talking about uncertainty in DVs. It also allows for developing a detailed analysis method of modality in DVs, as presented below. The method is designed to answer the following research questions:

  • How is lowered probability and reliability expressed by connecting lines in a corpus of award-winning, digital DVs?

  • Does the corpus indicate any clear conventions concerning this issue?

3 A Two-Level Analysis Approach to Modality in Data Visualizations

3.1 Visual Segmentation

I will in the following propose a two-level approach to the investigation of modality in DVs. The two levels, further called detail and global level, refer to what parts of the DV that are in focus. How this visual segmentation is done, is inspired by Morten Boeriis’ [17] dynamic functional rank scale. In Tekstzoom [17] he claims that a visual text can have several modality profiles on different text levels, and differentiates between four different text levels. For analysing modality in DVs, I propose that distinguishing between two zoom levels is sufficient.

At the detail level, only single graphical elements, like single lines or points, and the associated words, are considered (see right part of Fig. 1). This unit is comparable to a verbal sentence, as a part of a whole text. Here, we are interested in how these graphical elements – together with associated words – signal a certain level of probability and reliability, related to the detail statement they represent.

At the global level, the whole visualization (which may be integrated into a larger multimodal text including more verbal text or other visualizations) is focused (see left, the black part of Fig. 1). The pertinent question on this level is whether and how the choice of visual style signals that the visualization is a true reflection of an aspect of the world or not. The issue of coding orientation is here central, considering e.g. the effect that a hand drawing might have, compared to a digitally produced DV, regarding reliability. However, it may also be possible to find verbal hints of data uncertainty (expressing lowered probability) that concern not only the detail statement, but also the global statement of the whole DV. These verbal hints may be found within the global level, or in the surrounding co-text, as it might exist e.g. in a news article (see the grey area in Fig. 1).

Such a separation into two text levels allows for the investigation of whether and how single graphical elements, as well as the visual style of the whole visualization, signal modality.

Although this study focuses on the detail level, due to the connecting line constituting the study object, it is important to understand this model as a holistic concept. Boeriis claims that the overall modality of a text is a product of all modality profiles on all levels [17]. In other words, modality expressions on different text levels influence each other. However, how exactly this influence takes place and what effect it has on the overall modality profile is not a focused issue in this study

Fig. 1.
figure 1

Left: abstract representation of a line graph (black part = global level) and the co-text (grey) within a website; Right: only the detail level.

.

3.2 Operationalizing the Theory

Based on the two proposed levels for the DV analysis, Table 1 introduces concrete questions for an analysis of modality in DVs as well as the answer options. It should be understood as an extensible method offer, that may be adjusted to fit also analyses of other semiotic material or other research foci.

Table 1. Questions and answer options for an analysis of modality in DVs.

3.3 Description of the Visual Appearance of Connecting Lines

In Table 2, I suggest a set of visual variables and manifestation categories that can be used when focusing on connecting lines on a detail level. They are based on the system of ʻvisual variablesʼ suggested by Jacques Bertin [21], as well as other scholars [22,23,24,25,26,27], who developed Bertin‘s visual variables further or contributed to a nuanced description of the visual appearance of lines. Figure 2 shows some visual examples to each visual variable of Table 2.

Table 2. The visual variables a line can have, and a suggestion of manifestation categories.
Fig. 2.
figure 2

Some examples to each of the visual variables from Table 2.

3.4 How to Identify Visual Indications of Lowered Probability and Reliability

Based on existing literature, we can assume there are three ways to identify visual indications of lowered probability and reliability in DVs. First, some visualization types are specifically developed to represent data uncertainty. Second, users may judge a visual element as an indication of uncertainty based on an analogy to the ʻexperiential worldʼ [12]. Third, the user judgement may be based on criteria for what is real in the coding orientation applied.

Within the field of statistics, error bars and several newer visualization types, like gradient plots, violin plots or fan plots, are designed for indicating data uncertainty [28]. Also other visualization types can express data uncertainty, as is the case e.g. in various kinds of weather forecasts (see Hullman et al. [29] for other examples). However, in this study, focusing on the semiotic functions of connecting lines, it is most relevant to consider ways to identify signals of lowered probability and reliability on the detail level.

A first hint of potentially signalled lowered probability or reliability (referring to question 5 of Table 1) can be found when the line in focus resembles directly or metaphorically what the uncertainty indicates [30]. Analogies to our ʻexperiential worldʼ [12] can be the reason why certain characteristics intuitively are interpreted as signs of uncertainty. The sketchiness of hand-drawn lines may metaphorically signal uncertainty [30], as well as the visual degradation of the line (through blur), since ʻthe harder it is to see …, the more uncertain it appearsʼ [31]. Thus, blurry, sketchy, animated lines or lines with a pattern that leads to interruption (e.g., dashed lines) and lines with certain colour characteristics (e.g., low saturation) can indicate uncertainty [30, 31]. Also, if the visual appearance of the line changes along the length, this may be a hint of an indication of uncertainty. To an analyst, these aspects have to be considered, together with the coding orientation in use.

Given that one needs to apply an abstract coding orientation when analysing a line in a DV, the question to ask is: Is this the most ʻreduced articulationʼ to represent the ʻgeneral patternʼ or ʻthe deeper “essence” of what it depictsʼ [16] or not? Depending on the DV type and context, a line with the characteristics of the 3rd column of Table 2 (a straight, single-coloured, continuous, non-transparent etc. line) is counted as using the most reduced articulation. Whenever a more elaborated visual appearance is used, and other reasons behind this specific visual appearance can be ruled out, the line visually signals lowered probability or reliability. Such reasons can be: a) the intention to differentiate between different categories by different kinds of lines (as seen in Fig. 7); b) the intention to create a certain aesthetic effect, or c) the technical production tools favouring that kind of visual appearance.

In order to differentiate between signalled lowered probability and lowered reliability, it is often helpful to observe clues in the verbal text. If the visually depicted modality represents data uncertainty (and therefore lowered probability), the visual signal will normally be accompanied by explicit verbal clues (e.g., forecast, scenario, 95% confidence). If that is not the case, and yet, the line visually signals some kind of modality, the analyst can conclude that the line signals lowered reliability. This conclusion can be based on the existence of ʻintermodal tensionʼ [32], i.e. that the verbal and the visual modes offer different, incompatible information. Engebretsen also states, that the conventions within ʻgenres focusing on informativity and fact-oriented learning … points [sic] toward a rhetoric of clarity and unambiguousnessʼ [32]. Thus, unclarity and polysemy within visualizations have a negative impact on the reliability. In practical analysis, it can be difficult to judge whether incidents of such tension represent an intended use of modality or an unintended visual uncertainty expression.

4 Corpus Analysis

4.1 Data Selection and Database Setup

The method suggested in the previous section was applied to a corpus of 163 DVs. Due to the focus on the connecting line in this study, only the detail level was included in the analysis. The DVs were collected from the winner lists of the 2015, 2016 and 2017 Kantar Information is Beautiful AwardsFootnote 3 [33] and the Malofiej Awards number 24, 25 and 26Footnote 4 [34]. All DVs but one were targeted to the general public and were published in online news media or other channels of public information. All winners with publicly available digital DVs (at the date of data collection) that contained one or more central DVs with one or more connecting line(s) in the leading role of communicating the DV’s meaning were selected. The result of this filtering process was 163 single DVs stemming from 105 award-winning websitesFootnote 5. To establish a stable data basis for the analysis, over 400 screenshots, PDF documents and screencasts were created and organized in a relational database.

Due to the nature of the World Wide Web, it is impossible to claim that this corpus is a representative sample of the whole population of DVs with the characteristics mentioned above. Thus, the results of this analysis can by no means be used to generate valid statements about the whole population. However, this corpus contains a broad variety of DVs produced during the named timeframe in the western world, and the results of the research based on this material can be seen as a good approximation of how DVs in these countries have been developed in this specific time frame. Moreover, such awards raise publicity, and these DVs are judged by experts as ʻbest practicesʼ and viewed by a broad audience, including practicioners. Therefore, they are expected to serve as models and to have strong convention forming abilities.

4.2 Method

Each DV was coded according to the method proposed in Section A two-level analysis approach to modality in data visualizations, using a detailed coding scheme. The detailed coding scheme contained the same questions and answer options as those in Table 1, with a description of criteria for choosing each option. Before that, an inter-rater reliability study of a random sample of 25 DVs (approx. 15% of the corpus) was performed for the questions that contain judgement variables. This was necessary in order to ʻestimate how reliable the categorisation (coding) isʼ [35], and therefore make sure that the stated questions and offered answer categories are precise and adequate. Two coders (a second coder and me) used the same coding instructions and worked independently. With the answers of both coders, Gwet’s AC1 and Gwet’s AC2 coefficient [36] were calculated. Results showed that for all questions, the coders had either substantial agreement or higher when analysed according to Gwet’s benchmarking method [36]Footnote 6. This amount of agreement was deemed sufficient and the coding method was generally approved.

However, follow-up discussions between the two raters after the pre-test and also during the start of the single-coded analysis revealed that a few small adjustments of the coding scheme would still improve the rating process. Following an iterative method, these changes were made, resulting in the final coding scheme, that was then applied to the whole corpus. In instances of doubt, the second rater of the inter-rater reliability test was contacted to discuss the final codings. The (single-coded) analysis of the whole corpus then made it possible to generate frequency counts of whether and how modality is signalled in this corpus with connecting lines.

4.3 Analysis Findings

This section presents the results of the analysis on the detail level, using question 1 to 10 in Table 1. Due to the selection criteria for this corpus, the main statement in each DV is represented through graphical lines. For each DV, only one line is focused in the analysis. For all except two of the 163 lines in focus, an abstract coding orientation needs to be applied. For the final two, a naturalistic coding orientation is the most suitable.

As shown in Fig. 3, the connecting line in the focus of 26 (18 + 8) DVs out of 163 are found to indicate modality (lowered probability or lowered reliability). Within 33 (15 + 18) DVs, it is explicitly stated verbally that data uncertainty is represented within the detail statement represented through the focused connecting line. However, in only 18 DVs modality is signalled both visually through the connecting line in focus and through a corresponding verbal clue for data uncertainty. These 18 lines are therefore considered to signal lowered probability, while the reliability is not reduced. Under the earlier presented assumption that intermodal tension causes lowered reliability, this means that, on the detail level, the focused lines of 23 DVs (8 + 15) are found to be included in an instance of lowered reliability.

68% of the 41 DVs that verbally and/or visually signal modality on the detail level, are either route maps (41%) or line graphs (27%). The high occurrence of these two DV types also reflects the fact that these two types are the most common ones in this corpus (36% route maps, 23% line graphs).

Fig. 3.
figure 3

Distribution of visually and/or verbally signalled modality.

I will now look at what visual variables of the connecting lines signal what kind of modality. When the lines in focus signal lowered probability (based on data uncertainty), different manifestation categories of the visual variable pattern were the most commonly used – especially those with pattern changes (see column three Table 3). Changes between a continuous line and large interruption(s) and between a continuous line and a dashed/dotted line are used 6 and 5 times respectively. Figure 4 presents an example of the latter. However, also other visual characteristics, namely transparency, lowered crispness, colour variations, inconsistent line pressure, three or more forces (curved) or dynamics in the size are used for that purpose.

Table 3. Distribution of visual characteristics of connecting lines used to signal modality (lowered probability and reliability) and the distribution of the same characteristics being used for other purposes. Note that in some DVs, several visual characteristics are used simultaneously to signal modality. N for each line in this table is 163 connecting lines from 163 DVs.
Fig. 4.
figure 4

© Randall Munroe [37]. Distributed under CC BY-NC 2.5.

Screenshot from A timeline of earth’s average temperature, indicating data uncertainty by pattern change to a dashed line.

When we look at how lowered reliability is signalled by the focused connecting lines (see column four Table 3), the results reveal that the visual variable pattern does not have such a prominent role. The pattern and the curviness of the lines signal lowered reliability three times. However, dynamics (in size, orientation and position) and transparency are also found once each.

An example of a visualization where curvature signals lowered reliability can be found in An interactive visualization of every line in Hamilton [38, see Fig. 5]. Here, the semiotic motivation behind some connecting lines being curved, while others are straight, is not clear. Because the DV does not use the most reduced articulation possible (while applying an abstract coding orientation), it is rated as expressing lowered reliability.

As shown in Table 3, most of the visual characteristics of lines used to express modality, are not used exclusively for that purpose. Column five shows how many times the visual characteristics highlighted in column three and four are used for other purposes. For instance, in the visualization The Stories Behind a Line [39], different categories of transport means are visualized through different dashed/dotted lines (see Fig. 7). Another example of dashed lines not signalling modality is found in Syrian war explained in 5 min [40: 5:00, see Fig. 6]. Here, the animated dashes iconically represent moving bombs.

Fig. 5.
figure 5

© Shirley Wu [38]. Reproduced with permission. Photos are blurred for copyright reasons.

Screenshot from An interactive visualization of every line in Hamilton, where curvature indicates lowered reliability.

Fig. 6.
figure 6

Abstract representation of a film frame of Syrian war explained in 5 min [40: 5:00]. The dashes move towards the square field named ʻrebels’, iconically representing moving bombs.

Fig. 7.
figure 7

© Federica Fragapane, designed in collaboration with Alex Piacentini [39]. Reproduced with permission.

Screenshot of the legend of The Stories Behind a Line, using interrupted lines for different categories of transport means.

4.4 Limitations of the Results

Because most of the DVs were only single-coded and some questions contain judgement variables, it has to be kept in mind that my cultural background and previous knowledge might have influenced the interpretation. To counter this, the coding instructions were developed as detailed as possible, strictly followed and the inter-rater reliability study was performed.

Moreover, since only one connecting line was focused on the detail level of each DV, even if sometimes one DV contained more connecting lines, it is possible that the results could have changed if I had chosen to focus on other lines. Therefore, I have been careful when reporting these numbers, to refer only to the connecting lines ʻin focusʼ, not to all connecting lines in the material.

4.5 Implications and Conclusion

Within this corpus of 163 DVs, out of the 41 visualizations indicating some kind of modality on the detail level, 23 exhibit cases of intermodal tension. This number indicates that intermodal tension, meaning that the verbal and the visual resources offer conflicting signals, is fairly common in this field of DV-based communication. One implication of this finding is that the potential for DV designers to avoid unintended ambiguity by giving more attention to multimodal coherence is high.

The results further indicate a convention saying that pattern change is well suited for visually signalling data uncertainty, corresponding to the modality category lowered probability. Why pattern change – in the shaping of connecting lines in DVs – is emerging as a conventionalized signal of modality, may have several reasons. First, it must be assumed that pattern change potentially signals modality based on an analogy to the ʻexperiential worldʼ [12]. Furthermore, the use of patterns, or larger interruptions, is not expected as a typical line form in any DV type (unlike e.g. curvature, which is common in spline graphs for instance), thus such characteristics are free to use as modality markers. Moreover, it is technically easy with most design tools to apply different patterns to a graphical line (unlike e.g. dynamics). Last, patterns are possible to use also in two-coloured DVs, and they are printable and drawable analogously, which points to a long application history. For signalling lowered reliability, however, no such convention was traced, as the results show a more varied and unsystematic use of characteristics indicating this kind of modality.

Summing up, the study reveals that various visual characteristics of connecting lines are used to signal modality in this corpus of award-winning DVs. However, pattern change is used more often than any of the other variables found in the corpus. Due to the relatively low number of observations in this corpus, it is impossible to provide practitioners with a simple recipe for what visual clues are most effectively applied to signal modality in DVs. Nonetheless, the results provide an overview of the current practices in using lines for indicating modality, which is helpful for practitioners to make informed design decisions.

5 Further Research

In this article, a method for analysing modality in DVs is presented, based on a body of pre-existing theory and terminology around modality and uncertainty. A newly collected corpus of digital DVs is analysed with the suggested method, offering detailed knowledge about how certain visual characteristics of the graphical line are used for signalling modality. The findings indicate certain conventions regarding the semiotic potential of the graphical line in relation to modality. Such insights are valuable both for designers and scholars in relevant fields, as they contribute to the colouring of some of the white spots on the map over a graphical language still in its making. However, more empirical research is needed in order to draw a more detailed and reliable map over the field of multimodal modality. The findings, as well as the methodology presented in this study, will hopefully be a contribution to this future work.