Elsevier

Computers & Graphics

Volume 76, November 2018, Pages 1-17
Computers & Graphics

Parallel hierarchies: A visualization for cross-tabulating hierarchical categories

https://doi.org/10.1016/j.cag.2018.07.009Get rights and content

Highlights

  • Parallel Hierarchies visualize hierarchical categorical data for cross-tabulation.

  • They use Icicle Plots as axes whose pairwise frequency counts are shown as ribbons.

  • Suitable axes orders for different tasks and clutter reduction methods are proposed.

  • Use cases in the domains of demography and biology are reported on and discussed.

  • A study shows Parallel Hierarchies’ potential for breaking down complex analyses.

Abstract

The visualization of categorical datasets is an open field of research. While a number of standard diagramming techniques exist to investigate data distributions across multiple properties, these are rarely geared to take advantage of additional data properties – either given or derived. As a result, the data display is not as expressive as it could be when incorporating these properties, and it misses out on the potential of leveraging these properties for the data’s interactive exploration. In this paper, we present the visualization technique Parallel Hierarchies that is specifically tailored to take hierarchical categorizations into account. With Parallel Hierarchies, it is possible to individually adjust the desired level of detail for each categorical data property through drill-down and roll-up operations. This enables the analyst to selectively change levels of detail as the data analysis progresses and new questions arise. We illustrate the utility of Parallel Hierarchies with a demographic and a biological use case, and we report on a qualitative user study evaluating this visualization technique in an industrial scenario.

Introduction

Many aspects of our daily lives are hierarchically categorized: the jobs we perform are specified using the Standard Occupational Classification (SOC) hierarchy [1], the books we loan from the library are organized by the Dewey Decimal Classification (DDC) [2], the illnesses we get are catalogued in the International Statistical Classification of Diseases and Related Health Problems (ICD-10) [3], and so on. One of the most interesting aspects of these hierarchical categorizations is when they get applied to the same set of individuals or items, as this enables us to systematically explore dependencies or cross-correlations between them. For example, people in certain occupations may be more likely to get certain health problems, and people with particular health problems may be more likely to read books on specific self-help topics, and vice versa. In particular when not yet knowing these dependencies, interactive exploration of different hierarchies in various combinations and relations to each other, as well as at different levels of detail can reveal unexpectedly high or low numbers – so called frequency counts – between categories from different hierarchies.

Data visualization can enable such an interactive exploration of how numerical quantities distribute across multiple hierarchies. For relating hierarchical data to each other, the most common visualization approach is to draw different hierarchies side by side and to connect them with visual links [4]. In lack of a name for this type of visualization, it has been alluded to “as what Parallel Coordinates would resemble if the axes were hierarchical in nature” [4, p.10]. Commonly, this approach is focused on structural comparisons between similar hierarchies, such as showing the overlap between them, or determining which nodes have been added, removed, or changed with each version of a hierarchy [5], [6]. Yet for quantitative comparisons between entirely different hierarchies, this type of visualization has never actually been introduced, its design implications have never been discussed, and the resulting representations have never been evaluated.

This paper sets out to change this by providing a thorough description for this type of visualization, which we call Parallel Hierarchies and which is illustrated in Fig. 1. By contributing the necessary details to decide when, in which way, and to what end to use Parallel Hierarchies, this paper provides a point of reference for future implementations, derivations, and applications of this visualization technique. This contribution breaks down into three aspects:

  • a formulation of the data analysis problem addressed by Parallel Hierarchies in Section 2 and an overview of related work and visually similar techniques pertaining to Parallel Hierarchies in Section 3;

  • a description of the visual and interactive design aspects of Parallel Hierarchies in Section 4, as well as a discussion of common layout issues and limitations, and possible remedies for them in Section 7;

  • use case examples illustrating how to apply Parallel Hierarchies in practice in Section 5 and a qualitative evaluation highlighting some user responses and observed usage patterns in Section 6.

In addition, we made the JavaScript/D3-based source code of Parallel Hierarchies freely available under an Apache 2.0 license at https://parallelhierarchies.github.io.

Section snippets

The hierarchical cross-tabulation problem

Parallel Hierarchies are a visual-interactive solution to the problem of cross-tabulating numerical aggregates over hierarchical categories. This section unpacks and describes this problem by breaking it down into the properties of the input data (Which data is to be processed and shown by the visualization?) and the necessary affordances of the visual output (Which actions must be possible to perform on the visualization?). We exemplify both using the 1990 US Census data, of which 1% and 5%

Related work

From a data perspective, visualization techniques tailored to categorical datasets with additional properties have been presented in various contexts. The range of these techniques spans from time-oriented categorical data – e.g., to study patient data over time [14], [15], to geospatial categorical data – e.g., to study election results [16], [17].

From a visual perspective, visualizations in the style of Parallel Coordinates [18], [19], [20] using interconnected parallel axes have been

The parallel hierarchies technique

Parallel Hierarchies is designed specifically to (1) navigate multiple hierarchies defined over the categorical data properties to find suitable aggregation levels, (2) cross-tabulate pairs of categorical data properties at their respective aggregation level, and (3) switch effortlessly between the two. Together with common guidelines for designing categorical displays [43], [44], [45], these specifications informed our visualization design.

Parallel Hierarchies feature an arrangement of

Applying parallel hierarchies

The following describes two use cases in which we applied Parallel Hierarchies – one use case looking at demographic data from the US, and the other use case dealing with genome data of yeast. These two examples are to provide a first impression of Parallel Hierarchies in action in two very different fields.

Evaluating parallel hierarchies

Our particular realization of Parallel Hierarchies, as it was described in the previous section, was initially developed as an interactive visual analysis technique in the domain of product costing. Product costing involves analyses where one wants to break down the overall costs of a product along various aspects, such as cost types (e.g., labor, materials, patent fees, and taxes) and product components (e.g., frame, tires, electronics, engine, and seats) to find cost drivers and thus

Scalability considerations

Parallel Hierarchies have been designed with scalability in mind, which is realized by means of the simplified Icicle plots that reduce the display of a full hierarchy to only the ancestors of an active subcategory and its immediate children. This design decision alone leads to a clearer view of the currently focused categories, as one can see by comparing Figs. 1 and 12, where the latter basically illustrates how Fig. 1 would have looked like, if it was not for the simplified Icicle plots. Yet

Summary and outlook

Parallel Hierarchies present a unique way of displaying and exploring categorical aggregates. Its combination of tree visualization elements and set visualization elements in the same display space allows for a rich interaction with hierarchical and categorical aspects of the data at the same time. This interaction can be utilized for a variety of analysis goals: to drill-down into large datasets to find data items with particular characteristics, to identify data items that contribute most or

Acknowledgments

The authors thank Jochen Rode, Stefan Hesse, Martin Luboschik, and the anonymous reviewers for their support and suggestions that helped in shaping this paper into its final form. Thanks also to Isa Usmanov for implementing the first prototype and to Esther Lapczyna for advising us on graphic design aspects of Parallel Hierarchies. Furthermore, we thank all attendees of the SAP PLC co-development workshops, the biologists at the CZ Biohub, and the demographers from the Max Planck Institute for

References (92)

  • A. Telea et al.

    Code flows: visualizing structural evolution of source code

    Comput Graph Forum

    (2008)
  • S. Johansson et al.

    Interactive quantification of categorical variables in mixed data sets

    Proceedings of the international conference information visualisation, (IV’08

    (2008)
  • V. Ahl et al.

    Hierarchy theory: a vision, vocabulary, and epistomology

    (1996)
  • G.W. Furnas et al.

    Multitrees: enriching and reusing hierarchical structure

    Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’94)

    (1994)
  • G. Robertson et al.

    Polyarchy visualization: visualizing multiple intersecting hierarchies

    Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’02)

    (2002)
  • N. Elmqvist et al.

    Hierarchical aggregation for information visualization: overview, techniques, and design guidelines

    IEEE Trans Vis Comput Graph

    (2010)
  • M. Monroe et al.

    Temporal event sequence simplification

    IEEE Trans Vis Comput Graph

    (2013)
  • S. Malik et al.

    Cohort comparison of event sequences with balanced integration of visual analytics and statistics

    Proceedings of the international conference on intelligent user interfaces (IUI’15)

    (2015)
  • F. Stoffel et al.

    Proportions in categorical and geographic data: Visualizing the results of political elections

    Proceedings of the international working conference on advanced visual interfaces (AVI’12)

    (2012)
  • H.-J. Schulz et al.

    A visualization approach for cross-level exploration of spatiotemporal data

    Proceedings of the international conference on knowledge management and knowledge technologies (i-Know’13)

    (2013)
  • A. Inselberg

    Parallel coordinates: visual multidimensional geometry and its applications

    (2009)
  • J. Heinrich et al.

    State of the art of parallel coordinates

    Proceedings of the Eurographics 2013 - state of the art reports

    (2013)
  • J. Johansson et al.

    Evaluation of parallel coordinates: overview, categorization and guidelines for future research

    IEEE Trans Vis Comput Graph

    (2016)
  • C. Collins et al.

    Parallel tag clouds to explore and analyze faceted text corpora

    Proceedings of the IEEE symposium on visual analytics science and technology (VAST’09)

    (2009)
  • J. Johansson et al.

    Depth cues and density in temporal parallel coordinates

    Proceedings of the Eurographics/ IEEE-VGTC symposium on visualization (EuroVis’07)

    (2007)
  • S. Ghani et al.

    Visual analytics for multimodal social network analysis: A design study with social scientists

    IEEE Trans Vis Comput Graph

    (2013)
  • W. Freiler et al.

    Interactive visual analysis of set-typed data

    IEEE Trans Vis Comput Graph

    (2008)
  • B. Alsallakh et al.

    The state-of-the-art of set visualization

    Comput Graph Forum

    (2016)
  • A. Rusu

    Tree drawing algorithms

  • H.-J. Schulz

    Treevis.net: a tree visualization reference

    IEEE Comput Graph Appl

    (2011)
  • M. Friendly

    Visualizing categorical data

    (2000)
  • F. Bendix et al.

    Parallel sets: visual analysis of categorical data

    Proceedings of the IEEE symposium on information visualization (InfoVis’05)

    (2005)
  • R. Kosara et al.

    Parallel sets: interactive exploration and visual analysis of categorical data

    IEEE Trans Vis Comput Graph

    (2006)
  • W.C. Brinton

    Graphic presentation

    (1939)
  • J. Strickland

    Cosmograph? What’s a Cosmograph?

    Comput Hist Mus Volunteer Inf Exch

    (2012)
  • K.S. Candan et al.

    PhC: multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates

    ACM Trans Intell Syst Technol

    (2012)
  • FuaY.-H. et al.

    Hierarchical parallel coordinates for exploration of large datasets

    Proceedings of the IEEE conference on visualization (Vis’99)

    (1999)
  • E.N. Argyriou et al.

    A fraud detection visualization system utilizing radial drawings and heat-maps

    Proceedings of the international conference on information visualization theory and applications (IVAPP’14)

    (2014)
  • A. Arleo et al.

    GiViP: a visual profiler for distributed graph processing systems

    Proceedings of the international symposium on graph drawing and network visualization (GD’17)

    (2018)
  • J. Stasko et al.

    Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations

    Proceedings of the IEEE symposium on information visualization (InfoVis’00)

    (2000)
  • B. Alsallakh et al.

    Reinventing the contingency wheel: scalable visual analytics of large categorical data

    IEEE Trans Vis Comput Graph

    (2012)
  • D. Holten

    Hierarchical edge bundles: visualization of adjacency relations in hierarchical data

    IEEE Trans Vis Comput Graph

    (2006)
  • Keahey T.A., Rope D.J., Wills G.J.. Generating an outside-in hierarchical tree visualization, Patent application US...
  • V. Guchev et al.

    Design guidelines for correlated quantitative data visualizations

    Proceedings of the international working conference on advanced visual interfaces (AVI’12)

    (2012)
  • Cited by (14)

    • Performance assessment method for roof-integrated TSSCs

      2022, Applied Energy
      Citation Excerpt :

      We generate several design options featuring different combinations of design parameters. To find the interesting patterns between design parameters and performance indicators, we use parallel coordinate graphs [66,67] using R, where the parameters are represented with their own vertical axis and are evenly spaced and parallel located. The values are represented as a series of lines across the different axes.

    • HiePaCo: Scalable Hierarchical Exploration in Abstract Parallel Coordinates Under Budget Constraints

      2019, Big Data Research
      Citation Excerpt :

      We looked at the entire set of individuals and selected the 8 following individual attributes: country of birth (POB), work environment/field (INDUSTRY), job title (OCCUPY), AGE, SEX, number of children (FERTIL), and the poverty level (POVERTY). In this study, we looked into the relationship between different job titles and environment (based on the OCCUPY and INDUSTRY attributes) and other individual characteristics following the analysis previously carried out by Vosough et al. [16] on the same dataset. Fig. 13 presents the initial view: the POB, INDUSTRY, and OCCUPY attributes are clustered following the provided hierarchies.

    • A Note from the Editor in Chief

      2018, Computers and Graphics (Pergamon)
    View all citing articles on Scopus

    This article was recommended for publication by Tobias Isenberg.

    View full text