Parallel hierarchies: A visualization for cross-tabulating hierarchical categories
Graphical abstract
Introduction
Many aspects of our daily lives are hierarchically categorized: the jobs we perform are specified using the Standard Occupational Classification (SOC) hierarchy [1], the books we loan from the library are organized by the Dewey Decimal Classification (DDC) [2], the illnesses we get are catalogued in the International Statistical Classification of Diseases and Related Health Problems (ICD-10) [3], and so on. One of the most interesting aspects of these hierarchical categorizations is when they get applied to the same set of individuals or items, as this enables us to systematically explore dependencies or cross-correlations between them. For example, people in certain occupations may be more likely to get certain health problems, and people with particular health problems may be more likely to read books on specific self-help topics, and vice versa. In particular when not yet knowing these dependencies, interactive exploration of different hierarchies in various combinations and relations to each other, as well as at different levels of detail can reveal unexpectedly high or low numbers – so called frequency counts – between categories from different hierarchies.
Data visualization can enable such an interactive exploration of how numerical quantities distribute across multiple hierarchies. For relating hierarchical data to each other, the most common visualization approach is to draw different hierarchies side by side and to connect them with visual links [4]. In lack of a name for this type of visualization, it has been alluded to “as what Parallel Coordinates would resemble if the axes were hierarchical in nature” [4, p.10]. Commonly, this approach is focused on structural comparisons between similar hierarchies, such as showing the overlap between them, or determining which nodes have been added, removed, or changed with each version of a hierarchy [5], [6]. Yet for quantitative comparisons between entirely different hierarchies, this type of visualization has never actually been introduced, its design implications have never been discussed, and the resulting representations have never been evaluated.
This paper sets out to change this by providing a thorough description for this type of visualization, which we call Parallel Hierarchies and which is illustrated in Fig. 1. By contributing the necessary details to decide when, in which way, and to what end to use Parallel Hierarchies, this paper provides a point of reference for future implementations, derivations, and applications of this visualization technique. This contribution breaks down into three aspects:
- •
a formulation of the data analysis problem addressed by Parallel Hierarchies in Section 2 and an overview of related work and visually similar techniques pertaining to Parallel Hierarchies in Section 3;
- •
a description of the visual and interactive design aspects of Parallel Hierarchies in Section 4, as well as a discussion of common layout issues and limitations, and possible remedies for them in Section 7;
- •
use case examples illustrating how to apply Parallel Hierarchies in practice in Section 5 and a qualitative evaluation highlighting some user responses and observed usage patterns in Section 6.
In addition, we made the JavaScript/D3-based source code of Parallel Hierarchies freely available under an Apache 2.0 license at https://parallelhierarchies.github.io.
Section snippets
The hierarchical cross-tabulation problem
Parallel Hierarchies are a visual-interactive solution to the problem of cross-tabulating numerical aggregates over hierarchical categories. This section unpacks and describes this problem by breaking it down into the properties of the input data (Which data is to be processed and shown by the visualization?) and the necessary affordances of the visual output (Which actions must be possible to perform on the visualization?). We exemplify both using the 1990 US Census data, of which 1% and 5%
Related work
From a data perspective, visualization techniques tailored to categorical datasets with additional properties have been presented in various contexts. The range of these techniques spans from time-oriented categorical data – e.g., to study patient data over time [14], [15], to geospatial categorical data – e.g., to study election results [16], [17].
From a visual perspective, visualizations in the style of Parallel Coordinates [18], [19], [20] using interconnected parallel axes have been
The parallel hierarchies technique
Parallel Hierarchies is designed specifically to (1) navigate multiple hierarchies defined over the categorical data properties to find suitable aggregation levels, (2) cross-tabulate pairs of categorical data properties at their respective aggregation level, and (3) switch effortlessly between the two. Together with common guidelines for designing categorical displays [43], [44], [45], these specifications informed our visualization design.
Parallel Hierarchies feature an arrangement of
Applying parallel hierarchies
The following describes two use cases in which we applied Parallel Hierarchies – one use case looking at demographic data from the US, and the other use case dealing with genome data of yeast. These two examples are to provide a first impression of Parallel Hierarchies in action in two very different fields.
Evaluating parallel hierarchies
Our particular realization of Parallel Hierarchies, as it was described in the previous section, was initially developed as an interactive visual analysis technique in the domain of product costing. Product costing involves analyses where one wants to break down the overall costs of a product along various aspects, such as cost types (e.g., labor, materials, patent fees, and taxes) and product components (e.g., frame, tires, electronics, engine, and seats) to find cost drivers and thus
Scalability considerations
Parallel Hierarchies have been designed with scalability in mind, which is realized by means of the simplified Icicle plots that reduce the display of a full hierarchy to only the ancestors of an active subcategory and its immediate children. This design decision alone leads to a clearer view of the currently focused categories, as one can see by comparing Figs. 1 and 12, where the latter basically illustrates how Fig. 1 would have looked like, if it was not for the simplified Icicle plots. Yet
Summary and outlook
Parallel Hierarchies present a unique way of displaying and exploring categorical aggregates. Its combination of tree visualization elements and set visualization elements in the same display space allows for a rich interaction with hierarchical and categorical aspects of the data at the same time. This interaction can be utilized for a variety of analysis goals: to drill-down into large datasets to find data items with particular characteristics, to identify data items that contribute most or
Acknowledgments
The authors thank Jochen Rode, Stefan Hesse, Martin Luboschik, and the anonymous reviewers for their support and suggestions that helped in shaping this paper into its final form. Thanks also to Isa Usmanov for implementing the first prototype and to Esther Lapczyna for advising us on graphic design aspects of Parallel Hierarchies. Furthermore, we thank all attendees of the SAP PLC co-development workshops, the biologists at the CZ Biohub, and the demographers from the Max Planck Institute for
References (92)
- et al.
Rock: a robust clustering algorithm for categorical attributes
Inf Syst
(2000) - et al.
A novel virtual node approach for interactive visual analytics of big datasets in parallel coordinates
Fut Gen Comput Syst
(2016) - et al.
Escaping RGBland: selecting colors for statistical graphics
Comput Stat Data Anal
(2009) - et al.
Interacting with parallel coordinates
Interact Computs
(2006) - et al.
Drawing graph in two layers
Theor Comput Sci
(1994) Standard occupational classification manual
(2018)- et al.
Dewey decimal classification: principles and application
(2003) International statistical classification of diseases and related health problems, 10th revision
(2016)- et al.
A survey of multiple tree visualisation
Inf Vis
(2010) - et al.
Visual comparison of hierarchically organized data
Comput Grap Forum
(2008)
Code flows: visualizing structural evolution of source code
Comput Graph Forum
Interactive quantification of categorical variables in mixed data sets
Proceedings of the international conference information visualisation, (IV’08
Hierarchy theory: a vision, vocabulary, and epistomology
Multitrees: enriching and reusing hierarchical structure
Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’94)
Polyarchy visualization: visualizing multiple intersecting hierarchies
Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’02)
Hierarchical aggregation for information visualization: overview, techniques, and design guidelines
IEEE Trans Vis Comput Graph
Temporal event sequence simplification
IEEE Trans Vis Comput Graph
Cohort comparison of event sequences with balanced integration of visual analytics and statistics
Proceedings of the international conference on intelligent user interfaces (IUI’15)
Proportions in categorical and geographic data: Visualizing the results of political elections
Proceedings of the international working conference on advanced visual interfaces (AVI’12)
A visualization approach for cross-level exploration of spatiotemporal data
Proceedings of the international conference on knowledge management and knowledge technologies (i-Know’13)
Parallel coordinates: visual multidimensional geometry and its applications
State of the art of parallel coordinates
Proceedings of the Eurographics 2013 - state of the art reports
Evaluation of parallel coordinates: overview, categorization and guidelines for future research
IEEE Trans Vis Comput Graph
Parallel tag clouds to explore and analyze faceted text corpora
Proceedings of the IEEE symposium on visual analytics science and technology (VAST’09)
Depth cues and density in temporal parallel coordinates
Proceedings of the Eurographics/ IEEE-VGTC symposium on visualization (EuroVis’07)
Visual analytics for multimodal social network analysis: A design study with social scientists
IEEE Trans Vis Comput Graph
Interactive visual analysis of set-typed data
IEEE Trans Vis Comput Graph
The state-of-the-art of set visualization
Comput Graph Forum
Tree drawing algorithms
Treevis.net: a tree visualization reference
IEEE Comput Graph Appl
Visualizing categorical data
Parallel sets: visual analysis of categorical data
Proceedings of the IEEE symposium on information visualization (InfoVis’05)
Parallel sets: interactive exploration and visual analysis of categorical data
IEEE Trans Vis Comput Graph
Graphic presentation
Cosmograph? What’s a Cosmograph?
Comput Hist Mus Volunteer Inf Exch
PhC: multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates
ACM Trans Intell Syst Technol
Hierarchical parallel coordinates for exploration of large datasets
Proceedings of the IEEE conference on visualization (Vis’99)
A fraud detection visualization system utilizing radial drawings and heat-maps
Proceedings of the international conference on information visualization theory and applications (IVAPP’14)
GiViP: a visual profiler for distributed graph processing systems
Proceedings of the international symposium on graph drawing and network visualization (GD’17)
Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations
Proceedings of the IEEE symposium on information visualization (InfoVis’00)
Reinventing the contingency wheel: scalable visual analytics of large categorical data
IEEE Trans Vis Comput Graph
Hierarchical edge bundles: visualization of adjacency relations in hierarchical data
IEEE Trans Vis Comput Graph
Design guidelines for correlated quantitative data visualizations
Proceedings of the international working conference on advanced visual interfaces (AVI’12)
Cited by (14)
Performance assessment method for roof-integrated TSSCs
2022, Applied EnergyCitation Excerpt :We generate several design options featuring different combinations of design parameters. To find the interesting patterns between design parameters and performance indicators, we use parallel coordinate graphs [66,67] using R, where the parameters are represented with their own vertical axis and are evenly spaced and parallel located. The values are represented as a series of lines across the different axes.
HiePaCo: Scalable Hierarchical Exploration in Abstract Parallel Coordinates Under Budget Constraints
2019, Big Data ResearchCitation Excerpt :We looked at the entire set of individuals and selected the 8 following individual attributes: country of birth (POB), work environment/field (INDUSTRY), job title (OCCUPY), AGE, SEX, number of children (FERTIL), and the poverty level (POVERTY). In this study, we looked into the relationship between different job titles and environment (based on the OCCUPY and INDUSTRY attributes) and other individual characteristics following the analysis previously carried out by Vosough et al. [16] on the same dataset. Fig. 13 presents the initial view: the POB, INDUSTRY, and OCCUPY attributes are clustered following the provided hierarchies.
A survey on visual data representation for smart grids control and monitoring
2018, Sustainable Energy, Grids and NetworksA Note from the Editor in Chief
2018, Computers and Graphics (Pergamon)
This article was recommended for publication by Tobias Isenberg.