Data Visualization

Steele, Brian; Chandler, John; Reddy, Swarna

doi:10.1007/978-3-319-45797-0_5

Data Visualization

Brian Steele⁴,
John Chandler⁵ &
Swarna Reddy⁶

Chapter
First Online: 27 December 2016

7264 Accesses

Abstract

A visual is successful when the information encoded in the data is efficiently transmitted to an audience. Data visualization is the discipline dedicated to the principles and methods of translating data to visual form. In this chapter we discuss the principles that produce successful visualizations. The second section illustrates the principles through examples of best and worst practices. In the final section, we navigate through the construction of our best-example graphics.

The drawing shows me at one glance what might be spread over ten pages in a book.

— Ivan Turgenev, Fathers and Sons

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Three great examples:
- The Upshot from the New York Times: http://www.nytimes.com/section/upshot.
- Five Thirty Eight, Nate Silver’s organization that has largely invented the field of data science journalism. http://fivethirtyeight.com.
- Flowing Data, a site created by Nathan Yau dedicated to creating beautiful and informative data visualizations. http://flowingdata.com.
2.
If you are building interactive graphics or large-scale graphics via the web, there are better tools. Check out bootstrap, D3, and crossfilter.
3.
When ordering is a problem, it is often referred to as the “Alabama First!” problem, given how often Alabama ends up at the top of lists that are thoughtlessly put, or left, in alphabetical order. Arrange your lists, like your factors, in an order that makes sense.
4.
Kernels are an interesting side area of statistics and we will encounter them later in the chapter when we discuss loess smoothers. In order for a function to be a kernel, it must integrate to 1 and be symmetric about 0. The kernel is used to average the points in a neighborhood of a given value x. A simple average corresponds to a uniform kernel (all points get the same weight). Most high-performing kernels uses weights that diminish to 0 as you move further from a given x. The Epanechnikov kernel, which drops off with the square of distance and goes to zero outside a neighborhood, can be shown to be optimal with respect to mean square error. Most practitioners use Gaussian kernels, the default in R.
5.
In practice d is almost always 1 or 2.
6.
The purpose of these last two questions is to learn how to extract the bandwidth information from R directly.

References

C.A. Brewer, G.W. Hatchard, M.A. Harrower, Colorbrewer in print: a catalog of color schemes for maps. Cartogr. Geogr. Inf. Sci. 30 (1), 5–32 (2003)
Google Scholar
W.S. Cleveland, S.J. Devlin, Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83 (403), 596–610 (1988)
Article MATH Google Scholar
D. Sarkar, Lattice Multivariate Data Visualization with R (Springer Science Business Media, New York, 2008)
MATH Google Scholar
S.J. Sheather, M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. B 53, 683–690 (1991)
MathSciNet MATH Google Scholar
H. Wickham, ggplot2: Elegant Graphics for Data Analysis (Use R!) (Springer, New York, 2009)
Book MATH Google Scholar
H. Wickham. ggplot2 (Springer, New York, 2016)
Book MATH Google Scholar
L. Wilkinson, The Grammar of Graphics, 2nd edn. (Springer, New York, 2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Montana, Missoula, MT, USA
Brian Steele
School of Business Administration, University of Montana, Missoula, MT, USA
John Chandler
SoftMath Consultants, LLC, Missoula, MT, USA
Swarna Reddy

Authors

Brian Steele
View author publications
You can also search for this author in PubMed Google Scholar
John Chandler
View author publications
You can also search for this author in PubMed Google Scholar
Swarna Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Steele, B., Chandler, J., Reddy, S. (2016). Data Visualization. In: Algorithms for Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-45797-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-45797-0_5
Published: 27 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45795-6
Online ISBN: 978-3-319-45797-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics