Abstract
A visual is successful when the information encoded in the data is efficiently transmitted to an audience. Data visualization is the discipline dedicated to the principles and methods of translating data to visual form. In this chapter we discuss the principles that produce successful visualizations. The second section illustrates the principles through examples of best and worst practices. In the final section, we navigate through the construction of our best-example graphics.
The drawing shows me at one glance what might be spread over ten pages in a book.
— Ivan Turgenev, Fathers and Sons
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Three great examples:
-
The Upshot from the New York Times: http://www.nytimes.com/section/upshot.
-
Five Thirty Eight, Nate Silver’s organization that has largely invented the field of data science journalism. http://fivethirtyeight.com.
-
Flowing Data, a site created by Nathan Yau dedicated to creating beautiful and informative data visualizations. http://flowingdata.com.
-
- 2.
If you are building interactive graphics or large-scale graphics via the web, there are better tools. Check out bootstrap, D3, and crossfilter.
- 3.
When ordering is a problem, it is often referred to as the “Alabama First!” problem, given how often Alabama ends up at the top of lists that are thoughtlessly put, or left, in alphabetical order. Arrange your lists, like your factors, in an order that makes sense.
- 4.
Kernels are an interesting side area of statistics and we will encounter them later in the chapter when we discuss loess smoothers. In order for a function to be a kernel, it must integrate to 1 and be symmetric about 0. The kernel is used to average the points in a neighborhood of a given value x. A simple average corresponds to a uniform kernel (all points get the same weight). Most high-performing kernels uses weights that diminish to 0 as you move further from a given x. The Epanechnikov kernel, which drops off with the square of distance and goes to zero outside a neighborhood, can be shown to be optimal with respect to mean square error. Most practitioners use Gaussian kernels, the default in R.
- 5.
In practice d is almost always 1 or 2.
- 6.
The purpose of these last two questions is to learn how to extract the bandwidth information from R directly.
References
C.A. Brewer, G.W. Hatchard, M.A. Harrower, Colorbrewer in print: a catalog of color schemes for maps. Cartogr. Geogr. Inf. Sci. 30 (1), 5–32 (2003)
W.S. Cleveland, S.J. Devlin, Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83 (403), 596–610 (1988)
D. Sarkar, Lattice Multivariate Data Visualization with R (Springer Science Business Media, New York, 2008)
S.J. Sheather, M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. B 53, 683–690 (1991)
H. Wickham, ggplot2: Elegant Graphics for Data Analysis (Use R!) (Springer, New York, 2009)
H. Wickham. ggplot2 (Springer, New York, 2016)
L. Wilkinson, The Grammar of Graphics, 2nd edn. (Springer, New York, 2005)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Steele, B., Chandler, J., Reddy, S. (2016). Data Visualization. In: Algorithms for Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-45797-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-45797-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45795-6
Online ISBN: 978-3-319-45797-0
eBook Packages: Computer ScienceComputer Science (R0)