If our use of color in visualizations follows the Five Ws (and one H), weâre spending too much time on how, and not enough time on why. Color is too often seen as a tool to make pretty pictures, when it should be used to inform our audiences.
For one example, look at this âColor Emotion Guideâ from marketing site TapClicks (one of the top results when you Google âcolor in data visualizationâ đ§).
The implicit suggestion here is that we should use colorâwhenever and however we canâto capture some form of emotion. Not only is this specific categorization rather odd (Monster energy drinks are peaceful, Virgin Mobile embodies âbold excitementâ, and Harley Davidson motorcycles just scream âcheerful friendlinessâ đ€©), it leads novice practitioners to believe that color should be used just for the sake of using color.
Examples of gratuitous color use abound.
These examples illustrate my thesis: Too often, we ask how we can use color in our visualizations when we should be asking why we are using it.
Some combination of 1) default software settings, 2) an obsession with pretty color palettes, and 3) a lack of emphasis on careful color consideration has led to a sloppy use of color in some of our most popular data visualizations.
One of the most common errors I see is the overuse of color. In the charts above, for example, itâs evident that there are far too many colors, with no apparent reason for the
Often times, those creating visualizations will argue that they must include a 14 colors in their chart because the dataset has 14 data points of interest! It doesnât help that the default settings of some of the most popular data viz tools (such as Excel) by default map categorical variables to colors.
The reality is, however, that if you need more than a handful of colors in your chart, you can probably present your data in a different way. Take this example from Datawrapper:
The takeaway? When you emphasize everything, you end up emphasizing nothing. Thatâs why its important that we stop uncritically asking how we can use color in our charts.
A data visualization is nothing more than a pretty picture if it does not inform its viewer. And if your chart presents 14 different data points all mapped to different colors, what kind of story is it telling? I really like this from Appleâs data visualization practitioner Elijah Meeks:
Rather than trying to find that impossible 20-color palette, stop using color when you have so many dimensions. Itâs indistinguishable, itâs confusing and youâre just off-loading the complexity and decision-making to your reader.
Thatâs why color should be used more sparingly and more thoughtfully. Color is one of the most important parts of our visualizations, yet their current use is far too often gratuitous and overwhelming.
So, how should you use color?
Color is not the enemy. Rather, the (far too frequent) abuse and misuse of color is. So, how can you use color correctly? It depends on the purpose of your visualization, and, as a corollary, the purpose of color. You should ask: why am I using color?
1) Color to differentiate
One use of color is to draw attention to a data point of interest. This kind of color use would fall into the category of explanatory visualization, as opposed to its exploratory counterpart. If youâve already explored, analyzed, and probed your data, you now need to deliver those insights to someone else (a supervisor, a client, or a curious friend). It would be a waste of time to present to them all of the exploratory work you did, which is why your presentation should make use of color to focus on your findings.
As practitioner Andy Kirk puts it, visualization practitioners in this stage of presentation should make grey their best friend. This is because the absence of color, not the excessive use of it, helps paint a picture and tell a story. By using grey as the primary color in a visualization, we automatically draw our viewersâ eyes to whatever isnât grey. That way, if we are interested in telling a story about one data point, we can do so quite easily.
Hereâs a quick example I made in R a while back:
The point of the visualization is not to show our audience the kindergarten vaccination rate of every county in Texas. It is instead to highlight the lowest rateâTerry County. This visualization leverages the grey fill of every other bar to immediately draw the audiencesâ eyes to Terry County. Because we used only two colors, we can also highlight text in the subtitle to make the connection even clearer for our audience. Colorâif used prudentlyâmakes our visualizations more digestible and more informative.
Now, imagine if I visualized that same data in the following way:
Or, even worse:
After seeing those eyesores, arenât you thankful that we used color sparingly in the initial plot?
Perhaps youâre interested in a county-by-county overview of 2016 election results. Although you might be tempted to code all counties according to their Trump/Clinton split, thatâs prettier than it is insightful. What if, instead, we focused on those notable counties which flipped from one party to another between 2012 and 2016? From Kieran Healyâs book Data Visualization:
As we can see here, more counties flipped from majority-Democratic in 2012 to majority-Republican in 2016 than vice-versa. Because weâre focusing on only a fraction of all of the data points, we can also observe trends: The majority of counties that flipped had a small black population. No county that had over a ~53% black population flipped in either direction (annotation my own):
We also notice that flipping direction might be correlated with population size: nearly all of the flipping counties with a log population under 100,000 flipped to Republicans, while a greater proportion of all switching counties flipped toward the Democratic Party if their log population was greater than 100,000.
Color can, and should, be used to focus on the key parts of your visualization that you want your audience to see. By using color strategically, we can reduce the cognitive load required to understand what a visualization is depicting. Kalyuga et al. found that color-coding âameliorated split-attention effects, resulting in lower perceived difficulty.â Other researchers have reported reductions in cognitive load when experiment participants were provided color-coding.
The overuse of colors can have the opposite effect. In one 2019 paper, researchers found âtask-irrelevant digit colour information hampers the learning process only in instances where it triggers a conflict with the semantic properties of the base-code words.â What does this mean? If color encoding conflicts with the objects it represents, it impedes learning and weakens understanding. You donât have to understand what all this means (I donât dully understand all of it). This point is simply to illustrate: we should care about color. Color can be confusing and complicated, which is why its use should be intentional and minimal. It should be used to draw attention to the important parts of our charts.
2) Color to explore
Color does not have to be used in contrast to plain old grey. It can also be used as a tool to showcase a variety of data points all mapped to different colors. The important note is that this usage should be sparing. Given our prior examples, we definitely donât want a plot like this one (from Chapter 19 of Claus Wilkeâs Fundamentals of Data Visualization):
A more appropriate alternative may look something like this:
This way, the audience can still see general patterns in the data, but they donât have to treat the fill legend like a lookup table! Most ideally, this chart would have some interactivity, so that a user could hover over a point to see its respective data.
In examples like this, where we are exploring rather than explaining, we leave the exploration to our users; they can see what they want to see and we give them the information they need to do exactly that.
Another common use of color for exploratory purposes is showing data progression across a gradient (e.g. low to high, bad to good, cold to warm).
These examples are most commonly found in choropleth maps, where the shade of color in a given state (or county, or region) corresponds to a value of interest. As an example, hereâs a choropleth map I created using D3.js, which visualizes opioid-involved overdose deaths in the United States:
As we can see from the map, ârust beltâ states are suffering from the opioid crisis to a much greater degree than those in western states. Choropleth maps utilize color to show regional variation and illustrate the power of color shading to represent the severity or extent of a given variable.
However, color in this context can also be misused. Data may be mapped according to a categorical color scale, or a rainbow scale which makes it difficult to see progression. This chart from Claus O. Wilke is a good (bad) example:
This chart is pretty! When rainbow scales are used in the media or elsewhere, it may be because their creator asked âHow can I use color to make a pretty picture?â But this use of a non-monotonic color scale means that the relative size of differences between data points are difficult to detect. As an illustration, see how long it takes you to answer this question: using the legend above, what is the percent difference between this color and this color? (The answer is ~70%.)
In 2019, a group of researchers asked climate scientists to evaluate maps depicting changes in climate, where maps were either rainbow (similar to the Texas map above) or monotonic (such as the opioid map earlier). The researchers found that evaluations of magnitude difference in these maps were significantly more accurate when they were encoded with monotonic luminance scales rather than traditional rainbow scales. Other research consistently finds that rainbow color scales are both harmful and popular. All this to say: color matters.
So, color can be used for exploratory purposes. That is to say, color doesnât have to be used exclusively for focus, and it can be used in charts that have colors other than just grey! But exploratory color usage (as with all use of color) requires caution.
Conclusion
Color is tricky. While it can take your plot to the next level, it can also ruin it. The difference? The questions we ask. Are we questioning how we can use color; which of the many palettes to use, and how we can map them across variables? These are important questions, but they have to be preceded by the question of why we use color in the first place. If color doesnât serve the purpose of informing, clarifying, or guiding our audience, what purpose does it serve?