I am making a choropleth for an upcoming local race using D3. There are two candidates running for an election. For any given polygon on my map, if the red candidate has more votes, the color should be red. If the blue candidate has more votes, the color should be blue. If the blue candidate is winning by a lot, the color should be a deeper blue. If the red candidate is winning by a lot, the color should be a deeper red.
Should I use a discrete or continuous scale for this? Meaning, should I create a color ramp from red to blue for a given saturation or intensity. Or should I create bins such that if red/blue falls in a certain range, the polygon is assigned one of a few colors? From tutorials, it seems like most people make bins.
Answer
There are benefits and drawbacks to each way of doing it. To make a long story short, I would recommend creating "bins." A couple of notes to help you choose, and about designing choropleths in general:
A direct mapping of data value to color (an 'unclassed' map) could be considered the most accurate way to display the data, however classified maps (maps with 'bins') can be more legible for several reasons.
If you use an unclassed map and the data are skewed, or there are outliers in the data set, the outliers will stand out clearly, while many of the polygons can end up very similar in color. This would highlight the fact that a few areas are radically different than others (in your case, if a couple of areas had a significantly greater preference for one candidate over the other than most areas), but it is more difficult to distinguish the relationships within the rest of the map area.
In a classed map, each class should be visually distinct, so it is easy to tell where an area lies in the data, at the cost of some of the finer distinctions being lost.
Another problem is that perception of color intensity is not strictly linear. So if you had a color ramp from white to blue, corresponding to no candidate receiving more votes to the maximum lead for one candidate, the color that is 75% of the way between white and blue might not be perceived as being 75% of the way between the two colors, and therefore the map user would make a false assumption about what data value it represented.
Classed maps, on the other hand, can have the color of each class carefully chosen to be perceived clearly and distinctly. I do not know enough to design a set of colors that does this, but Cynthia Brewer and Mark Harrower do, and they created colorbrewer2.org, a great (free) tool to help cartographers choose good color schemes for their maps. You can choose from a variety of schemes, choose the number of classes, and it gives a preview of what the scheme might look like in practice, and the RBG, HEX, or CMYK values for each color in the scheme. Very useful, and simply fun to play with.
For these reasons, I would recommend making a classed map. The recommended number of classes is usually an odd number from 5-9 or so. Using an odd number gives a distinct average value, and this number of classes is generally considered to be enough to give useful distinctions in the data, but not too many so as to become indistinguishable. Since you are using a diverging color scheme (light color in the middle, two different colors at each end), you can get away with more classes, perhaps 7-9.
Head over to colorbrewer, choose "diverging" for the nature of your data, select the red to blue color scheme, choose your number of classes, and away you go!
For much of this, there is not a hard rule. The standard is, "does the map communicate the data well?" Playing around with parameters until you get something that "works" can be a good thing.
Now, a note on making choropleths. My apologies if this is familiar ground for you:
A point of interest when using a classed map is how the data is divided into classes. Is it broken at equal intervals along the range? Are a certain number of data points assigned to each class? A certain number of standard deviations from the mean? Is it broken at "natural" breaks in the data? Which method you use makes a difference in how data are portrayed. I am not much of a programmer, and I am not sure which method the script you link to uses. "Natural breaks" are usually a good choice. For data with a clear midpoint like polling data (the midpoint being a 50/50 split), standard deviations can be useful.
When making a choropleth, it is good to use data that is standardized over a unit of area. For instance, instead of using total population in a county, it is better to map population per square mile in each county. The reason being that a larger area will tend to have more people in it than a smaller one, so dividing by the area of each mapped unit gives a more accurate portrayal of trends. Data can also be standardized as a percent. For instance, a poverty rate rather than a number of people in poverty.
For your purposes, it is more revealing to map percentage of votes cast for a candidate than raw number of votes cast for that candidate.
Anyway, I hope some of this is useful, and that your map turns out well!
For much of this discussion I drew on Thematic Cartography and Geovisualization by Slocum et al.
No comments:
Post a Comment