Polygon Visualization of Multiple Variables in R

Choropleths are primarily useful for mapping areas with single variables. When more than one variable needs to be mapped in association with areas, other visualizations are needed.

Example data for this tutorial is in 2017-state-data.zip.

Dot-Density Maps

One common visualization when mapping counts is the dot-density map, where dots are placed randomly in areas based on a count associated with that area. A second variable can be mapped by varying the color or size of the dots accordingly.

One drawback of this technique is that if locations associated with those dots are not also randomly-dispersed, the map is deceptive. The following script creates a dot-density map of Democratic (blue) and Republican (red) votes, with each dot representing 100,000 people. While this map illustrates the sparse population of some states better than a choropleth, it does not clearly show concentrations of populations in large urban areas.

# Load state-level polygons with election data

library(rgdal)
states = readOGR(dsn=".", layer="2017-state-data", stringsAsFactors=F)
states = states[!(states$ST %in% c("HI", "AK")),]

# Reproject to Albers Equal-Area Conic and draw state outlines

usa_albers = CRS("+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")
states = spTransform(states, usa_albers)
plot(states, col="white", border="gray")

# Create random dots based on counts of votes for the two parties
# Each dot represents 100,000 people

library(maptools)

gop = as.integer(ceiling(states$GOP2012 / 100000))
gopdots = dotsInPolys(states, gop)

dem = as.integer(ceiling(states$DEM2012 / 100000))
demdots = dotsInPolys(states, dem)

# Add the dots to the plot

plot(gopdots, pch=19, cex=0.5, col="red2", add=T)
plot(demdots, pch=19, cex=0.5, col="navy", add=T)

Dot-Density Map of Democratic / Republican Votes in the 2012 Presidential Election

Bubble Charts

Graduated bubbles with varying colors are another way of visualizing two different variables for both points and areas.

For this example, we map the percentages of democratic votes in the 2012 US presidential election (bubble color) and the populations of the states (bubble size):

library(rgdal)
states = readOGR(dsn=".", layer="2017-state-data", stringsAsFactors=F)
states = states[!(states$ST %in% c("HI", "AK")),]

plot(states, col="white", border="gray")

breaks = quantile(states$PCDEM2012)
categories = as.numeric(cut(states$PCDEM2012, breaks))

palette = colorRampPalette(c("red", "navy"))
ramp = palette(4)
colors = ramp[categories]

The gCentroid() function from the rgeos library creates centroids for each of the polygons (byid=T). A centroid is the geometric center of a polygon that minimizes the total distance from every possible point in the polygon:

library(rgeos)
centroids = gCentroid(states, byid=T)

The cex parameter to plot() adjusts the size of symbol, with a value of one being the default size. Sizes in this example are calculated relative to the max() (largest) value of the variable.

The pch=19 parameter chooses the character code for the plot symbol - 19 is a circle.

You may need to tweak the scaling factor used to convert your variable to the cex value that makes your bubbles large enough to be visible but not so large that they overlap excessively or obscure the polygons.

scale_factor = 3
sizes = scale_factor * states$POP2014 / max(states$POP2014)

plot(centroids, pch=19, cex=sizes, col=colors, add=T)

Bubble Map of 2012 Presidential Voting vs Population (linear size scale)

If the range of your variable is so wide that the small bubbles are hard to see, taking a square root of that ratio will decrease the range of bubble sizes

scale_factor = 3
sizes = scale_factor * sqrt(states$POP2014 / max(states$POP2014))

plot(centroids, pch=19, cex=sizes, col=colors, add=T)

Bubble Map of 2012 Presidential Voting vs Population (square-root scaling)

Adding a legend for such a map is a bit more complex because it needs to have symbols for both the size and the colors, as well as headings for the two variables:

legend is a vector of labels
pt.cex is the size scaling for the symbols
x.intersp and y.intersp space the entries out horizontally and vertically
ncol places the legend in two columns, which is helpful for a legend with this many entries

labels = c("Population",
	max(states$POP2014),
	min(states$POP2014), 
	"Democratic Vote",
	paste0(min(states$PCDEM2012), "%"), 
	paste0(max(states$PCDEM2012),"%"))
pch = c(NA, 19, 19, NA, 19, 19)
cex = c(NA, max(sizes), min(sizes), NA, 2, 2)
col = c(NA, "gray", "black", NA, "red", "navy")

legend(x="bottomleft", legend=labels, pch=pch, pt.cex=cex, col=col, 
	bg="white", bty="n", y.intersp=1.3, x.intersp=2, ncol=2)

Bubble Map of 2012 Presidential Voting vs Population With Legend

Cartograms

Cartograms are two-variable visualizations where polygon colors are used as a choropleth and the geographic areas of polygons are distorted to represent a second value, often a count like population.

Cartograms can be created in R using the cartogram() function from the cartogram library. The data for this example is in 2017-state-data.zip.

library(rgdal)
states = readOGR(dsn=".", layer="2017-state-data", stringsAsFactors=F)
states = states[!(states$ST %in% c("HI", "AK")),]

library(cartogram)
carto = cartogram(states, "POP2010")

ramp = c("navy", "red2")
colors = ifelse(states$WIN2012 == "Obama", ramp[1], ramp[2])

plot(carto, col=colors)
legend(x="bottomleft", pch=15, col=ramp, legend=c("Democratic", "Republican"))

Cartogram of the 2012 US Presidential Election