Geospatial Data Accuracy and Lying With Maps

Maps are charismatic and imply absolute, objective truth. However, when you critically evaluate how maps are created, you can see the subjectivities that are embedded in almost every element of the map's design, as well as the underlying data itself.

This tutorial addresses geospatial data issues associated with accuracy, reliability, and deception.

Accuracy

All data is simplified representation (model) of the complex real world. Accuracy is the extent to which that model conforms to the real world.

Kimerling et al (2016, 278) note five types of accuracy associated with geospatial data:

Similarly, there are a number of sources of inaccuracy:

Inaccuracies have a variety of causes, including:

Certainty

Certainty is how well you understand the accuracy of your data. Being uncertain about the accuracy of your data limits the amount of confidence with which you can present that data. Likewise mistaken certainty can lead to conflict when decisions made on the basis of invalid data bump into reality.

Note that certainty and accuracy are not necessarily connected. You can be certain that your data is inaccurate, and you can be uncertain whether

Optimally, we strive for both accuracy and certainty, but there are few times where that is completely possible.

There are a number of approaches to dealing with uncertainty about data accuracy:

Authoritative

Authoritative data is produced and or supplied by a known authority, usually a government or governing body of some kind.

The implication is that either the data can be reliably trusted to be accurate, or that it represents formally-sanctioned information that should (or must) be used.

Data that is authoritative may not be either accurate or certain. For example, official economic statistics released by totalitarian regimes are often surrounded by uncertainty, and assumed to be inaccurate in a way that makes the regime look more effective than it actually is.

Sampling and Margin of Error

It is often impossible or impractical to gather complete information about some phenomenon (such as soil types or voting preferences), random samples of information are commonly used with inferential statistics to infer estimates about the phenomenon as a whole.

Because there is a possibility that samples may not be an exact duplicate of the whole, there is a possibility that the sampled values are off. Statistical calculations based on the sampled values and the number of people sampled are used to calculate the likely range of values that we can be 95% certain contains the actual value for the full population. This range is called a margin of error and expresses the amount of uncertainty associated with values created from sampled data.

For example, the American Community Survey (ACS) is performed by the US Census Bureau on an ongoing basis to capture a wide variety of data about American people and households. Unlike the decennial census which is mandated by Article I, Section 2 of the US Constitution to capture information about every person in the US, the ACS is given to randomly selected households in communities across the US.

This margin of error increases as the number of people sampled decreases, or the number of people in the population For example, estimates of income in sparsely-populated rural counties or estimates of the number of native speakers of obscure languages can have especially high margins of error.

The animation below demonstrates how to view the margin of error for American Community Survey data provided through American Factfinder.

Finding a Margin of Error in American FactFinder

Lying With Maps

Even if the data used to create a map is accurate, certain and authoritative, design decisions made by the cartographer can dramatically influence how that data is interpreted by readers. While the data represented by a map may be factually correct, cartographic choices can be manipulated to inspire interpretations that may not be consistent with the facts, or which represent a specific ideological perspective on the facts.

Below are some ways that maps can be used to lie as defined by Monmonier (2005) and Kimerling et al (2016).

Volunteered Geographic Information

Volunteered Geographic Information (VGI) is geographic data that comes from the general public. It can be passively collected, such as the location information captured from cellphone calls, or actively contributed by volunteers, such as with OpenStreetMap, a sort of Wikipedia of geographic information.

For example, while ArcGIS Online is completely controlled by ESRI, it connects a community of GIS students and professionals that make their data available to that community. The VGI in ArcGIS Online is usually just some variant on publicly-available data that you could upload yourself (if you know where to look), although occasionally there is an original dataset.

VGI can be of widely varying quality, and you should should verify the source when using volunteered data that does not come from ESRI.

VGI in ArcGIS Online