Geospatial Data

Geospatial data indicates what is where. One of the special things about geospatial data is that it comes in those two parts: what and where.

Attribute (what) data and location (where) data are fundamentally different things and have different characteristics.

A distinction can be drawn between data, information and knowledge with data being raw facts or numbers and information being the interpretation of data that forms the basis of knowledge. However, the three terms are often used interchangeably and interpreted differently by different people, so you should use caution and consider context when you hear or use these words.

Location Data

Latitude and Longitude

The Earth is a spheroid - it is round like a ball or sphere but flattened by slightly by the centrifugal force of rotation. The Earth is 24,900 miles circumference, but around 27 miles wider than it is tall.

To specify locations on the surface of the earth, we use angles that describe where those locations are relative to the center of the earth. Angles are measured in degrees, with one degree representing 1/360th of a circle.

While we think of and experience distances on the planet in terms of length (feet, miles, kilometers) rather than angles (degrees), the earth is lumpy and actual distances across the surface of the planet depend on how high you are (distance from the center of the earth). Even if the Earth's surface were perfectly smooth, manipulating distances across a three-dimensional surface requires fiendishly complex calculations. Specifying locations in degrees is a simpler and more accurate than specifying locations in distances.

Since the earth is a three-dimensional object, two angles are required to specify locations on the surface of the earth.

Latitude is the angle that tells you how far you are north or south of the Equator. The Equator is zero degrees latitude and, commonly, negative numbers are south, positive numbers are north. The range is from negative 90 degrees at the South Pole to positive 90 degrees at the North Pole. Lines on maps are drawn east/west around the planet to show latitude.

Lines of Latitude

Longitude is the angle that tells you how far you are east or west from the Prime Meridian in Greenwich, England (longitude 0.0) - think LONG stretching across your body. Negative numbers up to negative 180 are west of England. Positive numbers up to positive 180 are east of England. Lines on maps are drawn north/south between the poles to show longitude.

Lines of Longitude

Prime Meridian and the International Date Line

Latitude is based on specific physical characteristics of the earth. Zero degrees is set at the Equator, the imaginary line around the center of the spinning earth. The axis of the Earth's rotation also happens to point at Polaris (the North Star), a very bright star in the northern sky. It is possible to find your latitude in the northern hemisphere by measuring the angle of Polaris above the horizon using a device like a sextant or astrolabe.

Finding Your Latitude Using Polaris

However, longitude is more difficult to define since there are no clear geological or celestial markers to define where zero degrees should be located.

Early naval navigators had to carry clocks on their ships to keep track of the time at a location of known longitude. When the sun was directly overhead (12:00 noon), the navigators could tell how many degrees east or west they were of that known longitude by multiplying the number of hours difference between their local 12:00 noon and the 12:00 noon at that known location by 15 degrees (360 degrees rotation of the planet divided by 24 hours in one rotation). Longitude has an interesting history, dramatized in this episode of Nova.

Greenwich, England was chosen as zero degrees longitude, or the Prime Meridian, by agreement at a conference of 25 nations called by US president Chester Arthur in Washington, DC in 1884. This was a time when Britain was at the height of its imperial and maritime dominance and this conference simply formalized what was already a common practice on printed nautical charts. The United States government had started mandating use of the Greenwich meridian in 1850 for nautical purposes.

Greenwich
Prime Meridian Monument at the Royal Observatory, Greenwich (Daniel Case via Wikimedia Commons)

On the opposite side of the planet is the International Date Line. Unlike the Prime Meridian, this line is not exactly -180 (or +180) degrees longitude, but weaves its way around different islands to deal with political and economic considerations.

The International Date Line (Wikimedia Commons)

Coordinates

A specific location on the surface can be specified with a longitude and a latitude. This pair of numbers is called a set of coordinates.

A challenge with geographic coordinates is that there are multiple ways of expressing them. The traditional form is in degrees, minutes, seconds and direction, with a minute being 1/60th of a degree and a second being 1/16th of a minute. North or South means north or south of the Equator and East or West means east or west of the Prime Meridian.

38° 53' 50" N,  77° 02' 12" W

While that traditional method of specifying coordinates is clear and easy to read by humans, that type of notation is cumbersome for computers to work with. With geospatial technology, coordinates are generally specified as decimal coordinates, with positive latitudes meaning North of the Equator and negative longitudes being West of the Prime Meridian.

38.897192, -77.03896

An additional challenge with decimal coordinates is that although latitude is usually given as the first number (lat-long), sometimes longitude is the first number (long-lat). This is consistent with the convention in geometry where coordinates on a two-dimensional grid are specified as X,Y. When using coordinates from an unfamiliar source, you should make sure you know which order the coordinates are given in.

-77.03896, 38.897192

With GIS software, it is generally best to stick with decimal lat-long coordinates that are easily stored and used in numeric data tables. You can verify that your coordinates are lat/long by typing them into the search bar in Google Maps.

38.897192,-77.03896
(The White House)

Elevation

In many situations it is adequate to reference locations with two dimensions (X and Y, or longitude and latitude) since maps and computer displays are in two dimensions. However, we live in a three-dimensional world. Elevation (altitude or Z) is sometimes given as a third coordinate.

Unlike latitude and longitude which are angles given in degrees, elevations represent linear distance from the center of the earth, and are usually given in feet or meters. Exactly what elevation zero is depends on While elevation is often specified as height above mean sea level, different applications use different reference points. GPS uses height above the reference ellipsoid (see below).

Geospatial data that includes elevation is usually called 2.5-D rather than 3-D because for each X/Y (longitude/latitude) coordinate there is only one Z (elevation) value. 2.5D data cannot fully represent structures like buildings that have multiple levels or floors stacked on top of each other at different elevations for each X/Y coordinate.

An digital elevation model (DEM) for the State of Colorado as an example of 2.5-D data (USGS)

Datums

Reality rarely quantifies exactly to simple mathematical models, so the topic of elevation also raises an issue with exactly how the pure, spherical angles of latitude and longitude reflect the rugged, three-dimensional reality of the earth. In addition to the planet being slightly flattened from a perfect sphere by the force of rotation, the surface is covered with lumpy mountains and oceans. If you want to be very precise about locations (such as when engineering a bridge or dropping a bomb on someone) these lumps can result in errors that can cause problems.

This challenge is addressed in cartography in three stages:

There are a number of different datums that are used in different parts of the world, for different purposes, and at different times. These different datums reflect:

Datums you will commonly encounter include:

The Relationship Between the Earth, a Geoid, and an Ellipsoid

Data Models and Geometry

There are two broad models for storing geospatial data: Vector and Raster. They are called models because they are simplified representations of the objects or phenomena they are used to represent.

Vector Data

Vector data stores locations as discrete geometric objects: points, lines or polygons.

Areas are occasionally represented with centroids, which are points in that are mathematically equidistant from all parts of the area. For example, the political boundaries of cities are best represented with an area, but on a large map covering an entire nation, individual cities may be represented with points to make the map easier to read.

Examples of Points, Lines and Polygons To Model Features in Denver, CO (Base Map from Google)

Raster Data

Raster data is stored in a grid of regularly-spaced pixels of attribute data that cover an area of interest. Raster data is most useful for representing data about areas where there are unclear boundaries, such as with elevation, temperature or amounts of vegetation. The best known type of data is photographic image data. The digital elevation model described above is another example of geospatial data stored in rasters.

Raster Satellite Imagery of Denver (Google)

Although many different types of data can be stored as rasters, data about discrete objects with clear boundaries is usually more appropriately and accurately stored as vector rather than raster. GIS Software allows conversion between raster and vector, although the conversion process between the two models often involves inaccuracy and uncertainty.

Point Clouds

An emerging third type of model is the point cloud, which stores geospatial information as a collection of points in three-dimensional space (latitude, longitude and elevation). Point clouds are commonly captured using a aerial laser scanning technique called Lidar. Unlike vector and raster data that is analogous to a flat two-dimensional map, point clouds can be used to more-faithfully represent structures and topography, albeit at the cost of greater storage and processing demands.

Elevation Point Cloud of Downtown Spokane, WA (USGS)

Attributes

Spatial locations by themselves aren't particularly useful or interesting. You need some what to go with the where.

Attributes are the what part of what is where. Attributes are also sometimes referred to as fields, variables, or columns.

Geospatial data is based on the georelational model that connects location data (geo) with attribute data - representations of what is there, often stored in relational databases.

General Classifications of Attribute Data

Classes of Attributes

There are techniques called mixed methods for converting (coding) qualitative data to quantitative values. A common example is the Likert scale that asks survey participants to rate some quality or feeling on a numeric scale of one to five. However, as with all data conversions, this introduces uncertainty about whether the numbers fully and meaningful capture the characteristics being studied.

Primary vs Secondary Data

Primary data is data you capture yourself.

Secondary data is data you recycle from someone else.

Primary data is often more expensive to obtain since you will have to do work to get it, such as by surveying with GPS devices or conducting interviews with human subjects. But if you're doing something novel, you will likely need novel data that you get yourself rather than from someone else.

Population vs Sampled Data

Generally, the more data you have, the better your conclusions when you analyze that data. If you have population data for everything you are studying, that is ideal.

An example of population data is the census conducted every ten years, where the US Federal Government attempts to find and count every person in the country and get basic information on who they are and where they live (what is where).

However, in many if not most research situations, it is too difficult or expensive to capture complete data. For example, if you are running a congressional campaign, you cannot survey every voter in your congressional district to find out how they are planning to vote.

In such cases it is usually adequate to capture a sample of the data and then use statistical techniques to make an inference from that sample about the full population. The use of sampling rather than a full census introduces uncertainty about whether your sample reflects the overall population. However, there is always some uncertainty when gathering data (especially about humans) and the uncertainty with sampled data can be quantified and considered as part of the analytical process.

Individual vs Aggregated Data

Individual data is data about individual persons or objects.

Aggregated data is data that combines data from groups of individuals (often based on location in different geographic areas) into a smaller set of numbers, usually averages or medians. An example of aggregated data is US census data that is aggregated by census tract or county in order to preserve the privacy of individual census respondents.

As with sampling, aggregation introduces uncertainty as important individual distinctions can be lost when people are combined into groups and summarized.

One issue with aggregated data is the ecological fallacy, when you make assumptions about individuals based on aggregated data. For example, states are often classified as red states or blue states based on whether the majority of the voters in that state vote Republican or Democratic, respectively. However, even in very red Utah, Democratic President Obama got 25% of the vote in the 2012, so assuming that everyone you meet in Utah is conservative is incorrect.

2012 Electoral College Results Choropleth

The opposite of the ecological fallacy is the exception fallacy, where an assumption is made about a group based on a few exceptional individuals. For example, if you meet a tall basketball player from Ohio, the assumption that everyone in Ohio is tall would be incorrect.

LeBron James with the Cleveland Cavaliers in 2014 (Keith Allison via Wikimedia Commons)

Structured vs Unstructured Data

Structured data is strictly organized so that for every field or record in your data, you know what it represents. If your data is organized in a table with columns and rows, it is probably structured. Most geospatial data that you deal with in conventional geographic information systems is structured.

Unstructured data is data that is not clearly organized in a way that it can be simply processed by computers. Examples of unstructured data is text like Facebook posts, text messages or tweets. There is meaningful data there, but turning it into something useful requires analysis (often by humans or complex computer algorithms) to give it some kind of structure. Contemporary societies generate tremendous amounts of unstructured data and the analysis and use of that data is an area of heavy research (and investment) called big data.

Accuracy vs Precision

The precision with which you express a number should be in keeping with the amount of accuracy that your data possesses.

Accuracy is how close your count or measurement reflects the reality that you are counting or measuring.

Precision is the amount of accuracy that you express with the numbers you use to represent your data. Usually what this means is the number of significant digits used. For example:

For example:

Precision is often used to deceive people into thinking that you have a better understanding of reality than you actually do. For example:

Spatial Accuracy vs Precision

The world is thing of infinite, wondrous complexity.

When we try to understand that world as numbers, we have to simplify. Our measuring devices and techniques cannot be exactly accurate. We have to round numbers so our precision reflects our accuracy.

With latitudes and longitudes in degrees, the number of decimal places you use (precision) reflects the accuracy of that location on the ground. That accuracy can be expressed as +/- a distance on the ground.

The table below shows the approximate distances for each fraction of a degree in Manhattan:

DegreesLatitudeLongitude
0.16.91 miles5.23 miles
0.013,648 feet2,764 feet
0.001365 feet276.4 feet
0.000136.5 feet27.6 feet
0.0000143.8 inches33.2 inches
0.0000014.38 inches3.32 inches
0.000000111.1 millimeters7.02 millimeters
0.000000011.11 millimeters0.702 millimeters
Approximate Distances For Each Fraction Of A Degree In Manhattan At 40.75, -74

Levels of Measurement

A concept from statistics that you may deal with when working with geospatial data is levels of measurement, which is similar to the general categorization given above. The most commonly-used division of levels was developed by psychologist Stanley Smith Stevens, and has four general levels with a handful of specific sub-levels:

Metadata

If you don't know what you have, then you don't have it.

Metadata is data about your data. Typical items kept in metadata include:

While simply remembering what you have is adequate for small projects with short-term needs (like class projects), failure to document your data may make the data useless if you ever want to reuse that data in the future.

However, because the creation of metadata is usually separate from the capture and processing of the data itself, and is more about the future than getting the task at hand done, it is common for metadata to be missing or skeletal. While it is often possible to look at the data and make a reasonable guess as to what it represents and where it came from, a few minutes adding metadata can save a great deal of effort in the future for you and for other people who use your data.

Philosophical Considerations

Rene Descartes, 1648 (Frans Hals via Wikimedia Commond)

X/Y coordinates are called Cartesian Coordinates after the French philosopher Rene Descartes (1596 - 1650). Descartes is also remembered for is statement Cogito ergo sum (I think therefore I am) as evidence of our own existence.

Cartesian coordinates combines the principles of geometry codified by the Greek philosopher Euclid (c. 300 BC) with the techniques of algebra codified by the Arab mathemetician Muḥammad ibn Mūsā al-Khwārizmī (c. 780–850) to create a useful and powerful synthesis that is a foundation for contemporary geospatial technology.

Descartes coordinate system was part of a broader philosophical quest to find objective knowledge about the world that is independent of our perspective and preconceptions. Descartes believed that the only essential properties of matter were geometric, so an objective understanding of the world could be had through a sufficiently advanced geometry (Shand 2002, pp 72).

Descartes philosophy, to some extent, is a foundation for the geospatial technology that is based on Decartes way of representing the world in his coordinate system. This is also the foundation of a critique of geospatial technology that reducing the world to X/Y coordinates obscures not only important qualitative and subjective meanings, but also obscures the political and economic forces (and associated moral questions) underlying the development and use of geospatial technology.

Geospatial technologies with their vivid visualizations and massive scope, are charismatic, and charisma can deceive.