Areas

A classic means of organizing space, both in the material world and in geographic information systems, is division of land into areas.

There are a variety of different types of areas used for different types of management and analysis. Those areas can be identified with a variety of different naming and numbering schemes. Areas are commonly visualized as maps and analysis of the distribution patterns of attributes across areas is one of the classical techniques of geography. Those will be the subjects of this tutorial.

  1. Polygons
  2. Types of Areas
  3. Area Identifiers
  4. Uncertainty
  5. Describing Area Patterns

Polygons

An area is "a particular extent of space or surface" (Merriam-Webster 2020).

Areas are usually represented in GIS with vector polygons.

Polygons are formed by connecting node points at specific latitudes and longitudes with edge lines that form boundaries.

Polygon nodes and edges

Curved boundaries are usually stored in GIS as polygons using closely spaced nodes that appear as curves when viewed.

Polygon curves

Individual features sometimes consist of multiple polygons that are treated a single entity in GIS. In the example below, the border of England includes islands off the coast. In the second example, the country of South Africa wraps around the country of Lesotho, and the border of Lesotho is a second polygon carving out a hole in the contiguous area of the country.,/P.

Multiple polygons per feature: Islands and holes

Human-defined areas such as political boundaries or building footprints are discrete phenomena with clearly defined edges that are usually best represented as vector polygon objects. However, some phenomena, especially environmental phenomena, are continuous phenomena that often do not have clear boundaries and are best represented as rasters. In the example raster below, there are clearly areas of high and low vegetation across Africa, but those levels vary continuously across the landscape.

Vegetation levels as a continuous phenomenon (NASA 2005)

Types of Areas

Getis et al. (2014, 14) defined a taxonomy of four different types of regions that expanded on Hartshorne's (1959) original three-part taxonomy. In the world of GIS, this taxonomy is useful for understanding these different types of areas and how information about them can best be captured, analyzed, and communicated.

Administrative Areas

Administrative areas are areas that are "created by laws, treaties, or regulations" and are usually associated with government, military, or business control or operation Getis et al. (2014, 14) Administrative areas have clear, rigorously surveyed boundaries that are well-represented by discrete object polygons.

In the United States, there is a rough nested hierarchy of administrative areas that divide the country into areas that are managed by different levels of government:

The hierarchy of administrative areas containing Chicago, IL, USA

Local Administrative Areas

At the local level, there is a much wider range of different areas used to divide cities into manageable chunks overseen by separate administrators and departments, such as:

Land use zones are an especially complicated type of administrative area that govern the types of buildings and businesses that can operate in particular areas of cities. The boundaries are dictated by a variety of political and economic factors.

City of Chicago zoning web map

Cadastres

Another related example of administrative areas are property boundaries. The set of property boundaries within an administrative area (like a county) is called a cadastre. A cadastre is "an official register of the quantity, value, and ownership of real estate used in apportioning taxes" (Merriam-Webster 2020).

President Obama's Hyde Park house in the Cook County property information map

Census Tracts

The US Census Bureau (USCB) is the part of the US federal government responsible for collecting data about people (demographics) and the economy in the United States. The Census Bureau has its roots in Article I, section 2 of the US Constitution, which mandates an enumeration of the entire US population every ten years (the decennial census) in order to set the number of members from each state in the House of Representatives (the lower house of the US Congress) and Electoral College (that selects the US President) (USCB 2017).

To preserve the privacy of people who respond to the census or other surveys run by the USCB, the USCB only releases data to the public that has been aggregated to areas. The smallest areas of data that the USCB releases data in are census tracts and, sometimes, block groups. Accordinly, data by census tract provides a fine-grained GIS view of community demographics that is very useful in social and economic research.

Census tracts are subdivisions of counties that are drawn based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019).

Median household income by census tract in Illinois 2014-2018 (US Census Bureau American Community Survey)

Formal Areas

Formal areas (uniform regions) are areas that each have a common set of physical or social characteristics.

Although Hartshorne's (1959) original taxonomy considered administrative areas to simply be a specific type of formal areas, this tutorial follows Getis et al. (2014, 14) in seeing formal areas as defined by their contents rather than by the decisions of governing authorities.

One challenge of this approach is that in the absence of a clear authority to specify the exact boundaries of formal areas, the boundaries can be ambiguous and/or contested.

For example, social scientists often make a distinction between nations and states. Nations are groups of people that have a common identity, often based on common history, religion, or social practices. In contrast, a state (country) is a governmental entity that controls a specific geographical area. This definition should not be confused with the use of the term state in the US to represent one of the 50 United States that are part of the federal system of government.

While governments usually strive to create a national identity within their borders, different geographic parts of states can be home to different nations. Conflicts between different nations within states often result in violent civil war and / or violent efforts by the state to suppress national identities that conflict with the state.

Such an example is Kurdistan. The Kurds are an ethnic group distributed across a largely contiguous area that falls within the states of Turkey, Syria, Iraq, and Iran. Many Kurds (Kurdish nationalists) aspire to to create an independent Kurdish state, which has resulted in often violent suppression by existing state governments that do not want to lose control of the territory that would form that independent state.

Kurdistan (CIA 1992)

Similarly, areas of different environmental characteristics can be considered formal areas. For example, physiographic regions are areas classified by common geological structures and histories based on a taxonomy initially developed by Nevin Fenneman (1917).

Physiographic regions of the US (USGS 2000 after Fenneman and Johnson 1946)

Functional Areas

Functional areas (nodal regions) are focused on a central point, with diminishing influence the further you go away from that central point.

A primary example of this is a metropolitan area, which is "a major city together with its suburbs and nearby cities, towns, and environs over which the major city exercises a commanding economic and social influence (Encyclopedia Britannica 2020). Metropolitan functional area boundaries often cross multiple city, county, and state administrative boundaries.

For tabulating purposes, the US Census Bureau defines a set of administrative areas as core-based statistical areas (CBSA) that include metropolitan statistical areas (big cities) and micropolitan statistical areas (small cities).

However, as with formal areas, there are no clear boundaries that define where the influence of the core cities (central points) starts and ends. Indeed, different types of influence (such as access to health care or commuting distances) can have different extents, and ties to the global economy can spread influence worldwide. Accordingly, you should interpret maps of functional areas with this ambiguity in mind.

The Chicago-Naperville-Elgin, IL-IN-WI metropolitan statistical area (USCB, ESRI, OpenStreetMap)

Another common example of functional areas used in urban transportation planning are areas within walking distance of transit stops. In the map below, the red circles encompass areas within a 1/2 mile of stations on the CTA El rail lines.

(CTA 2020, Open Street Map)

Vernacular Areas

Vernacular areas (perceptual regions) are areas that are socially-defined by shared history and common identities. Accordingly, the boundaries of these regions are ambiguous and fluid (Wikipedia 2020).

A primary example of this is city neighborhoods. Neighborhood identities evolve over time under the influence of ethnic groups, real estate developers, and urban planners (city government). Neighborhoods grow and shrink based on time and social attitudes. Residents in adjacent homes may think of themselves as living in different neighborhoods even if there are no clear physical markers to indicate any difference between the neighborhood identity of their properties.

City governments often create maps showing clear neighborhood boundaries for planning purposes, but some residents of those neighborhoods may have a different opinion of which neighborhood they belong in. For example, this neighborhood map created by the City of Chicago was based on a field survey from 1978 asking randomly selected residents what neighborhood they lived in:

Neighborhoods in Chicago, IL, USA (The City of Chicago 2016)

Area Identifiers

Names

Once a set of areas have been defined, you need a way of identifying individual areas. Perhaps the most common way of identifying specific areas is with names. Countries have names, cities have names, and neighborhoods have names.

A challenge of using names with geographic information systems is that GIS uses coordinate systems (usually latitude and longitude) to represent locations. Names often do not uniquely identify a specific latitudes and longitudes on the surface of the planet.

The process of coverting place names to latitudes and longitudes is called geocoding. This requires large databases and complex algorithms to deal with the ideosyncracies of place names, such as:

The 28 different counties named Washington across the US

Latitudes and Longitudes

The polygons used to represent discrete areas are created using points at specific latitudes and longitudes that are connected using lines. In some cases, one feature may contain multiple polygons for things like separate islands, or to represent holes in polygons, such as for bodies of water.

To simplify identification of areas, these collections of points are often generalized into a single point. This is a common practice with GPS apps when giving directions from one area to another.

These points can be placed at entrances to areas, such as doorways or front gates. However, a more common technique is to place points at centroids, which are locations within polygons that are the geometri center of mass of the polygon.

Centroids

Linear Addresses

Urban areas are commonly referenced using addresses based on a numeric location along a named street. The standard formats of these addresses vary widely by country, and even within countries can have a significant amount of variation.

In the United States, the common format for street addresses is a number followed by a street name. For example, 101 Main Street. Individual apartments or offices at a single street address are often followed by a suite number (101 Main Street #204). The street address then needs an additional city and state name to distinguish it by identical addresses in other cities.

As with place names, street addresses have multiple challenges that make the geocoding of addresses into latitudes and longitudes difficult and imperfect:

Linear addressing

Standardized Codes

The ambiguity associated with place names and linear addresses can be resolved by using standardized coding systems that uniquely identify areas and other types of locations.

ISO Country Codes

At the international level, the International Organization for Standardization (ISO) has defined two-letter and three-letter country codes (ISO 3166-1 alpha-2 and alpha-3) that uniquely identify past and current country boundaries. These codes mitigate some of the challenges associated with country names as place names that were described above.

ISO country codes (large countries)

ZIP Codes

Beginning in 1963, the US Postal Service divided the country into mail delivery service administrative areas that are identified with ZIP codes (Zone Improvement Plan codes). Postal systems in most countries of the world also have similar areas, albeit with different way of identifying those areas. These codes are used when physically mailing letters or packages.

ZIP codes in Chicago, IL, USA (City of Chicago 2020)

FIPS Codes

The United States Federal Information Processing Standards (FIPS) included a set of two-digit state codes (FIPS 5-1 from 1970 and FIPS 5-2 from 1987) and five-digit county codes (FIPS 6-4 from 1990). In 2008, the management of the standard moved from the National Institutes of Standards to ANSI's InterNational Committee for Information Technology Standards, with the county code standard becoming INCITS 31-2009 (US Census Bureau 2020).

The US Census Bureau's GEOID coding system is used to uniquely identify various geographic units in its data files based on these standards, and preserves the FIPS acronym (US Census Bureau 2020).

State FIPS Codes

Local Codes

Local governments use a wide variety of ideosyncratic identifier codes for parcels and structures.

For example, the City of New York's Property Land Use Tax lot Output (PLUTO) system organizes parcels (individual areas of owned land) by block and lot numbers.

PLUTO listing for New York City's Empire State Building (The City of New York 2020)

The US Public Land Survey System

The ceding of vast land areas west of the Mississippi River to the US by the British following the Revolutionary War resulted in a demand for a partitioning scheme that could organize Euro-American settlement and private ownership. The Land Ordinance of 1785 established the US Public Land Survey System that provided a surveying methodology for clearly identifying parcels of land.

The PLSS is based around a set of initial points, of which 35 total were eventually chosen across the western and southern United States:

The Public Land Survey System (USGS 2017)

A specific parcel is then specified with the following convention IN REVERSE ORDER:

While archaic and fraught with idiosyncrasies that reflect both the technical and political limitations of its era, the PLSS is still the basis for property records in some parts of the western US. Similar surveying systems also persist as part of property records even in locations not covered by the PLSS. Therefore, the PLSS still remains significant two and a half centuries after its creation.

Uncertainty

Data is a simplified abstraction that represents the infinitely-complex real world. The process of capturing data by translating the real world into an abstraction always introduces some level of uncertainty about the correspondence of our data to the facts on the ground.

Uncertainty means that something is "not clearly identified or defined" (Merriam-Webster 2021). While the rigorous computational technology of GIS implies absolute truth, uncertainty means that you need to interpret geospatial data with an understanding that the data may deviate from the actual ground truth, or the representations of that data may give a mistaken impression of the actual ground truth.

Since the outputs of GIS are often used to guide decision-making, it is important for the GIS professional to communicate uncertainty so the decision-makers understand the limitations of the information they have been given. Failure to do this can be an ethical issue that can result in harm. Negligence in adhering to professional protocols can result in legal liability.

Longley et al. (2015, 99 - 127) define three levels of uncertainty in geospatial data that are particularly relevant when working with areas: conception, representation, and analysis.

Conceptual Uncertainty

The abstract polygons used to represent areas have exact nodal coordinates and clear lines. Administrative areas usually have stable, rigorously surveyed borders that both lend themselves to representation in GIS and demand use of GIS to preserve accuracy.

However, the borders of formal, functional, and vernacular areas are often less clear. In order to draw borders around a phenomenon you need to be able to classify it into specific categories that will form clear boundaries. Such clarity is often unavailable because categories all have some measure of the following qualities:

Representational Uncertainty

The way in which geospatial data is stored in GIS can add additional uncertainty.

Topology: clean (left), gap / sliver (center), overlap (right)

Analytical Uncertainty

Finally, the way that areas are analyzed or interpreted adds additional uncertainty.

The Modifiable Areal Unit Problem (MAUP): As described above, the choice of where to draw area boundaries is subjective and, in the case of administrative boundaries, determined by factors other than the phenomenon being analyzed. Using different sets of areas can result in different analytical results, even when the underlying phenomenon is the same.

This is the modifiable areal unit problem. here is no universally applicable solution to the MAUP. In cases where accuracy is not essential (such as with data exploration) or possible (such as with data that has been aggregated for privacy reasons), the the uncertainty presented by the MAUP may be acceptable as long as caveats are provided along with the analysis. In other cases, more-sophisticated analytical approaches may be more appropriate.

For example, demographic information is often aggregated at the state or county level. This groups large collections of dissimilar people into a smaller set of numbers that may hide important differences in specific areas. If you change the boundaries (especially by subdividing large areas into smaller ones), your analysis will change, even though the facts on the ground are the same.

The Modifiable Areal Unit Problem

Metadata: Metadata is information about a dataset, such as the date(s) represented, the original source / author, the definitions for the numeric units, etc. Because metadata is stored and created separately from the data, it is frequently missing, out of date, or incorrect. Even when present, metadata is often ignored. This can result in uncertain analysis because the data upon which that analysis is based is unreliable, out-of-date, or imappropriate.

Describing Area Patterns

Objects in space often tend to cluster together with other like objects. For example, wealthy neighborhoods are clusters of expensive houses and poor neighborhoods are clusters of more modest dwellings. Habitats can be more hospitable to different species of flora and fauna, and so different types of plants and animal species often live close together.

There are a variety of ways to describe the spatial distribution patterns of phenomena that both make complex phenomena more understandable. Those descriptions can also be a starting point for more rigorous analysis.

One of the primary ways of analyzing the spatial distribution of phenomenon is looking where points group together, or where high or low values are clustered together. A more formal term for clustering is spatial autocorrelation, which is the amount of clustering within a single set of values within a specific space.

Regional Descriptions

One way of describing patterns is by indicating what regions of have higher levels or a greater concentration of a phenomenon than other areas.

For example, the map below of projected annual precipitation changes in 2040-2059 based on the RCP 6.0 climate change scenario, the desert southwest region is modeled to see a reduction of up to 50 mm per year, while the Appalachian region in the northeastern part of the country is projected to be wetter, with 100 - 150 mm of precipitation above the historic average.

Projected precipitation change 2040-2059 under the RCP 6.0 climate change scenario (National Center for Atmospheric Research (NCAR) and ESRI)

Core-Based Descriptions

Functional areas (core-based clusters) represent spatial correlation between a value and proximity to core areas.

For example, in the State of Illinois, higher monthly homeowner mortgage costs tend to cluster in major urban areas. This variable exhibits strong autocorrelation with itself and strong correlation with proximity to urban cores (regional description).

Median monthly mortgage costs in Illinois 2014-2018 (US Census Bureau American Community Survey)

Moran's I

While visible observation of spatial autocorrelation is useful when exploring data, more-rigorous means of quantifying autocorrelation are helpful when performing serious research.

Patrick Alfred Pierce Moran (1950) developed a technique for quantifying autocorrelation. Values of the Moran's I statistic varies from -1 (evenly dispersed = evenly spread out) to 0 (no autocorrelation) to +1 (completely clustered).

Simplified examples of different values of the Moran's I statistic

In a real-world Illinois example, clustering of median household income has high autocorrelation (I = 0.54), with high income counties largely clustered in a ring around Chicago, and low income counties clustered throughout the state.

Median household income by county in Illinois 2014 - 2018 (USCB American Community Survey 2020)

In contrast, the percentage of people who work at home exhibits no significant autocorrelation (I = 0.09) without either regular dispersion or regular clustering.

Percentage of workers who work at home by county in Illinois 2014 - 2018 (USCB American Community Survey 2020)

(R script for calculating and mapping Moran's I)