Proximity Analysis in ArcGIS Pro

Proximity analysis is the extraction of information from geospatial data based on distance between features. Proximity analysis is especially useful for evaluating two characteristics:

Proximity analysis assumes distance decay, where the power of the relationship between locations decreases as distance between locations increases. Distance decay is a fundamental assumption behind gravity models.

Three methods involving proximity are covered in this tutorial.

Categorical Proximity

Categorical proximity analysis divides areas into two classes (adjacent and non-adjacent) based on proximity to specific locations.

This example will demostrate categorical proximity analysis between petroleum refineries and census tracts to answer the question, "How are neighborhoods adjacent to refineries demographically different from neighborhoods that are not adjacent to refineries?"

Location Data: Refineries

Proximity analysis can be performed on two sets of features, either of which can be points, lines, or polygons.

For this example, we will analyze proximity between points and areas.

Petroleum refineries are large facilities that transform crude oil pumped from the ground into gasoline and a wide variety of other products that are essential to contemporary life in the United States.

Figure
Exxon Mobil Refinery, Baton Rouge, Louisiana (Wikipedia 2017)

The example point locations will be petroleum refinery locations from the Energy Information Administration (EIA) U.S. Energy Atlas.

  1. Give the initial map a meaningful name (Refineries Map).
  2. Under Analysis and Tools, use the Export Features tool to copy the data from the EIA website into a new feature class in the project geodatabase.
  3. Inport Features: Copy the GeoServices link from the EIA website. Remove the ending query? and everything after it.
  4. Output Feature Class: Provide a meaningful name (Refineries).
  5. Under Filter select the features based on the State field for the state you wish to analyze (Illinois).
Importing refinery data into the project geodatabase

Area Data: Census Tracts

The areas we use or this example will be census tracts in Illinois with demographic data.

The American Community Survey (ACS), an ongoing survey by the US Census Bureau that provides information about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.

Census tracts are subdivisions of counties that are drawn by the US Census Bureau based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019). The census tract is the smallest reliable area of aggregation in data made public by the US Census Bureau.

  1. Under Analysis and Tools, use the Export Features tool to copy the data from census tract feature service into a new feature class in the project geodatabase.
  2. Inport Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Tracts feature layer.
  3. Output Feature Class: Provide a meaningful name (Tracts).
  4. Under Filters select the features based on the ST field for the state you wish to analyze (IL).
Exporting census tract data into the project geodatabase

Model Builder

While you can run the analysis tools manually, it can be helpful to use a ModelBuilder diagram that will save you from repeated typing as you debug your workflow.

Use of ModelBuilder also preserves your workflow so that you can update and reproduce your analysis in the future.

Note that you will probably wish to run the Export Features steps manually rather than in your ModelBuilder diagram because saving project packages with references to feature services can be very slow while the packaging process downloads all data from the feature service to incorporate in the package.

Creating and renaming a blank ModelBuilder diagram

Buffer and Overlay

One common technique for categorical proximity analysis is buffer and overlay, where buffer polygons (usually circles) of a fixed size or radius are placed around locations and those buffers are overlaid over another set of areas to find areas that are adjacent to the buffered locations.

Oil refineries emit a wide variety of toxic chemicals (UCAR 2023). Accordingly, we might hypothesize that the areas around oil refineries would be considered undesirable places to live, reducing the property values and rents in those areas and attracting marginalized populations without the economic means to live in more desirable locations.

To represent the presumed undesirable area around refineries, we will create ten mile (16,000 meter) buffers corresponding to research finding increased cancer risk in the ten miles around refineries (Williams et al. 2020).

  1. Add the Buffer tool to create the buffers.
  2. Add the Summarize Within tool to count the number of refineries that intersect each tract as an estimate of exposure to emissions.
  3. Add the Calculate Field tool to create a categorical field that classifies each tract as refinery or non-refinery.
  4. Change the Symbology to the adjacency field
Creating and joining ten-mile refinery buffers with census tracts

Group Comparison Table

We can create a table showing the differences between summarized demographic value based on the Refinery_Adjacent field.

You can then review the table to find demographic characteristics that are significantly different between the refinery and non-refinery tracts.

Summarizing demographics between refinery and non-refinery tracts

Group Comparison Box Plot

We can also use box plots to compare distributions of attribute values between groups. For this example we use a box plot to compare percent foreign born between refinery and non-refinery tracts.

Although the difference between the medians of the two groups indicates some general association of higher percentages of foreign born residents with proximity to refineries, the overlap between ranges indicates that the association is vague and not predictive.

  1. In the Contents pane, right click on the Refinery_Tracts layer.
  2. Select Create Chart and Box Plot
Comparing percent foreign-born refinery and non-refinery tracts

Limitations of Buffer and Overlay

Use of simple circular buffers operates on the assumption of influence of or accessibility to a point evenly around the point.

In the case of dispersions of toxic emissions a transport model is needed to model wind and water flow patterns and how they transport the toxics. These models can be quite complex (Beyea and Hatch 1999).

Figure
Refinery analysis ModelBuilder diagram

Weighted Proximity Analysis

Weighted proximity analysis uses distance between features to quantify proximity. These continuous distance values (wieghts) can then be used with techniques like multiple regression.

This example will demonstrate weighted proximity analysis between Amtrak stations and counties to answer the questions:

Hub and spoke analysis is a weighted proximity analysis technique that analyzes accessibility to a small number of centralized hubs where distance from peripheral points to the closest hub is calculated with imaginary spoke lines. Hub-and-spoke is useful for estimating or optimizing accessibility for hubs like distribution centers or transit stops.

Figure
Hub-and-spoke analysis

Location Data: Amtrak Stations

The destination points used for these examples will be Amtrak stations from the US Bureau of Transportation Statistics Geospatial at BTS Open Data Catalog.

Amtrak is a quasi-public US passenger railroad corporation that was formed by the US federal government in 1971 to take over US intercity passenger service when private railroads were facing economic crisis and were eager to discontinue unprofitable passenger operations (Minn 2016).

Unlike transit which has utility only when within a convenient walking or driving distance, Amtrak carries a significant number of leisure riders where the value is the journey as much or more than the destination (Losada-Rojas et al. 2019). This hub and spoke analysis assumes that some Amtrak riders may be willing to drive considerable distance to access service, although longer distances would make the trip less desirable (distance decay).

Figure
Boarding Amtrak's Crescent in Birmingham, Alabama (Minn 2007)

As above, you could use the feature service for this data directly, but copying it into a new feature class will likely improve speed and reliability.

  1. Insert a new Map and give it a meaningful name (Amtrak Map)
  2. Go to the National Transportation Atlas Database and search for Amtrak Stations.
  3. Under Analysis and Tools, open the Export Features tool to copy the station information from the feature service into the project geodatabase.
Importing Amtrak stations

Area Data: Counties

Counties in the United States are geographical divisions of states for government administration.

  1. Under Analysis and Tools, open the Export Features tool.
  2. Input Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Counties layer.
  3. Output Feature Class: Counties
  4. Add Filters to exclude Alaska and Hawaii, which have no Amtrak service.
Importing county data

Proximity Calculation

  1. Add the Near tool to find the distance between each county and the nearest Amtrak station.
  2. Update the symbology to symbolize by Km_to_Station.
Finding the distance between counties to the nearest Amtrak station

Underserved Areas

Underserved counties that have high populations but distant service can be identified by multiplying population by distance.

Two groups of counties stand out on the list: moderate population counties (such as the counties in the Texas Rio Grande valley) that are far from an Amtrak station, and high population counties (such as Los Angeles or San Diego) that have a limited number of large stations.

  1. Add the Calculate Field tool.
  2. Duplicate the layer and symbolize by the Person_km attribute.
  3. Find a threshold value in the sorted Attribute Table and add a Definition Query to display only the top 20 underserved counties.
  4. Add labels with county names.
Identifying counties with high population underserved by Amtrak

Exploratory Regression

You can perform exploratory regression to find the county demographic characteristics most associated with proximity to Amtrak stations.

Exploratory regression is implemented in the Exploratory Regression tool:

For this set of variables, the strongest models have an R2 of 0.15, indicating only a weak general association of any combination of variables with distance to the nearest Amtrak station.

Exploratory regression

Limitations of Weighted Proximity Analysis

Use of Euclidean distance assumes direct access to locations, while actual walking distance and effort is affected by street grids, physical obstacles (like roads, buildings, and private property), and topographical barriers (like creeks or steep hills).

Use of counties invokes the modifiable areal unit problem since distances calculated by the Near tool are based on the centroid of the county, and actual individual travel distances to stations may vary considerably within a county, especially if the county is large.

Figure
Amtrak analysis ModelBuilder diagram

Centrographics

Centrography is "statistical analyses concerned with centers of population, median centers, median points, and related methods" (Sviatlovsky and Eells 1939). Much like general statistical measures of central tendency, centrographic measures can be useful for summarizing large amounts of data, assessing change over time, and optimizing proximity.

Location Data: Major League Ballparks

This example will use the locations of major league baseball parks in the US.

  1. Download the CSV file.
  2. Under Analysis and Tools, open the XY Table to Point tool to import the ballpark data into a new feature class in the project geodatabase.
Loading the ballpark CSV file

The mean center is the point at the mean of all latitudes and mean of all longitudes for a group of features.

When working with points of interest, the mean center represents an optimal location with the minimum amount of Euclidean distance to each point of interest.

One use for a mean center is to find a central location that would minimize travel to a centralized location from a group of geographically dispersed cities. If baseball general managers were planning on having a group meeting, or if a championship game were planned at a neutral site, the mean center might be a good choice.

The Mean Center tool creates a new feature class with one point at the mean center.

The optional weighting parameter can be used to specify a field that indicates which locations are more important than others.

The mean center of US major league ballparks is in Missouri between Kansas City and St. Louis at lat/long 39.18905, -92.63477. This is slightly north of the 2020 census center of population in south central Missouri at 37.415725, -92.346525 (USCB 2021).

Because any meeting of any size would presumably need to be in a major city with copious lodging and a major airport, St. Louis or Kansas City would be suitable locations.

Mean center

Limitations

The major caveat with mean centers is that they may not represent optmal travel distance since transportation networks or physical obstacles may impose indirect routing and access.