Proximity Analysis in ArcGIS Pro

Rev. 1 February 2025

Proximity analysis is the extraction of information from geospatial data based on distance between features. Proximity analysis is especially useful for evaluating two characteristics:

Influence (core to periphery): Projection of materials or power from central locations outward, such as political control, chemical emissions, or habitat extent
Accessibility (periphery to core): Ability to reach desirable destinations like transit hubs or green space

Proximity analysis assumes distance decay, where the power of the relationship between locations decreases as distance between locations increases. Distance decay is a fundamental assumption behind gravity models.

With desirable characteristics (like appeal to tourists), distance decay means that distant locations are less desirable than closer locations (closer locations require less travel effort).
With undesirable characteristics (like toxic emissions), this means that distant locations are more desirable than closer locations (pollution disperses over space and is less intense the further you are from the source).

Three methods involving proximity are covered in this tutorial.

Categorical proximity (buffer and overlay)
Weighted proximity (hub and spoke)
Centrographics

Categorical Proximity

Categorical proximity analysis divides areas into two classes (adjacent and non-adjacent) based on proximity to specific locations.

This example will demostrate categorical proximity analysis between petroleum refineries and census tracts to answer the question, "How are neighborhoods adjacent to refineries demographically different from neighborhoods that are not adjacent to refineries?"

Location Data: Refineries

Proximity analysis can be performed on two sets of features, either of which can be points, lines, or polygons.

For this example, we will analyze proximity between points and areas.

Petroleum refineries are large facilities that transform crude oil pumped from the ground into gasoline and a wide variety of other products that are essential to contemporary life in the United States.

Exxon Mobil Refinery, Baton Rouge, Louisiana (Wikipedia 2017)

The example point locations will be petroleum refinery locations from the Energy Information Administration (EIA) U.S. Energy Atlas.

Give the initial map a meaningful name (Refineries Map).
Under Analysis and Tools, use the Export Features tool to copy the data from the EIA website into a new feature class in the project geodatabase.
Inport Features: Copy the GeoServices link from the EIA website. Remove the ending query? and everything after it.
Output Feature Class: Provide a meaningful name (Refineries).
Under Filter select the features based on the State field for the state you wish to analyze (Illinois).

Importing refinery data into the project geodatabase

Area Data: Census Tracts

The areas we use or this example will be census tracts in Illinois with demographic data.

The American Community Survey (ACS), an ongoing survey by the US Census Bureau that provides information about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.

Census tracts are subdivisions of counties that are drawn by the US Census Bureau based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019). The census tract is the smallest reliable area of aggregation in data made public by the US Census Bureau.

Under Analysis and Tools, use the Export Features tool to copy the data from census tract feature service into a new feature class in the project geodatabase.
Inport Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Tracts feature layer.
Output Feature Class: Provide a meaningful name (Tracts).
Under Filters select the features based on the ST field for the state you wish to analyze (IL).

Exporting census tract data into the project geodatabase

Model Builder

While you can run the analysis tools manually, it can be helpful to use a ModelBuilder diagram that will save you from repeated typing as you debug your workflow.

Use of ModelBuilder also preserves your workflow so that you can update and reproduce your analysis in the future.

Note that you will probably wish to run the Export Features steps manually rather than in your ModelBuilder diagram because saving project packages with references to feature services can be very slow while the packaging process downloads all data from the feature service to incorporate in the package.

Creating and renaming a blank ModelBuilder diagram

Buffer and Overlay

One common technique for categorical proximity analysis is buffer and overlay, where buffer polygons (usually circles) of a fixed size or radius are placed around locations and those buffers are overlaid over another set of areas to find areas that are adjacent to the buffered locations.

Oil refineries emit a wide variety of toxic chemicals (UCAR 2023). Accordingly, we might hypothesize that the areas around oil refineries would be considered undesirable places to live, reducing the property values and rents in those areas and attracting marginalized populations without the economic means to live in more desirable locations.

To represent the presumed undesirable area around refineries, we will create ten mile (16,000 meter) buffers corresponding to research finding increased cancer risk in the ten miles around refineries (Williams et al. 2020).

Add the Buffer tool to create the buffers.

Input Features: Refineries
Output Feature Class: Refinery Buffers
Distance: 16000 Meters

Add the Summarize Within tool to count the number of refineries that intersect each tract as an estimate of exposure to emissions.

Input Polygons: Tracts
Input Summary Features: Refinery_Buffers
Output Feature Class: Refinery_Tracts
Check Keep All Input Polygons so that tracts with no refineries are retained for mapping.

Add the Calculate Field tool to create a categorical field that classifies each tract as refinery or non-refinery.

Input Table: Refinery_Tracts
Field Name: Refinery_Adjacent
Field Type: Text
Expression: reclass(!Polygon_Count!)
Code Block:

def reclass(Count):
	if Count > 0:
		return "Refinery"
	else:
		return "Non-Refinery"

Change the Symbology to the adjacency field

Primary symbology: Unique Values
Field: Refinery_Adjacent
Reset colors

Creating and joining ten-mile refinery buffers with census tracts

Group Comparison Table

We can create a table showing the differences between summarized demographic value based on the Refinery_Adjacent field.

You can then review the table to find demographic characteristics that are significantly different between the refinery and non-refinery tracts.

Add the Summarize Attributes tool.
Input Layer: Refinery_Tracts
Output Name: Refinery_Summary
Fields: Refinery_Adjacent
Summary Fields: Select quantitative demographics and show the median.

Summarizing demographics between refinery and non-refinery tracts

Group Comparison Box Plot

We can also use box plots to compare distributions of attribute values between groups. For this example we use a box plot to compare percent foreign born between refinery and non-refinery tracts.

Although the difference between the medians of the two groups indicates some general association of higher percentages of foreign born residents with proximity to refineries, the overlap between ranges indicates that the association is vague and not predictive.

In the Contents pane, right click on the Refinery_Tracts layer.
Select Create Chart and Box Plot

Numeric field(s): Percent Foreign Born
Categories: Refinery_Adjacent
Check Show outliers

Comparing percent foreign-born refinery and non-refinery tracts

Limitations of Buffer and Overlay

Use of simple circular buffers operates on the assumption of influence of or accessibility to a point evenly around the point.

In the case of dispersions of toxic emissions a transport model is needed to model wind and water flow patterns and how they transport the toxics. These models can be quite complex (Beyea and Hatch 1999).

Weighted Proximity Analysis

Weighted proximity analysis uses distance between features to quantify proximity. These continuous distance values (wieghts) can then be used with techniques like multiple regression.

This example will demonstrate weighted proximity analysis between Amtrak stations and counties to answer the questions:

Which counties have the greatest and least access to Amtrak service?
What county demographic characteristics are associated with closer Amtrak service?

Hub and spoke analysis is a weighted proximity analysis technique that analyzes accessibility to a small number of centralized hubs where distance from peripheral points to the closest hub is calculated with imaginary spoke lines. Hub-and-spoke is useful for estimating or optimizing accessibility for hubs like distribution centers or transit stops.

Location Data: Amtrak Stations

The destination points used for these examples will be Amtrak stations from the US Bureau of Transportation Statistics Geospatial at BTS Open Data Catalog.

Amtrak is a quasi-public US passenger railroad corporation that was formed by the US federal government in 1971 to take over US intercity passenger service when private railroads were facing economic crisis and were eager to discontinue unprofitable passenger operations (Minn 2016).

Unlike transit which has utility only when within a convenient walking or driving distance, Amtrak carries a significant number of leisure riders where the value is the journey as much or more than the destination (Losada-Rojas et al. 2019). This hub and spoke analysis assumes that some Amtrak riders may be willing to drive considerable distance to access service, although longer distances would make the trip less desirable (distance decay).

Boarding Amtrak's *Crescent* in Birmingham, Alabama (Minn 2007)

As above, you could use the feature service for this data directly, but copying it into a new feature class will likely improve speed and reliability.

Insert a new Map and give it a meaningful name (Amtrak Map)
Go to the National Transportation Atlas Database and search for Amtrak Stations.

Click I want to use this, View API Resources, and copy the link from the GeoService. Remove the "query?" and all subsequent text from the link.

Under Analysis and Tools, open the Export Features tool to copy the station information from the feature service into the project geodatabase.

Input Features: In ArcGIS Online, find the Amtrak Rail Stations feature layer from the Federal User Community organization.
Output Feature Class: Stations
The tool should read the features into the new feature class in a few seconds.

Importing Amtrak stations

Area Data: Counties

Counties in the United States are geographical divisions of states for government administration.

Under Analysis and Tools, open the Export Features tool.
Input Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Counties layer.
Output Feature Class: Counties
Add Filters to exclude Alaska and Hawaii, which have no Amtrak service.

Importing county data

Proximity Calculation

Add the Near tool to find the distance between each county and the nearest Amtrak station.

Input Features: Counties
Near Features: Stations
Search Radius: 1000 Kilometers
Field Names Property: Distance
Field Name: Km_to_Station
Distance Unit: Kilometers

Update the symbology to symbolize by Km_to_Station.

Finding the distance between counties to the nearest Amtrak station

Underserved Areas

Underserved counties that have high populations but distant service can be identified by multiplying population by distance.

Two groups of counties stand out on the list: moderate population counties (such as the counties in the Texas Rio Grande valley) that are far from an Amtrak station, and high population counties (such as Los Angeles or San Diego) that have a limited number of large stations.

Add the Calculate Field tool.

Input Table: Counties
Field Name: Person_Km
Field Type: Double
Expression: !Total Population! * !Km to Station!

Duplicate the layer and symbolize by the Person_km attribute.
Find a threshold value in the sorted Attribute Table and add a Definition Query to display only the top 20 underserved counties.
Add labels with county names.

Identifying counties with high population underserved by Amtrak

Exploratory Regression

Exploratory regression is an automated approach to regression analysis which involves trying all possible combinations of predictor variables to find the best models.

Exploratory regression violates the philosophy behind the (deductive) classical scientific method where you begin with a hypothesis and then use your models to test your hypothesis.

However, in situations where those fundamental processes are not well understood, inductive analysis with tools like exploratory regression can be useful for giving new insights that inform the development of hypotheses that can then be tested on other data sets (ESRI 2023).

You can perform exploratory regression to find the county demographic characteristics most associated with proximity to Amtrak stations.

Exploratory regression is implemented in the Exploratory Regression tool:

Input Features: Counties
Dependent Variable: Km_to_Station
Candidate Explanatory Variables: Choose derived demographic amount variables (as opposed to count variables like population) that you hypothesize may be associated with distance to Amtrak stations.
Report File: Give a name for your file at a specific path (Exploratory.txt).
This tool may take a few minutes to run depending on the size of your data set and the number of independent variables you are exploring.

For this set of variables, the strongest models have an R² of 0.15, indicating only a weak general association of any combination of variables with distance to the nearest Amtrak station.

Exploratory regression

Limitations of Weighted Proximity Analysis

Use of Euclidean distance assumes direct access to locations, while actual walking distance and effort is affected by street grids, physical obstacles (like roads, buildings, and private property), and topographical barriers (like creeks or steep hills).

Use of counties invokes the modifiable areal unit problem since distances calculated by the Near tool are based on the centroid of the county, and actual individual travel distances to stations may vary considerably within a county, especially if the county is large.

Centrographics

Centrography is "statistical analyses concerned with centers of population, median centers, median points, and related methods" (Sviatlovsky and Eells 1939). Much like general statistical measures of central tendency, centrographic measures can be useful for summarizing large amounts of data, assessing change over time, and optimizing proximity.

Location Data: Major League Ballparks

This example will use the locations of major league baseball parks in the US.

Download the CSV file.
Under Analysis and Tools, open the XY Table to Point tool to import the ballpark data into a new feature class in the project geodatabase.

Input Table: Find the CSV in your local storage
Output Feature Class: Ballparks
X Field: Longitude
Y Field: Latitude

Loading the ballpark CSV file

The mean center is the point at the mean of all latitudes and mean of all longitudes for a group of features.

When working with points of interest, the mean center represents an optimal location with the minimum amount of Euclidean distance to each point of interest.

One use for a mean center is to find a central location that would minimize travel to a centralized location from a group of geographically dispersed cities. If baseball general managers were planning on having a group meeting, or if a championship game were planned at a neutral site, the mean center might be a good choice.

The Mean Center tool creates a new feature class with one point at the mean center.

The optional weighting parameter can be used to specify a field that indicates which locations are more important than others.

The mean center of US major league ballparks is in Missouri between Kansas City and St. Louis at lat/long 39.18905, -92.63477. This is slightly north of the 2020 census center of population in south central Missouri at 37.415725, -92.346525 (USCB 2021).

Because any meeting of any size would presumably need to be in a major city with copious lodging and a major airport, St. Louis or Kansas City would be suitable locations.

Mean center

Limitations

The major caveat with mean centers is that they may not represent optmal travel distance since transportation networks or physical obstacles may impose indirect routing and access.