Proximity Analysis in ArcGIS Pro
Rev. 14 October 2024
Proximity analysis is the extraction of information from geospatial data based on distance between features. Proximity analysis is especially useful for evaluating two characteristics:
- Influence (core to periphery): Projection of materials or power from central locations outward, such as political control, chemical emissions, or habitat extent
- Accessibility (periphery to core): Ability to reach desirable destinations like transit hubs or green space
Proximity analysis assumes distance decay, where the power of the relationship between locations decreases as distance between locations increases. Distance decay is a fundamental assumption behind gravity models.
- With desirable characteristics (like appeal to tourists), distance decay means that distant locations are less desirable than closer locations (closer locations require less travel effort).
- With undesirable characteristics (like toxic emissions), this means that distant locations are more desirable than closer locations (pollution disperses over space and is less intense the further you are from the source).
Three methods involving proximity are covered in this tutorial.
Categorical Proximity
Categorical proximity analysis divides areas into two classes (adjacent and non-adjacent) based on proximity to specific locations.
This example will demostrate categorical proximity analysis between petroleum refineries and census tracts to answer the question, "How are neighborhoods adjacent to refineries demographically different from neighborhoods that are not adjacent to refineries?"
Location Data: Refineries
Proximity analysis can be performed on two sets of features, either of which can be points, lines, or polygons.
For this example, we will analyze proximity between points and areas.
Petroleum refineries are large facilities that transform crude oil pumped from the ground into gasoline and a wide variety of other products that are essential to contemporary life in the United States.
The example point locations will be petroleum refinery locations from the Energy Information Administration (EIA) U.S. Energy Atlas.
- Give the initial map a meaningful name (Refineries Map).
- Under Analysis and Tools, use the Export Features tool to copy the data from the EIA website into a new feature class in the project geodatabase.
- Inport Features: Copy the GeoServices link from the EIA website. Remove the ending query? and everything after it.
- Output Feature Class: Provide a meaningful name (Refineries).
- Under Filter select the features based on the State field for the state you wish to analyze (Illinois).
Area Data: Census Tracts
The areas we use or this example will be census tracts in Illinois with demographic data.
The American Community Survey (ACS), an ongoing survey by the US Census Bureau that provides information about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.
Census tracts are subdivisions of counties that are drawn by the US Census Bureau based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019). The census tract is the smallest reliable area of aggregation in data made public by the US Census Bureau.
- Under Analysis and Tools, use the Export Features tool to copy the data from census tract feature service into a new feature class in the project geodatabase.
- Inport Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Tracts feature layer.
- Output Feature Class: Provide a meaningful name (Tracts).
- Under Filters select the features based on the ST field for the state you wish to analyze (IL).
Model Builder
While you can run the analysis tools manually, it can be helpful to use a ModelBuilder diagram that will save you from repeated typing as you debug your workflow.
Use of ModelBuilder also preserves your workflow so that you can update and reproduce your analysis in the future.
Note that you will probably wish to run the Export Features steps manually rather than in your ModelBuilder diagram because saving project packages with references to feature services can be very slow while the packaging process downloads all data from the feature service to incorporate in the package.
Buffer and Overlay
One common technique for categorical proximity analysis is buffer and overlay, where buffer polygons (usually circles) of a fixed size or radius are placed around locations and those buffers are overlaid over another set of areas to find areas that are adjacent to the buffered locations.
Oil refineries emit a wide variety of toxic chemicals (UCAR 2023). Accordingly, we might hypothesize that the areas around oil refineries would be considered undesirable places to live, reducing the property values and rents in those areas and attracting marginalized populations without the economic means to live in more desirable locations.
To represent the presumed undesirable area around refineries, we will create ten mile (16,000 meter) buffers corresponding to research finding increased cancer risk in the ten miles around refineries (Williams et al. 2020).
- Add the Buffer tool to create the buffers.
- Input Features: Refineries
- Output Feature Class: Refinery Buffers
- Distance: 16000 Meters
- Add the Summarize Within tool to count the number of refineries that intersect each tract as an estimate of exposure to emissions.
- Input Polygons: Tracts
- Input Summary Features: Refinery_Buffers
- Output Feature Class: Refinery_Tracts
- Check Keep All Input Polygons so that tracts with no refineries are retained for mapping.
- Add the Calculate Field tool to create a categorical field that classifies each tract as refinery or non-refinery.
- Input Table: Refinery_Tracts
- Field Name: Refinery_Adjacent
- Field Type: Text
- Expression: reclass(!Polygon_Count!)
- Code Block:
- Change the Symbology to the adjacency field
- Primary symbology: Unique Values
- Field: Refinery_Adjacent
- Reset colors
def reclass(Count): if Count > 0: return "Refinery" else: return "Non-Refinery"
Group Comparison Table
We can create a table showing the differences between summarized demographic value based on the Refinery_Adjacent field.
You can then review the table to find demographic characteristics that are significantly different between the refinery and non-refinery tracts.
- Add the Summarize Attributes tool.
- Input Layer: Refinery_Tracts
- Output Name: Refinery_Summary
- Fields: Refinery_Adjacent
- Summary Fields: Select quantitative demographics and show the median.
Group Comparison Box Plot
We can also use box plots to compare distributions of attribute values between groups. For this example we use a box plot to compare percent foreign born between refinery and non-refinery tracts.
Although the difference between the medians of the two groups indicates some general association of higher percentages of foreign born residents with proximity to refineries, the overlap between ranges indicates that the association is vague and not predictive.
- In the Contents pane, right click on the Refinery_Tracts layer.
- Select Create Chart and Box Plot
- Numeric field(s): Percent Foreign Born
- Categories: Refinery_Adjacent
- Check Show outliers
Limitations of Buffer and Overlay
Use of simple circular buffers operates on the assumption of influence of or accessibility to a point evenly around the point.
In the case of dispersions of toxic emissions a transport model is needed to model wind and water flow patterns and how they transport the toxics. These models can be quite complex (Beyea and Hatch 1999).
Weighted Proximity Analysis
Weighted proximity analysis uses distance between features to quantify proximity. These continuous distance values (wieghts) can then be used with techniques like multiple regression.
This example will demonstrate weighted proximity analysis between Amtrak stations and counties to answer the questions:
- Which counties have the greatest and least access to Amtrak service?
- What county demographic characteristics are associated with closer Amtrak service?
Hub and spoke analysis is a weighted proximity analysis technique that analyzes accessibility to a small number of centralized hubs where distance from peripheral points to the closest hub is calculated with imaginary spoke lines. Hub-and-spoke is useful for estimating or optimizing accessibility for hubs like distribution centers or transit stops.
Location Data: Amtrak Stations
The destination points used for these examples will be Amtrak stations from the US Bureau of Transportation Statistics Geospatial at BTS Open Data Catalog.
Amtrak is a quasi-public US passenger railroad corporation that was formed by the US federal government in 1971 to take over US intercity passenger service when private railroads were facing economic crisis and were eager to discontinue unprofitable passenger operations (Minn 2016).
Unlike transit which has utility only when within a convenient walking or driving distance, Amtrak carries a significant number of leisure riders where the value is the journey as much or more than the destination (Losada-Rojas et al. 2019). This hub and spoke analysis assumes that some Amtrak riders may be willing to drive considerable distance to access service, although longer distances would make the trip less desirable (distance decay).
As above, you could use the feature service for this data directly, but copying it into a new feature class will likely improve speed and reliability.
- Insert a new Map and give it a meaningful name (Amtrak Map)
- Go to the National Transportation Atlas Database and search for Amtrak Stations.
- Click I want to use this, View API Resources, and copy the link from the GeoService. Remove the "query?" and all subsequent text from the link.
- Under Analysis and Tools, open the Export Features tool to copy the station information from the feature service into the project geodatabase.
- Input Features: In ArcGIS Online, find the Amtrak Rail Stations feature layer from the Federal User Community organization.
- Output Feature Class: Stations
- The tool should read the features into the new feature class in a few seconds.
Area Data: Counties
Counties in the United States are geographical divisions of states for government administration.
- Under Analysis and Tools, open the Export Features tool.
- Input Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Counties layer.
- Output Feature Class: Counties
- Add Filters to exclude Alaska and Hawaii, which have no Amtrak service.
Proximity Calculation
- Add the Near tool to find the distance between each county and the nearest Amtrak station.
- Input Features: Counties
- Near Features: Stations
- Search Radius: 1000 Kilometers
- Field Names Property: Distance
- Field Name: Km_to_Station
- Distance Unit: Kilometers
- Update the symbology to symbolize by Km_to_Station.
Underserved Areas
Underserved counties that have high populations but distant service can be identified by multiplying population by distance.
Two groups of counties stand out on the list: moderate population counties (such as the counties in the Texas Rio Grande valley) that are far from an Amtrak station, and high population counties (such as Los Angeles or San Diego) that have a limited number of large stations.
- Add the Calculate Field tool.
- Input Table: Counties
- Field Name: Person_Km
- Field Type: Double
- Expression: !Total Population! * !Km to Station!
- Duplicate the layer and symbolize by the Person_km attribute.
- Find a threshold value in the sorted Attribute Table and add a Definition Query to display only the top 20 underserved counties.
- Add labels with county names.
Exploratory Regression
You can perform exploratory regression to find the county demographic characteristics most associated with proximity to Amtrak stations.
Exploratory regression is implemented in the Exploratory Regression tool:
- Input Features: Counties
- Dependent Variable: Km_to_Station
- Candidate Explanatory Variables: Choose derived demographic amount variables (as opposed to count variables like population) that you hypothesize may be associated with distance to Amtrak stations.
- Report File: Give a name for your file at a specific path (Exploratory.txt).
- This tool may take a few minutes to run depending on the size of your data set and the number of independent variables you are exploring.
For this set of variables, the strongest models have an R2 of 0.15, indicating only a weak general association of any combination of variables with distance to the nearest Amtrak station.
Limitations of Weighted Proximity Analysis
Use of Euclidean distance assumes direct access to locations, while actual walking distance and effort is affected by street grids, physical obstacles (like roads, buildings, and private property), and topographical barriers (like creeks or steep hills).
Use of counties invokes the modifiable areal unit problem since distances calculated by the Near tool are based on the centroid of the county, and actual individual travel distances to stations may vary considerably within a county, especially if the county is large.
Centrographics
Centrography is "statistical analyses concerned with centers of population, median centers, median points, and related methods" (Sviatlovsky and Eells 1939). Much like general statistical measures of central tendency, centrographic measures can be useful for summarizing large amounts of data, assessing change over time, and optimizing proximity.
Location Data: Major League Ballparks
This example will use the locations of major league baseball parks in the US.
- Download the CSV file.
- Under Analysis and Tools, open the XY Table to Point tool to import the ballpark data into a new feature class in the project geodatabase.
- Input Table: Find the CSV in your local storage
- Output Feature Class: Ballparks
- X Field: Longitude
- Y Field: Latitude
The mean center is the point at the mean of all latitudes and mean of all longitudes for a group of features.
When working with points of interest, the mean center represents an optimal location with the minimum amount of Euclidean distance to each point of interest.
One use for a mean center is to find a central location that would minimize travel to a centralized location from a group of geographically dispersed cities. If baseball general managers were planning on having a group meeting, or if a championship game were planned at a neutral site, the mean center might be a good choice.
The Mean Center tool creates a new feature class with one point at the mean center.
The optional weighting parameter can be used to specify a field that indicates which locations are more important than others.
The mean center of US major league ballparks is in Missouri between Kansas City and St. Louis at lat/long 39.18905, -92.63477. This is slightly north of the 2020 census center of population in south central Missouri at 37.415725, -92.346525 (USCB 2021).
Because any meeting of any size would presumably need to be in a major city with copious lodging and a major airport, St. Louis or Kansas City would be suitable locations.
Limitations
The major caveat with mean centers is that they may not represent optmal travel distance since transportation networks or physical obstacles may impose indirect routing and access.