Proximity Analysis in ArcGIS Pro
Rev. 19 April 2025
Proximity analysis is the extraction of information from geospatial data based on distance between features. Proximity analysis is especially useful for evaluating two characteristics:
- Influence (core to periphery): Projection of materials or power from central locations outward, such as political control, chemical emissions, or habitat extent
- Accessibility (periphery to core): Ability to reach desirable destinations like transit hubs or green space
Proximity analysis assumes distance decay, where the power of the relationship between locations decreases as distance between locations increases. Distance decay is a fundamental assumption behind gravity models.
- With desirable characteristics (like appeal to tourists), distance decay means that distant locations are less desirable than closer locations (closer locations require less travel effort).
- With undesirable characteristics (like toxic emissions), this means that distant locations are more desirable than closer locations (pollution disperses over space and is less intense the further you are from the source).
Multiple methods involving proximity are covered in this tutorial.
- Example Data
- Categorical Proximity Analysis
- Weighted Proximity Analysis
- Proportional Proximity Analysis
- Aggregated Proximity Analysis
- Centrographics
Example Data
This tutorial will use data for St. Clair County, Illinois, which is a part of the St. Louis metropolitan area on the Illinois side of the Mississippi River.
ACS Tracts
The American Community Survey (ACS), an ongoing survey by the US Census Bureau that provides information about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.
Census tracts are subdivisions of counties that are drawn by the US Census Bureau based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019). The census tract is the smallest reliable area of aggregation in data made public by the US Census Bureau.
This example uses precompiled data from the Minn 2019-2023 ACS feature service in the University of Illinois ArcGIS Online organization.
- Google the county name and FIPS code to find the five-digit FIPS code for the desired county. For this example, the FIPS code for St. Clair County, IL is 17163.
- Construct a GEOIDFQ prefix for the desired tracts. The GEOID for tracts begins with 1400000US, followed by the five digit county FIPS code, followed by the tract ID. Given the FIPS code found above, tract GEOIDs in Cook County begin with 1400000US17163.
- Under Analysis and Tools, use the Export Features tool to copy the data from census tract feature service into a new feature class in the project geodatabase.
- Inport Features: Search in ArcGIS Online for the Minn 2015-2019 ACS Tracts feature layer.
- Output Feature Class: Provide a meaningful name (ACS_Tracts).
- Filter Expression: Add a filter for GEOIDFQ begins with the GEOIDFQ prefix found above.
PLACES Tracts
The CDC's PLACES project uses small area estimation methods to obtain 36 modeled chronic disease measures for counties, tracts, places, and ZCTAs in the United States.
- Under Analysis and Tools, use the Export Features tool to copy the data from census tract feature service into a new feature class in the project geodatabase.
- Inport Features: Search in ArcGIS Online for the Minn 2023 PLACES feature service and select the tracts layer.
- Output Feature Class: Provide a meaningful name (PLACES_Tracts).
- Filter Expression: Add a filter for GEOIDFQ begins with the GEOIDFQ prefix found above.
- Join the ACS and PLACES data into a single feature class using the Join Features tool.
- Target Layer: ACS_Tracts
- Join Layer: PLACES_Tracts
- Output Dataset: Tracts
- Relationship: One to Many (copies all fields into the joined feature class)
- Attribute Relationship: GEOIDFQ (ACS) to GEOID (PLACES)
Interstate Highways
The Topologically Integrated Geographic Encoding and Referencing (TIGER) database is a collection of geospatial polygons maintained by the US Census Bureau.
TIGER/Line data is available as shapefiles, and includes geographic features useful for mapping like roads, railroads, and water features.
- Visit the TIGER/Line shapefiles page and download and unzip the shapefile of Primary and Secondary Roads for the state you are analyzing.
- Zoom your map to the county you will be analyzing.
- Under Analysis and Tools, use the Export Features tool to select and copy features from the shapefile into a new feature class in the project geodatabase.
- Inport Features: Search for the .shp file.
- Output Feature Class: Browse to the project geodatabase and provide a meaningful name (Interstates). Note that you must browse in the database rather than simply typing a name, which will simply copy the features to a new shapefile.
- Filter Expression: Add a filter for RTTYP is equal to I (interstate highways).
- Environment: Set the extent for export to the current map.
Transit Stops
The National Transportation Atlas Database (NTAD) is an ArcGIS Hub portal of transportation data from the US Department of Transportation.
- Go to the NTAD website, search for Transit Stops and zoom to the analysis area containing your transit system.
- Click on one of the stops in your analysis area to find the nta_id for the transit system serving the area (70006).
- Go to I want to use this and API Resources and copy the API endpoint.
- Under Analysis and Tools, use the Export Features tool to select and copy features from the feature service into a new feature class in the project geodatabase.
- Inport Features: Paste the API URL and remove everything from query to the end of the URL.
- Output Feature Class: Provide a meaningful name (Transit).
- Filter Expression: Add a filter for nta_id is equal to the transit system ID you found above.
A backup snapshot of the national transit stops feature class from 8 April 2025 is available here as a zipped shapefile.
ModelBuilder
While you can run the analysis tools manually, it can be helpful to use a ModelBuilder diagram that will save you from repeated typing as you debug your workflow.
Use of ModelBuilder also preserves your workflow so that you can update and reproduce your analysis in the future.
Note that you will probably wish to run the Export Features steps above manually rather than in your ModelBuilder diagram because packaging models with Export Features will copy duplicate data from the source feature service or shapefile into the project package and can dramatically slow the packaging process.
Categorical Proximity Analysis
Categorical proximity analysis divides areas into two classes (adjacent and non-adjacent) based on proximity to specific locations.
This analysis examines the differences in asthma rates between census tracts adjacent to interstate highways. While being essential transportation arteries in contemporary American life, they are also sources of significant air pollution for nearby residents, and are associated with poor pulmonary health (Samuels and Freemark 2022; Brugge et al. 2007).
Use of categorical proximity operates on the imperfect assumption of influence or accessibility to a point distributed evenly around the points, and is therefore only useful for rough estimation of influence or accessibility. In the case of dispersions of pollutants a transport model is needed to more accurately model wind and flow patterns and how they transport the toxins. These models can be quite complex (Beyea and Hatch 1999).
Classification
- Add the Join Features tool to count the number of interstate segments within one kilometer of each census tract.
- Target Layer: Tracts
- Join Layer: Interstates
- Output Feature Class: Interstate_Tracts
- Keep All Target Features: Check
- Relationship: One to One
- Spatial Relationship: Planar near
- Near Distance: 1 kilometer
- Add the Calculate Field tool to create a categorical field that classifies each tract as near (1) or not near (0) an interstate. This classification creates a numeric value so it can be used as a dummy variable later in regression.
- Input Table: Interstate_Tracts
- Field Type: Long
- Field Name: Near_Interstate
- Expression: reclass(!Count!)
- Code Block:
def reclass(Count): if Count > 0: return 1 else: return 0
- Primary symbology: Unique Values
- Field: Interstate_Tracts
- Reset colors
Group Comparison Table
Add the Summarize Attributes tool to compare attributes between the two categories of tracts.
- Input Layer: Interstate_Tracts
- Output Name: Interstate_Summary
- Fields: Near_Interstate
- Summary Fields:
- Total_Population (Sum)
- Median_Household_Income (Mean)
- Asthma (mean)
- 40% of residents in St. Clair County (78,988 / (78,988 + 117,752)) live near an interstate, the mean median household income is 37% lower near interstates (53,976 / 85,593), and the mean asthma rate is 21% higher near interstates (12.3 / 10.2).
Categorical Proximity Regression
Health conditions often involve multiple factors and simple group comparison can easily lead your audience into the post hoc fallacy of assuming causation simply because of an observed bivariate relationship. More sophisiticated techniques like multiple regression are needed to help separate the different factors associated with complex health conditions.
When performing multiple regression using proximity, the choice of whether to use the proximity variable as an independent or dependent variable will depend on what you are trying to model.
- When analyzing influence, the influenced factor will generally be the dependent variable and proximity will be one of the independent variables.
- When analyzing accessibility, proximity will generally be the dependent variable and associated predictive factors will be the independent variables.
Asthma disproportionately affects low income children and parents with limited English skill are less likely to achieve optimal asthma management ( Banta et al. 2021). While the asthma rates in the PLACES data are adult rates, and there is no language variable in the provided ACS, we assume that adult rates are similar to childhood rates, and that percent foreign born is a suitable proxy for English proficiency.
In this model, proximity to the interstate is a dummy variable used as an independent variable to investigate any relationship to the dependent variable of asthma prevalence. A dummy variable is a model variable that has a value of either zero or one to indicate the presence or absence of a condition (Wikipedia 2025).
- Download and install the mmregression toolbox as shown in this tutorial and add the mmregression tool to the ModelBuilder diagram.
- Input Feature Class: Interstate_Tracts
- Output Feature Class: Categorical_Residuals
- Dependent_Variable: Asthma (modeled crude percent prevalence rate of asthma among adults aged >=18 years in 2021)
- Independent Variables:
- Near Interstate
- Median_Household_Income
- Percent_Foreign_Born
- The model predicts 78% of the variance in asthma rates (Adjusted R-squared = 0.775), the spatial lag has the best fit (lowest AIC = 63.5), and income makes the strongest contribution to the model.
- However, the spatial lag coefficients show that proximity to the interstate makes limited contribution to the model (near_interstate = 0.098 vs. income = -0.593), indicating that being near an interstate highway has limited predictive power of asthma rates after controlling for income given this data in St. Clair County, IL.
Weighted Proximity Analysis
Weighted proximity analysis involves measuring distance between each feature in one set of features to the nearest features in another set of features.
- When analyzing phenomenon with distance decay, these distances represent decreasing influence (such as with dilution of pollution further from industrial facilities) or decreasing accessibility (such as with increased walking distance to transit stops.
- Use of continuous weights rather than categorical near/not-near classification can be helpful for measuring phenomena where the boundaries of accessibility or influence are ambiguous.
Distance Calculation
In this example, we perform weighted proximity analysis to find the distance from each census tract to the nearest interstate highway.
Tract centroids are used to find distances that are more clearly reflective of accessibility for the majority of the area than proximity to the nearest polygon edge.
- Add the Feature to Point tool to create a feature class of area centroids.
- Input Features: Tracts
- Output Feature Class: Tract_Centroids
- Inside: Check to create centroids
- Add the Near tool to find the distance between each tract and the nearest interstate highway.
- Input Features: Tract_Centroids
- Near Features: Interstates
- Search Radius: 1000 Kilometers (use a large distance to avoid -1 values with tracts that are a long distance from the nearest stop)
- Method: Planar
- Field Names: Distance (Km_to_Interstate)
- Distance Unit: Kilometers
- Add to Display the Tract_Centroids and symbolize by graduated symbol to validate the distances.
Note that the Near tool calculates distance from the nearest adjacent vertex for lines and polygons. If you are working with long straight line features with a limited number of vertices, you may need to use the Generate Points Along Lines tool to convert the lines to a sequence of points that will more clearly reflect distance from a point or polygon to the nearest line.
Weighted Proximity Regression
As with categorical proximity analysis, we can use multiple regression to examine the association of asthma rates (the dependent variable) with proximity to an interstate highway while controlling for demographic factors (the independent variables).
- Download and install the mmregression toolbox as shown in this tutorial and add the mmregression tool to the ModelBuilder diagram.
- Input Feature Class: Tract_Centroids
- Output Feature Class: Weighted_Residuals
- Dependent_Variable: Asthma
- Independent Variables:
- Km_to_Interstate
- Median_Household_Income
- Percent_Foreign_Born
- The model predicts 80% of the variance in adult asthma rates (Adjusted R-squared = 0.796), the spatial lag has the best fit (lowest AIC = 65.1), and income makes the strongest contribution to the model.
- Distance to the interstate makes a slightly greater contribution to the model than the categorical variable above (coefficient 0.136 > 0.098), but the variable still makes significantly less of a contribution to the model than median household income.
Weighted Proximity with Points
Hub and spoke analysis is a proximity analysis technique that analyzes accessibility to a small number of centralized hubs where distances from peripheral points to the closest hub (weights) are calculated with imaginary spoke lines. Hub-and-spoke is useful for estimating or optimizing accessibility for hubs like distribution centers or transit stops.

In this example we use stops on the MetroLink light rail service that provides service into downtown St. Louis and Lambert Field International Airport.

Since riders can drive to MetroLink stops, distance from each tract can be used as a proxy for convenience, as opposed to categorical access to bus stops where usability is limited to short walking distances. While the use of simple Euclidean distance generally overestimates accessibility (Biba et al. 2010), hub and spoke analysis is useful for making rough estimates with limited available data.
- Add the Select tool to isolate only MetroLink stops points. In this data set, the light rail stops are distinguished from bus stops by the presence of METROLINK in the stop name.
- Input Features: Transit
- Output Feature Class: Rail
- Filter Expression: name contains METROLINK
- Add the Feature to Point tool to create a feature class of area centroids.
- Input Features: ACS_Tracts
- Output Feature Class: ACS_Centroids
- Inside: Check to create centroids
- Add the Near tool to find the distance between each tract centroid and the nearest rail station.
- Input Features: Tract_Centroids
- Near Features: Rail
- Search Radius: 1000 Kilometers (use a large distance to avoid -1 values with tracts that are a long distance from the nearest stop)
- Method: Planar
- Field Names: Distance (Km_to_Station)
- Distance Unit: Kilometers
- Add to Display the Tract_Centroids and symbolize by graduated symbol to validate the distances.
- Download and install the mmregression toolbox as shown in this tutorial and add the mmregression tool to the ModelBuilder diagram.
- Input Feature Class: Tract_Centroids
- Output Feature Class: Rail_Residuals
- Dependent_Variable: Km_to_Station
- Independent Variables:
- Median_Household_Income
- Median_Age
- Percent_No_Vehicle
- Pop_per_Square_Mile (population density)
- The model predicts 45% of the variance in distance to rail station (Adjusted R-squared = 0.447), the spatial lag has the best fit (lowest AIC = 101.8), and percent of residents without a vehicle is the variable that makes the strongest contribution to the model.
- Notably, income and age make a minimal contribution to the model, indicating that the siting of stations does not exhibit clear bias associated with income or resident age.
Proportional Proximity Analysis
When working with large areas of aggregated data that exceed the width of influence or accessibility zones around locations, Join Features can capture large areas where much of the areas population can be outside the zone of significant influence.
Proportional proximity analysis aggregates values with weighting based on the proportion of the area covered by the zone of influence. While this technique assumes even spatial distribution of the population within the areas that is rare in practice, this technique mitigates the effect of minimal coverage of large areas.
- Add the Buffer tool to create the buffers.
- Input Features: Interstates
- Output Feature Class: Interstate_Buffer
- Distance: 1 kilometer
- Dissolve: Yes
- Add the Summarize Within tool to estimate population and perform a weighted mean of asthma rates and household income.
- Input Polygons: Interstate_Buffer
- Input Summary Features: Tracts
- Output Feature Class: Interstate_Summary
- Check Keep All Input Polygons so that tracts with no refineries are retained for mapping.
- Summary Values
- Total_Population (sum)
- Asthma (mean)
- View the attribute table to see the summarized value for comparison to overall values.
- Note that proportional summing and averaging results in significantly different values from categorical analysis above for both total population (33,212 proportional vs. 53,976 categorical) and asthma rates (6.6% proportional vs. 12.3 categorical).
Error 100014 Summarize Within failed may be thrown if you have long file names (such as Median_Household_Income) and you will need to remove those variables from your analysis.
Aggregated Proximity Analysis
Categorical and weighted proximity analysis will be ineffective when evaluating accessability where large areas of aggregated data can contain multiple destination points.
Aggregated proximity analysis involves aggregating and normalizing the counts of points within areas for proximity comparson across those areas.This example analyzes accessibility to bus stops within census tracts. While this purely spatial technique does not consider service characteristics (frequency, time of day, vehicle size) or whether the system can take riders to the most needed destinations in a timely manner (circuity), with a small bus system this technique provides a rough guide to where transit is most and least available.
- Add the Join Features tool to count the number of transit stops within each census tract.
- Target Layer: The census tracts (ACS_Tracts)
- Join Layer: The transit stops (Transit)
- Output Name: The feature class that will contain the count of transit stops (Transit_Tracts)
- Join Operation: One to One
- Keep All Target Features: Check
- Spatial Relationship: Planar Near
- Spatial Near Distance: 0.8 kilometers (standard assumption that people will walk up to 1/2 mile to a transit stop)
- Add the Calculate Field tool to normalize counts by the population. the value is multipled by 1000 (per 1k) to make numbers more readable.
- Input Table: Transit_Tracts
- Field Type: Double
- Field Name: Stops_per_1k
- Expression: 1000 * !COUNT! / !Total_Population!
Aggregated Proximity Regression
We can use multiple regression to examine demographic factors associated with transit accessibility. Because we are evaluating proximity as accessibility, the proximity variable will be the dependent variable. The residuals can be used to identify underserved areas given the existing demographic profile of well served areas.
- Download and install the mmregression toolbox as shown in this tutorial and add the mmregression tool to the ModelBuilder diagram.
- Input Feature Class: Transit_Tracts
- Output Feature Class: Transit_Residuals
- Dependent_Variable: Stops_per_1k
- Independent Variables:
- Median_Household_Income
- Median_Age
- Percent_No_Vehicle
- Pop_per_Square_Mile (population density)
- The model predicts 56% of the variance in number of stops per 1k residents (Adjusted R-squared = 0.559), the spatial lag has the best fit (lowest AIC = 92.1), and percent of residents with no vehicle makes the strongest contribution to the model.
- Median household income also makes a fairly strong contribution to the model, consistent with the role of public transit in the United States as a service provided by the state for residents who cannot afford automobility (Moulding 2005).
Centrographics
Centrography is "statistical analyses concerned with centers of population, median centers, median points, and related methods" (Sviatlovsky and Eells 1939). Much like general statistical measures of central tendency, centrographic measures can be useful for summarizing large amounts of data, assessing change over time, and optimizing proximity.
Mean Center
The mean center is the point at the mean of all latitudes and mean of all longitudes for a group of features.
When working with points of interest, the mean center represents an optimal location with the minimum amount of Euclidean distance to each point of interest.
The Mean Center tool creates a new feature class with one point at the mean center.
As might be expected, the mean center of transit stops for the Metro Transit service in the St. Louis metropolitan area is in downtown St. Louis.
Although mean centers can be used in planning to minimize travel distance from peripheral locations (such as for distribution hubs or professional conferences), mean centers may not represent perfectly optimized travel distance since transportation networks or physical obstacles may impose indirect routing and access.
Standard Deviational Ellipse
A standard deviational ellipse visually summarizes the center, dispersion, and directional trend of a set of features. Standard deviational ellipses can be used to visually compare spatial distributions of differing sets of points.
For this example we compare the standard deviational ellipses for the St. Louis Metro bus vs. light rail service.
- In this data set, light rail stations are distinguished by the word METROLINK in the station title. We add the Calculate Field to add a Mode attribute to the Transit stop point feature class with the values Bus or Rail.
- Add the Directional Distribution (Standard Deviational Ellipse) tool.
- Input Feature Class: Transit
- Output Ellipse Feature Class: Transit_Ellipse
- Case Field: Mode
- The much smaller ellipse for rail service is consistent with the more limited number of stops on that small system as compared to the much more sprawling bus service covering much of the metropolitan area.