Proximity Analysis in ArcGIS Pro

Rev. 19 April 2025

Proximity analysis is the extraction of information from geospatial data based on distance between features. Proximity analysis is especially useful for evaluating two characteristics:

Proximity analysis assumes distance decay, where the power of the relationship between locations decreases as distance between locations increases. Distance decay is a fundamental assumption behind gravity models.

Multiple methods involving proximity are covered in this tutorial.

Example Data

This tutorial will use data for St. Clair County, Illinois, which is a part of the St. Louis metropolitan area on the Illinois side of the Mississippi River.

ACS Tracts

The American Community Survey (ACS), an ongoing survey by the US Census Bureau that provides information about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.

Census tracts are subdivisions of counties that are drawn by the US Census Bureau based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019). The census tract is the smallest reliable area of aggregation in data made public by the US Census Bureau.

This example uses precompiled data from the Minn 2019-2023 ACS feature service in the University of Illinois ArcGIS Online organization.

Exporting census tract data into the project geodatabase

PLACES Tracts

The CDC's PLACES project uses small area estimation methods to obtain 36 modeled chronic disease measures for counties, tracts, places, and ZCTAs in the United States.

Exporting PLACES census tract estimates into the project geodatabase

Interstate Highways

The Topologically Integrated Geographic Encoding and Referencing (TIGER) database is a collection of geospatial polygons maintained by the US Census Bureau.

TIGER/Line data is available as shapefiles, and includes geographic features useful for mapping like roads, railroads, and water features.

Exporting interstate highways into the project geodatabase

Transit Stops

The National Transportation Atlas Database (NTAD) is an ArcGIS Hub portal of transportation data from the US Department of Transportation.

Exporting transit stopsinto the project geodatabase

A backup snapshot of the national transit stops feature class from 8 April 2025 is available here as a zipped shapefile.

ModelBuilder

While you can run the analysis tools manually, it can be helpful to use a ModelBuilder diagram that will save you from repeated typing as you debug your workflow.

Use of ModelBuilder also preserves your workflow so that you can update and reproduce your analysis in the future.

Note that you will probably wish to run the Export Features steps above manually rather than in your ModelBuilder diagram because packaging models with Export Features will copy duplicate data from the source feature service or shapefile into the project package and can dramatically slow the packaging process.

Creating a blank ModelBuilder diagram

Categorical Proximity Analysis

Categorical proximity analysis divides areas into two classes (adjacent and non-adjacent) based on proximity to specific locations.

This analysis examines the differences in asthma rates between census tracts adjacent to interstate highways. While being essential transportation arteries in contemporary American life, they are also sources of significant air pollution for nearby residents, and are associated with poor pulmonary health (Samuels and Freemark 2022; Brugge et al. 2007).

Use of categorical proximity operates on the imperfect assumption of influence or accessibility to a point distributed evenly around the points, and is therefore only useful for rough estimation of influence or accessibility. In the case of dispersions of pollutants a transport model is needed to more accurately model wind and flow patterns and how they transport the toxins. These models can be quite complex (Beyea and Hatch 1999).

Classification

Categorizing tracts by proximity to an interstate highway

Group Comparison Table

Add the Summarize Attributes tool to compare attributes between the two categories of tracts.

Creating a comparison table

Categorical Proximity Regression

Health conditions often involve multiple factors and simple group comparison can easily lead your audience into the post hoc fallacy of assuming causation simply because of an observed bivariate relationship. More sophisiticated techniques like multiple regression are needed to help separate the different factors associated with complex health conditions.

When performing multiple regression using proximity, the choice of whether to use the proximity variable as an independent or dependent variable will depend on what you are trying to model.

Asthma disproportionately affects low income children and parents with limited English skill are less likely to achieve optimal asthma management ( Banta et al. 2021). While the asthma rates in the PLACES data are adult rates, and there is no language variable in the provided ACS, we assume that adult rates are similar to childhood rates, and that percent foreign born is a suitable proxy for English proficiency.

In this model, proximity to the interstate is a dummy variable used as an independent variable to investigate any relationship to the dependent variable of asthma prevalence. A dummy variable is a model variable that has a value of either zero or one to indicate the presence or absence of a condition (Wikipedia 2025).

Modeling asthma rates with proximity and demographic variables

Weighted Proximity Analysis

Weighted proximity analysis involves measuring distance between each feature in one set of features to the nearest features in another set of features.

Distance Calculation

In this example, we perform weighted proximity analysis to find the distance from each census tract to the nearest interstate highway.

Tract centroids are used to find distances that are more clearly reflective of accessibility for the majority of the area than proximity to the nearest polygon edge.

Weighted proximity

Note that the Near tool calculates distance from the nearest adjacent vertex for lines and polygons. If you are working with long straight line features with a limited number of vertices, you may need to use the Generate Points Along Lines tool to convert the lines to a sequence of points that will more clearly reflect distance from a point or polygon to the nearest line.

Weighted Proximity Regression

As with categorical proximity analysis, we can use multiple regression to examine the association of asthma rates (the dependent variable) with proximity to an interstate highway while controlling for demographic factors (the independent variables).

Multiple regression with weighted proximity

Weighted Proximity with Points

Hub and spoke analysis is a proximity analysis technique that analyzes accessibility to a small number of centralized hubs where distances from peripheral points to the closest hub (weights) are calculated with imaginary spoke lines. Hub-and-spoke is useful for estimating or optimizing accessibility for hubs like distribution centers or transit stops.

Figure
Hub-and-spoke analysis

In this example we use stops on the MetroLink light rail service that provides service into downtown St. Louis and Lambert Field International Airport.

St. Louis MetroLink (Minn 2014)

Since riders can drive to MetroLink stops, distance from each tract can be used as a proxy for convenience, as opposed to categorical access to bus stops where usability is limited to short walking distances. While the use of simple Euclidean distance generally overestimates accessibility (Biba et al. 2010), hub and spoke analysis is useful for making rough estimates with limited available data.

Weighted proximity

Proportional Proximity Analysis

When working with large areas of aggregated data that exceed the width of influence or accessibility zones around locations, Join Features can capture large areas where much of the areas population can be outside the zone of significant influence.

Proportional proximity analysis aggregates values with weighting based on the proportion of the area covered by the zone of influence. While this technique assumes even spatial distribution of the population within the areas that is rare in practice, this technique mitigates the effect of minimal coverage of large areas.

Finding proportional population and asthma rates in tracts within 1 km of interstate highways

Error 100014 Summarize Within failed may be thrown if you have long file names (such as Median_Household_Income) and you will need to remove those variables from your analysis.

Aggregated Proximity Analysis

Categorical and weighted proximity analysis will be ineffective when evaluating accessability where large areas of aggregated data can contain multiple destination points.

Aggregated proximity analysis involves aggregating and normalizing the counts of points within areas for proximity comparson across those areas.

This example analyzes accessibility to bus stops within census tracts. While this purely spatial technique does not consider service characteristics (frequency, time of day, vehicle size) or whether the system can take riders to the most needed destinations in a timely manner (circuity), with a small bus system this technique provides a rough guide to where transit is most and least available.

Finding the number of residents for each transit stop as a metric for level of service

Aggregated Proximity Regression

We can use multiple regression to examine demographic factors associated with transit accessibility. Because we are evaluating proximity as accessibility, the proximity variable will be the dependent variable. The residuals can be used to identify underserved areas given the existing demographic profile of well served areas.

Multiple regression with aggregated proximity

Centrographics

Centrography is "statistical analyses concerned with centers of population, median centers, median points, and related methods" (Sviatlovsky and Eells 1939). Much like general statistical measures of central tendency, centrographic measures can be useful for summarizing large amounts of data, assessing change over time, and optimizing proximity.

Mean Center

The mean center is the point at the mean of all latitudes and mean of all longitudes for a group of features.

When working with points of interest, the mean center represents an optimal location with the minimum amount of Euclidean distance to each point of interest.

The Mean Center tool creates a new feature class with one point at the mean center.

As might be expected, the mean center of transit stops for the Metro Transit service in the St. Louis metropolitan area is in downtown St. Louis.

Although mean centers can be used in planning to minimize travel distance from peripheral locations (such as for distribution hubs or professional conferences), mean centers may not represent perfectly optimized travel distance since transportation networks or physical obstacles may impose indirect routing and access.

Mean center

Standard Deviational Ellipse

A standard deviational ellipse visually summarizes the center, dispersion, and directional trend of a set of features. Standard deviational ellipses can be used to visually compare spatial distributions of differing sets of points.

For this example we compare the standard deviational ellipses for the St. Louis Metro bus vs. light rail service.

Standard deviational ellipse