Creating Subsets of Data in ArcGIS Pro

You will often run into occasions when you have a large collection of features, but only want to map or perform analysis on a subset of those features, such as for a particular area. ArcGIS Pro has a variety of different (and redundant) methods for subsetting data, which can add confusion when deciding how to subset data for a particular application.

This tutorial will cover some means of selecting subsets of data in ArcGIS Pro.

Example Data

The example data for this tutorial will be a three feature classes created in a project database using data from the US Census Bureau (USCB).

A feature class is a collection of features each having the same spatial representation (point, line, or polygon) and the same set of attributes (ESRI 2023). The three sets of USCB features will be loaded into three separate feature classes.

A geodatabase is a collection of feature classes, rasters, and/or tables held in a common file system folder (file geodatabase) or relational database management system (ESRI 2023).

A project geodatabase is a file geodatabase specific to each ArcGIS Pro project that is included in a project directory and is used as the default geodatabase for new feature classes created by tools used in the project.

Census Tracts

The US Census Bureau (USCB) is the US federal government agency responsible for collecting data about people and the economy in the United States. The Census Bureau has its roots in Article I, section 2 of the US Constitution, which mandates an enumeration of the entire US population every ten years (the decennial census) in order to set the number of members from each state in the House of Representatives and Electoral College (USCB 2017). The Census Act of 1840 established a central office for conducting the decennial census, and that office became the Census Bureau under the Department of Commerce and Labor in 1903 (USCB 2021).

Demographic data is "the statistical characteristics of human populations (such as age or income)." Etymologically, the word is a combination of the Greek words dêmos (people) and graphein (write) - literally, writing about people (Merriam-Webster 2020).

Census tracts are organizational boundaries used for USCB data collection that are drawn to roughly align with neighborhood borders. Ideally, each tract contains 4,000 residents, although the number of residents can vary depending on area (USCB 2019).

The Minn 2015-2019 ACS Tracts feature layer in the University of Illinois ArcGIS Online organization provides basic demographic data for census tracts and counties in the US from the 2015-2019 American Community Survey five-year estimates.

To avoid issues with access speed and availability, we use the Export Features tool to read the data from the feature services into new feature classes in the project geodatabase (US_Tracts)

Importing census tracts into the project geodatabase

Roads

County-level road data is sourced from the USCB TIGER/Line Shapefiles All Roads download (County_Roads).

Importing county roads data into the project geodatabase

Places

USCB places data from the USCB TIGER/Line Shapefiles contains boundaries of municipalities and unincorporated areas (State_Places).

Importing places into the project geodatabase

Filters vs. Definition Queries

There are two broad approaches to subsetting based on attributes: filters and definition queries.

A definition query isolates a subset of displayed features on a map layer, but leaves the source feature class intact.

Definition queries are set on layer properties in maps and are most appropriate when you just need a simple subset for mapping or basic analysis.

This example shows how to use a definition query to subset Mount Pulaski, IL from the State_Places feature class using the NAME field.

Subsetting using a definition query

A filter isolates data as it is being moved or processed into a new feature class by a tool.

This example shows how to use a definition query to subset Mount Pulaski, IL from the State Places feature class using the Select tool with an expression on the NAME field that creates a new feature class in the project geodatabase (City_Boundary).

Subsetting using a filter

ModelBuilder

ModelBuilder is a visual programming language in ArcGIS Pro that allows you use a graphical editor to create custom tools that allow you to automate complex, tedious, or repetitive tasks where there are consistent step-by-step sequences of operations (workflows).

ModelBuilder is useful when working with tool filters because you can easily diagnose filter problems and re-run tools without having to completely re-enter the tool and filter parameters.

This example demonstrates creating a new ModelBuilder diagram and adding an Export Features tool using the same filter from the interactive example above.

Creating a ModelBuilder diagram

Behind the scenes, ModelBuilder creates Python code that uses the ArcPy API. While most users never need to see this code, you can export and examine it if needed for diagnostics or to submit for an assignment.

You can view and copy the code under ModelBuilder, Send To Python Window.

Exporting ModelBuilder Python

Filter Expressions

Filter expressions are used to define the criteria for subsetting features. Expressions are used in both tool filters and definition queries.

Aside from simple exact value matches like the NAME example above, expressions can be created to perform a variety of different comparisons.

Quantitative Ranges

You can create subsets based on ranges of quantitative variables.

For this example, we Export Features for census tracts with a median household income below the 2015-2019 Illinois median $65,886 (Low_Income_Tracts).

Subset by quantitative variable

Multiple Conditions Combined with AND

To filter by multiple conditions, you can use the AND operator to subset only features that meet two or more different conditions For this example, we Export Features for census tracts in Illinois (ST = IL) and median household income less than $65,886 (Low_Income_Tracts).

Subset with multiple conditions connected by AND

Partial Text Match

You can create subsets based on partial matches of names. This can be useful when names may contain additional text, such as directions (east, west) or unpredictable suffixes like "Road" vs. "Rd."

For this example, we use an expression to Select places across the US that Contains the word "Mount" in their name (Mount_Places).

Subset by partial text match

Begins With

Begins with conditions match text that begins with a search string.

One application of begins with is subsetting census tracts in specific counties based on the GEO_ID field included in USCB data.

Subset by begins with

Multiple Conditions Combined with OR

You can add additional clauses connected with the OR operator to match multiple criteria.

For this example, we subset interstates and major roads that have a RTTYP (route type code) of I (interstate), U (US highway), and S (state highway) to create a new feature class of highways (County_Highways).

Subset by combining multiple criteria with OR

Subsetting Based on Proximity

The distinquishing characteristic of geospatial data is the where component, and ArcGIS Pro can be used to subset data based on location relative to features in other layers.

Select by Location

The Select Layer by Location tool selects features based on their proximity to features in another layer. Because this tool only selects features, you also need to run the Export Features tool to copy those features to a new feature class.

For this example, we create a feature class of census tracts (Interstate_Tracts) within one mile of interstates, which could be useful for analyzing the health effects of exposure to high levels of auto and truck pollution.

Subsetting based on proximity

Clipping

Clipping subsets features contained within (or outside) boundar(ies) defined by polygon(s) in another layer.

In this example, we use the Clip tool to subset roads (County_Roads) within Mount Pulaski, IL (City_Boundary) into a new feature class (City_Roads).

Subsetting based on proximity

Subsets by Drawing Selections

In some cases, the subsetting criteria may be primarily visual or too amorphous to encode in a formal filter or location expression.

In such cases, you can subset data by manually drawing selections and then using the Export Features tool to export those selected features into a new feature class (Chicago_Metro_Tracts).

This technique is inherently arbitrary and difficult to reproduce or automate, and should be used only when there is absolutely no reasonable way to create a filter expression.

For this example, we display selected tracts around Springfield, IL.

Manual selection of features exported into a new feature class

SQL Expressions

Enterprise GIS commonly stores geospatial data in the same types of enterprise relational databases that are ubiquitous in information technology.

Structured query language (SQL) is a language used to interact with relational databases. While ArcGIS Pro generally hides SQL behind the user interface and eliminates the need for most users to know SQL. However, developers or other users who work on the information technology side of GIS need to have some basic familiarity with SQL.

Filter and query expressions can be specified in ArcGIS Pro using the same syntax as SQL WHERE clauses. SQL can sometimes be easier to work with than dialog combo boxes when expressions incorporate multiple comparisons.

For example, this expression subsets features from the Places feature class that have a NAME of Mount Pulaski.

NAME = 'Mount Pulaski'

Note that text values used for comparison must be enclosed in single quotation marks, but numeric values can be used as they are.

ALAND >= 100000000

SQL can be used in both definition queries and tool filters by selecting the SQL switch in the dialog.

A definition query with SQL

Comparison Operators

Comparison operators are used to specify selection criteria.

Logical Operators

Logical operators are used to combine comparisons.

Examples

These examples are based on the dialog subsets demonstrated above.

Census tracts with a median household income below the 2015-2019 Illinois median $65,886

Median_Household_Income <= 65886

Census tracts in Illinois and median household income less than $65,886

(ST = 'IL') AND (Median_Household_Income <= 65886)

Counties across the US that contain the word "Pulaski" in their name

NAME LIKE '%Pulaski%'

Census tracts in Logan County, IL (FIPS code 17107)

FactFinder_GEO_ID LIKE '1400000US17107%'

Interstates and major roads that have a RTTYP (route type code) of I (interstate), U (US highway)

(RTTYP = 'I') OR (RTTYP = 'U')