Geospatial Data from the US Census Bureau
The US Census Bureau (USCB) is the US federal government agency responsible for collecting data about people and the economy in the United States. The Census Bureau has its roots in Article I, section 2 of the US Constitution, which mandates an enumeration of the entire US population every ten years (the decennial census) in order to set the number of members from each state in the House of Representatives and Electoral College (USCB 2017). The Census Act of 1840 established a central office for conducting the decennial census, and that office became the Census Bureau under the Department of Commerce and Labor in 1903 (USCB 2021).
This tutorial covers basic techniques for acquiring US Census Bureau data for use in a ArcGIS Pro, the ArcGIS Online Map Viewer, Python, and R.
- The American Community Survey
- TIGER Shapefiles
- Using Precompiled USCB Data
- Using Living Atlas USCB Data
- Using USCB Table Data
- Using USCB API Data
The American Community Survey
Among the Census Bureau's many programs is the American Community Survey (ACS), an ongoing survey that provides information on an annual basis about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.
Unlike the constitutionally-mandated decennial census which is only taken every ten years, the ACS continuously surveys people in America's communities so that the ACS data can be more detailed and current than the decennial census. However, because the ACS is a survey rather than a complete count like the decennial census, there is uncertainty about how accurately the sampling represents the facts on the ground, and that uncertainty is expressed in a statistical margin of error (MOE) on most ACS values (US Census Bureau 2018).
Spatial Aggregation
In order to preserve the confidentiality of respondents (and the associated willingness of people to respond to highly-personal questions), the US Census Bureau generally only releases data that has been aggregated (combined) into areas at various geographic scales:
- National data represents the United States as a whole.
- States are the fifty governmental jurisdictions into which the United States is divided. Non-state territories like Guam and Puerto Rico are also generally included in state-level data.
- Counties are the largest territorial divisions for local governments within the fifty states of the United States.
- Core-Based Statistical Areas (CBSA) represent metropolitan areas based around central (core) cities that are connected by social and economic ties. CBSAs can cross county and state lines.
- Places are incorporated cities and unincorporated communities. Each place is contained within one state, although place boundaries can cross county lines.
- ZIP Codes Tabulation Areas (ZCTA) are simplifications of the ZIP code delivery service areas used by the US Post Office. ZCTAs often cross social boundaries (like neighborhoods), so they are often problematic for social research.
- Census tracts are organizational boundaries used for USCB data collection that are drawn to roughly align with neighborhood borders. Ideally, each tract contains 4,000 residents, although the number of residents can vary depending on area (USCB 2019).
- Block groups are subdivisions of some census tracts. Block group data is unreliable for social analysis due to random noise added for disclosure avoidance.
Temporal Aggregation
Although ACS data is captured through surveys that are administered on an ongoing basis, it is aggregated into time-periods to improve geographic coverage and reduce margin of error.
ACS data is released annually in aggregation by two different time-periods.
One-Year Interval | Five-Year Interval |
---|---|
Useful when you need the most current data about an characteristic that changes frequently | Useful when you need the most accurate data about a characteristic that stays fairly stable over time |
Useful for areas that are changing rapidly | Useful for areas that are well-established |
Often has gaps in sparsely-populated rural areas | Data is more complete |
Based on fewer surveys, so it has wider margins of error | Based on more surveys, so it has lower margins of error |
FIPS Codes and GEOIDs
Geographies in USCB data are uniquely identified with FIPS (Federal Information Processing Standards) codes. FIPS codes for different geographies can be found with Google.
FIPS codes build left to right from the more general to the more specific.
- State codes are two digits. For example, the FIPS code for Illinois is 17.
- County codes add three digits to the right of the state code. For example, the FIPS code for Cook County, IL is 17031.
- Census tract codes add an additional six digits to the county code. For example, the FIPS code for the census tract containing the Willis Tower (Sears Tower) in downtown Chicago, IL is 17031839100.
FIPS codes are commonly used as GEOID values in US Census Bureau data.
US Census Bureau data sets usually also include a fully-qualified GEOID in a column named GEO_ID (in data tables), GEOIDFQ (in TIGER shapefiles), or AFFGEOID (in older data). Fully-qualified GEOIDs clearly express what type of area is being represented and also avoid the issue with leading zeroes being removed from FIPS codes when software represents the GEOIDs as numbers in spreadsheets or feature classes.
- The first three digits specify the type of area: 040 for states, 050 for counties, 140 for census tracts, and 310 for core-based statistical areas (metropolitan areas).
- The next four digits represent the geographic variant and component, which are usually 0000 for states, counties, and tracts.
- The letters US
- The FIPS code
For example:
- 0400000US17 is the GEOID for Illinois.
- 0500000US17031 is the GEOID for Cook County, Illinois.
- 1400000US17031839100 is the GEOID for the census tract containing the Willis Tower.
- 310M500US16980 is the GEOID for the Chicago-Naperville-Elgin, IL-IN-WI CBSA (Chicago metropolitan area).
Community Profile Pages
If you are looking for quick information on a specific state, county, city or community, the USCB provides profile pages in data.census.gov that include basic demographic information about population, income, education, etc.
You can access a profile page by typing the name of the area of interest into the search bar and waiting for it to autocomplete. If there is a profile page, a link to that page will appear for you to select.
TIGER Shapefiles
The Topologically Integrated Geographic Encoding and Referencing (TIGER) database is a collection of geospatial data maintained by the US Census Bureau.
- TIGER/Line cartographic boundary files are simplified representations of selected geographic areas from the TIGER database that are specifically designed for small scale thematic mapping.
- The core TIGER/Line files are available, and include other geographic features useful for mapping like roads, railroads, and water features.
Shapefiles utilize a file format developed by ESRI in the 1990s that is actually a collection of files that each contain separate information, such as the coordinates, attributes, projection, and metadata.
- Because all these files need to be kept together, shapefiles are commonly distributed in .zip archives that collect and compress the separate files into a single, compact file with .zip at the end of the name.
- Despite the limitations of the shapefile format (most notably constraining field names to ten characters), the format is supported by a wide variety of software, and is still commonly used to store and distribute geospatial data.
ArcGIS Pro
This video demonstrates how to download, unzip, and import a shapefile into an ArcGIS Pro project geodatabase. This example uses county polygon boundaries compatable with the ACS table data described below.
- Go to the USCB's TIGER Cartographic Boundary Files page and download the appropriate type of geography. The lowest or medium resolution files are fine unless you are doing high accuracy mapping.
- Using Windows Explorer, open the file, copy the contents, and paste them into the Downloads directory.
- Bring the data into the project geodatabase with the Export Features tool (County_Economics).
A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).
A feature class is a geographic dataset within a geodatabase that contain features of the same geometric type (points, lines, polygons) and a common set of attributes (ESRI 2024).
Each individual geographic entity in a feature class is called a feature. For example, in a feature class for roads in a county, each road segment would be a feature (ESRI 2020).
A project geodatabase is the default geodatabase used for storing feature classes that are imported or created as part of an ArcGIS Pro project.
- The project geodatabase is a file geodatabase kept in the project folder (Documents\ArcGIS\Projects\project_name) under the name project_name.gdb
- The project geodatabase does not store data brought in to maps with Add Data from feature services or from geospatial data files (like shapefiles) unless you explicitly copy that data into the project file geodatabase using a tool like Export Features or Copy Features.
- The contents of the project database can be viewed in the Catalog Pane.
- The system files that make up the project geodatabase can be viewed with the Windows File Explorer. These files have cryptic names assigned by the software and should not be changed outside of the software.
ArcGIS Online
Zipped shapefiles can be read directly into the ArcGIS Online Map Viewer to create new feature classes for mapping and analysis.
Python
GeoPandas is a Python package for working with geospatial data.
Matplotlib is a Python package for plotting graphs.
- Find the URL to the appropriate TIGER cartographic boundary zipped shapefile from the link on the page shown above.
- Read the cartographic boundary file directly from the USCB website into a GeoDataFrame object using the read_file() function.
- The to_crs() method is used to reproject the data to the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- The GeoDataFrame plot() method plots the polygons.
- The pyplot set_axis_off() method turns off the axis scale that is unnecessary with a projected map.
- The pyplot show() function displays the plotted map.
import geopandas import matplotlib.pyplot as plt military = geopandas.read_file("https://www2.census.gov/geo/tiger/TIGER2023/MIL/tl_2023_us_mil.zip") states = geopandas.read_file("https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_us_state_5m.zip") states = states.to_crs("ESRI:102009") military = military.to_crs("ESRI:102009") axis = states.plot(facecolor='none', edgecolor='gray') military.plot(facecolor='red', ax=axis) axis.set_axis_off() plt.show()
R
Functions from the sf (simple features) library are used to work with vector geospatial data.
- Load the shapefile polygons into a simple features data.frame using st_read().
- Although st_read() can read from URLs, it cannot process zipped shapefiles, so we need to use download.file() to download the files first.
- Adding /vsizip/ to the start of a file name will filter the zipped shapefiles through a virtual file system that exposes the shapefile contents to st_read().
- Reproject the polygons into a cartographically appropriate projection with st_transform(). For this data we use the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- To map only the continental 48 states, we exclude Alaska, Hawaii, and territories using their FIPS codes.
library(sf) download.file('https://www2.census.gov/geo/tiger/TIGER2023/MIL/tl_2023_us_mil.zip', 'temp.zip') military = st_read('/vsizip/temp.zip') download.file('https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_us_state_5m.zip', 'temp.zip') states = st_read('/vsizip/temp.zip') military = st_transform(military, "ESRI:102009") states = st_transform(states, "ESRI:102009") states = states[!(states$STATEFP %in% c('02', '15', '60', '66','69', '72', '78')),] plot(states$geometry, col=NA, border='gray') plot(military$geometry, col='red', add=T)
Using Precompiled USCB Data
Although data.census.gov is the definitive source for US Census Bureau data, the amount of available data is vast, and that data is made available in formats that requires additional processing to use in GIS. Accordingly, subsets of that data are sometimes made available within organizations in pre-processed forms to facilitate easier use.
The following precompiled layers are available on this website as GeoJSON and as feature services from the University of Illinois ArcGIS Online organization:
- 2015-2019 ACS states
- 2015-2019 ACS core-based statistical areas (metropolitan areas)
- 2015-2019 ACS counties
- 2015-2019 ACS census tracts
ArcGIS Pro
The video below demonstrates how to download data from the Minn 2015-2019 ACS Tracts feature service available from the University of Illinois ArcGIS Online organization.
To avoid the speed, reliability, and feature count limitations of a large feature service like this, it may be advisable to use the Export Features tool to copy the data from the feature service into the project geodatabase.
Filter by State
If you need a subset of the features, you can set a Filter to the Export Features tool.
This precompiled data has a ST field with the USPS state abbreviation that can be used to subset tracts in individual states.
Definition Query
Optionally, if you have already imported the full data set, you can use a If you need a subset of the features, you can use a definition query to limit display and analysis to a specific state.
Filter Tracts by County GEO_ID
Filtering by county requires a definition query based on the GEO_ID, which is described above..
The GEO_ID for tracts begins with 1400000US, followed by the five digit county FIPS code, followed by the tract ID.
For this example, the FIPS code for Cook County, IL is 17031, so tract GEO_IDs in Cook County begin with 1400000US17031.
MMUSCB
Customizable GeoJSON files of ACS data at the county or state tract level can be downloaded using MMUSCB and imported using the JSON to Fetures tool.
ArcGIS Online
The video below demonstrates how to add the Minn 2015-2019 ACS Counties feature service available from the University of Illinois ArcGIS Online organization.
- Search for the layer in ArcGIS Online.
- If needed, resymbolize the layer and select the variable you wish to display. For this example we use median monthly rent.
- If needed, change the color ramp and/or adjust the scale to accentuate the differences.
- Change the blending mode to Multiply so you can see the base map as geographic context for your data.
- Save the map under a meaningful name (Minn County Income).
- Share the map.
- Copy the URL to get a link.
Python
To load precompiled ACS data into a GeoDataFrame in Python:
- GeoPandas is a Python package for working with geospatial data.
- Matplotlib is a Python package for plotting graphs.
- Read the geospatial data from the file into a GeoDataFrame object using the read_file() function.
- The to_crs() method is used to reproject the data to the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- The GeoDataFrame plot() method can be used to with the name of the attribute to visualize the variable as a choropleth map.
- The pyplot set_axis_off() method turns off the axis scale around the map, which is unnecessary with a projected map.
The pyplot show() function displays the plotted map.
import geopandas import matplotlib.pyplot as plt counties = geopandas.read_file("https://michaelminn.net/tutorials/data/2015-2019-acs-counties.geojson") counties = counties.to_crs("ESRI:102009") axis = counties.plot("Median Household Income", cmap = "coolwarm", legend=True, scheme="quantiles") axis.set_axis_off() plt.show()
We can filter the counties to show only the continental US to more effectively use the mapped area. We can also overlay a map of state outlines over the counties for geographic context.
counties = counties[~counties["ST"].isin(['AK', 'HI', 'PR'])] states = geopandas.read_file("https://michaelminn.net/tutorials/data/2015-2019-acs-states.geojson") states = states[~states["ST"].isin(['AK', 'HI', 'PR'])] states = states.to_crs(counties.crs) axis = counties.plot("Median Household Income", scheme="naturalbreaks", cmap="coolwarm_r", edgecolor="none", legend=True, legend_kwds={"bbox_to_anchor":(0.2, 0.4)}) states.plot(facecolor="none", edgecolor="#808080", ax=axis) axis.set_axis_off() plt.show()
R
Functions from the sf (simple features) library are used to work with vector geospatial data.
- Load the shapefile polygons into a simple features data.frame using st_read().
- Reproject the polygons into a cartographically appropriate projection with st_transform(). For this data we use the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- Create an appropriate diverging colorRampPalette.
- plot() a choropleth colored by the desired variable.
library(sf) counties = st_read("https://michaelminn.net/tutorials/data/2015-2019-acs-counties.geojson") counties = st_transform(counties, "ESRI:102009") redblue = colorRampPalette(c("red", "lightgray", "navy")) plot(counties["Median.Household.Income"], breaks="quantile", pal=redblue, lwd=0.1)
We can filter the counties to show only the continental US to more effectively use the mapped area. We can also overlay a map of state outlines over the counties for geographic context.
states = st_read("https://michaelminn.net/tutorials/data/2015-2019-acs-states.geojson") states = states[!(states$ST %in% c('AK', 'HI', 'PR')),] states = st_transform(states, st_crs(counties)) counties = counties[!(counties$ST %in% c('AK', 'HI', 'PR')),] plot(counties["Median.Household.Income"], breaks="quantile", pal=redblue, border=NA, reset=F) plot(states$geometry, col=NA, border="#404040", add=T)
ESRI's Living Atlas
If you are using ArcGIS Pro or ArcGIS Online and you are not too particular about the symbology of your map ESRI's Living Atlas of the World contains a variety of layers of demographic data. Some of this data is from the American Community Survey, although ESRI also makes data available that they collect from other sources.
ArcGIS Pro
The video below shows how to add a Living Atlas layer of median age to a map in ArcGIS Pro. Note that this layer is scale-dependent and changes the types of areas being displayed (states, counties, census tracts) depending on how closely you are zoomed in to the map.
Note that with aggregated income numbers, median is often used instead of a mean (average) because income is unually not evenly distributed across a population, and a handful of wealthy people can distort averages so they are not representative of the typical economic well-being of people living in a particular area. (Yates 2020).
If you want to use the Living Atlas data for analysis, use the Export Features tool to copy the data from the Living Atlas into a new feature class in the project database.
ArcGIS Online
The video below shows how to add a Living Atlas layer of median household income to a map in ArcGIS Online. Note that this layer is scale-dependent and changes the types of areas being displayed (states, counties, census tracts) depending on how closely you are zoomed in to the map.
- Create a new map in ArcGIS Online.
- Select Add and Living Atlas Layers.
- Search for the data by name, in this case median household income.
- Zoom in on the area you want to display. This particular layer is a scale-dependent layer that changes the types of areas displayed depending in how closely you are zoomed in to an area.
- Change the Blending to Multiply so you can see the base map and labels as geographic context for your data.
- Save the map under a meaningful name, share it, and copy the URL for a link (Minn_2019_County_Economics).
You can get metadata on the source and year of the information in a layer by opening the Properties panel and clicking on the link below Information.
Using USCB Table Data
Data from a variety of different programs is available on data.census.gov for download as table data.
These ACS demographic profile (DP) tables contain useful groups of data:
- DP02 Selected Social Characteristics (marriage, fertility, education, ancestry)
- DP03 Selected Economic Characteristics (employment, sectors, income)
- DP04 Selected Housing Characteristics (housing size, age, and costs)
- DP05 Demographic and Housing Estimates (race, ethnicity, and age groups)
If you need to variable(s) that are unavailable from an precompiled sources, you can download separate table and polygon data from the USCB and join the data together in ArcGIS Pro.
A join is a database operation where two tables are connected based on common key values. In GIS, an attribute join is used to connect data from external tables (such as in a CSV file) to geospatial locations defined in a feature class that comes from a shapefile or file geodatabase.
Downloading Table Data
The video below demonstrates downloading selected variables from the DP03 table with county-level data from data.census.gov.
- From the data.census.gov home page, search for the desired table (DP03). The default table shows values for the entire USA.
- Click Geos to select the type of geographic area. For this example, we will use County and All counties within the United States and Puerto Rico.
- Click Download Table.
- Select the appropriate Table Vintages. For this example, we use the 2019 five-year estimates for maximum accuracy in the pre-COVID world.
- Download the zipped CSV.
- In the Windows File Explorer, open the .zip archive, and open the file with a name containing the word "data" in it.
- Remove all unnecessary columns, and rename the columns to meaningful names. For this example, these are the variables we keep.
- GEO_ID (needed to join to the area polygons later)
- Mean Minutes to Work (DP03_0025E)
- Median Household Income (DP03_0062E)
- Percent Workforce Participation (DP03_0002PE)
- Percent Professional (DP03_0041PE)
- Remove the 2nd row with the descriptive column information and leave just the top header row and data rows.
- Look through the rows and remove any rows with non-numeric data.
- Save the spreadsheet as a CSV file under a meaningful name (County_Economics.csv).
Mapping Table Data in ArcGIS Pro
Although the tables downloaded from data.census.gov contain geographic area identifiers, they do not contain the polygon information needed to map that data as areas in software and we need to join the table data to area polygons for mapping.
For the join key, we use the USCB GEO_ID field that is common to both the table downloaded from data.census.gov and the TIGER/LINE shapefile.
- Download table data from data.census.gov as demonstrated above.
- Import an area polygon shapefile as demonstrated above.
- Under Analysis, Tools find the Join Fields tool to copy the data from the CSV table into the polygon feature class.
- Input Features: Find the feature class in your project geodatabase (County_Economics).
- Input Join Field: The AFFGEOID field for the polygons
- Join Table: Find the cleaned CSV file of downloaded census data
- Join Table Field: GEOID
- Leave the other options as the default.
- Validate Join to confirm that records are being joined.
- Run to join the data.
- Symbolize the updated layer to verify the new fields have been joined into the data.
If using data that covers a broad area, you may need to isolate particular subsets of the data.
For this example, we use a definition query to subset only counties in Illinois.
- TIGER files have different identification fields depending on the geography. With county-level data like this, the STATEFP field is a numeric FIPS code that indicates the state.
- Examine the fields in a location in the area you want to select to find the appropriate code. In this case, the STATEFP for Illinois is 17.
- Right click on the layer, select Properties and select Definition Query.
- Add a New Definition Query.
- Select the identification field (STATEFP), is equal to, and the identification code (17).
- Click OK and only the locations matching the criteria should be visible.
Mapping Table Data in ArcGIS Online
For the join key, we use the USCB GEO_ID field that is common to both the table downloaded from data.census.gov and the TIGER/LINE shapefile.
- Download table data from data.census.gov as demonstrated above.
- Download an area polygon shapefile as demonstrated above.
- On your ArcGIS Online Content page, click New Item and upload the zipped shapefile as a hosted feature layer.
- Click New Item and upload the CSV as a hosted feature table.
- After the layer completes publishing, Open in Map Viewer.
- Click the Analysis icon on the right side panel and select Tools, Summarize Data, Join Features
- For the Target layer, select the layer created with the shapefile.
- For the Join layer, Browse layers and find the new table service.
- Under Attribute relationships, the target field will be AFFGEOID and the Join field will be GEO_ID.
- Under Result layer, give a meaningful Output name (Minn_2019_County_Economics).
- Estimate credits to make sure the operation will not use too many credits. County joins should use under 10 credits.
- Run. The join may take awhile depending on how many features you are joining. If the join takes longer than five minutes, the tool may have completed without notifying you. You can return to the home page, return to the map, and then add the layer from your content.
- Update the polygon Symbology based on the newly joined data variable
- Save the map under a meaningful title (Minn_2019_County_Economics).
- Back in your Content page, remove the original shapefile and table.
Mapping Table Data in Python
A conventional technique for acquiring USCB data is to download table data from data.census.gov and join it to TIGER polygons. Downloading is preferred over API access if you wish to preserve a snapshot of the data at a particular time, or you want to avoid the unreliability and download times associated with APIs.
- Download and clean up the table data as described above.
- Pandas is a Python package working with tabular data.
- GeoPandas is a Python package for working with geospatial data.
- Matplotlib is a Python package for plotting graphs.
- CSV files can be read into Pandas DataFrames using the read_csv() function.
- Find the URL to the appropriate TIGER cartographic boundary zipped shapefile from the link on the page shown above.
- Read the cartographic boundary file directly from the USCB website into a GeoDataFrame object using the read_file() function.
- Use the to_crs() method to reproject the data to the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- Join the table DataFrame with the county polygon GeoDataFrame using the Pandas merge() method and the GEO_ID fields in the two objects.
- To map only the continental 48 states, we exclude Alaska, Hawaii, and Puerto Rico using their FIPS codes.
import pandas import geopandas import matplotlib.pyplot as plt county_data = pandas.read_csv("https://michaelminn.net/tutorials/gis-census/2023_County_Economics.csv") counties = geopandas.read_file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_5m.zip") counties = counties.to_crs("ESRI:102009") counties = counties.merge(county_data, left_on="AFFGEOID", right_on="GEO_ID") counties = counties[~counties["STATEFP"].isin(['02', '15', '72'])] axis = counties.plot("Percent Workforce Participation", cmap="coolwarm_r", legend=True, scheme="quantiles") axis.set_axis_off() plt.show()
Mapping Table Data in R
A conventional technique for acquiring USCB data is to download table data from data.census.gov and join it to TIGER polygons. Downloading is preferred over API access if you wish to preserve a snapshot of the data at a particular time, or you want to avoid the unreliability and download times associated with APIs.
- Download and clean up the table data as described above.
- Read the table data into an R data.frame using the read.csv() function.
- Find the URL to the appropriate TIGER cartographic boundary zipped shapefile from the link on the page shown above.
- download.file() to a temporary file with the .shz extension. You must go through this intermediate step so the file has the .shz extension so that st_read() knows this is a zipped shapefile.
- Load the shapefile polygons into a simple features data.frame using st_read().
- Reproject the polygons into a cartographically appropriate projection with st_transform(). For this data we use the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- merge() the polygons and the table on the GEO_ID fields.
- To map only the continental 48 states, we exclude Alaska, Hawaii, and Puerto Rico using their FIPS codes.
- Create an appropriate diverging colorRampPalette.
- plot() a choropleth colored by the desired variable.
library(sf) county_data = read.csv("https://michaelminn.net/tutorials/gis-census/2023_County_Economics.csv") download.file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_5m.zip", "temp.shz") counties = st_read("temp.shz") counties = st_transform(counties, "ESRI:102009") counties = merge(counties, county_data, by.x="AFFGEOID", by.y="GEO_ID") counties = counties[!(counties$STATEFP %in% c('02', '15', '72')),] redblue = colorRampPalette(c("red", "lightgray", "navy")) plot(counties["Percent.Workforce.Participation"], pal=redblue, breaks="quantile")
If you want to save a copy of your processed data for later use, you can use the st_write() function to create a variety of different types of geospatial data files.
st_write(counties, "2019_County_Economics.geojson")
The US Census Bureau API
The USCB makes much of their data available through application programmers interfaces (APIs) that permit direct access to current versions of USCB data via services.
Although APIs have a learning curve, if you are using USCB data in R or Python, access through an API can be a much more flexible way of accessing USCB data than manually downloading and cleaning table data.
ACS Variables
ACS variables are referenced by cryptic variable names that indicate the source table and the number of the variable in that table, along with letters that indicate whether the variable represents estimated values or margins of error. Adding to the complexity, variables representing different types of summarization are stored in different reference files.
For this example, we focus on data from the 2015-2019 ACS five-year estimates that represent the final data release reflecting the pre-COVID world.
Typical variables from the profile variable list (DP02 - DP05) include:
- DP02_0016E: Average household size
- DP02_0065PE: Percent bachelor's degree
- DP02_0066PE: Percent graduate degree
- DP02_0070PE: Percent veterans
- DP02_0093PE: Percent foreign born
- DP03_0025E: Mean minutes to work
- DP03_0062E: Median household income
- DP04_0003PE: Percent vacant units
- DP04_0046PE: Percent homeowners
- DP04_0047PE: Percent renters
- DP04_0089E: Median home value
- DP04_0101E: Median monthly mortgage costs
- DP04_0134E: Median rent
- DP05_0018E: Median age
Typical variables from the subject variable list include:
- S0101_C01_001E: Total population
- S0101_C01_002E: Population under 5
- S0101_C01_022E: Population under 18 years
- S0101_C01_030E: Population 65 years or over
- S0801_C01_003E: Percent drive alone to work
- S0801_C01_009E: Percent transit to work
- S0801_C01_013E: Percent work at home
- S1301_C05_001E: Percent single mothers
- S1301_C04_001E: Annual births per 1K women
- S2504_C02_027E: Percent of homes with no vehicle
API Calls
USCB API calls are URLs with path components and parameters that will return the requested data. This list of available APIs describes the different API options.
For this examples, we will download state-level median household income (DP03_0062E) from the 2015-2019 ACS five-year estimates that represent the final data release reflecting the pre-COVID world.
For this API query URL:
https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=state:*
- The path for profile variables is https://api.census.gov/data/2019/acs/acs5/profile
- The get= parameter is a list of requested variable names.
- The GEO_ID variable is the GEOID needed for the join with the TIGER polygons (below).
- The for= parameter indicates the FIPS code(s) for the desired areas. Wildcards (*) can be specified
- An optional key= parameter specifies the API key (if needed for high volume applications).
Using USCB API Data in Python
To create a GeoDataFrame of ACS data:
- GeoPandas is a Python package for working with geospatial data.
- Matplotlib is a Python package for plotting graphs.
- Load the API data into a DataFrame by passing the API URL to the Pandas read_json() function.
- Use iloc to remove the header row and the redundant state column.
- rename() the columns with meaningful names.
- astype(float) to convert the data column from text to numeric.
- Find the URL to the appropriate TIGER cartographic boundary zipped shapefile from the link on the page shown above.
- Read the cartographic boundary file directly from the USCB website into a GeoDataFrame object using the read_file() function.
- Use the to_crs() method to reproject the data to the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- Join the table DataFrame with the county polygon GeoDataFrame using the Pandas merge() method and the GEO_ID fields in the two objects.
- To map only the continental 48 states, we exclude Alaska, Hawaii, and Puerto Rico.
import pandas import geopandas import matplotlib.pyplot as plt state_income = pandas.read_json("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=state:*") state_income = state_income.iloc[1:, 0:2] state_income = state_income.rename(columns={0:"GEO_ID", 1:"Median Household Income"}) state_income.iloc[:,1] = state_income.iloc[:,1].astype(float) states = geopandas.read_file("https://www2.census.gov/geo/tiger/GENZ2022/shp/cb_2022_us_state_5m.zip") states = states.to_crs("ESRI:102009") state_income = states.merge(state_income, left_on="AFFGEOID", right_on="GEO_ID") state_income = state_income[~state_income["STUSPS"].isin(['AK', 'HI', 'PR'])] axis = state_income.plot("Median Household Income", scheme="naturalbreaks", cmap="coolwarm_r", edgecolor="#808080", legend=True) axis.set_axis_off() plt.show()
County Level Data
County level data can be loaded by modifying the API parameters and joining with the TIGER counties file.
This example maps median household income (DP03_0062E) by Illinois county (state FIPS 17).
county_income = pandas.read_json("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=county:*&in=state:17") county_income = county_income.iloc[1:, 0:2] county_income = county_income.rename(columns={0:"GEO_ID", 1:"Median Household Income"}) county_income.iloc[:,1] = county_income.iloc[:,1].astype(float) counties = geopandas.read_file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_20m.zip") counties = counties.to_crs("ESRI:102009") county_income = counties.merge(county_income, left_on="AFFGEOID", right_on="GEO_ID") axis = county_income.plot("Median Household Income", scheme="naturalbreaks", cmap="coolwarm_r", edgecolor="#808080", legend=True, legend_kwds={"bbox_to_anchor":(0.2, 0.4)}) axis.set_axis_off() plt.show()
Tract-Level Data
Census tracts are organizational boundaries used for USCB data collection that are drawn to roughly align with neighborhood borders. Ideally, each tract contains 4,000 residents, although the number of residents can vary depending on area (USCB 2019).
This example maps median household income (DP03_0062E) by census tract in Cook County, Illinois (state FIPS 17, county FIPS 031).
Note that tract-level data is commonly undisclosed or unavailable and is represented with the negative number -666666666. This code resets those values to zero to avoid distorting the legend.
tract_income = pandas.read_json("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=tract:*&in=state:17&in=county:031") tract_income = tract_income.iloc[1:, 0:2] tract_income = tract_income.rename(columns={0:"GEO_ID", 1:"Median Household Income"}) tract_income.iloc[:,1] = tract_income.iloc[:,1].astype(float) tract_income[tract_income["Median Household Income"] < 0] = 0 tracts = geopandas.read_file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_17_tract_500k.zip") tracts = tracts.to_crs("ESRI:102009") tract_income = tracts.merge(tract_income, left_on="AFFGEOID", right_on="GEO_ID") axis = tract_income.plot("Median Household Income", scheme="naturalbreaks", cmap="coolwarm_r", edgecolor="none", legend=True, legend_kwds={"bbox_to_anchor":(0.2, 0.4)}) axis.set_axis_off() plt.show()
Using USCB API Data in R
To create a data.frame of ACS data:
- The sf (simple features) library provides functions for working with vector geospatial data.
- The jsonlite library is a JSON parser.
- Load API data as a data.frame by passing the API URL to the jsonlite fromJSON() function.
- Remove the header row and the redundant state column.
- Provide meaningful column names().
- Use as.numeric() to convert the data column from text to numeric.
- Find the URL to the appropriate TIGER cartographic boundary zipped shapefile from the link on the page shown above.
- download.file() to a temporary file with the .shz extension. You must go through this intermediate step so the file has the .shz extension so that st_read() knows this is a zipped shapefile.
- Load the shapefile polygons into a simple features data.frame using st_read().
- Reproject the polygons into a cartographically appropriate projection with st_transform(). For this data we use the North America Lambert Conformal Conic projection suitable for North America (ESRI 102009).
- merge() the polygons and the table on the GEO_ID fields.
- To map only the continental 48 states, we exclude Alaska, Hawaii, and Puerto Rico using their FIPS codes.
- Create an appropriate diverging colorRampPalette.
- plot() a choropleth colored by the desired variable.
library(sf) library(jsonlite) state_data = fromJSON("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=state:*") state_data = as.data.frame(state_data[2:nrow(state_data),1:2],) state_data[,2] = as.numeric(state_data[,2]) names(state_data) = c("GEO_ID", "Median Household Income") download.file("https://www2.census.gov/geo/tiger/GENZ2022/shp/cb_2022_us_state_5m.zip", "temp.shz") states = st_read("temp.shz") states = st_transform(states, "ESRI:102009") state_income = merge(states, state_data, by.x="AFFGEOID", by.y="GEO_ID") state_income = state_income[!(state_income$STUSPS %in% c('AK', 'HI', 'PR')),] redblue = colorRampPalette(c("red", "lightgray", "navy")) plot(state_income["Median Household Income"], pal=redblue, breaks="quantile")
County Level Data
County level data can be loaded by modifying the API parameters and joining with the TIGER counties file.
This example maps median household income (DP03_0062E) by Illinois county (state FIPS 17).
county_data = fromJSON("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=county:*&in=state:17") county_data = as.data.frame(county_data[2:nrow(county_data),1:2],) county_data[,2] = as.numeric(county_data[,2]) names(county_data) = c("GEO_ID", "Median Household Income") download.file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_20m.zip", "temp.shz") counties = st_read("temp.shz") counties = st_transform(counties, "ESRI:102009") county_income = merge(counties, county_data, by.x="AFFGEOID", by.y="GEO_ID") redblue = colorRampPalette(c("red", "lightgray", "navy")) plot(county_income["Median Household Income"], pal=redblue, breaks="quantile")
Tract-Level Data
Census tracts are subdivisions of counties that are drawn based on clearly identifiable features to ideally contain around 4,000 residents, although in practice the range of population is usually between 1,200 and 8,000 (USCB 2019).
This example maps median household income (DP03_0062E) by census tract in Cook County, Illinois (state FIPS 17, county FIPS 031).
Note that tract-level data is commonly undisclosed or unavailable and is represented with the negative number -666666666. This code resets those values to zero to avoid distorting the legend.
tract_data = fromJSON("https://api.census.gov/data/2019/acs/acs5/profile?get=GEO_ID,DP03_0062E&for=tract:*&in=state:17&in=county:031") tract_data = as.data.frame(tract_data[2:nrow(tract_data),1:2],) tract_data[,2] = as.numeric(tract_data[,2]) names(tract_data) = c("GEO_ID", "Median Household Income") download.file("https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_tract_500k.zip", "temp.shz") tracts = st_read("temp.shz") tracts = st_transform(tracts, "ESRI:102009") tract_income = merge(tracts, tract_data, by.x="AFFGEOID", by.y="GEO_ID") redblue = colorRampPalette(c("red", "lightgray", "navy")) plot(tract_income["Median Household Income"], pal=redblue, breaks="quantile", lwd=0.1)