Importing US Census Bureau Data into ArcGIS Pro
The US Census Bureau does make some of their data available in geospatial data files, but those offerings tend to be a limited subset of available data. Therefore, if you want USCB data that is not already available an existing service, you will need to do an attribute join between tabular data available from the USCB's data.census.gov website and the USCB's TIGER/Line cartographic boundary shapefiles. This tutorial describes that process in ArcGIS Pro.
The US Census Bureau and the American Community Survey
Demographic data is "the statistical characteristics of human populations (such as age or income)." Etymologically, the word is a combination of the Greek words dêmos (people) and graphein (write) - literally, writing about people (Merriam-Webster 2020). Typical demographic variables for an area include:
- Median Household Income
- Median Age
- Median Family Size
- Percent of Residents With a College Degree
- Percent of Residents That Were Born Outside the USA
- Percent of Residents That Are Military Veterans
The US Census Bureau (USCB) is the part of the US federal government responsible for collecting data about people and the economy in the United States. The Census Bureau has its roots in Article I, section 2 of the US Constitution, which mandates an enumeration of the entire US population every ten years (the decennial census) in order to set the number of members from each state in the House of Representatives and Electoral College (USCB 2017).
The American Community Survey
Among the Census Bureau's many programs is the American Community Survey (ACS), an ongoing survey that provides information on an annual basis about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.
Unlike the constitutionally-mandated decennial census which is only taken every ten years, the ACS continuously surveys people in America's communities so that the ACS data can be more detailed and current than the decennial census. However, because it is a survey rather than a complete count like the census, there is uncertainty about how accurately the sampling represents the facts on the ground, and that uncertainty is expressed in a statistical margin of error (MOE) on most ACS values (US Census Bureau 2018).
ACS data is released with data that is gathered over 1-year and 5-year intervals. The shorter intervals are more current, but less accurate, and often available only in a limted set of areas (like highly populous counties). The longer intervals are more accurate, but less current. You should choose the data set based on whether the analysis you are performing needs data that is more accurate or more up-to-date. For this tutorial, we go with accuracy by choosing the most recent five-year estimates.
Profile Pages on data.census.gov
The USCB's primary data portal is data.census.gov.
Although the portal provides a wide varity of types of access to the USCB's data, profile pages can be useful if you are looking for summary demographic and economic information about a specific area.
You can find these pages in data.census.gov by searching for a place name along with the word "profile."
The video below shows how to access the profile pages for the USA, Illinois, and Urbana, IL.
Downloading Data Tables Using data.census.gov
The video demonstrates how to download data tables from data.census.gov and prepare them for use in ArcGIS Pro using the Excel 365 web app. This example uses median household income by county in Illinois.
- Go to data.census.gov and search for the variable you need. You may need to dig around for awhile or consider a substitute if what you are looking for is unavailable. In this example, B19013 Median Household Income is just what we're looking for.
- Select the type of geographic areas you are looking for. In this case we are looking at counties in Illinois. You may need to select Transpose Table to see all of your values.
- Make sure that you have the appropriate number of values. In this example, the 1-year estimates do not contain all counties, so we switch to the 5-year estimates. The 5-year data is almost always the better choice for county or local level data.
- Download the data as a CSV to your hard drive.
- Using Windows Explorer, navigate into the .zip archive file and copy the file that contains "data_with_overlays" to your desktop.
- Open that file in a spreadsheet program like Excel 365, remove the top row with cryptic names, and rename the variable column to something short but descriptive.
- Save it back to your desktop under a meaningful name.
- Go to the USCB's TIGER Cartographic Boundary Shapefiles page and download the appropriate type of geography. The smaller (lower resolution) files are fine unless you are doing high accuracy mapping.
- Using Windows Explorer, find the downloaded shapefile, open the shapefile with Windows Explorer, and copy the individual files to your Desktop.
- Create a new project map in ArcGIS Pro.
- Add Data from your computer, navigate to your Desktop, and add the shapefile that has a .shp suffix.
- If needed, modify the projection to something cartographically valid like
- WGS 1984 Web Mercator.
TIGER Cartographic Boundary Shapefiles
The USCB maintains a collection of geospatial data on political boundaries in their Topologically Integrated Geographic Encoding and Referencing (TIGER) database.
They make a version of that data suitable for mapping as TIGER/Line cartographic boundary shapefiles.
Shapefiles utilize an old file format that ESRI developed in the 1990s. Despite their limitations (most notably limiting field names to ten characters), the format is supported by a wide variety of software, and is still commonly used to store and distribute geospatial data.
A shapefile is actually a collection of separate files that each contain separate information, such as the coordinates, attributes, projection, and metadata. Because all these files need to be kept together, shapefiles are commonly distributed in .zip archives that collect and compress the separate files into a single, compact file with .zip at the end of the name.
This video demonstrates how to download, unzip, and import a shapefile into ArcGIS Pro. This example uses county polygon boundaries compatable with the ACS data described above.
Joining Tables with Spatial Data in ArcGIS Pro
We now have a table of county-level data and a layer of features defining the boundaries of those counties. In order to map that data, we need to perform an attribute join.
A join is a database operation where two tables are connected based on common key values. In GIS, an attribute join is used to connect data from external tables (such as in a CSV file) to geospatial locations defined in a feature class that comes from a shapefile or file geodatabase.
For the join key, we use the USCB number that is a field common to both the table downloaded from data.census.gov and the TIGER/LINE shapefile.
- Right click on the layer, select Joins and Relates, and select Add Join.
- The Layer Name should be the polygon layer.
- The Input Join Field should be AFFGEOID.
- In Join Table, navigate to the spreadsheet you edited.
- The Output Join Field should be the id field from the spreadsheet.
- Uncheck Keep All Target Features so only the boundaries with matching entries in the data table are kept.
- Run the tool and you should have new fields joined to the polygon layer.
- Change the Symbology on that layer to style the polygons by your variable.
Publishing The Data As A Service
If you want to share your data with others, or want to put it in an ArcGIS Online web map, you need to publish it as a feature layer.
- In the Contents pane, right click on the layer, select Data and Export Features. Because the joined exists only in the project, you need to export your joined layer as a completely new layer (feature class).
- Give the feature class a meaningful name. Note that this name should only contain letters (no spaces or punctuation marks)
- Run the tool.
- Change the symbology to color the layer by your variable.
- Right click on the new layer, select Sharing and Share as Web Layer.
- Give the layer a meaningful name, along with a summary.
Cautions When Interpreting Census Data
Unlike the Decennial Census which attempts to collect information from everybody, the American Community Survey only gets information from a randomly selected group of people (a sample), and then uses statistical techniques to make an inference about the overall characteristics of the areas where those people live or work.
Because ACS data is a sample, we cannot be absolutely certain what the actual overall value really is. All ACS values have a margin of error that gives a range of possible values, usually with in a 95% level of confidence. These margins of error can be especially high in sparsely-populated rural areas, or with values that apply so small numbers of people.
While using ACS values as-is is usually fine for mapping, when your analysis involves high levels of accuracy or rigor, you need to be aware of the margins of error on your variables and communicate your uncertainty to your audience.
While the USCB collects data from individuals, it always aggregates that data by geographic areas to preserve anonymity and privacy.
As with sampling, aggregation introduces uncertainty as important individual distinctions can be lost when people are combined into groups and summarized.
One issue with aggregated data is the ecological fallacy, when you make assumptions about individuals based on aggregated data. For example, states are often classified as red states or blue states based on whether the majority of the voters in that state vote Republican or Democratic, respectively. However, even in very red Utah, Democratic President Obama got 25% of the vote in the 2012, so assuming that everyone you meet in Utah is conservative is incorrect.
The opposite of the ecological fallacy is the exception fallacy, where an assumption is made about a group based on a few exceptional individuals. For example, if you meet a tall basketball player from Ohio, the assumption that everyone in Ohio is tall would be incorrect.>