Creating Shapefiles With American FactFinder Data

This tutorial describes how to map data from the US Census Bureaus's American FactFinder website using an attribute join with TIGER/Line shapefiles.

A join is a database operation where two tables are connected based on common key values. In GIS, an attribute join is used to connect data from external tables (such as in a CSV file) to geospatial locations defined in a feature class that comes from a shapefile or file geodatabase.

Attribute Join Illustration

Attribute Join and Conversion with ArcMap

  1. Joining a csv file containing a single variable (0:45)
    • Download a single-variable CSV file from American FactFinder
    • Download census tract polygons from the TIGER website
    • Add a field to a shapefile
    • Join the shapefile and CSV data
    • Use the field calculator to convert the text field in the CSV file to a numeric field in the shapefile
    • Select only needed joined fields
    • Save the new shapefile with the ACS data field
  2. Joining a multiple-variable summary CSV file (8:25)
    • Download a the summary CSV file from American FactFinder
    • Download census tract polygons from the TIGER website
    • Add field(s) to a shapefile
    • Join the shapefile and CSV data
    • Use the field calculator to convert the text field(s) in the CSV file to a numeric field(s) in the shapefile
    • Select only needed joined fields
    • Save the new shapefile with the ACS data field
  3. Joining within a TIGER file geodatabase (13:50)
    • Download a American Community Survey (ACS) file geodatabase from TIGER website
    • Search the metadata for the names of the desired fields
    • Join the feature class with the needed file geodatabase tables
    • Select only needed joined fields
    • Save as a new feature class in the file geodatabase
    • Set aliases so fields have human-readable names
  4. Exporting a layer to KMZ for use in Google Maps (optional) (19:20)
  5. ArcGIS Online, the future? (20:40)

Note that with some data, the field calculator will freeze up when the data contains non-numeric characters representing missing data. You can work around this issue by:

Attribute Join in ArcMAP with ArcPy

The following Python code can be modified to reflect the paths to your files, and then entered in the ArcMAP Python console (Geoprocessing -> Python) to join ACS data to TIGER data and convert the column types to numeric values.

import arcpy
tractfile = "N:/spokane/tl_2014_53_tract.shp"
attributefile = "N:/spokane/ACS_14_5YR_B19013_with_ann.csv"
fields = {"HD01_VD01": "MEDHHINC"}
fieldtype = "LONG"

# Load data
arcpy.MakeFeatureLayer_management ( tractfile, "tracts")
arcpy.MakeTableView_management(attributefile, "attributes")

# Has to be converted to a DBF file to have an object ID - Oy!
arcpy.TableToTable_conversion ("attributes", "N:/spokane", "temp.dbf")

# Join new columns - this may take many minutes depending on the size of your files
arcpy.JoinField_management ("tracts", "GEOID", "temp", "GEO_id2", fields.keys())

# Create new field, field calculator to convert, and delete old field
for infield in fields:
	arcpy.AddField_management("tracts", fields[infield], fieldtype)
	arcpy.CalculateField_management("tracts", fields[infield], "[" + infield + "]")
	arcpy.DeleteField_management("tracts", infield)

Attribute Join with QGIS

Importing American FactFinder Data Into R

# Import attribute CSV file

attributes = read.csv("ACS_15_5YR_B19013_with_ann.csv", stringsAsFactors=F)


# Import tract polygons
# The "dsn" parameter is the directory where the shapefile is located.
# A dot "." means the current directory.
# The "layer" parameter is the shapefile name (without the .shp suffix)

library(rgdal)
tracts = readOGR(dsn=".", layer="cb_2015_53_tract_500k", stringsAsFactors=F)


# Merge (join) the attributes to the shapefile

tracts = merge(tracts, attributes, by.x = "GEOID", by.y = "GEO.id2", all.x=F)


# Convert median household income column to a new integer column.
# Data read from a CSV file will be characters rather than numbers
# and need to be converted to numbers using as.integer().
# The name of this column wih vary depending on what variable(s) you use

tracts$MEDHHINC = as.integer(tracts$HD01_VD01)


# Keep only needed columns

tracts@data = tracts@data[,c("GEOID", "NAME","MEDHHINC")]


# Write shapefile

writeOGR(tracts, dsn=".", layer="medhhinc", driver="ESRI Shapefile", overwrite_layer=T)


# Optional diagnostic plot

library(sp)
spplot(tracts, zcol="MEDHHINC")
2015 ACS Spokane County, WA Median Household Income (USCB 2016)