Mapping Socrata Panel Data With ArcGIS Online

This tutorial describes how to download and map panel data from a Socrata data portal. These examples utilize data from the Centers for Disease Control and Prevention.

The Centers for Disease Control and Prevention (CDC) is a US federal agency charged with protecting the health of Americans. The CDC is headquartered in Atlanta, GA and has over 12,000 employees in a number of facilities across the country.

The CDC makes a wide variety of data freely available to the public through their data.cdc.gov Socrata data portal.

For these example, we will use obesity data, collected through the Behavioral Risk Factor Surveillance System (BRFSS) which is a continuous, state-based surveillance system that collects information about modifiable risk factors for chronic diseases and other leading causes of death.

Panel Data

Many of the CDC data sets are available as panel data, which is multi-dimensional data.

The dimensions of data can be thought of as the dimensions of the spreadsheet that would be needed to represent the data. While what these dimensions can represent differs depending on the phenomena the data represents, with geospatial panel data, the three dimensions of data are commonly where, what, and when.

First Dimension: Where

If your data has one column, such as a list of states, that is one-dimensional data. One-dimensional data represents locations with no consideration of differences between those locations.

One-Dimensional Data

Second Dimension: What

If there is attribute information (what) to go with the location information (where), that forms two-dimensional data. This example is obesity rates by state. Thematic maps commonly only map one attribute at a time.

Two-Dimensional Data

Two-dimensional data can contain multiple attributes per location. For this example, each state location has attributes for the different rates of obesity for different demographic groups. The arrangement of multiple attributes is called stratification, which is discussed in more detail below.

Stratified Two-Dimensional Data

Third Dimension: When

If all of that data is collected in multiple time periods, such as by week or year, that would be three-dimensional data.

Three-Dimensional Data

Stratification

In many situations involving the study of people, we may want to stratify our subjects into strata (categories), such as men/women, different age groups, etc. so that we can study (and perhaps address) differences between groups.

These stratification categories represent separate columns in the second dimension described above.

For example, the CDC's BRFSS data for obesity has the following stratification categories:

To get a particular value from the data set you must specify an index for each dimension and stratification category. For example:

  1. Location: New York (first dimension)
  2. Year: 2016 (third dimension)
  3. Question: Percent of adults aged 18 years and older who have obesity
  4. Demographic Group: Male (Gender)
  5. Data value = 25.9%

This means that in 2016 an estimated 25.9% of adult men in New York State were obese.

Create a Bubble Map of State-Level Data

Filter Data in Socrata

Much of the CDC's data is made available in their data catalog using the Socrata Open Data, a web app utilized by many government agencies for distributing data to the general public. The app is flexible, but can be confusing and has a bit of a learning curve.

The Socrata Open Data Web App

The challenge with using Socrata with three-dimensional panel data is that Socrata can only represent two-dimensional spreadsheets. The second dimension therefore has to be represented with stratification columns that specify the stratification categories for the data value in a particular row.

Therefore to map panel data, you must choose a specific set of dimensions and then reshape the data using filters that isolate a subset of the data based on the values in the stratification columns.

Keep adding filters until you get the data series you want.

You will need to apply one filter for each dimension and stratification category. When unfamiliar with data, this may require some trial-and-error to get the data reshaped so there is one value per geographical area.

When using state-level data, keep adding filters until you have 50 rows left in your data - or perhaps a few more than 50 if your data set includes both states and territories

The video below demonstrates how to find data in the CDC Data Catalog and apply a sequence of filters to get a mappable CSV file. The exact set of dimensions and filter values will be different for other data sets. For this example, the following filters are applied to get one overall obesity value for each of the 50 states.

  1. Year: 2016
  2. Question: Percent of adults aged 18 years and older who have obesity
  3. Total: Total (all demographic groups)
Filtering CDC Data In Socrata

Export to a CSV and Clean Up the Data

Once you have your filtered data, you should export the data as CSV for Excel. CSV stands for comma-separated variable and if you open the file in a text editor like Notepad or TextEdit, you will see that it is exactly what is says - rows of variables with the variables in each row separated by commas

Example CSV file

When you download the file, you should then remove all unneeded columns and rename the columns so their content is clear:

  1. With state level data, renaming LocationDesc to State will help ArcGIS Online clearly understand what the file contains
  2. Renaming the generic Data Value column to the name of the variable will alleviate confusion when mapping the data or reusing it in the future
  3. Save the data as a CSV file to your computer
CSV File Cleanup

Import Into an ArcGIS Online Bubble Map

You can then make a quick graduated bubble map of the data in ArcGIS Online:

  1. Create a new map from your ArcGIS Online home page
  2. Select Add Layer From File and select the CSV file you just saved
  3. For Locate features by, the app should see your State column
  4. Choose an attribute to show with the column containing the variable value, in this case, Data_Value
Creating a Bubble Map From a CSV File in ArcGIS Online

Create a Choropleth of State-Level Data

Another type of visualization is a choropleth, where areas are colored according to some variable. In order to create a choropleth with state-level data, we need a layer of state polygons to color with the data.

You should generally only use choropleths to map relative values (such as percentages) rather than absolute values (like population or numbers of new diagnosed cases of disease) with choropleths. Because the population density of areas varies widely, using choropleths with absolute values can give a misleading impression.

For example, on a choropleth of newly diagnosed flu cases (an absolute value), the small number of people in a large area (like Wyoming) will stand out more than a small area with a large number of people (like New Jersey), making a widespread flu epidemic seem more mild than it is. Creating such a map with prevalence of the disease as a percentage of population (a relative value) will more accurately communicate the distribution of infection.

Example Choropleth of Obesity Rates in the USA

Download a TIGER Shapefile and Upload to ArcGIS Online

The US Census Bureau makes a variety of zipped shapefiles available through their TIGER (Topologically Integrated Geographic Encoding and Referencing) website. A shapefile is a file format developed by ESRI in the 1990s that is still often used for distributing geospatial data. A shapefile is actually composed of a number of related files, which are usually combined into a ZIP archive file with a .zip extension.

Use the cartographic boundary files, which are specifically designed to cover political boundaries rather than statistical boundaries used for other Census Bureau data collection.

Select the type of areas you are mapping. In this case, these are state boundaries. For a national map like this, you can sacrifice some detail for smaller size and faster loading by using the 500K low-resolution file.

To upload the shapefile to ArcGIS Online, Add layer from file with the zipped shapefile.

To start with, just visualize location as Single Symbol to make sure the shapefile uploaded successfully

Downloading and Uploading a TIGER Shapefile of State Boundaries

Join the Data to the Shapefile

To color the shape polygons, you need to perform a database operation called a join that joins the individual data rows from a CSV file to their associated polygons in the shapefile.

Joining a Shapefile with CSV Data

Styling the Choropleth

When the analysis completes, you can color the polygons. In this map we see a clear band of states with high obesity in the Deep South.

  1. Change style and base the style on the variable value
  2. Because obesity is generally considered to be undesirable, we style Counts and Amounts, Color and use a red fill
  3. Save As under a meaningful name
  4. Share it with Everyone and copy the the URL if you want to share it
Styling a State-Level Choropleth

County-Level Data and Map

County-level data will give you a more-detailed view, and is useful for more-complex spatial analysis. However, there are only small handful of data sets on the CDC website that have county-level data, and it is a bit more challenging to work with than state-level data.

Find and Filter Data

You can find county-level data by searching "county" in the CDC Data Catalog. Some county-level data available as of this writing includes:

This example uses Stroke Mortality Data Among US Adults (35+) by State/Territory and County. The data contains multiple years and four stratification categories, so five filters were needed to reshape the data down to the 62 counties of New York.

Filtering County-Level Data

Export to a CSV and Clean Up the Data

  1. Export the data as CSV for Excel and open it in a spreadsheet program
  2. Rename the LocationAbbr to State and and the LocationDesc to County. This will give a hint to ArcGIS Online that this is data for the USA, rather than the default to assume this is world data
  3. Rename the Data Value column to a descriptive variable name
  4. Delete all other unused columns
  5. Save the CSV file to your computer
Exporting and Cleaning Up County-Level Data

Import Into ArcGIS Online For a Bubble Map

When you Add layer from file, make sure it chooses State for the state column, and select Address or Place for geocoding.

You can then style it to your liking. In this case, colored rather than sized bubbles make the pattern of high stroke mortality in upstate New York more obvious.

Creating Bubble Map With County-Level Data

Create a Hosted Layer of County Polygons

If you want to make a choropleth with the data, you will need to first create a layer of county boundary polygons.

The US Census Bureau provides cartographic boundary files for counties, although you can only download a shapefile for all US counties, which is too large to add as a regular layer to an ArcGIS Online map. Therefore, you will need to create a hosted layer.

  1. Download the 500k resolution file
  2. Go to your ArcGIS Online Content page
  3. Add an item from my computer with that shapefile to create a hosted layer

Note that because hosted layers are shared among ArcGIS Online users, the names of hosted layers must be unique. Simple names like Counties will already be taken, and you may want to use your name as a prefix to create a unique name.

Importing a County Boundary Shapefile

Join The Data To The Polygons

You can use a spatial join to join the data from your CSV file to the hosted county polygons layer so that data can be used to color the polygons.

  1. In a new map, Search for layers in My Content and add the county polygons hosted layer to map all counties in the USA
  2. From the Analysis button on the counties layer, select Summarize Data and Join Features
  3. As above, the target layer will be the counties because the final shapes you want are the county boundary polygons
  4. The layer to join to the target layer will be the CSV data layer you added above when creating the bubble map
  5. Choose a spatial relationship, Intersects. This will cause data to be joined from the points that intersect (are contained in) the county boundaries
  6. Give the new layer a meaningful name
  7. Click off Use current map extent so that all intersecting features are joined, regardless of whether they appear in the current map view
  8. Run Analysis. This may take a few minutes to process all the counties
  9. Remove the original state shapefile layer to reduce clutter
  10. Style the new layer according to the data variable
  11. Save the map under a meaningful name
  12. Share the map with everyone
  13. You will be given a prompt to update sharing of the the hosted layers too - accept that
  14. Copy the link
Joining County-Level Data

Directory Cleanup

To avoid clutter in your home content directory, go to your Content page, create a new directory for this project, and move all the layers and web maps there.

Directory Cleanup