# Mapping Socrata Panel Data With ArcGIS Online

This tutorial describes how to download and map panel data from a Socrata data portal. These examples utilize data from the Centers for Disease Control and Prevention.

The Centers for Disease Control and Prevention (CDC) is a US federal agency charged with protecting the health of Americans. The CDC is headquartered in Atlanta, GA and has over 12,000 employees in a number of facilities across the country.

The CDC makes a wide variety of data freely available to the public through their data.cdc.gov Socrata data portal.

For these example, we will use obesity data, collected through the Behavioral Risk Factor Surveillance System (BRFSS) which is a continuous, state-based surveillance system that collects information about modifiable risk factors for chronic diseases and other leading causes of death.

## Panel Data

Many of the CDC data sets are available as panel data, which is multi-dimensional data.

The dimensions of data can be thought of as the dimensions of the spreadsheet that would be needed to represent the data. While what these dimensions can represent differs depending on the phenomena the data represents, with geospatial panel data, the three dimensions of data are commonly where, what, and when.

### First Dimension: Where

If your data has one column, such as a list of states, that is one-dimensional data. One-dimensional data represents locations with no consideration of differences between those locations.

### Second Dimension: What

If there is attribute information (what) to go with the location information (where), that forms two-dimensional data. This example is obesity rates by state. Thematic maps commonly only map one attribute at a time.

Two-dimensional data can contain multiple attributes per location. For this example, each state location has attributes for the different rates of obesity for different demographic groups. The arrangement of multiple attributes is called stratification, which is discussed in more detail below.

### Third Dimension: When

If all of that data is collected in multiple time periods, such as by week or year, that would be three-dimensional data.

### Stratification

In many situations involving the study of people, we may want to stratify our subjects into strata (categories), such as men/women, different age groups, etc. so that we can study (and perhaps address) differences between groups.

These stratification categories represent separate columns in the second dimension described above.

For example, the CDC's BRFSS data for obesity has the following stratification categories:

• Question
• Percent of adults aged 18 years and older who have obesity
• Percent of adults aged 18 years and older who have an overweight classification
• Percent of adults who report consuming fruit less than one time daily
• Percent of adults who report consuming vegetables less than one time daily
• Percent of adults who engage in muscle-strengthening activities on 2 or more days a week
• Age
• 18 - 24
• 25 - 34
• 35 - 44
• 45 - 54
• 55 - 64
• Education
• Less than high school
• High school graduate
• Some college or technical school
• Gender
• Male
• Female
• Income
• Less than \$15,000
• \$15,000 - \$24,999
• \$25,000 - \$34,999
• \$35,000 - \$49,999
• \$50,000 - \$74,999
• Race
• Non-Hispanic White
• Non-Hispanic Black
• Hispanic
• Asian
• Hawaiian/Pacific Islander

To get a particular value from the data set you must specify an index for each dimension and stratification category. For example:

1. Location: New York (first dimension)
2. Year: 2016 (third dimension)
3. Question: Percent of adults aged 18 years and older who have obesity
4. Demographic Group: Male (Gender)
5. Data value = 25.9%

This means that in 2016 an estimated 25.9% of adult men in New York State were obese.

## Create a Bubble Map of State-Level Data

### Filter Data in Socrata

Much of the CDC's data is made available in their data catalog using the Socrata Open Data, a web app utilized by many government agencies for distributing data to the general public. The app is flexible, but can be confusing and has a bit of a learning curve.

The challenge with using Socrata with three-dimensional panel data is that Socrata can only represent two-dimensional spreadsheets. The second dimension therefore has to be represented with stratification columns that specify the stratification categories for the data value in a particular row.

Therefore to map panel data, you must choose a specific set of dimensions and then reshape the data using filters that isolate a subset of the data based on the values in the stratification columns.

Keep adding filters until you get the data series you want.

You will need to apply one filter for each dimension and stratification category. When unfamiliar with data, this may require some trial-and-error to get the data reshaped so there is one value per geographical area.

When using state-level data, keep adding filters until you have 50 rows left in your data - or perhaps a few more than 50 if your data set includes both states and territories

The video below demonstrates how to find data in the CDC Data Catalog and apply a sequence of filters to get a mappable CSV file. The exact set of dimensions and filter values will be different for other data sets. For this example, the following filters are applied to get one overall obesity value for each of the 50 states.

1. Year: 2016
2. Question: Percent of adults aged 18 years and older who have obesity
3. Total: Total (all demographic groups)

### Export to a CSV and Clean Up the Data

Once you have your filtered data, you should export the data as CSV for Excel. CSV stands for comma-separated variable and if you open the file in a text editor like Notepad or TextEdit, you will see that it is exactly what is says - rows of variables with the variables in each row separated by commas

When you download the file, you should then remove all unneeded columns and rename the columns so their content is clear:

1. With state level data, renaming LocationDesc to State will help ArcGIS Online clearly understand what the file contains
2. Renaming the generic Data Value column to the name of the variable will alleviate confusion when mapping the data or reusing it in the future
3. Save the data as a CSV file to your computer

### Import Into an ArcGIS Online Bubble Map

You can then make a quick graduated bubble map of the data in ArcGIS Online:

1. Create a new map from your ArcGIS Online home page
2. Select Add Layer From File and select the CSV file you just saved
3. For Locate features by, the app should see your State column
4. Choose an attribute to show with the column containing the variable value, in this case, Data_Value

## Create a Choropleth of State-Level Data

Another type of visualization is a choropleth, where areas are colored according to some variable. In order to create a choropleth with state-level data, we need a layer of state polygons to color with the data.

You should generally only use choropleths to map relative values (such as percentages) rather than absolute values (like population or numbers of new diagnosed cases of disease) with choropleths. Because the population density of areas varies widely, using choropleths with absolute values can give a misleading impression.

For example, on a choropleth of newly diagnosed flu cases (an absolute value), the small number of people in a large area (like Wyoming) will stand out more than a small area with a large number of people (like New Jersey), making a widespread flu epidemic seem more mild than it is. Creating such a map with prevalence of the disease as a percentage of population (a relative value) will more accurately communicate the distribution of infection.

### Download a TIGER Shapefile and Upload to ArcGIS Online

The US Census Bureau makes a variety of zipped shapefiles available through their TIGER (Topologically Integrated Geographic Encoding and Referencing) website. A shapefile is a file format developed by ESRI in the 1990s that is still often used for distributing geospatial data. A shapefile is actually composed of a number of related files, which are usually combined into a ZIP archive file with a .zip extension.

Use the cartographic boundary files, which are specifically designed to cover political boundaries rather than statistical boundaries used for other Census Bureau data collection.

Select the type of areas you are mapping. In this case, these are state boundaries. For a national map like this, you can sacrifice some detail for smaller size and faster loading by using the 500K low-resolution file.

To upload the shapefile to ArcGIS Online, Add layer from file with the zipped shapefile.

To start with, just visualize location as Single Symbol to make sure the shapefile uploaded successfully

### Join the Data to the Shapefile

To color the shape polygons, you need to perform a database operation called a join that joins the individual data rows from a CSV file to their associated polygons in the shapefile.

• Select Perform Analysis, Summarize Data, Join Features

• The Target layer is the state polygons, because our ultimate target is a layer of colored polygons
• The Join layer is your CSV table of data
• The Type of join will be Choose a spatial relationship, Intersects. This will cause data to be joined from the points that intersect (are contained in) the state boundaries
• Give a meaningful Result layer name
• Click off Use current map extent so that all intersecting features are joined, regardless of whether they appear in the current map view
• Remove the original state shapefile layer to reduce clutter

### Styling the Choropleth

When the analysis completes, you can color the polygons. In this map we see a clear band of states with high obesity in the Deep South.

1. Change style and base the style on the variable value
2. Because obesity is generally considered to be undesirable, we style Counts and Amounts, Color and use a red fill
3. Save As under a meaningful name
4. Share it with Everyone and copy the the URL if you want to share it

## County-Level Data and Map

County-level data will give you a more-detailed view, and is useful for more-complex spatial analysis. However, there are only small handful of data sets on the CDC website that have county-level data, and it is a bit more challenging to work with than state-level data.

### Find and Filter Data

You can find county-level data by searching "county" in the CDC Data Catalog. Some county-level data available as of this writing includes:

• Drug Poisoning Mortality
• Stroke Mortality
• Teen Birth Rates
• Heart Disease Mortality
• Behavioral Risk Factors (Variety of Questions)
• Air Quality (particulate matter)

This example uses Stroke Mortality Data Among US Adults (35+) by State/Territory and County. The data contains multiple years and four stratification categories, so five filters were needed to reshape the data down to the 62 counties of New York.

• Year: 2013
• LocationAbbr (State): NY
• GeographicLevel (State- or County-Level): County
• Stratification1 (Gender): Overall
• Stratification2 (Race): Overall

### Export to a CSV and Clean Up the Data

1. Export the data as CSV for Excel and open it in a spreadsheet program
2. Rename the LocationAbbr to State and and the LocationDesc to County. This will give a hint to ArcGIS Online that this is data for the USA, rather than the default to assume this is world data
3. Rename the Data Value column to a descriptive variable name
4. Delete all other unused columns
5. Save the CSV file to your computer

### Import Into ArcGIS Online For a Bubble Map

When you Add layer from file, make sure it chooses State for the state column, and select Address or Place for geocoding.

You can then style it to your liking. In this case, colored rather than sized bubbles make the pattern of high stroke mortality in upstate New York more obvious.

### Create a Hosted Layer of County Polygons

If you want to make a choropleth with the data, you will need to first create a layer of county boundary polygons.

The US Census Bureau provides cartographic boundary files for counties, although you can only download a shapefile for all US counties, which is too large to add as a regular layer to an ArcGIS Online map. Therefore, you will need to create a hosted layer.

2. Go to your ArcGIS Online Content page
3. Add an item from my computer with that shapefile to create a hosted layer

Note that because hosted layers are shared among ArcGIS Online users, the names of hosted layers must be unique. Simple names like Counties will already be taken, and you may want to use your name as a prefix to create a unique name.

### Join The Data To The Polygons

You can use a spatial join to join the data from your CSV file to the hosted county polygons layer so that data can be used to color the polygons.

1. In a new map, Search for layers in My Content and add the county polygons hosted layer to map all counties in the USA
2. From the Analysis button on the counties layer, select Summarize Data and Join Features
3. As above, the target layer will be the counties because the final shapes you want are the county boundary polygons
4. The layer to join to the target layer will be the CSV data layer you added above when creating the bubble map
5. Choose a spatial relationship, Intersects. This will cause data to be joined from the points that intersect (are contained in) the county boundaries
6. Give the new layer a meaningful name
7. Click off Use current map extent so that all intersecting features are joined, regardless of whether they appear in the current map view
8. Run Analysis. This may take a few minutes to process all the counties
9. Remove the original state shapefile layer to reduce clutter
10. Style the new layer according to the data variable
11. Save the map under a meaningful name
12. Share the map with everyone
13. You will be given a prompt to update sharing of the the hosted layers too - accept that
14. Copy the link

## Directory Cleanup

To avoid clutter in your home content directory, go to your Content page, create a new directory for this project, and move all the layers and web maps there.