Mapping Free and Open Data in ArcGIS Online

Open data is "data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike" (Open Knowledge Foundation 2018).

A fundamental value of open data is interoperability, which is "the ability of diverse systems and organizations to work together" and intermix different datasets (Open Knowledge Foundation 2018). When people can work together with a minimum of restrictions, this results greater individual productivity that, ideally, benefits society as a whole.

These definitions draw a distinction with proprietary data where access is controlled, usually to subscribers who pay to license use of that data. All data requires effort (and, therefore, money) to capture and maintain. In the proprietary model, the data users provide those resources (plus a profit to the company controlling the data). In the open model, the cost is paid for by the public as a whole (such as with taxpayer-supported government data), or by individual companies that build business models that leverage open data with their own contributions.

The concept of open data should also be distinguished from free data, which is data that can be accessed (usually as downloads from public websites) at no cost. Open data is free, but free data is sometimes subject to restrictions (often loosely enforced) on how the data can be used and / or redistributed.

Despite the ideal of perfectly interoperable systems built on open data, the current reality is that free and open geospatial data is commonly made available across a variety of different types of website in a variety of different formats. Using that data often requires tedious cleanup. Although estimates vary and are primarily anecdotal, a common view is that data scientists spend 80% of their time finding, cleaning, and reorganizing data, and only 20% of their time doing meanigful data analysis (Ruiz 2017).

This tutorial will cover some basic techniques on how to acquire free and open data from public websites and add it to ArcGIS Online for mapping.

Free / Non-Free vs. Open / Proprietary

Geospatial Data Portals

Many government agencies make their open geospatial data available on data portal websites. The ideal situation is when open geospatial data is available as a feature service or some form of geospatial data file that can be accessed on a data portal and directly imported into GIS with no additional effort.

Socrata Shapefiles

One content management system commonly used by city governments for their open data portals is Socrata. Depending on the configuration, Socrata enables download of geospatial data as zipped shapefiles and GeoJSON files. Socrata also permits download of table data, and the software can be used as an interface into complex panel data.

For example, the Chicago Open Data Portal uses Socrata, and geospatial data sets usually have an export option for shapefiles.

  1. Acquire the data: Download the zipped shapefile from the portal.
  2. Store the data:
    • On your ArcGIS Online Content page, click Add Item.
    • Browse to find the zipped shapefile on your hard drive.
    • Give it a meaningful name and some tags.
    • Add metadata (summary, description, credits).
    • Adjust the sharing as needed.
  3. Communicate:
    • After the service is created, view the feature service in the map viewer.
    • Change the styling if desired and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a zipped shapefile downloaded from a Socrata data portal

Socrata Tables with Latitudes and Longitudes

Point data in data portals like Socrata is sometimes made available as tables with latitudes and longitudes rather than in geospatial data formats like shapefiles.

For this example, we create a feature class of locations with radiation producing equipment in New York City from the NYC Open Data Socrata portal.

  1. Acquire the data: Download the CSV for Excel file from the portal.
  2. Store the data:
    • On your ArcGIS Online Content page, click Add Item.
    • Browse to find the CSV file on your hard drive.
    • Give it a meaningful name and some tags.
    • Under Locate features by, make sure Coordinates is selected.
    • Make sure the location fields for your latitude and longitude have been correctly identified in the field table.
    • When the service information page pops up, add metadata (summary, description, credits).
    • Adjust the sharing as needed.
  3. Communicate:
    • View the feature service in a Map Viewer.
    • Change the styling if desired and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a point CSV file downloaded from a Socrata data portal

ArcGIS Hub

Another content management system commonly used by city governments for their open data portals is ArcGIS Hub.

Because ArcGIS Hub is implemented with ArcGIS Enterprise server software, ArcGIS Hub sites will often be configured to give you the option to use a feature service or download a data file (usually a shapefile). Feature services are more valuable for smaller data sets where you want your maps to display the most current version of the data immediately as it becomes avaiable. Downloaded data files that you can then upload to create your own feature services are more useful if you want a static picture at a given point in time, if the data set is large, or if the portal is unreliable. Feature services are dependent on the provider, can be published or retracted at will, and are often slow with large data sets. If you need reliability and performance, building your own feature services is usually a better choice.

For this example, we use data from the Rochester (NY) Open Data Portal for abandoned buildings slated for demolition because of safety or health threats.

  1. Acquire the data: Download the zipped shapefile from the portal.
  2. Store the data:
    • On your ArcGIS Online Content page, click Add Item.
    • Browse to find the zipped shapefile on your hard drive.
    • Give it a meaningful name and some tags.
    • Add metadata (summary, description, credits).
    • Adjust the sharing as needed.
  3. Communicate:
    • After the service is created, view the feature service in the map viewer.
    • Change the styling if desired and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a zipped shapefile downloaded from an ArcGIS Hub data portal

Tabular Data Portals

Some data portals provide data that has a mappable geospatial component, but only provide it in table form. One such portal is the provided by the World Bank, which is a group of international agencies created at the end of World War II to provide funding and knowledge to promote economic development in developing countries.

The World Bank collects a vast array of country-level data (including public health data), and makes it available to the general public on their data.worldbank.org data portal as part of their mission to be a source of knowledge that promotes economic development.

Because the only geospatial components the World Bank tables contain are country names and ISO country codes, you have two options for mapping tabular data in ArcGIS Online: geocoding the country names to create bubble maps, and joining the data to an existing feature class of polygons to create choropleths.

Geocoding Tables

When your table contains place names, you can map the table as a bubble map (graduated symbols) by geocoding the place names.

Because alternative spellings, abbreviations, and name variations can confuse the geocoding software, geocoding is an imperfect process and if you have something like a country code that can be joined more accurately (see below), that may be a better option. Geocoding can be costly if you have very large numbers (tens or hundreds of thousands) of addresses to geocode. And because geocoding is commonly implemented with public serviers, Geocoding can violate privacy if you are working with confidential data (like medical records).

  1. Acquire the data: Download the table and open it in a spreadsheet program like Excel.
  2. Process the data:
    • Remove all unneded rows and columns. Note that in this case we need to remove the aggregated area rows.
    • Make sure the top row contains only the names of your variables.
    • Make sure all rows in the location column have valid location names.
    • Save As the spreadsheet as a Comma Separated Variable (CSV) file.
  3. Store the data:
    • On your ArcGIS Online Content page, click Add Item.
    • Browse to open the CSV file.
    • Give it a meaningful name and some tags.
    • Check to make sure the place name has been selected for the Location.
    • Add metadata (summary, description, credits).
    • Change sharing as needed.
  4. Communicate:
    • After the service is created, view the feature service in the map viewer.
    • Change the styling if desired and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a geocoded table

Attribute Joins for Choropleths

Since geocoding only converts place names to points, if you want to create a choropleth with your data, you will need to join it to a layer of polygons based on a common attribute.

One challenge is that attribute joins need to have a variables in both layers that match exactly. Variations in country names and abbreviations can cause gaps in the data because of failure to match. Fortunately for this example, the World Bank data contains standard ISO three-digit country codes. For this example, we join it with the Minn 2017 World Polygons layer from the University of Illinois ArcGIS Online organization to create a feature service that can be mapped as a choropleth.

  1. Acquire the data: Download the table and open it in a spreadsheet program like Excel.
  2. Process the data (Excel):
    • Remove all unneded rows and columns. Note that in this case we need to remove the aggregated area rows.
    • Save As the spreadsheet as a Comma Separated Variable (CSV) file.
  3. Process the data (ArcGIS Online):
    • In ArcGIS Online, create a new Map.
    • Add and Add Data From File and browse to open the CSV file.
    • On the Add CSV Layer dialog, for Locate features by: select None, add as table.
    • Add the layer of polygons. In this example it is the Minn 2017 World Polygons layer.
    • Click the Analyze button on the polygon and select Summarize Data and Join Features.
    • The target layer will be the polygons and the layer to join will be the CSV table.
    • Select Choose the fields to match and choose the fields from the two data sets that should be used to join the two tables. For this example, the target is ISO3_CODE and the join layer field is Country Code.
    • Give a meaningful Result layer name.
    • Uncheck Use current extent so that all polygons are joined.
    • Run the tool, which may take a few minutes.
  4. Communicate:
    • Remove all the layers except the joined layer.
    • Change the styling to the desired default variable and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a joined table

Spatial Joins for Choropleths

In cases where you have place names that will geocode fairly cleanly but do not have an column in your table that will cleanly join with a polygon layer, you can first geocode to create a point map, and then perform a spatial join to join the points to the polygon layer.

  1. Acquire the data: Download the table and open it in a spreadsheet program like Excel.
  2. Process the data (Excel):
    • Remove all unneded rows and columns. Note that in this case we need to remove the aggregated area rows.
    • Save As the spreadsheet as a Comma Separated Variable (CSV) file.
  3. Process the data (ArcGIS Online):
    • In ArcGIS Online, create a new Map.
    • Add and Add Layer From File and browse to open the CSV file.
    • On the Add CSV Layer dialog, for Locate features by: select Place names and make sure the place name in your data is set to the appropriate type of location. This will create a point map.
    • Add the layer of polygons. In this example it is the Minn 2017 World Polygons layer.
    • Click the Analyze button on the polygon and select Summarize Data and Join Features.
    • The target layer will be the polygons and the layer to join will be the CSV table.
    • Choose Join by spatial relationship.
    • Choose Intersects.
    • Give a meaningful Result layer name.
    • Uncheck Use current extent so that all polygons are joined.
    • Run the tool, which may take a few minutes.
  4. Communicate:
    • Remove all the layers except the joined layer.
    • Change the styling to the desired default variable and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from a geocoded table

HTML Tables

Data is sometimes made available on websites as HTML tables on web pages. As with the tabular data above, it is possible to copy that data into

For example, Wikipedia pages commonly contain tables of data with place names that you can import into ArcGIS Pro. However, note that Wikipedia is an online encyclopedia and the data in Wikipedia usually comes from some other original source. Before using data copied from Wikipedia, you may want to look at the references for the original data source and use that instead so you can be assured of the most current and accurate data.

  1. Process the data:
    • Select the data on the web page and copy it into a blank spreadsheet.
    • Remove all unneded rows and columns.
    • Make sure the top row contains only the names of your variables. Rename as needed.
    • Make sure all rows in the location column have valid location names.
    • Save As the spreadsheet as a Comma Separated Variable (CSV) file.
  2. Store the data:
    • On your ArcGIS Online Content page, click Add Item.
    • Browse to open the CSV file.
    • Give it a meaningful name and some tags.
    • Check to make sure the place name has been selected for the Location.
    • Add metadata (summary, description, credits).
    • Change sharing as needed.
  3. Communicate:
    • After the service is created, view the feature service in the map viewer.
    • Change the styling if desired and Save Layer so that styling is the default when the layer is loaded in the future.
Creating a feature service from an geocoded HTML table

Manual Data Entry

In some situations where governments or organizations either do not have the resources or the ideological inclination to support open data portals, data may only be available as tables in Portable Document Format (PDF) files. Even in places with open data portals, some agencies may make their data only available as PDF files, especially historic documents.

While tables can sometimes be copied directly from PDF files into a spreadsheet for import into GIS, PDF is a format designed for print and the data must either be copied with a significant amount of clean up, or in the worst cases may have to be entirely entered by hand. However, if you need the data and you can't find another electronic source, you may not have a choice.

For example, the US Census Bureau has published a variety of statistical abstracts since the late 19th century that summarize data about the US. The older documents are scanned into PDF files. This video shows manual entry of data about the number of miles of railroad in operation in 1877 by state, which is then geocoded for mapping in ArcGIS Online.

Creating a feature service from a geocoded manually-entered table

Suggested Governmental and NGO Open Data Websites

International Data

World Bank

The World Bank, which is collects a vast array of country-level data, and makes it available to the general public on their data.worldbank.org data portal as part of their mission to be a source of knowledge that promotes economic development. The use of World Bank data is detailed above in the Tabular Data section.

The World Bank

EU Open Data Portal

Eurostat is the European Union's open data portal with a wide variety of demographic and economic data for the countries in the EU.

Eurostat

FAOSTAT

FAOSTAT is the data portal for the Food and Agriculture Organization of the United Nations.

FAOSTAT

WHO Global Health Observatory

The World Health Organization (WHO) is the public health organization of the United Nations and they make their data available through their Global Health Observatory. Unfortunately, their data is displayed at text and images that cannot be copied into a table, so you will need to manually type this data into a spreadsheet for mapping in ArcGIS Online unless you can find that data from another source (like the World Bank).

WHO Global Health Observatory

BP Statistical Review

The multinational energy company BP (formerly British Petroleum) has produced a compendium of energy data by country for over 60 years in its annual Statistical Review of World Energy. Current and historical data is available in an Excel workbook that can be downloaded from their website.

The BP Statistical Review of World Energy

US National Data

US government agencies at the national, state, and local level have generally embraced the concept of open data and make many of their geospatial and non-geospatial data sets freely available to the public. This is less true of other regions of the world, especially in countries with authoritarian governments.

US Census Bureau

The US Census Bureau (USCB) is the part of the US federal government responsible for collecting data about people and the economy in the United States. The Census Bureau has its roots in Article I, section 2 of the US Constitution, which mandates an enumeration of the entire US population every ten years (the decennial census) in order to set the number of members from each state in the House of Representatives and Electoral College (USCB 2017).

Among the Census Bureau's many programs is the American Community Survey (ACS), an ongoing survey that provides information on an annual basis about people in the United States beyond the basic information collected in the decennial census. The ACS is commonly used by a wide variety of researchers when they need information about the general public.

The US Census Bureau makes their data available as tables at data.census.gov. You can find more information in this tutorial on Mapping US Census Bureau Data With ArcGIS Online

data.census.gov

US Centers for Disease Control and Prevention

The Centers for Disease Control and Prevention (CDC) is the US federal government's public health organization. One challenge with CDC data is that the agency does not have a centralized data portal or unified standard for distributing data. Therefore you may have to dig around their website a bit to find what you are looking for. Also, most publicly available data is aggregated to a high level (generally the state level), limiting its utility for meaningful health analysis.

Centers for Disease Control and Prevention

State / Local Data

Most US state and city governments make some open data available, although larger (and more politically liberal) jurisdictions tend to be more open with their data than smaller and/or more-conservative areas. States and localities commonly use Socrata or ArcGIS Hub (see above). Since the data situation is geographically variable, Googling location and open data is usually the easiest way to find the open data portal for a given city or state.

NYC Open Data