Point Cluster Analysis in ArcGIS Online

GIS software allow you to not only visualize geospatial data, but also to use statistical techniques that expose spatial patterns that might not be visible to the naked eye. Finding areas with unusually low or high concentrations of some kind of characteristic (such as crime or disease) can help inform individual actions (such as knowing where to buy a house) or collective actions (such as public policy interventions to address environmental hazards or causes of criminal activity).

This tutorial covers a variety of techniques for basic point cluster analysis in ArcGIS Online using crime data from Baltimore as an example data set. All of these techniques have strengths and weaknesses, and the choice of which one is appropriate for any given situation depends on the level of rigor needed, and the ultimate end use for the analysis.

Filtering Socrata Crime Data

Many large-city police departments make historical geocoded crime data available to the public using the Socrata Open Data. The app is flexible, but can be confusing and has a bit of a learning curve.

ArcGIS Online is a cloud service where you pay only for what you use. The software keeps track of your usage by charging you a certain number of credits for each action you take in ArcGIS Online. Some operations like displaying layers require only fractions of a credit, while other operations can cost considerably more.

A major issue with the use of ArcGIS Online for crime analysis is that large numbers of crime points can consume a large service credits. Accordingly, you should generally filter your crime points for specific types of crime and limited ranges of dates to avoid exhausting your credit quota.

In Socrata, you add filters to select specific types of data from a data set.

The video below gives an example of the use of filters with the City of Baltimore's Socrata site to select victim-based crime data for robbery during 2017 and export it to a CSV file. Data for other cities may require different filters.

Filtering Socrata Data

Importing Crime Data Into ArcGIS Online

Large data sets with thousands of points like crime data, cannot be imported directly into an ArcGIS Online web map. Instead, the data must be imported as a hosted layer, which can then be added to a map for analysis.

The video below demonstrates how to import a CSV file into ArcGIS Online as a hosted layer, and then add that hosted layer to a new map.

Importing Crime Data as a Hosted Layer

Clustering

A layer with a large number of data points will not be very useful on its own since the overlapping points are too densely packed to observer clear patterns.

One way to improve interaction with such a map is to turn on clustering. As you zoom in and out, points are combined into bubbles representing clusters of points. By clicking on a cluster bubble, you can scroll through information on the points in that cluster.

Clustering

Heat Map

A layer with a large number of data points will not be very useful on its own since the overlapping points are too densely packed to observer clear patterns.

Another way of analyzing the density of points is visualization with a heat map.

  1. Change Style and choose Heat Map
  2. Adjust the Area of Influence to adjust the sensitivity of the heat map
  3. The calculation of colors in a heat map is scale dependent - it varies based on what area you are viewing and how closely you are zoomed in or out
Visualizing Crime Data as a Heat Map

Density Contour Map

Although ArcGIS Online heat maps are easy, effective visualizations, they do not give clear analysis of the data. Because the area of influence changes depending on how zoomed in or out you are (scale), viewing at different scales can under- or over-emphasize concentrations of points. Also, the exact meanings of different colors are not clear, making precise analysis impossible.

Kernel density analysis produces contour lines that delineate areas with different ranges of densities of points.

Density Analysis

Neighborhood Choropleth

If you have a set of pre-defined areas such as neighborhoods, police precincts, or census tracts, you can use the aggregate tool to count the number of points in each areas for display as a choropleth.

A major issue with this type of analysis is the modifiable area unit problem (MAUP). Because the boundaries of neighborhoods are often the historical legacies of political processes, those boundaries often cut across areas where points are clustered. This can result in under- or over-emphasis of influence in areas, as well as dramatic differences in analysis results when that analysis uses different types of areas.

  1. Download a neighborhood shapefile from the city's open data portal. If no neighborhood file exists, you might consider using census tracts or zip codes, which you will need to add as a hosted layer (see above)
  2. Add the layer from My Content
  3. Perform Analysis, Summarize Data, Aggregate Points to find the number of points in each neighborhood
  4. Give the new layer a meaningful name
  5. Show Credits to make sure the analysis will not consume over ten credits and exhaust your quota
  6. Run Analysis. This may take several minutes depending on the number of of points and neighborhoods
  7. Symbolize the new layer on Join Count
  8. Remove the neighborhood layer
Density Choropleth

Correlating Neighborhood Characteristics

The Enrich Layer tool utilizes proprietary data and algorithms from ESRI to find demographic and marketing characteristics of areas.

For this analysis we will enrich the layer of neighborhood crime counts with three variables we hypothesize may correlate with crime: median household income, median age, and Number of food service and drinking places.

Note that this tool is very credit intensive, so you should perform the operation as few times as possible with only the variables you

Neighborhood Data Enrichment

We can then export this data to Google Sheets to perform correlation analysis:

  1. Export the layer to an Excel workbook
  2. Import the workbook into Google Sheets
  3. Give columns meaningful names if needed
  4. Create an X/Y scatter chart
  5. Make the axes logarithmic if points are clustered at far right
  6. Add a trendline and give it a contrasting color
  7. Display the R2
  8. Move the chart to its own sheet
  9. Publish the chart to share or place in a Story Map
Neighborhood Data Enrichment

Point Hot Spot Analysis

The most statistically-rigorous of the point density analysis techniques in ArcGIS Online is hot-spot analysis using the Getis-Ord Gi* statistic. This technique involves creating a fishnet grid of areas, counting the number of points in each grid cell, and then using the statistical techniques to determine if adjacent cells containing large numbers of points represent a statistically significant concentration relative to the other cells in the analysis.

Red cells are areas where there is a high level of confidence that the area around the cell is a hot spot. Blue cells are areas where there is a high level of confidence that the area around the cell is a cool spot of low density.

A major issue with this technique is that it is very computationally intensive, which means it uses a very large number of credits when being used on a large number of points. Therefore, you should perform hot-spot analysis as few times as possible, on as small an area as possible, with as few points as possible. You should also check the number of credits that a hot-spot analysis will consume and keep that in the context of how many credits you currently have in your quota.

Point Hot Spot Analysis

Area Hot Spot Analysis

Although hot-spot analysis is commonly used with points, it actually is an area analysis technique, where hot spots represent groups of neighboring areas that have statistically significantly higher values of some attribute than other neighboring areas in the data set.

For example, performing hot-spot analysis on the number of robberies per neighborhood gives a somewhat more dramatic map than the point hot-spot map, which a clear hot-spot in the center city and a clear cold-spot in wealthy northern neighborhoods like Roland Park and Wyndhurst.

Area Hot Spot Analysis

Sharing and Cleanup

When you are done with your map, make sure to save it and share it.

You should also create a new folder in your Content directory to store all the web maps and layers for this project. This will minimize clutter in your home directory.

Save and Clean Up