Spatial Analysis of Crime Point Data in ArcGIS Pro

Spatial analysis consists of "methods to study the location, distribution, and relationship of spatial phenomena" (Bäing 2014). Because geospatial data is fundamentally numeric, these methods are often based in complex statistics that can be impenetrable to most GIS users. However, the fundamental ideas behind these methods are often conceptually straightforward, and the implementation of the algorithms in software means that most GIS users only need to have a general understanding of the capabilities and limitations of these methods in order to be able to use them effectively.

This module demonstrate some spatial analysis methods that can be used with crime point data in ArcGIS Pro.

Crime Data Limitations

Some major cities make georeferenced crime location data available for analysis through their open data portals. This data has a number of limitations that you should be aware of as you perform your analysis:

Downloading Crime Data

The Chicago Police Department (CPD) provides a dataset of reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago since 2001. This dataset is made available through a dashboard and web map for convenient online analysis, and is also made available as raw CSV point data that can be imported into GIS. The original source of the data is the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.

The Chicago data portal uses the Socrata web app, which permits download of the data as CSV files with columns of latitude and longitude.

The app also provides a facility for filtering data. Because this data a variety of crimes over more than 20 years, downloading the millions of points and 1.6 gigabytes of data will likely overwhelm your computer. Therefore, in this example, a filter is used to download only data for the Year of 2020. Because consecutive years are used for comparison below, the process is repeated for 2019.

Downloading 2020 Chicago crime data from the Chicago Data Portal

Import into ArcGIS Pro

The examples in this lesson will focus on homicide data. The US FBI's Uniform Crime Reporting (UCR) Program (FBI 2020) defines "criminal homicide" as:

  1. Murder and nonnegligent manslaughter: the willful (nonnegligent) killing of one human being by another. Deaths caused by negligence, attempts to kill, assaults to kill, suicides, and accidental deaths are excluded. The program classifies justifiable homicides separately and limits the definition to: (1) the killing of a felon by a law enforcement officer in the line of duty; or (2) the killing of a felon, during the commission of a felony, by a private citizen.
  2. Manslaughter by negligence: the killing of another person through gross negligence. Deaths of persons due to their own negligence, accidental deaths not resulting from gross negligence, and traffic fatalities are not included in the category manslaughter by negligence.

Point data in CSV files with longitudes and latitudes can be imported into ArcGIS Pro and stored in the project database.

  1. Create a new map in a new project.
    • In Properties, General, rename the map so you can keep track of different visualizations during your analysis.
  2. Import the data into the project geodatabase.
    • Click Add Data, XY Point Data, which will open the XY Table To Point tool.
    • For Input Table, browse on your hard drive to find the CSV of crime points that you downloaded.
    • For Output Feature Class, give a meaningful name with no spaces or punctuation. This will copy the CSV points as new features in this feature class in the project database.
    • Make sure the X Field and Y Field are automatically set to the latitude and longitude columns.
  3. Set a definition query to show only a specific type of crime.
    • Right-click on the new layer in the table of contents an select Properties, and Definition Query.
    • Add a New definition query.
    • Select the field containing the type of crime. For Chicago data it is Primary Type.
    • Set the relationship to is equal to and select a crime type. For this analysis we will choose Homicide.
    • Apply and select OK.
  4. Modify the projection
    • New maps inherit the projection of the first layer added. In this example, the crime data is in unprojected WGS 84 lat/long, which distorts the width of high latitudes like Chicago. Changing this to a web Mercator or state plane projection will remove this distortion.
    • Right-click on the map in the table of contents, select Properties, and select Coordinate System.
    • Select the new projection. In this case we use Web Mercator.
  5. Repeat the import for the 2019 data
Importing into ArcGIS Pro

Point Visualization

When analyzing crime data by types of crime, a common task is to look for areas where there are large numbers of crime incidents that have clustered together over time. Those areas can then be targeted for deployment of increased law enforcement presence or remedial social programs to help address the underlying causes of crime.

Hollow Point Symbols

Depending on the type of crime and time period, you will often have large clusters of points that make it difficult to distinguish different levels of density. One symbology change that can help in some situations is to use hollow dots. When they overlap, the edges clump together to visually indicate areas of high density.

  1. Right-click on the layer and select Symbology.
  2. Click on the symbol to change it.
  3. Select the hollow symbol from the list.
  4. If desired, select Properties and change the color and size.
Hollow point symbols

Heat Map

Heat map symbology draws clumps of point features in lighter colors so that higher densities of points appear "hotter" than areas of low density. The densities are recalculated as you zoom in or out. This is a quick symbology technique for situations where you have large numbers of points that cannot be easily distinguished.

  1. View the Catalog Pane, copy/paste your point map, and give the copied map a meaningful name.
  2. Change the Symbology to Heat Map.
Heat map symbology

Kernel Density

One challenge with heat map symbology is that it is scale dependent and changes as you zoom in and out. While this makes it good for exploratory analysis, it is less effective when you want to rigorously identify clear areas of clustering.

The kernel density tool creates a raster by calculating the number of points within a certain radius (kernel) of each pixel. This gives a density surface that is stable regardless of how close or distant you zoom in to the area.

  1. Duplicate your point map and give the copied map a meaningful name.
  2. Click Analysis and Tools and search for the Kernel Density Tool.
  3. For the Input point or polyline features, select the crime point layer.
  4. For the Output raster, provide short but meaningful name with no punctuation or spaces.
  5. Leave the Search radius empty. If you are using a projected coordinate system, this can be set to give a specific distance of influence to use for the kernel.
  6. Change Area units to square miles to display the values in crime incidents per square mile.
  7. Set the Method to geodesic because the crime data is in latitudes and longitudes.
Creating a kernel density raster

While the example above allows the tool to choose a radius, if you need more-clearly defined values to compare over time, you can choose a specific distance value. This requires using the Change projection tool to reproject the layer to a planar coordinate system (such as a state plane coordinate system). For the example above, using the Illinois West SPCS in feet and using a search radius of one mile (5280 feet) gives a similar set of values to the default.

Aggregation by Area

A common anaysis technique for large numbers of points is to aggregate them into areas of some kind, and then perform visualization or analysis on the count of points in each area.

For this example, we will perform a spatial join to aggregate the points into census tracts. This layer also contains a variety of different variables from the American Community Survey that can be used to examine relationships between crime counts and underlying neighborhood conditions.

  1. Duplicate the point map and give it a meaningful name.
  2. Add Data with the layer of areas for aggregation. For this example we use the Minn 2015-2019 ACS Chicago Tracts layer of census tracts from the U of I organization.
  3. Perform the spatial join.
    • Right click on the tract layer and select Joins and Relates and Spatial Join.
    • The Target Layer should be the layer of polygons (the tracts).
    • The Join Layer is what will be counted (the crime points).
    • Give the new layer that will be created a meaningful name.
    • Run the tool.
  4. Visualize the counts.
    • Right click on the new layer and symbolize by Join Count.
Aggregating by census tract

Rates per Capita

Although the US Census Bureau generally defines the borders of census tracts to encompass around 4,000 residents, the number of people that live in each census tract can vary widely, especially in commercial or rural areas where few people live. Accordinly low crime incidence numbers can make low-population areas seem safer, while the data may just indicate that there are fewer crimes because there are fewer potential victims.

One way to address this issue is to divide the crime counts to get per capita crime rates using the expression calculator. This makes it possible to more fairly compare different areas. Because homicide is (thankfully) fairly rare, multiplying by 1000 to give homicides per 1,000 residents will be easier to interpret.

Visualizing crime rates

Hot Spots

While the counts and rates displayed above by tract may be adequate for your needs, unusual events (like an isolated mass shooting), a random spike in a low population area (the small numbers problem), or displacement of locations for anonymization can cause individual area numbers to be artificially low or high. Aggregation by area is also subject to the modifiable areal unit problem (MAUP) since using a different set of areas that are larger, smaller, or have different boundaries can result in significantly different results.

With data like this, it can be helpful to analyze areas in the context of their immediate neighbors and the area as a whole.

The Hot Spot Analysis tool calculates the Getis-Ord Gi* statistic (pronounced G-i-star) for each feature in a dataset. The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results.

Use of this tool will help you assess whether the patterns you see in your data are the result of real spatial processes at work or just the result of random chance. The use of calculated probabilities makes this tool more rigorously defensible in research than simple visual inspection of points or the arbitrary radius choices used with kernel density,

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. Click Analysis and Tools and search for the Hot Spot Analysis (Getis-Ord GI*) tool.
  3. For the Input Feature Class, select the bin polygons you created above.
  4. For the Input Field, use the COUNT field calculated by the summarize within tool.
  5. For Output Feature Class, give a short but meaningful name to store the analysi results in the project geodatabase.
  6. Use the defaults for everything else and Run.

The resulting layer is colored red for hot spots where there are statistically significant clusters of higher levels of crime. The areas are colored blue for cold spots where there are statistically significant clusters of low levels of crime.

Getis-Ord GI* hot spot analysis

Outlier Analysis

A similar analysis technique is cluster and outlier analysis (Anselin Local Moran's I). Like the hot spot tool, this tool identifies clusters. But this tool also points out outliers of high values surrounded by low values or outlier low values surrounded by high values.

Because there is some level of randomness to infrequent social phenomenon like diseases or crime, outliers may simply be random occurences. However, a high crime outlier in an area that is generally low crime may deserve further investigation as a possible new crime cluster. Conversely, a low crime outlier in an otherwise high crime cluster may be worth attention to see what is keeping crime low there, and how those practices might be applied to the surrounding high crime cluster.

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. Under Analysis and Tools, search for the Cluster and Outlier tool.
  3. The Input Feature Class is the tracts with counts of crime.
  4. The Input Field is the crime count variable.
  5. Give the Output Feature Class a meaningful name.
  6. Run the tool.
The Anselin local Moran's I tool to detect clusters and outliers

Changes over Time

Changes in the density of crime over time can be useful for assessing the success of past interventions, and identifying emerging areas that may need additional intervention in the future.

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. Perform a second spatial join to get crime counts from the previous period.
  3. Rename the JOIN_COUNT fields so it is clear what years they represent. The leftmost JOIN_COUNT field will be the one for the most recent join.
  4. Change the Symbology for the layer to an expression that shows the increase or decrease in murders by showing the 2020 crime rates minus the 2019 rates.
  5. Customize the Color Scheme to a diverging color scheme so places where there were fewer homicides are blue and places where there were more are red.
Comparing time periods using kernel density contour lines

Crime and Social Variables

While having a descriptive understanding of where crime has occurred is useful, often we want to gain some understanding of why it occurred where it occured (explanatory model), or be able to have some idea about where it could occur in the future (predictive model).

Spatial analysis provides a wide variety of techniques for relating variables to each other in order to build models. While, crime, like most social phenomena, is driven by complex chains of causation and elements of randomness, it is possible to build models that offer insights into these processes. The examples below are highly simplified and do not provide a particularly strong fit with crime, but they are provided to offer some insights into the types of modeling that you can do with GIS.

Social Disorganization Theory

Social disorganization theory is a social ecology theory that asserts that crime is the result of an "inability of a community structure to realize the common values of its residents and maintain effective social controls" (Lersch 2004, 46; Sampson and Groves 1989, 177). Social disorganization theory has its roots in research performed in the School of Sociology at the University of Chicago beginning in the 1920s, most notably by Robert Park, Ernest Burgess, Clifford Shaw, and Henry McKay. Accordingly, these ideas are often referred to as Chicago School ideas.

Although the complexity of human society makes analysis of social disorganization similarly complex, a handful of neighborhood characteristics are theorized to lead to breakdowns in formal and informal social controls, and tend to correlate with higher levels of crime in specific areas (Lersch 2004, 50-53; 148; Sampson and Raudenbusch 1999; 2001):

Although the American Community Survey tract data used in this example does not contain variables that directly measure the social disorganization theory factors directly, it does contain potential proxy variables that can be assumed to correlate with those factors. Those variables include:

Bivariate Correlation Charts

A quick way to look for relationships between variables is to use an x/y scatter chart to look for bivariate correlation (correlation between two variables). If there is correlation, the points in an X/Y scatter chart will form something like a diagonal line upwards (positive) or downwards (negative) across the chart. If there is no correlation, the points will be distributed around the chart area.

The R2 value indicates the strength of the correlation. Zero means no correlation and one means perfect correlation.

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. In the Contents pane and then selecting Data, Create Chart, and Scatter Plot.
  3. Set the X and Y variables to examine correlations.

As shown in the video, the R2 values for all four variables are fairly low (0.16 or less) and the X/Y scatter charts show a spreading pattern (heteroskedacity) that indicates that high crime areas tend to be low income, but low crime areas can be both low and high income. This is consistent with crime/adversity mismatch research that shows no consistent relationship between socioeconomic disadvantage and levels of violence (Manguel 2021).

Creating X/Y scatter charts

Charts can be added to layouts along with map frames to create descriptive charts.

Creating X/Y scatter charts

Ordinary Least Squares (OLS) Regression

Regression is a statistical technique that involves creating a formula that models a dependent variable in terms of one or more independent (explanatory) variables. OLS regression is a stronger analysis technique than bivariate correlation because OLS considers the effects of multiple variables, which reflects the reality that complex social and environmental phenomenon are often explained as the confluence of multiple factors.

There are a wide variety of different regression techniques that address the wide variety of ways in which independent variables can interact to explain the dependent variable.

One of the simplest techniques is ordinary least squares linear regression that combines the explanatory variables into a single linear formula and adjusts the coefficients on each variable to get the best fit.

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. Under Analysis, Tools search for Ordinary Least Squares.
  3. The Input Feature Class should be the tracts.
  4. The Unique ID Field should be TARGET_FID.
  5. For Output Feature Class give a meaningful name.
  6. The Dependent Variable should be the JOIN_COUNT with the number of homicides per tract.
  7. Select the Explanatory Variables
  8. The Output Report File is an optional file where the results of the analysis can be written.
  9. After running the tool the Results list the algorithm outputs that can be used to evaluate how well the variables work in the model.
  10. The default symbology of the new layer is the residuals (where the model differs from the actual values).
  11. You can also symbolize by the Expected variable to see how the model predictions do or do not match the actual values.
Ordinary least squares linear regression

Geographically Weighted Regression

Ordinary least squares linear regression is a non-spatial technique that does not take into account autocorrelation and local variation across space. When working with data that is highly clustered (autocorrelated), this can result in results that can indicate that explanatory variable are more or less important than they actually are.

Geographically weighted regression (GWR) is a powerful exploratory method in spatial data analysis. It serves for detecting local variations in spatial behavior and understanding local details, which may be masked by global regression models. Unlike SR, where regression coefficients for each independent variable and the intercept are obtained for the whole study region, in GWR, regression coefficients are computed for every spatial zone. Therefore, the regression coefficients can be mapped and the appropriateness of stationarity assumption in the conventional regression analyses can be checked. (Duzgun and Kemec 2008).

As the spatial data contain autocorrelation, the lack of ability to include this property in non-SR led analysts to develop SR models for better treatment of spatial data. In this way, the elimination of the main shortcomings of non-SR, which are assumptions of identically and independently distributed (i.i.d.) explanatory variables (X i 's) and uncorrelated error terms, is attempted by relaxing the regression method with the allowance of spatial autocorrelation.

  1. Duplicate your tract count map and rename it so you preserve your prior visualization.
  2. Under Analysis, Tools search for Geographically Weighted Regression.
  3. For Input Features, choose the tract layer with counts of crime incidents.
  4. For Dependent Variable, choose the Join_Count with the number of incidents per tract.
  5. For the Model Type, choose Count (Poisson) because number of incidents is a count.
  6. Select the Explanatory Variables.
  7. For Output Features give a meaningful name.
  8. For the Neighborhood Type, choose Distance band.
  9. For the Neighborhood Selection Method, choose Golden search.
  10. After running the tool, you can View Details to see the model fit results.
  11. The created feature layer contains a variable for Deviance Residuals (to see where the model doesn't fit) and a variable for Predicted to see the predicted incidence counts.
  12. Geographically weighted regression

    Ground Truth

    In 1982, George L. Kelling and James Q. Wilson published a highly influential article entitled Broken Windows: The police and neighborhood safety that asserted a connection between crime and disorder as typified by the physical care of a neighborhood. If a window on a building is broken and no one fixes it, that is a sign that the people in the neighborhood do not care about their community. This leads to a breakdown in the informal community controls that normally hold crime in check.

    Further research has suggested that broken windows is a fallacious confusion of correlation with causation (Thacher 2004), and the policing practices that follow from broken windows theory (like "stop-and-frisk") have been vigorously critiqued as counterproductive and socially unjust. However, the theory does lead us to ask questions about the relationship between crime and the built environment, something more fully fleshed out in urban planning practices like crime prevention through environmental design (CPTED) (Jeffery 1971).

    Google Street View allows you to take a virtual walking tour of a neighborhood, and street view can be used to assess whether analysis performed in GIS is consistent with the conditions on the ground (ground truth). Care should be used in such qualitative analysis to mitigate the effects of confirmation bias that can reinforce preconcieved notions and stereotypes rather than validate analysis.

    You can copy coordinates from a map in ArcGIS Pro by right-clicking on the location in the map, selecting Copy Coordinates, and search for that location in Google Maps.

    Visiting a location virtually in Google Street View