Spatial Analysis of Crime Point Data in ArcGIS Pro
Spatial analysis consists of "methods to study the location, distribution, and relationship of spatial phenomena" (Bäing 2014).
Because geospatial data is fundamentally quantitative, these methods are often based in complex statistics that can be impenetrable to most GIS users. However, the fundamental ideas behind these methods are often conceptually straightforward, and the implementation of the algorithms in software means that most GIS users only need to have a general understanding of the capabilities and limitations of these methods in order to be able to use them effectively.
This module demonstrate some spatial analysis methods that can be used with point data in ArcGIS Pro. While this tutorial uses Chicago crime data as an example, these techniques can be applicable to any data consisting of points for a single type of phenomenon.
Crime Data Limitations
Some major cities make georeferenced crime location data available for analysis through their open data portals. This data has a number of limitations that you should be aware of as you perform your analysis:
- Data Recording Errors: This information is often based upon preliminary information supplied by the reporting parties that have not been verified, are subject to change, and may be corrupted by mechanical or human error.
- Classification Problems: Preliminary crime classifications may be changed at a later date based upon additional investigation. Some crime classification are ambiguous or overly broad. For example, burglary encompasses both residential and commercial, but the groups performing those crimes are different and different strategies are needed for addressing those two different types of crime.
- Anonymization: In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Accordingly, your analysis will need to be at a broad enough scale that this intentional spatial inaccuracy will not be a major factor.
- Geocoding Errors: As with all locations geocoded from place names, geocoding errors may give latitudes and longitudes that are not at the specified location. In some cases, these errors are obvious, such as when the points are at locations outside the police department's jurisdiction, or when coordinates are missing completely. However, an unknown number of points may also be placed at wrong locations. Most of these techniques work with large groups of points, so the effects of individual geocoding errors should be minimized.
- Temporal Errors: The dates and times in the data may not reflect the actual incident time, especially with types of crime (like domestic abuse) that occur over periods of time. Accordingly, comparisons over time may be skewed one way or the other.
- Underreporting: While records of highly-visible crimes like murder and arson are fairly complete, more than half of violent crimes in general and three-quarters of rapes are not reported to police ( Bureau of Justice Statistics 2012; RAINN 2018). In neighborhoods with poor community/police relations, victims may be less inclined to report crime to the police. Accordingly, analysis based on incident data from police departments will be incomplete for some crimes.
Acquire and Process the Data
Crime Data
The Chicago Police Department (CPD) provides a dataset of reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago since 2001. This dataset is made available through a dashboard and web map for convenient online analysis, and is also made available as raw CSV point data that can be imported into GIS. The original source of the data is the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
The Chicago data portal uses the Socrata web app, which permits download of the data as CSV files with columns of latitude and longitude.
The app also provides a facility for filtering data. Because this data a variety of crimes over more than 20 years, downloading the millions of points and 1.6 gigabytes of data will result in data that will be very slow to analyze with ArcGIS Pro.
Therefore, in this example, a filter is used to download only data for the Year of 2019 and 2023.
Point data in CSV files with longitudes and latitudes can be imported into ArcGIS Pro and stored in the project database.
- Create a new project and a new map.
- Download a CSV file of crime data from the Chicago Data Portal, filtered to show only 2019 and 2023.
- Under Analysis, Tools, open the XY Table To Point tool to import the points into the project geodatabase.
- Input Table: browse on your local machine to find the CSV of crime points that you downloaded.
- Output Feature Class: Give a meaningful name with no spaces or punctuation (Crime_Points).
- X Field and Y Field: Make sure these are automatically set to the latitude and longitude columns.
Neighborhoods
While neighborhoods are vernacular areas that often have unclear and contested boundaries, cities commonly create data files for neighborhoods that define unofficial boundaries that are useful for reference and context.
The City of Chicago makes neighborhood boundaries available as shapefiles that can be downloaded, unzipped, and imported into ArcGIS Pro.
- Download the zipped shapefile of neighborhood boundaries from the Chicago Data Portal.
- Use the Windows Explorer to extract the contents of the .zip file so they are visible to ArcGIS Pro.
- Under Analysis, Tools open the Export Features tool to import the boundaries into the project geodatabase.
- Input Features: Navigate to the .shp file.
- Output Features: Navigate to the project geodatabase and specify a new feature class name (Neighborhood_Areas)
Demographics
For this example, we use demographic data from the the US Census Bureau's 2015-2019 American Community Survey five year estimates. For convenience, we will use the Minn 2015-2019 ACS Tracts feature service from the University of Illinois ArcGIS online organization.
Under Analysis, Tools open the Export Features tool to import the boundaries into the project geodatabase.
- Input Features: Under ArcGIS Online, find the Minn 2015-2019 ACS Tracts feature service.
- Output Feature Class: Navigate to a new name in your project geodatabase (Census_Tracts)
- Expression: Add a filter to import only Illinois (ST = 'IL').
Analyze the Data
ModelBuilder
ModelBuilder is a visual programming language in ArcGIS Pro that allows you use a graphical editor to create custom tools that allow you to automate complex, tedious, or repetitive tasks where there are consistent step-by-step workflow sequences of operations.
Using ModelBuilder, you graphically chain together sequences of tools from the toolbox. This will be useful for this example because you will be executing a long sequence of tools, and using them in a ModelBuilder diagram will both make it easier to keep track of what you are doing and will allow you to easily rerun the analysis if you need to modify or fix one step in the analysis.
To start a new ModelBuilder diagram, on the Analysis ribbon, select ModelBuilder.
Crime Points
The example in this tutorial focuses on robberies. The Chicago Police Department (2024) defines "robbery" as:
The taking or attempting to take anything of value from the care, custody, or control of a person or persons by force or threat of force or violence and/or by putting the victim in fear, including attempted offenses.
To facilitate analysis, we will separate the crime data into separate feature classes of robberies in 2019 and robberies in 2023.
- Add the
Export Features tool to your diagram.
- Input Features: Crime_Points (imported above)
- Output Feature Class: Crime_2019
- Expression: Add a filter to export only crimes for the analysis type in 2019. Also filter by latitude greater than 41 to exclude incorrectly geocoded points.
- Add a second
Export Features tool to your diagram.
- Input Features: Crime_Points (imported above)
- Output Feature Class: Crime_2023
- Expression: Add a filter to export only crimes for the analysis type in 2023. Also filter by latitude greater than 41 to exclude incorrectly geocoded points.
- Insert a new Map.
- Under Properties, rename the map to something meaningful (Crime_2023)
- On your ModelBuilder diagram, right-click on the Crime_2023 feature class and Add to Display.
- Change the Symbology for your points to semi-transparent hollow dots that will make it possible to distinguish areas of high and low density.
- Under Properties modify the map Coordinate System to use a cartographically appropriate projection (Web Mercator).
Viewing Statistics
You can Explore Statistics by selecting columns in the Attribute Table.
Save ModelBuilder
You should periodically save your ModelBuilder diagrams from the ModelBuilder ribbon as you work on your project. This will mitigate loss of work if the software crashes.
Note that the ModelBuilder Save button is separate from the Save Project button at the top of the screen for the project as a whole.
Make sure to Save the ModelBuilder diagram before exporting a project package, or your changes may not be saved in the package.
Neighborhood Crime Counts
A spatial join is a join where data from a join data set is copied into a target data set based on proximity of features in the two data sets.
Spatial joins can also be used to count the number of points within polygons. In this step, we use a spatial join to get counts of crime points within neighborhood polygons.
- Add the Join Features tool to your diagram to perform a spatial join to get neighborhood crime counts for 2019.
- Target Layer: Neighborhood_Areas
- Join Layer: Crime_2019
- Output Name: Counts_2019
- Spatial Relationship: Intersects
- Add the Alter Field tool to rename the new Count.
- Input Table: Counts_2019
- Field Name: COUNT
- New Field Name: Count_2019
- Add the Join Features tool to your diagram to perform a spatial join to get neighborhood crime counts for 2023.
- Target Layer: Counts_2019
- Join Layer: Crime_2023
- Output Name: Counts_2023
- Spatial Relationship: Intersects
- Add the Alter Field tool to rename the new Count.
- Input Table: Counts_2023
- Field Name: Count
- New Field Name: Count_2023
- Insert a new Map.
- Under Properties, give the map a meaningful name (Counts_2023)
- On your ModelBuilder diagram, right-click on the Counts_2023 feature class and Add to Display.
- Symbolize by the crime count in the color palette of your choosing.
- Modify the Blending to Multiply so the base map is visible through the layer.
- View the Attribute Table if you wish to know the neighborhoods with the highest and lowest crime counts.
Neighborhood Demographics
Demographics are "the statistical characteristics of human populations (such as age or income)" (Merriam-Webster 2024).
To analyze the relationship of crime to community characteristics, we need demographic data for the neighborhoods.
The neighborhood data from the City of Chicago contains only boundaries, so we will have to use another spatial join to aggregate that information with census tract demographic data from the US Census Bureau's American Community Survey.
- Add the Feature to Point tool to your diagram. This will convert the tract data to centroid points so that tracts that overlap neighborhood boundaries are only counted in one neighborhood.
- Input Features: Census_Tracts (imported above)
- Output Feature Class: Tract_Centroids
- Add the Join Features tool to your diagram.
- Target Layer: Counts_2023
- Join Layer: Tract_Centroids
- Output Name: Demographics
- Spatial Relationship: Intersects
- Summary Fields: Add fields that will be transferred from the tract data along with how the software should handle multiple join features (tracts) within a single target feature (neighborhood)
- Total Population (sum)
- Median Household Income (mean)
- Median Age (mean)
- Percent Foreign Born (mean)
- Insert a new Map.
- Under Properties, give the map a meaningful name (Demographics)
- On your ModelBuilder diagram, right-click on the Demographics feature class and Add to Display.
- Symbolize by the variable of your choice in the color palette of your choosing.
- Modify the Blending to Multiply so the base map is visible through the layer.
Neighborhood Rates
Neighborhoods vary by population, and places with more people could be expected to have more crime, so maps of crime counts often become simple maps of population that obscure where crime is actually a more serious problem.
Normalization is the adjustment of variable values to a common scale so they are comparable across space and time (Wikipedia 2023).
In this example we normalize crime counts by dividing by population to get crime rates that are comparable across different neighborhoods.
The small numbers problem occurs when small changes in counts create exceptionally high changes in rates when the population or incidence counts are small (Taklar et al. 2009). In this case, because the numbers of reported robberies per neighborhood is often small (there was one reported robbery in the Museum Campus neighborhood), the small numbers problem can cause the results to overstate or understate the severity of the condition.
- Add an
Export Features tool to your diagram. You need to duplicate the
Neighborhoods feature class because ModelBuilder will not add the same
created feature class to multiple maps.
- Input Features: Demographics
- Output Feature Class: Rates_2023
- Add the Calculate Field tool to your diagram.
- Input Table: Rates_2023
- Field Type: Double (change this before adding the new name)
- Field Name: Rate_2023
- Expression: Divide Count_2023 by Total_Population and multiply by 1000 to get the rate per 1k.
- Insert a new Map.
- Under Properties, give the map a meaningful name (Rates_2023)
- On your ModelBuilder diagram, right-click on the updated Rates_2023 feature class and Add to Display.
- Symbolize by the Rate_2023 in the color palette of your choosing.
- Modify the Blending to Multiply so the base map is visible through the layer.
Neighborhood Change
Changes in the density of crime over time can be useful for assessing the success of past interventions, and identifying emerging areas that may need additional intervention in the future.
Since we have crime counts for two different years, we can map percentage change between those two periods.
- Add an
Export Features tool to your diagram. You need to duplicate the
feature class because ModelBuilder will not add the same created feature
class to multiple maps.
- Input Features: Rates_2023
- Output Feature Class: Change_2023
- Add the Calculate Field tool to your diagram.
- Input Table: Change_2023
- Field Type: Double (change this before adding the new name)
- Field Name: Percent_Change
- Expression: Divide Count_2023 by Count_2019, subtract one, and multiply by 100 to get percent change. Add 0.1 to the 2019 count to avoid a divide by zero problem in neighborhoods where there were zero reported incidents in 2019.
- Insert a new Map.
- Under Properties, give the map a meaningful name (Change_2023)
- On your ModelBuilder diagram, right-click on the updated Change_2023 feature class and Add to Display.
- Symbolize by Percent_Change.
- A diverging color scheme can be useful to distinguish areas where crime is increasing vs. decreasing.
- Adjust the legend label precision to match the data accuracy.
- Modify the Blending to Multiply so the base map is visible through the layer.
Hot Spot Analysis
While the counts and rates displayed above by tract may be adequate for your needs, maps can be deceiving.
- Unusual events (like an isolated mass shooting) can result in random spikes in low population areas (the small numbers problem).
- Displacement of locations for anonymization to protect victims can cause individual area numbers to be artificially low or high.
- Aggregation by area is subject to the modifiable areal unit problem (MAUP) since using a different set of areas that are larger, smaller, or have different boundaries can result in significantly different results.
With data like this, it can be helpful to analyze areas in the context of their immediate neighbors and the area as a whole.
The Hot Spot Analysis tool calculates the Getis-Ord Gi* statistic (pronounced gee eye star) for each feature in the context of its neighbors and reports the results as z-scores and p-values to indicate the probability that an area is a hot spot or cold spot.
- To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well.
- The local sum for a feature and its neighbors is compared proportionally to the sum of all features.
- When the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results.
- Use of this tool will help you assess whether the patterns you see in your data are the result of real spatial processes at work or just the result of random chance.
The use of calculated probabilities makes this tool more rigorously defensible in research than simple visual inspection of points.
- Add the Hot Spot Analysis tool to your diagram.
- Input Feature Class: Change_2023
- Input Field: Rate_2023
- Output Feature Class: Hot_Spots_2023
- Insert a new Map.
- Under Properties, give the map a meaningful name (Hot_Spots_2023)
- On your ModelBuilder diagram, right-click the Hot_Spots feature class and Add to Display.
- The resulting layer is colored red for hot spots where there are statistically significant clusters of higher levels of crime. The areas are colored blue for cold spots where there are statistically significant clusters of low levels of crime.
View Python
Behind the scenes, ModelBuilder creates Python code that you can view.
On the ModelBuilder ribbon, click the Export options (green arrow) and select Send To Python Window to view or copy the code.
Crime and Social Variables
While having a descriptive understanding of where crime has occurred is useful, often we want to gain some understanding of why it occurred where it occured (explanatory model), or be able to have some idea about where it could occur in the future (predictive model).
Spatial analysis provides a wide variety of techniques for relating variables to each other in order to build models. While, crime, like most social phenomena, is driven by complex chains of causation and elements of randomness, it is possible to build models that offer insights into these processes. The examples below are highly simplified and do not provide a particularly strong fit with crime, but they are provided to offer some insights into the types of modeling that you can do with GIS.
Social Disorganization Theory
Social disorganization theory is a social ecology theory that asserts that crime is the result of an "inability of a community structure to realize the common values of its residents and maintain effective social controls" (Lersch 2004, 46; Sampson and Groves 1989, 177). Social disorganization theory has its roots in research performed in the School of Sociology at the University of Chicago beginning in the 1920s, most notably by Robert Park, Ernest Burgess, Clifford Shaw, and Henry McKay. Accordingly, these ideas are often referred to as Chicago School ideas.
Although the complexity of human society makes analysis of social disorganization similarly complex, a handful of neighborhood characteristics are theorized to lead to breakdowns in formal and informal social controls, and tend to correlate with higher levels of crime in specific areas ( Lersch 2004, 50-53; 148; Sampson and Raudenbusch 1999; 2001):
- Poverty: Poorer places tend to have higher crime
- Mixed land use: Places that have a mix of residential and commercial activity have higher levels of crime
- Population density: Places where people live very close together tend to have fewer people watching out for each other
- Residential turnover: In places where people are constantly moving in and out, it is more difficult to know who is an intruder
- Family instability: Neighborhoods with higher rates of divorce, separation, and single-parent households have lower levels of formal and informal social control.
Although the American Community Survey tract data used in this example does not contain variables that directly measure the social disorganization theory factors directly, it does contain potential proxy variables that can be assumed to correlate with those factors. Those variables include:
- Pop per Square Mile (population density)
- Median Household Income (poverty)
- % Vacant Units (neighborhood stability)
- % Single Mothers (family instability)
Bivariate Correlation Charts
Correlation is "a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone" (Merriam-Webster 2020).
While correlation does not prove that one of the measured characteristics causes the other, correlation is a common exploratory data analysis technique used to identify relationships in geospatial data that are deserving of further investigation.
A quick way to look for correlation between variables is to use an x/y scatter chart.
- If there is correlation, the points in an X/Y scatter chart will form something like a diagonal line upwards (positive) or downwards (negative) across the chart.
- If there is no correlation, the points will be erratically distributed around the chart area.
Evaluation of R2 to determine whether correlation should be considered strong or not depends on the type of phenomena being studied.
- The range of R2 is from 0.000 (no correlation) to 1.000 (perfect correlation).
- Values of less than 0.100 can usually be considered to represent no meaningful correlation.
- In the social sciences where relationships often involve the complex interplay of ambiguous factors, values as low as the 0.200s or 0.300s can be considered strongly correlated enough to merit further investigation.
- In the natural sciences, values above 0.600 are often expected from variables that are strongly correlated.
To create an x/y scatter chart:
- Select the layer you want to analyze in the Contents pane and then select Data, Visualize, Create Chart, Scatter Plot.
- Set the X and Y variables to examine correlations.
- Using a log scale can be useful when the values are skewed to the low end of the distribution.
- If desired, you can add the chart to a layout.
As shown in the video, the R2 values for all four variables are fairly low (0.13 or less) and the X/Y scatter charts show a spreading pattern (heteroskedacity) that indicates that high crime areas tend to be low income, but low crime areas can be both low and high income.
This is consistent with crime / adversity mismatch research that shows no consistent relationship between socioeconomic disadvantage and levels of violence (Manguel 2021).
Ground Truth
In 1982, George L. Kelling and James Q. Wilson published a highly influential article entitled Broken Windows: The police and neighborhood safety that asserted a connection between crime and disorder as typified by the physical care of a neighborhood. If a window on a building is broken and no one fixes it, that is a sign that the people in the neighborhood do not care about their community. This leads to a breakdown in the informal community controls that normally hold crime in check.
Further research has suggested that broken windows is a fallacious confusion of correlation with causation (Thacher 2004), and the policing practices that follow from broken windows theory (like "stop-and-frisk") have been vigorously critiqued as counterproductive and socially unjust. However, the theory does lead us to ask questions about the relationship between crime and the built environment, something more fully fleshed out in urban planning practices like crime prevention through environmental design (CPTED) (Jeffery 1971).
Google Street View allows you to take a virtual walking tour of a neighborhood, and street view can be used to assess whether analysis performed in GIS is consistent with the conditions on the ground (ground truth). Care should be used in such qualitative analysis to mitigate the effects of confirmation bias that can reinforce preconcieved notions and stereotypes rather than validate analysis.
You can copy coordinates from a map in ArcGIS Pro by right-clicking on the location in the map, selecting Copy Coordinates, and search for that location in Google Maps.
Communicate the Results
These different types of visualizations can be integrated into an infographic layout to provide an infographic summarizing your analysis.
Neat Lines and Marginalia
- Insert a Layout sized 11" x 17" in landscape orientation
- Properties, General rename the layout to Infographic.
- Add rectangles as neat lines.
- Frame line: 15 x 8 @ 1 x 2
- Logo: 2 x 1 @ 1 x 1
- Title: 10 x 1 @ 3 x 1
- Credits: 3 x 1 @ 13 x 1
- Insert a Picture of the logo.
- Add Straight text for the title (Robbery in Chicago, 2023)
- Add a Rectangle text box for the credits:
- Cartographer: Your Name
- Date: Current date
- Source: City of Chicago
Main Map
- Map frame: 5 x 7 @ 1 x 2
- Hide the base map service credits.
- Add a Straight text for the caption (There were 11,054 reported robberies in Chicago in 2023)
- Use 24 point Arial to be consistent across the layout.
Small Maps with Legends
- Small map frames: 2.3 x 3
- Remove base map
- Add a legend
- Add caption text (18 pt Arial)
Correlation Results
You can add the correlation x/y scatter chart to the layout.