Geocoding in ArcGIS Pro

Geocoding is the process of converting text descriptions of locations like place names or street addresses into Cartesian points, lines, or polygons that can be mapped and analyzed in GIS.

ArcGIS Online and ArcGIS Enterprise provide geocoding services that are suitable for most geocoding tasks. However, there are costs and limitations to geocoding services that must be considered when deciding how to handle geocoding errors and when determining if alternative geocoding methods are needed.

This tutorial will introduce geocoding in ArcGIS Pro and methods for dealing with the limitations of geocoding.

Geocoding Problems

Geocoding is an imperfect process that can result in geocoding errors like missing locations or location that are in the wrong place.

With street addresses, street names present a special challenge. Some typical issues include

Falsehoods programmers believe about addresses (Tandy 2013)

Manual Geocoding

The examples in this tutorial use a collection of locations (including invalid addresses) in Champaign-Urbana, Illinois, USA that were collected using Google Maps.

If you have a small number of points, the easiest approach to accurate geocoding may be simply to get lat/long locations from Google Maps and add them as columns in your data in a spreadsheet program.

You can then export the spreadsheet to a CSV file and import it into ArcGIS Pro from Map, Add Data, XY Point Data to run the XY Table to Point tool.

Manual geocoding with Google Maps

Geocoding Service

Geocoding services provide internet access to servers that use large location databases and sophisticated algorithms to identify addresses and place names in order to return lat/long coordinates for those locations. There are a variety of geocoding services, with Google's geocoding service incorporated into Google Maps search probably being the most prominent.

The easiest and (often) most accurate way to geocode addresses or place names in ArcGIS Pro is using the ArcGIS World Geocoding Service, which is available using the Geocode Addresses tool.

  1. The tool reads addresses from the Input Table.
  2. The tool sends geocoding requests to the server for those addresses.
  3. The server uses its algorithms and databases to find lat/long locations for those addresses.
  4. The server responds to ArcGIS Pro with the lat/long locations.
  5. The tool copies those lat/long responses into a new feature class in the project geodatabase.
  6. The data from the feature class is added to the map for rendering.

The major downside to the ArcGIS World Geocoding Service is that there is a small charge in credits for each geocoding operation, and those charges can be substantial when you have a task that requires geocoding thousands of addresses.

To geocode a CSV file:

  1. Under the Analysis tab and Toolbox, find the Geocode Addresses tool.
  2. For Input Table, select the CSV file.
  3. For Input Address Locator, use the ArcGIS World Geocoding Service.
  4. For Input Address Fields, select Multiple Field and the appropriate address fields from your CSV file. If you use standard column names like Address or City, the tool should automatically select those fields.

  5. For Output Feature Class, give a name for the new feature class in the project geodatabase.
  6. For Country select the appropriate country (United States).
  7. For Preferred Location Type, select Address location and the appropriate subcategory Street Address.
Geocoding with the ArcGIS World Geocoder service

Reverse Geocoding

Reverse geocoding is the process of converting lat/long coordinates to street addresses and / or location names.

The ArcGIS World Geocoding Service supports reverse geocoding using the Reverse Geocode tool in ArcGIS Pro.

  1. For Input Feature Class or Layer, select the layer of points (Google_Maps).
  2. For Input Address Locator, use the ArcGIS World Geocoding Service.
  3. For Output Feature Class, give a name for the new feature class in the project geodatabase.
  4. Select the Feature Type that you would like returned.
  5. Right click on the layer and view the Attribute Table to see the returned values.
Reverse geocoding with the ArcGIS World Geocoder service

Street Address Locator

An older geocoding technique involves the use of custom street address locators created from datasets that use street segments with names and address ranges to estimate street address locations.

Geocoding services are generally preferred to street address locators because street address locators often mismatch or fail to match street names, and because locations interpolated from building number ranges can have significant deviation from reality.

A street address locator may be the appropriate solution if you are geocoding a large number of addresses where service cost would be an issue, or if you are geocoding confidential data, such as the home addresses of participants in medical research.

Download Centerlines

Street centerline data used for creating custom locators needs features and attributes indicating the range of addresses associated with each street segment.

For each feature you are geocoding:

  1. The geocoder breaks each address down into street number and street name components.
  2. The geocoder does a fuzzy search through the locator to find street segments that match the street name.
  3. The geocoder further selects the segment whose street number range contains the address street number.
  4. The geocoder selects the side of the segment whose street number range contains the address street number.
  5. The geocoder interpolates the location on the street segment based on where the address street number sits in the range of possible street numbers associated with that segment.
Street Layer Geocoding Parameters

One source for suitable centerline shapefiles includes the US Census Bureau's TIGER/Line Lines shapefiles. Some cities also maintain street centerline files, such as the New York City Department of City Planning's LION files.

Downloading a TIGER all lines shapefile

Create Locator

To create a locator:

  1. Unzip the centerline file.
  2. Add the shapefile to a map.Under Analysis and Toolbox, find the Create Locator tool.
  3. Select the centerlines as the Primary Table with a Role of Street Address.
  4. For the TIGER shapefile, the following fields are appropriate:

    • Feature ID: FID
    • Left House Number From: LFROMADD
    • Left House Number To: LTOADD
    • Right House Number From: RFROMADD
    • Right House Number To: RTOADD
    • Street Name: FULLNAME
    • Left ZIP: ZIPL
    • Right Zip: ZIPR
    • Language Code: English
  5. Select a location for the Output Locator in the project geodatabase (Champaign).
Creating a custom locator from a centerlines layer

Centerline Geocode

Under Analysis and Toolbox, find the Geocode Addresses tool.

  1. Input Table: Your CSV file of addresses
  2. Input Address Locator: The custom locator created above (Champaign)
  3. Input Address Fields: Select only the fields that have corresponding fields in the locator (Address). Deselect City and State since there are no city or state fields in the locator, and selection will cause the geocoder to find zero points.
  4. Output Feature Class: Select a meaningful name (Custom_Locator)
  5. Category: Address and Street Address
Creating a custom locator from a centerlines layer

Nominatim

There are a variety of additional commercial and open geocoding services available beyond those provided by ESRI and Google Maps. However, since ArcGIS Pro does not contain a native tool for using other services, you will need to use an ArcPy script if you want to access them.

Nominatim (from the Latin, "by name") is an open geocoding service that can be used to search OpenStreetMap data by name and address (geocoding) and to generate synthetic addresses of OSM points (reverse geocoding) (Nominatim 2022).

Nominatim may be useful if you wish to save the expense of using a commercial geocoder, and either you are geocoding place names, or you are geocoding street addresses in an area where you do not have access to data for a custom street address locator.

While Nominatim is free to use, it is a community-developed and supported service that can be less accurate than commercial geocoding services, and since it is not designed for or intended for bulk geocoding, it has significant volume and speed limitations. Be aware of the Nominatim usage policies.

Nominatim API search requests are URLs in the following form:

https://nominatim.openstreetmap.org/search?<params>

There are a variety of specific parameters available, but the most straightforward search involves a free-form query (q=<query>) with an output format (format=geojson).

For example, to make an API request to find the lat/long for 1301 W. Green St. Urbana, IL, you can use the following URL.

Note that the query string has spaces replaced with plus signs (+) and an ampersand (&) between the query string and the format parameter.

https://nominatim.openstreetmap.org/search?q=1301+West+Green+Street+Urbana+Illinois+USA&format=json

Nominatim returns the results as JSON, which a browser will display in a readable form.

Nominatim GeoJSON
import csv
import json
import arcpy
import urllib.request

# Parameters

csv_filename = 'U:\\Downloads\\addresses.csv'

query_columns = ['Address', 'City', 'State']

output_feature_class = 'Nominatim'


# Read the header from the CSV file

csv_file = open(csv_filename, newline='')

csv_reader = csv.reader(csv_file, delimiter = ',')

csv_header = next(csv_reader)


# Create the new feature class and add the fields

wgs84 = arcpy.SpatialReference(4326)

if not arcpy.Exists(output_feature_class):
	arcpy.management.CreateFeatureclass("", output_feature_class, "Point", "", "", "", wgs84)

	for fieldname in csv_header:
		arcpy.management.AddField(output_feature_class, fieldname, "TEXT")


# Create the insert cursor and loop though the file

cursor_fields = csv_header

cursor_fields.append('SHAPE@X')

cursor_fields.append('SHAPE@Y')

print(cursor_fields)

outcursor = arcpy.da.InsertCursor(output_feature_class, cursor_fields)


# Rewind the file and use a dictionary reader to loop through the file

csv_file.seek(0)

csv_reader = csv.DictReader(csv_file, delimiter=',', quotechar='|')



for row in csv_reader:

	# Create the query
	
	query = '+'.join(str(row[key]) for key in query_columns)

	query = query.replace(' ', '+')

	endpoint = 'https://nominatim.openstreetmap.org/search?'

	url = endpoint + 'q=' + query + '&format=json'
	
	print(url)


	# Send the query to the Nominatim API and parse the returned JSON

	user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)"

	request = urllib.request.Request(url, headers={'User-Agent': user_agent})

	json_string = urllib.request.urlopen(request).read()

	results = json.loads(json_string.decode("utf-8"))

	if len(results) <= 0:
		continue;


	# Append the long/lat to the attributes and insert the new point feature
	
	new_row = list(row.values())

	new_row.append(float(results[0]['lon']))
	
	new_row.append(float(results[0]['lat']))
	
	print(new_row)

	outcursor.insertRow(new_row)

del outcursor
Nominatim geocoding script

Security Certificate Error

If you get a URLError SSL: CERTIFICATE_VERIFY_FAILED message, this is caused by outdated security certificates in your Python installation.

The formal way to fix this problem is to update the certificates in your Python installation. However, that is difficult with this special installation for ArcGIS Pro.

One way around this issue is to replace the urlopen() code above with a variant that uses a context to ignore security certificates

import ssl

request_context = ssl.create_default_context()

request_context.check_hostname = False

request_context.verify_mode = ssl.CERT_NONE

user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)"

request = urllib.request.Request(url, headers={'User-Agent': user_agent})

json_string = urllib.request.urlopen(request, context=request_context).read()

BOM Error

If your diagnostic display of heading names contain junk characters before your first field, you probably saved your CSV file as UTF-8 with BOM (byte order mark) characters at the beginning of the file.

BOM characters at the start of the header

This will likely happen when saving a CSV from Excel on Mac computers with the UTF-8 character set that can represent characters beyond the standard English ASCII letters, numbers, and punctuation marks.

There are two options for addressing this issue.

  1. Add an encoding to the open() function that indicates that the file is in UTF-8 and that if it begins with a BOM, it should not be included in the data read from the file.

    csv_file = open(csv_filename, newline='', encoding='utf_8_sig')
    
  2. Reopen the file in Excel and save as a plain text CSV file.

Analysis

There are three factors you should consider in assessing the quality of geocoding (Precisely 2022):

Match Rate

You can validate the match rate of the geocoder by viewing the attribute table for the geocoded points.

Finding the unmatched points in the attribute table

Positional Accuracy

Desire lines are an analysis tool that draws lines between related points in different layers. In business analyst, this tool is used to visualize the spatial relationships between stores and the customers of those stores.

Desire line can be used to analyze positional accuracy of a geocoder by finding the distances between reference points and the geocoded points.

The geocoder with the lowest median distance between reference points and geocoded points has the best positional accuracy.

  1. Under Analysis, Tools, find the Generate Desire Lines tool.
  2. Symbolize the lines so they are visible over the base map.
  3. Select the desire lines layer in the Contents pane, then under the Data ribbon, choose Visualize, Create Chart, Histogram to show the distribution of
  4. Add the Median line to find the median amount of deviation. Median will likely be preferred over mean since the distribution of values is often skewed.
  5. If you have ungeoreferenced points, those distances will be -1 and may affect your median. Add a Definition Query to filter those out.
Analyzing positional accuracy with desire lines

Infographic

You can present your analysis results in a summative infographic.

In this case, the two geocoders have very similar positional accuracies. The big difference is in match rate. The ESRI World Geocoder matches all valid points, but does mismatch an invalid address. The street geocoder is more conservative with a lower match rate, but that conservatism avoids mismatching any invalid addresses. So the trade-off seems to be match rate completeness vs. accuracy.

  1. Create a 11" x 8.5" landscape orientation layout and give it a meaningful name.
  2. Add neat lines and marginalia.
Infographic neat lines and marginalia
  1. Add a map frame of the reference and geocoded points.
  2. Remove the service credits
  3. Add a map zoomed in on an area with notably wide deviations between the different geocoders.<7/li>
Infographic maps
  1. Add the histograms.
  2. Add analysis results.
Infographic analysis
  1. Add a legend.
Infographic legend

Analysis with ArcPY

The manual analysis performed above can also be performed using an ArcPy script.

Deviation Lines

Given the four geocoders used above, we can use an ArcPy notebook script to create hub-and-spoke lines to visualize and analyze the deviations (in geodesic meters) between a particular standard (in this case, the locations manually geocoded with Google Maps) and the locations returned by the other three geocoders.

# Parameters

hub_feature_class = "Google_Maps"

search_feature_classes = {"World_Geocoder": "USER_Address", 
                          "Custom_Locator": "USER_Address", 
                          "Nominatim": "Address"}
key_field = "Address"

distance_field = "Distance"

geocoder_field = "Geocoder"

spoke_feature_class = "Deviation"


# Create the spoke lines feature class and cursor

wgs84 = arcpy.SpatialReference(4326)

arcpy.management.CreateFeatureclass("", spoke_feature_class, 
                                    "POLYLINE", "", "", "", wgs84)

arcpy.management.AddField(spoke_feature_class, key_field, "TEXT")

arcpy.management.AddField(spoke_feature_class, geocoder_field, "TEXT")

arcpy.management.AddField(spoke_feature_class, distance_field, "FLOAT")

spoke_fields = ["SHAPE@", key_field, geocoder_field, distance_field]

spoke_cursor = arcpy.da.InsertCursor(spoke_feature_class, spoke_fields)


# Create a cursor to loop through all hub points

hub_fields = ["SHAPE@", key_field]

hub_cursor = arcpy.da.SearchCursor(hub_feature_class, hub_fields)

for hub in hub_cursor:
    
    # Loop through all comparison feature classes
        
    for search_name, search_key in search_feature_classes.items():

        search_fields = ["SHAPE@", search_key]
        
        search_cursor = arcpy.da.SearchCursor(search_name, search_fields)
        
        for search in search_cursor:
            
            # Only compare points geocoded to the same address
            if (hub[1] != search[1]) or (hub[0] == None) or (search[0] == None):
                continue
                
            # Create spoke line 

            segment = arcpy.Polyline(arcpy.Array([hub[0].centroid, search[0].centroid]), wgs84)

            distance = segment.getLength("GEODESIC", "METERS")
            
            print(search_name, search[1], distance)
            
            spoke_cursor.insertRow([segment, hub[1], search_name, distance])
            
        del search_cursor
            
del hub_cursor

del spoke_cursor
Deviation spoke lines

Pivot Tables

The statistical modules NumPy and Pandas can be used to create descriptive statistics for the deviation distances.

import numpy as np
import pandas as pd

# Convert the feature class attributes to a Pandas data frame

column_names = [x.name for x in arcpy.ListFields(spoke_feature_class)]

distances = arcpy.da.FeatureClassToNumPyArray(spoke_feature_class, "*")

distances = [list(x) for x in distances]

distances = pd.DataFrame(distances, columns=column_names)


# Display pivot tables

print("\nMax")

print(pd.pivot_table(distances, values=distance_field,
                     index=[geocoder_field],
                     aggfunc=np.max))

print("\nMedian")

print(pd.pivot_table(distances, values=distance_field,
                     index=[geocoder_field],
                     aggfunc=np.median))

print("\nMin")

print(pd.pivot_table(distances, values=distance_field,
                     index=[geocoder_field],
                     aggfunc=np.min))

print("\nMatches")

print(pd.pivot_table(distances, values=distance_field,
                     index=[geocoder_field],
                     aggfunc=np.count_nonzero))

The output shows the maximum, median, and minimum deviations for each of the geocoders compared to the manually geocoded Google Maps points. It also shows the percentage of addresses that each geocoder matched to some location (match rate).

These four statistics can be used to assess the quality of the geocoders in a particular geographic area. Ideally the "best" geocoder would have highest match rate and lowest deviations in all three categories. However, if there are different rankings in the different categories, your evaluation of quality will be dependent on your needs, such as an application using a large number of addresses for large physical buildings where getting the most number of points geocoded (match rate) would probably take precedence over pinpoint accuracy.

The output shows that the custom street address locator had the lowest maximum and median deviation compared to Google Maps, but it also had the worst match rate (5 of 9 = 56%) compared to the services. This might make it the best geocoder where accuracy is more important than full coverage.

Max
                   Distance
Geocoder                   
Custom_Locator   108.147087
Nominatim       3989.762695
World_Geocoder  4921.140137

Median
                 Distance
Geocoder                 
Custom_Locator  60.280296
Nominatim       70.182304
World_Geocoder  64.972946

Min
                 Distance
Geocoder                 
Custom_Locator  23.634283
Nominatim       17.633583
World_Geocoder  24.911308

Matches
                Distance
Geocoder                
Custom_Locator         5
Nominatim              9
World_Geocoder         9