Geodatabases
An ArcGIS geodatabase is an organized collection of geographic datasets held in a common file system folder (file geodatabase) or a multiuser relational database management system (RDBMS) (ESRI 2024).
- The ArcGIS geodatabase is the native data format in ArcGIS Pro for managing data.
- ArcGIS geodatabases provide the capability to set rules for data integrity (domains) and explicitly define relationships between data objects (relations).
- ArcGIS geodatabases support a transaction model for managing workflows.
- ArcGIS geodatabases can fully preserve the organization of data imported from other formats, such as CAD, raster grids, or shapefiles.
- ArcGIS geodatabases can be used with non-ESRI software (such as QGIS), although full capabilities are only available with ESRI software.
There are two types of geodatabases that you will commonly work with in ArcGIS Pro:
- File geodatabases
- Enterprise geodatabases
This tutorial will cover basic geodatabase concepts and techniques in ArcGIS Pro and ArcGIS Enterprise.
File Geodatabases
File geodatabases are collections of files in a folder in local storage on your personal machine that are used to store geodatabase tables and relationships (ESRI 2024).
A project geodatabase is a file geodatabase unique to every ArcGIS Pro project that is the default location for new data created or imported into the project.
The actual files in file geodatabases are managed by ArcGIS Pro and are cryptically named by the software, so you will generally want to avoid making any direct changes to the files outside of ArcGIS Pro.
Tables
A relational database table is a collection of related data organized as columns and rows.
- Columns represent characteristics.
- Rows represent individual things that have those characteristics.
For example, this is a simple table of Illinois Counties in the Chicago metropolitan area (Chicago–Naperville–Elgin, IL–IN–WI Metropolitan Statistical Area).
County | Population | Median_Household_Income |
---|---|---|
Cook | 5,275,541 | 76,632 |
Dekalb | 100,420 | 68,590 |
DuPage | 932,877 | 102,152 |
Grundy | 52,533 | 89,993 |
Kane | 516,522 | 93,343 |
Kendall | 131,869 | 101,447 |
Lake | 714,342 | 101,442 |
McHenry | 310,229 | 98,907 |
Will | 696,355 | 96,668 |
An ArcGIS feature class is a table representing a set of common features that have a common set of attributes and a common spatial representation (point, line, or polygon). Feature classes stored in relational databases have a hidden shape column and associated spatial characteristics (such as the type of projection) (ESRI 2024).
Data Types
Relational database table columns have specific data types that represent the types of values that can be stored in those columns. The names for these data types come from the underlying programming language(s) used to store the data in computer memory and in databases. Data types commonly used for feature classes include:
- Double (real numbers: numbers with a fractional component)
- Long (integers: whole numbers with no fractional component)
- Text
- Date
Other types you will encounter (especially in older data) include the following. These are shorter variants on the types given above and can be used when trying to minimize storage needs for tables with extremely large numbers of records on systems with limited resources.
- Float (real numbers)
- Short (integers)
Tables used for feature classes in ArcGIS Enterprise contain additional columns with feature IDs, shape size, and shape geometries. These are maintained by the software and you will generally want to leave these columns alone to avoid corrupting your data.
You can view the data types in a feature class table by showing the Attribute Table and switching to Fields View.
Export Features
The Export Features tool can be used to copy data from shapefiles or feature services into new feature classes in a geodatabase.
For this example we create a Counties feature class in the project geodatabase from the Minn 2015-2019 ACS Counties feature service in the University of Illinois ArcGIS Online organization, with a filter to bring in just counties in Illinois.
When importing shapefiles, you need to first unzip the files if they were distributed in a .zip archive. You also need to make sure that the destination is in the project geodatabase rather than simply another shapefile.
For this example we create a County_Roads feature class from a TIGER/Line shapefiles of roads in Peoria County, IL.
Export Table
The analogous Export Table tool can be used to import CSV files into non-spatial tables in a geodatabase.
For this example, we creae a Manufacturing table with manufacturing data by county from the US Census Bureau's 2017 Economic Census.
- Download the data.
- Go to data.census.gov to download the table data.
- Search for EC1731BASIC Manufacturing economic census data.
- For Geos, summarize by County.
- Under Columns, add the Geographic identifier code (GEOID).
- Download the CSV file.
- Clean up the data.
- Remove unneeded columns.
- Rename the columns to shorter names.
- Find/replace the "D" values used for missing data.
- Remove any message text in numeric columns.
- Save As the file as a CSV with a meaningful name (Manufacturing.csv).
- Run the Export Table tool to bring the data into the project geodatabase.
Table Data Input
You can create a new database table in the Catalog Pane by right-clicking on the database, selecting New, and choosing the type of table or feature class to create.
Standalone File Geodatabases
File geodatabases separate from the project geodatabase can be created and used in ArcGIS Pro using the Create File Geodatabase tool.
Some organizations will share data as zipped file geodatabases, and you can create new file geodatabases if you need to keep your data separate from your project.
Domains
While column data types provide useful restrictions on the ranges of values you can place in columns, there are situations where you will want to further constrain the values that can be entered into specific columns to maintain consistency across the table and mitigate the entry of erroneous or impossible values.
Domains are geodatabase rules that describe and limit the possible values for fields.
Domains are defined for a geodatabase as a whole, and then applied to specific table fields as needed.
Categorical domains are used to restrict the values of text field to a specific set of possible categorical values.
- Right-click on a feature class in the Catalog Pane and select Domains.
- Start typing in the list to add the domain.
- Domain Name: Provide a descriptive name (Candidate_2020)
- Description: Provide a description.
- Field Type: Text
- Domain Type: Coded Value Domain
- In the list on the right side, add Code values and descriptions of those codes.
- Click Save.
- Right-click on the feature class in the Catalog Pane to which you want to add the domain and select Fields View.
- Double click in the Domain column on the field to select the desired domain.
Quantitative domains can be used to restrict numeric fields (integer and real) to specific ranges of values.
Date domains can be used to restrict date fields to specific ranges of dates.
Enterprise Geodatabases
ArcGIS geodatabases are an extension of relational databases.
When you are collaborating with multiple users in an organization (such as a government office), you will often want to keep your data in an enterprise database managed with a relational database management system (RDBMS).
- Relational database management systems (RDBMS) are software and hardware used to manage the relational databases that are ubiquitous in business and government for storing organizational data.
- Enterprise geodatabases are geodatabases that are kept in RDBMS from a variety of commercial vendors and open source projects including Oracle, Microsoft (SQL Server), and PostgreSQL / PostGIS.
- PostgreSQL is an open source RDBMS commonly used with ArcGIS Enterprise because it is robust and full-featured while also being less complex and expensive than proprietary solutions.
- PostGIS extensions to PostgreSQL allow PostgreSQL to store feature classes with geospatial table columns. PostGIS also provides extensions to the SQL language for operations that consider geospatial proximity, distance, and size.
Relational databases store data as sets of tables that are connected by a set of logical relationships. This method of organizing data was first developed by Edgar F. Codd (1970) and the first commercial relational database management system was released by Software Development Laboratories (now Oracle) in 1979 (Ménard 2023).
While the organization of relational databases can initially seem overly complex, this technique for storing large amounts of complex data has proven to be exceptionally flexible and powerful, which explains the popularity and ubiquity of relational databases for enterprise data storage.
Data Stores
An ArcGIS Enterprise data store is database used for storing geospatial data in an enterprise environment.
- Portal has data stores that are used for hosted feature services (relational data stores), tile service caches (tile cache data store), and other types of data stored by the portal.
- Enterprise geodatabases in RDBMS can be used as additional data store(s) in ArcGIS Enterprise.
- Authoritative data constantly maintained by the organization can be kept in the enterprise database while different snapshots and views of that data can be published as feature services in the portal data store.
RDBMS data stores can provide significant advantages over keeping everything in the portal data store.
- Full set of geodatabase and relational database capabilities
- Integration with other non-spatial organizational data
- Speed
- Capacity
- Versioning
- Backup and restore capabilities
- Data update in live feature services without making layers editable
However, RDBMS require significant additional maintenance compared to using the portal data store for everything, and smaller organizations may find the portal data store adequate to satisfy organizational needs.
Database Architecture
- A server has one RDBMS.
- The RDBMS is administered by the database administrator.
- The database administrator is named postgres.
- The database administrator can do anything (superuser) so this high risk role is commonly limited to high-level systems administrators.
- Each RDBMS contains one or more databases.
- A database that has configured for ArcGIS spatial database engine (SDE) is a geodatabase.
- All geodatabases are administered by the geodatabase administrator.
- The geodatabase administrator is named sde.
- Each geodatabase can contain multiple schema.
- A schema is a set of table definitions and relations between tables in a relational database.
- A schema has one user who is the schema's data owner.
- The data owner user name and schema user name are always the same.
- The data owner can grant privileges to other users.
- Each schema can contain one or more tables.
- One schema named sde is owned by the geodatabase administrator and is used for system tables.
Connecting to an Enterprise Geodatabase
You can connect to an enterprise geodatabase in the ArcGIS Pro Catalog Pane by right-clicking on Databases and selecting Connect.
You will need to enter a database user name and password supplied by your systems administrator. Note that these credentials will probably be different from the credentials you use to login to portal.
Copying Tables into an Enterprise Geodatabase
If you have a table or feature class in an existing project geodatabase, you can drag and drop in the Catalog Pane to copy the data into (or from) an enterprise geodatabase.
Disconnecting from a Database
Unfortunately, ArcGIS Pro has no mechanism for disconnecting from a database.
This will cause permissions problems if you need to reconnect as an administrator or different user.
The only way to disconnect from a database is to close ArcGIS Pro and restart.
Administrative Consoles
A database administrative console is a user interface that administrators of enterprise databases can use to analyze performance and diagnose issues.
pgAdmin 4 is the administrative console that is included in installations of PostgreSQL. pgAdmin has a rich set of tools but can be somewhat challenging to install and use, especially for novices.
Adminer is a web-based console in PHP that supports multiple types of databases and is somewhat simpler to use with PostgreSQL than pgAdmin. An administrator can install Adminer on a Linux server running PHP.
To interact with a database, you need to log in with credentials supplied by the database administrator.
- To view table fields, click on the table name.
- To enter a query, click the SQL command link.
Creating Enterprise Geodatabases
Database administrators can create new enterprise databases in an RDBMS using the Create Enterprise Geodatabase tool.
- Database type: The type of database (PostDatabase administrator: Named postgres by default
- Instance: The server name or domain name for the RDBMS
- Database: A name for the new database
- Database Administrator: Usually postgres for PostgreSQL databases
- Database Administrator Password
- Geodatabase Administrator Password
- Tablespace Name: Leave blank for PostgreSQL
- Authorization File: Located in C:\Program Files\ESRI\Licensex.x\sysgen\keycodes
An administrator can delete a database.
DROP DATABASE db_name;
Creating Users and Schemas
A geodatabase administrator can create a new user, which also creates a new schema.
- Connect to the database as the geodatabase administrator (sde).
- Run the
Create Database User tool.
- Input database connection: Find the current connection
- Database user: The name for the new user and schema
- Database user password
- Role: Leave this blank
To list users:
SELECT * FROM pg_user;
To view users currently connected to the database:
SELECT * FROM pg_stat_activity
To change a user password:
ALTER USER user_name WITH PASSWORD 'new_password';
To drop users:
DROP USER username;
Granting Privileges
Schema owners can grant privileges to view and/or edit tables to other users in ArcGIS Pro by right-clicking on the table and selecting Privileges...
- Select Add User and select all users you wish to share the table.
- Select PUBLIC to share with everyone.
- Choose the privileges you wish to grant. Select allows privileged users to read the table.
- To remove a user from the list, deselect all privileges.
DB administrators can create group roles, grant privileges to group roles, and add individual login roles to group roles
Selection
Structured query language (SQL) is the primary language for communicating with databases for creating tables, inserting data into tables, querying data from tables, and performing administrative actions on the database.
While Portal for ArcGIS and ArcGIS Pro hide geodatabase interaction behind graphical user interfaces and save users from the learning curve and complexity of SQL, there are tools in ArcGIS Pro where some knowledge of SQL can be helpful for performing complex operations, and general knowledge of SQL can be a useful job skill when working with databases in the non-ESRI world.
SELECT Statements
SELECT is the primary SQL command for querying data from tables.
The simplest form of a SELECT statement is:
SELECT column_names FROM table_name WHERE condition;
For example, this statement selects all counties in Illinois.
SELECT * FROM Counties WHERE ST = 'IL';
- The asterisk (*) is a wildcard that indicates to select all columns.
- FROM Counties indicates to select rows from the Counties table.
- WHERE ST = 'IL' indicates to select only records where the value in the ST column are IL (Illinois).
SELECT statement WHERE clauses indicate which records to select. They are composed of one or more comparisons, which can be tied together with logical operators.
Simple Comparisons
Simple comparisons indicate a set or range of field values that are acceptable for records to be selected. Operators include:
- = (equals)
- <> (not equal)
- > (greater than)
- >= (greater than or equal)
- < (less than)
- <= (less than or equal)
Note that text (strings) needs to be included in single quotation marks.
SELECT * FROM Counties WHERE Region = 'Midwest';
The greater than and less than symbols can be used to select features within ranges of values.
SELECT * FROM Counties WHERE Population >= 100000;
Logical Operators
Logical operators are used in SQL to combine multiple comparisons into compound statements.
- x AND y (both comparison x and comparison y must be true)
- x OR y (comparison x or comparison y must be true)
- NOT x (comparison x is not true)
Parentheses are commonly used to avoid ambiguous syntax and make statements easier to interpret.
For example, to isolate small, high-income counties:
SELECT * FROM Counties WHERE (Total_Population <= 100000) AND (Median_Household_Income > 70000);
Logical operators can be used to isolate ranges of quantitative values. This example is counties in the middle quartiles of median household income.
SELECT * FROM Counties WHERE (Median_Household_Income >= 43000) AND (Median_Household_Income <= 60000);
Mathematical Operators
Mathematical operators permit calculations prior to comparison. The operands can be fixed values or names of fields.
- + (add)
- - (subtract)
- * (multiply)
- / (divide)
- % (modulo: remainder after division)
For example, this statement selects countries with a lower population density than the US median.
SELECT * FROM Counties WHERE (Total_Population / Land_Area_Sq_km) <= 36;
Range Comparisons
Range comparisons are statements that test whether field values are in a particular range or set of values.
- x BETWEEN y AND z (x is within the range y to z, inclusive)
- x IN (a, b, c...) (x is one of the specified values)
- x LIKE wildcard (x contains a specific word or phrase)
BETWEEN ranges can be clearer than compound comparison statements.
SELECT * FROM Counties WHERE Median_Household_Income BETWEEN 43000 AND 60000;
Likewise with IN set operations. This example selects the five Midwestern states.
SELECT * FROM Counties WHERE ST IN ('OH', 'IN', 'IL', 'WI', 'MI');
Percent signs (%) are used as wildcards in LIKE statements to match any text. This statment isolates counties that have "Red" in their name.
SELECT * FROM Counties WHERE Name LIKE '%Red%'
LIKE statements can be useful for working with groups of US Census Bureau GEOIDs. For example, this statement isolates all census tracts in Peoria County (FIPS 17143).
SELECT * FROM Tracts WHERE GEOID LIKE '1400000US17143%'
Column Selection
SELECT statements can also be used to select specific columns from tables.
- A comma separated list of column names can follow the SELECT keyword.
- The statements above all use the asterisk (*) wildcard to select all columns.
- Limiting a specific set of columns can be useful to make the resulting selection easier to work with if your source table has a large number of columns.
- Limiting the number of selected columns can also improve selection speed and memory usage when you are working with large volumes of data or large numbers of transactions (such as with a public website).
- If you are working with feature classes, you also need to include the shape column to preserve the points, lines, or polygons associated with features for mapping or analysis.
For example, to select only a minimum number of columns to make a choropleth of median age in Illinois:
SELECT GEOID, Name, Median_Age, shape FROM Counties WHERE ST = 'IL';
Aggregation
Aggregation is the combination of multiple values into a smaller set of values for comparison or analysis. SQL has multiple functions for aggregation and commonly used functions include:
- MIN(x) returns the smallest value in column x
- MAX(x) returns the largest value in column x
- COUNT(x) returns the number of rows in column x
- SUM(x) returns the total sum of the values in numerical column x
- AVG(x) returns the average value of the values in numerical column x
One or more of these functions can be used in the SELECT column list. For example, this statement shows the range and average of county land area in square miles for US counties.
SELECT MIN(Square_Miles), AVG(Square_Miles), MAX(Square_Miles) FROM Counties
Aggregation functions can also be used for comparisons, although the syntax is a bit more complex than a simple SELECT.
Subqueries
Conceptually, SQL SELECT statements operate on one record at a time while aggregation functions operate on multiple records at a time.
Subqueries are queries within queries that can be used for performing complex operations that involve working with more than one row at a time.
Subqueries are enclosed in parentheses in a statement. For example, this statement selects the county with the highest median household income.
SELECT * FROM Counties WHERE Median_Household_Income = (SELECT MAX(Median_Household_Income) FROM Counties);
This statement selects countries with above average median household income.
SELECT * FROM Counties WHERE Median_Household_Income > (SELECT AVG(Median_Household_Income) FROM Counties);
Subqueries can become exceptionally complicated, but are often useful to encode complex queries in SQL where they can be optimized and executed quickly by the RDBMS.
Filters
There are three places you can perform selects in ArcGIS Pro:
- Filters on tools like Export Features
- Definition queries
- Query layers
Filters in ArcGIS Pro are used to subset groups of features for display or tool operations. Filters can be constructed using dialog options for assistance, or can be defined using the same syntax used for SQL WHERE clauses.
The Export Features tool includes a filter option. The Export Features tool is more appropriate than other select options when:
- You have a simple query.
- You are working with data that could change or disappear (such as a feature service or enterprise geodatabase table) and you need to keep a stable snapshot at a specific point in time, or
- You are importing data from a shapefile into a project geodatabase or an enterprise geodatabase.
This is an example of filtering large counties from the Minn 2015-2019 ACS Counties feature service with a population of 1 million persons or more (Large_Counties).
Definition Queries
Definition queries are filters added to map layers so they display only the features that satisfy the filter conditions. Definition queries also limit the number of features processed when that layer is used with an analysis tool.
A definition query is more appropriate than other select options when:
- You have a simple query.
- You are working with a feature class that is used in different layers with different queries.
- You are working with data that can be updated and you want your layer to be automatically updated in the future.
Definition queries are added by right-clicking on the layer, selecting Properties, and then selecting Definition Query.
For example, this adds a definition query to show only countries with a population density of 100 residents per square mile or fewer.
Query Layers
Query layers in ArcGIS Pro are map layers that use SQL SELECT statements to get data from enterprise geodatabases. Query layers permit access to the broad range of SQL SELECT syntax beyond simple WHERE clauses created as filters.
A query layer is more appropriate than other options when:
- You are working with data in an enterprise database that can change and you want your layer to be automatically updated in the future.
- You wish to use features specific to an enterprise geodatabase, such as PostGIS spatial operators.
- You have a complex query, such as with subqueries or aliases.
- You are working with a feature class that is used in different layers with different queries.
For example, this query layer selects specific columns for counties with higher than average median household income.
- Under Analysis, Tools run the Make Query Layer tool.
- Input Database Connection: Select the enterprise database
- Output Layer Name: Provide a meaningful name (Wealthy_Counties)
- Query: Always include the Objectid and shape in your selected columns.
- Define the spatial properties of the layer: If unchecked, the tool will attempt to infer the spatial properties (shape, CRS) from the first record returned from the database. Check this box and specify those properties below if no symbols display.
- Unique Identifier Field(s): Use a field (text or numeric) that can be used to uniquely identify features (Objectid).
- Shape Type: Select the type of shape used by the feature class (Polygon). Selection of a different shape type from the data will cause the tool to silently fail and display no features.
- Coordinate System: Select the feature class's coordinate system. This example data matches Current Map, but specification of an incorrect coordinate system will cause the tool to silently fail and display no features.
SELECT Objectid, Name, ST, Median_Household_Income, shape FROM Counties WHERE Median_Household_Income > (SELECT AVG(Median_Household_Income) FROM Counties)
Unfortunately, the Make Query Layer tool will accept your query even if there are problems with the syntax, and the layer will appear in the Contents pane with no features.
You can view and validate a layer's query under Properties, Source.
Subset Columns in ArcGIS Pro
When working with tables containing large numbers of columns, selecting only the necessary subset of those columns can make working with the data easier.
The Export Features tool includes a Field Map that can be used to add, delete, rename, and reorder fields, as well as change other field properties.
For example, this creates an County_Income feature class by selecting specific fields from the Minn 2015-2019 ACS Counties feature service.
When working with existing tables, you can define aliases and hide fields in the Attribute Table. Alias field names in ArcGIS Pro are alternative (often more-readable) field names that can be used while retaining the original field names in the data.
Joins
Relationships
Relationships are the logical connections between tables based on keys, and this aspect of relational databases distinguishes relational databases from other methods of storing data.
A key is a field in a table that can be used to uniquely identify specific rows in that table.
A foreign key is a field or set of fields in one table that can be used to connect related rows in another related table in a join.
For example, take this table of mean Commuting time (in minutes) in each county.
A one-to-one relationship between tables means that at most one row in one table matches with a row in the other table. In this example of voter participation percentages in the 2020 election, exactly one row in the Particpation table is matched to exactly one row in the Counties table above.
County | Participation |
---|---|
Cook | 72.2 |
Dekalb | 74.2 |
DuPage | 76.6 |
Grundy | 79.9 |
Kane | 73.7 |
Kendall | 76.7 |
Lake | 69.4 |
McHenry | 69.7 |
Will | 75.1 |
A many-to-one relationship means that multiple rows in one table can match a single row in the other table. For example, this Releases table represents toxics releases from facilities in different counties. Each county can have more than one facility.
County | Company | Release tons 2022 |
---|---|---|
Cook | Wieland Metals | 3,619 |
Cook | Ingredion | 428 |
DuPage | SWD | 47 |
DuPage | KNS | 9 |
McHenry | Rohm and Haas | 190 |
Will | ExxonMobil | 1,000 |
Schema
A schema is a set of table definitions and relations between those tables in a relational database.
Entity-relationship diagrams (E-R diagrams) graphically represent relational database schema and are commonly used in the creation and analysis of relational database designs.
E-R diagrams can be quite complex when the database is used for complex business data.
The Generate Schema Report tool can be used in ArcGIS Pro create detailed reports on the feature classes, tables, domains, and relations con+figured in a file or enterprise geodatabase. While these reports do not contain graphical E-R diagrams, the reports can be useful for diagnosing database problems and creating new databases with similar capabilities.
Table Joins
A join is a database operation where you connect two related tables based on keys in the tables that are used to match rows in one table to rows in the other table.
Joins are especially useful for attaching non-spatial data tables to spatial feature classes.
When joining two tables, the table you start with (usually on the left side of a diagram) is the target table, which is often a feature class when working with geodatabases. The table you are joining to the target table is the join table.
For example, a join of the Counties and Participation table based on the common County key would result in this table:
County | Population | Median_Household_Income | Participation |
---|---|---|---|
Cook | 5,275,541 | 76,632 | 72.2 |
Dekalb | 100,420 | 68,590 | 74.2 |
DuPage | 932,877 | 102,152 | 76.6 |
Grundy | 52,533 | 89,993 | 79.9 |
Kane | 516,522 | 93,343 | 73.7 |
Kendall | 131,869 | 101,447 | 76.7 |
Lake | 714,342 | 101,442 | 69.4 |
McHenry | 310,229 | 98,907 | 69.7 |
Will | 696,355 | 96,668 | 75.1 |
Join Types
You will commonly use two of the four different types of joins in ArcGIS Pro.
- A left join keeps all target features regardless of whether there are matching records in the join layer. Missing values from the join table are given a NULL value.
- An inner join only keeps features where there are matching features in both the target and join layer.
- An outer join only keeps all rows regardless of whether they match between the two tables. These are not used in ArcGIS Pro because they would result in records with missing geometric features.
- A right join keeps all target features regardless of whether there are matching records in the join layer. These are not used in ArcGIS Pro because they would result in records with missing geometric features.
Join Field
The Join Field tool can be used to perform attribute joins based on a common key field.
This example joins the Counties and Participation tables described above using the county name as the key.
Join Field adds columns to the target feature class, and if you want to keep a copy of the feature class without the new columns, you should first run Export Features to create a duplicate that will be modified (Participation_Counties).
Join Field keeps all input features to create a left join.
Add Join
ArcGIS Pro supports in-memory joins that can be added to layers by right-clicking in the Contents Pane and selecting Joins and Relates and Add Join.
- In-memory joins affect only specific map layers and do not change the underlying feature class or join table.
- In-memory joins are dynamic like query layers in that the joined data updates when the underlying feature classes or tables update.
- In-memory joins can be used with data from the project geodatabase or external shapefiles.
- In-memory joins cannot be automated and cannot be shared when creating web layers.
- In-memory joins disappear if the join table is deleted or the target table is exported, and you should use the Join Field tool as described above if you want your join to be permanent.
You can leave Keep all input records checked for a left join or uncheck for an inner join.
SQL Joins
In an SQL SELECT statement the LEFT JOIN keywords are added after the column list or wildcard, folowed by the keyword ON and the condition connecting the keys for matching rows.
Because you are working with multiple tables, if you have columns in the two tables that have the same name, you need to identify fields with both the name of the table and name of the column in the format table_name.column_name.
SELECT Counties.*, Participation.Participation FROM Counties LEFT JOIN Participation ON Counties.Name = Participation.County
An INNER JOIN would use the same syntax:
SELECT Counties.*, Participation.Participation FROM Counties INNER JOIN Participation ON Counties.Name = Participation.County
Additional WHERE conditions can be added after the join if needed.
SELECT Counties.*, Participation.Participation FROM Counties INNER JOIN Participation ON US_Counties.Name = Participation.County WHERE Median_Household_Income > 60000;
WHERE can also be used to create an inner join of tables in the FROM clause.
SELECT * FROM Counties, Participation WHERE Counties.Name = Participation.County
Joins in Query Layers
Query layers in ArcGIS Pro are map layers that use SQL SELECT statements to get data from enterprise geodatabases.
- Query layers are useful when you want your joined data to update automatically when the data in the enterprise geodatabase is updated.
- Query layers can be used with complex SQL queries (such as subqueries) that would otherwise require multiple ArcGIS Pro tools.
- If you need a reliable, fixed snapshot of your data, you can run Export Features to copy the query layer into a new feature class.
- A disadvantage of query layers is that they can only be used with enterprise geodatabases.
When performing joins using tables in file or enterprise geodatabases, you should select only specific columns from the join table to avoid bringing in duplicate OBJECTID or key fields that will cause the join to fail without any kind of warning or error message.
SELECT Counties.*, Participation.Participation FROM Counties INNER JOIN Participation ON Counties.Name = Participation.County
Mismatched Names
Joins are very sensitive to idiosyncrasies of the underlying data, and there are a variety of problems that can cause joins to have missing rows or fail altogether.
Text keys used for joins must match exactly and the use of place names as keys can result in incomplete or failed joins.
- Different abbreviations (Democratic Republic of the Congo vs. Dem Rep Congo)
- Alternative names (Iran vs. The Islamic Republic of Iran)
- Different articles (The Plurinational State of Bolivia vs. Plurinational State of Bolivia)
- Different capitalizations (WASHINGTON vs. Washington)
- Different punctuation (ST. vs. ST)
- Different languages (The Ivory Coast vs. Côte d'Ivoire)
- Different character sets (China vs. 中國)
- Leading or trailing spaces ("United States " vs. "United States")
A common adage is that data scientists spend 50% to 80% of their time cleaning data so that it can be analyzed (Lohr 2014).
Options for addressing these mismatches include:
- If available, use standardized codes (like ISO country codes or USCB GEOIDs) instead of place names.
- Geocode the table on the place name and use a spatial join.
- If no other option is available, manually modify one set of keys to match the other set.
Mismatched Key Data Types
Keys used for joins must be of the same data type, and mismatched key data types are especially common when working with numeric codes that are represented as text in one table and an integer in the other table.
You can use Calculate Field to create a numeric version of the text code
This example changes the text zip code field from the TIGER cartographic boundary file (Zip_Codes) to numeric to match numeric zip codes from Illinois State Cancer Registry data (Breast_Cancer)
- View the feature class Attribute Table and select Add.
- Field Name: Give the field a meaningful name similar but distinctive to the field you are converting (Zip_Numeric).
- Alias: Duplicate the field name.
- Data type: Choose a numeric field type. Long is usually appropriate for numeric codes.
- Click Save on the Fields ribbon.
- Return to the Attribute Table, right click on the new field, and select Calculate Field.
- Double click to add the field you are converting from.
- Run Export Features tool to create a copy of your feature class that will be modified below (Cancer_Zip).
- Run the Join Field tool and join the two tables using the converted field.
- Input Table: The feature class
- Input Field: The converted numeric key field (Zip_Numeric)
- Join Table: The table
- Join Field: The table numeric key field
- Keep all input records: Uncheck to leave
- Validate Join: Click and confirm that records are joined in the message output.
You can avoid the headaches of field calculation when working with query layers by using the CAST operator to transform (cast) the text field to a numeric (INT) field.
SELECT Zip_Codes.*, Breast_Cancer_Count FROM Zip_Codes INNER JOIN Breast_Cancer ON CAST(Zip_Codes.NAME20 AS INT) = Breast_Cancer.Zip_Code
Leading Zero Problem
Numeric codes like FIPS codes or zip codes sometimes have leading zeroes, and the inadvertent removal of leading zeros by the software creates a variation of the mismatched name problem.
- For example, the zip code for Portland, Maine's city hall is 04101.
- When data is processed in a spreadsheet program or transferred via a CSV file, numeric codes may be changed to a numeric data type that removes the leading zero (4101).
- Even if the number is converted back to text data types, the codes will not match if one table has the leading zeroes and the other doesn't.
Although it is possible to reformat the numbers as text with the leading zero, the easiest course of action will probably be to convert both the feature class and table fields to numeric (Long integer).
Spatial Queries
Spatial queries are database queries defined by spatial relationships between features, such as intersection or proximity.
Select by Location
The Select Layer by Location tool selects features based on a spatial relationship to features in another dataset or the same dataset.
Note that the "select" in the name of the tool refers to selection in the editor and if you need to preserve the editor selection, you will also need to run Export Features to export the selection to a new feature class.
This example creates a new Hospitals layer in or near Peoria County, Illinois. The hospital location data is from the US Department of Homeland Security open data site.
Summarize Within
The Summarize Within tool can be used to aggregate values from one layer (summary features) into polygon areas (input polygons).
This example summarized the count of hospitals and total hospital beds by county.
- Input Polygons: The counties layer (Counties)
- Input Summary Features: The hospitals layer (Hospitals)
- Output Feature Class: Provide a meaningful name (County_Hospital_Summary)
- Keep all input polygons: Uncheck to show only counties with hospitals
- Summary fields: Beds (Sum)
- Add shape summary attributes: Check to get the count of hospitals
Nearest Neighbor
The Near tool calculates distance between each input feature and the closest feature in another feature class.
In this example we calculate the distance from each Illinois county to the nearest hospital. Because the Near tool modifies the input feature class, we first use the Export Features tool to create a new feature class (Hospital_Distance) to preserve the original data.
- Input Features: The areas (Counties)
- Near Features: The points (Hospitals)
- Search Radius: The maximum search distance from each area (500 kilometers)
- Method: Geodesic
- Field Names: Keep the defaults
- Distance Unit: Use the approate unit for your locale (Kilometers)
Spatial Join
A spatial join copies attributes from a join layer into features in a target layer that match based on a spatial relationship.
The Join Features tool can be used to perform both attribute joins (above) and spatial joins.
In this example we spatial join hospitals to counties to compare neighborhood income distribution.
- Target Layer: Features that will receive the attributes (Hospitals)
- Join Layer: Features that are the source of the attributes (Counties)
- Output Dataset: Provide a meaningful name (Hospitals_Income)
- Join Operation: Join one to one (default)
- Spatial Relationship: Within
- Summary Fields: Add the fields you want to copy from the join layer to the target layer
Geometry Attributes
The Calculate Geometry Attributes tool can be used to add columns to a feature class with geometric characteristics of individual features like centroid lat/long, perimeter, or area.
PostGIS
The PostGIS extensions to PostgreSQL provide additional operators that can be used to create WHERE selection conditions on spatial relationships like proximity or intersection.
- Query layers using PostGIS functions can be preferred over Join Features if you are working with data that is subject to change and you want your layer to update automatically without having to re-run Join Features with each update.
- However, spatial queries can be slow, especially with large numbers of features or complex queries. If a SQL query is absolutely needed, creating the query layer and exporting it to a new feature class may be the most time-efficient option.
- PostGIS has an extensive collection of spatial functions that are covered in the Official Manual.
- Further information on what you can do with PostGIS is included in this quick start and this Introduction to PostGIS.
Select by Location
The PostGIS ST_DWithin() function indicates which features in the first parameter feature class are contained with a specified distance (third parameter) feature(s) in the second parameter feature class.
- ST_DWithin can be used in the WHERE clause to find points within a given distance from a specified shape.
- For this example, we select Hospitals in or near Peoria County, Illinois (County_Boundary).
- Since the features are in WGS 84 or NAD 83 lat/long, we use the ST_Transform() function to transform both sets of features to a North American Lambert conformal conic projection (ESRI 102009) so the distance can be specified in meters (50,000 meters = 50 kilometers).
SELECT Hospitals.* FROM Hospitals, County_Boundary WHERE public.st_dwithin( public.st_transform(Hospitals.shape, 102009), public.st_transform(County_Boundary.shape, 102009), 50000)
A query using ST_DWithin() can be used as query layers.
Summarize Within
The PostGIS ST_Intersects() function indicates which features in the first parameter feature class intersect (overlap) features in a second parameter feature class.
- ST_Intersects() can be used to JOIN hospitals to counties for grouping and summarization.
- The GROUP BY clause specifies the field(s) used to aggregate hospital features for the SUM function.
- The ORDER BY clause specifies the field(s) used to sort the output list.
SELECT Counties.Name, SUM(Hospitals.beds) AS beds FROM Counties JOIN Hospitals ON public.st_intersects( public.st_transform(Counties.shape, 102009), public.st_transform(Hospitals.shape, 102009)) GROUP BY Counties.Name ORDER BY Counties.Name
If you want to create a query layer with a summarized intersection, you also need to include the polygon shape field in the selected columns and the GROUP BY so the software has a geospatial field to map.
SELECT SUM(Hospitals.beds) AS beds, COUNT(Hospitals.beds) AS hospitals, Counties.Name, Counties.shape FROM Counties JOIN Hospitals ON public.st_intersects( public.st_transform(Counties.shape, 102009), public.st_transform(Hospitals.shape, 102009)) GROUP BY Counties.Name, Counties.shape ORDER BY Counties.Name
Nearest Neighbor
The MIN function with the PostGIS ST_Distance() function can be used to calculate distance between each feature in the first parameter feature class to the the closest feature in the second parameter feature class.
SELECT MIN(public.st_distance( public.st_transform(County_Tracts.shape, 102009), public.st_transform(Hospitals.shape, 102009))) / 1000 AS distance_km, County_Tracts.objectid, County_Tracts.shape FROM County_Tracts, Hospitals GROUP BY County_Tracts.objectid, County_Tracts.shape
Spatial Join
A spatial join copies attributes from a join layer into features in a target layer that match based on a spatial relationship.
The ST_Within() function indicates which features in the first parameter feature class are located entirely within a second parameter feature class.
In this example we spatial join hospitals to census tracts of ACS data in Peoria County, IL to find the median household income in the neighborhoods surrounding each hospital.
SELECT County_Tracts.Median_Household_Income, County_Hospitals.* FROM County_Hospitals INNER JOIN County_Tracts ON public.st_within( public.st_transform(County_Hospitals.shape, 102009), public.st_transform(County_Tracts.shape, 102009))
Joined feature classes can also be used in query layers, although the performance may be unacceptably slow when joining feature classes with large numbers of features.
SELECT INTO
If you have a complex, slow query and access to an administrative console that will let you submit queries, you can execute queries and save them to new feature classes or tables using an INTO clause.
If you create a table with a geospatial shape column, you will need to right-click on the new feature class in the Catalog Pane and select Register with Geodatabase to use the layer in ArcGIS Pro.
SELECT Counties.Name, SUM(Hospitals.beds), COUNT(Hospitals.beds), Counties.shape INTO County_Hospital_Summary FROM Counties JOIN Hospitals ON public.st_intersects( public.st_transform(Counties.shape, 102009), public.st_transform(Hospitals.shape, 102009)) GROUP BY Counties.Name, Counties.shape ORDER BY Counties.Name
Geometry Attributes
PostGIS functions can be used to calculate geometric characteristics of individual features like centroid lat/long, perimeter, or area.
Area calculations use the North America Albers equal area conic projection (ESRI 102008)
SELECT Counties.name, public.st_area(public.st_transform(Counties.shape, 102008)) / 1000000 AS square_km, public.st_perimeter(public.st_transform(Counties.shape, 102008)) / 1000 AS perimeter_km, public.st_astext(public.st_centroid(Counties.shape)) AS longlat, Counties.objectid, Counties.shape FROM Counties
Publishing
Publishing in ArcGIS Enterprise involves making data available across a network with a service URL or REST endpoint. Published data can be used to create web maps, web apps, or ArcGIS Pro layers.
There are two sources of data for a service:
- Reference registered data: Service data is stored in an enterprise geodatabase and updates when the databases is updated. Because such use of such services requires constantly accessing the enterprise geodatabase, their speed and performance can be slow or erratic.
- Copy all data: Service data is a snapshot of data from the source geodatabase that is copied into the portal data store. Feature services sourced from the portal data store can have good performance, but are difficult to update without completely republishing the services and breaking any links to the old services.
There are characteristics that vary across four types of published services:
Characteristic | Map image | Feature | Image tile | Vector tile |
---|---|---|---|---|
User can change styling | No | Yes | No | No |
Supports pop-up queries | No | Yes | No | No |
Performance with large data sets | Slow | Slow | Fast | Fast |
Downloadable data | No | Yes | No | No |
Requires enterprise geodatabase | Yes | No | No | No |
Publishing Feature Classes
Feature classes can be published in all four different types of services.
- Reference registered data (data read directly from the enterprise database) can only be published as a map image with the option of also publishing a feature services.
- Image or vector tiles provide the best performance but cannot be queried or downloaded as data.
- Only feature services allow the map user the options to change the styling or download the underlying data.
To publish a feature class:
- Add the feature class to a map in ArcGIS Pro.
- Adjust the Symbology as desired.
- Right click on the layer in the contents and select Sharing, Share as Web Layer
- Name: Give the layer a meaningful name.
- Summary: Add a description with the source.
- Copy all data: Chose the desired source and type of service
- Reference registered data updates data from the enterprise geodatabase.
- Copy all data stores the data in the portal relational data store as a snapshot (the example below).
- Share with: Choose the appropriate sharing level.
- Click Analyze.
- If you get the error "Unique numeric IDs are not assigned," click the ellipsis (...) on the message and select Auto-Assign Ids Sequentially.
- If you are publishing a feature class from an enterprise geodatabase you may need to rename the layer to remove the database name so it is not published with the feature service.
- Click Publish.
- In the Contents page in your portal, verify that the feature service has been published.
Feature services that draw from the portal data store are listed in the portal Content page as Feature layer (hosted).
Feature services that draw from an enterprise geodatabase (registered data) are listed in the portal Content simply as Feature layer.
If you get Error 00231 Layer's data source must be registered with the server when trying to publish, right click on the ellipsis (...) in the error, select Register Data Source With Server and register your schema with the portal. Your data store name should a generalized name you can reuse in the future (e.g. Peoria_County) rather a name specific to one feature service (e.g. Peoria_County_Parcels). You can view registered data stores with the Manage Registered Data Stores tool.
Publishing Raster Data
Rasters can be published as map images or tile layers.
- Tile layers are scaled and sliced into tile caches when they are published to give fast web map performance.
- However, tile layer caches can get large if rendered at high zoom levels, so the visibility range is usually constrained.
- Map images are scaled in the browser, but load and navigation performance can be slower on devices with weaker browsers or poor network connectivity (mobile devices).
Group Layers
Group layers combine multiple related feature layers into a single bundle that can be easily added to web maps and kept together in the layers list.
Group layers are useful for layers that function as a group (such as police stations and police precinct boundaries) or represent different scales of the same variable (states + counties + tracts).
- Add all layers to the project and give them related names (Minn_2023_PLACES_Counties, Minn_2023_PLACES_Tracts).
- Symbolize the layers consistently.
- For each layer, right click on the layer and select Properties. On the General tab, Add visibility ranges so that tracts are visible when zoomed in and counties are visible when zoomed out.
- Tract = Minimum scale 1:2,000,000
- County = Maximum scale 1:2,000,000
- Right click on the map in the Contents Pane and select New Group Layer.
- Right click on the group layer, and in Properties, give it a meaningful name (Minn_2023_PLACES_Health_Data).
- Drag the layers into the group layer
- Right click on the group layer and select Sharing, Share as Web Layer.
- Name: Provide a meaningful name
- Summary: Add metadata
- Copy all data: Feature
- Share with: Everyone
- Click Analyze.
- If you get the message Unique numeric IDs are not assigned, click on the ellipsis (...) beside the message and select Auto-Assign IDs Sequentially.
- Ignore the error 24035 Group Layer type is not supported for web feature layers.
- Click Publish.
- Go into your portal, select the new feature layer, and Open in Map Viewer to verify that the symbology and visibility work as desired.
Relationship Classes
Relationship classes define relationships between geodatabase tables.
- Relationship classes tie tables together so that deleted or added rows in one table are reflected in the relationship tables.
- Data in related tables is available in identification pop-ups in ArcGIS Pro.
- Data in related tables can be published together with the feature service to portal.
- Relationship classes create explicit documentation of relations between tables that would not necessarily be obvious without examining the data in depth or consulting offline schema documentation.
- The use of relationship classes in publishing web maps is only available in ArcGIS Online or ArcGIS Enterprise 11.2 or later.
To publish a feature class and related table together in a feature service:
- Add both the feature class and the related table to the map.
- Run the Create Relationship Class tool.
- Origin table: The feature class (Counties)
- Destination table: The table of related data (Manufacturing)
- Output Relationship Class: Give this a meaningful name
- Origin Primary Key: The key from the origin table (GEOID)
- Origin Foreign Key: The key from the data table (GEO_ID)
- Click on a feature to verify the data is related.
- Right click on the feature class and Share as Web Layer as a Feature Layer.
- Auto-assign feature IDs sequentially
- After publication finishes, Create a new map with the service
- Edit the popups, Add Content with Related Records.
ArcGIS Enterprise Sites
ArcGIS Enterprise Sites is a web interface for distributing content from portal to users outside your organization. Sites is the same user interface used for ArcGIS Hub, a cloud platform commonly used by government agencies for open data portals.
Create and Customize the Site
- On your portal Content page, select Create App and Sites.
- Give your site a meaningful name (Peoria County Illinois Open Data) and Create site.
- Delete the unused placeholder sections.
- Update the placeholder contact e-amil address.
- Update the copyright.
- Change the top row background image.
Content Library
The site content library is the collection of portal services data objects that are available to site visitors.
- Open the site in edit mode.
- In the second drop-down menu on the edit navigation bar (the name of the site), select Content Library.
- Click the Find existing content
- Select services listed under Data.
- When two Datasets with the same name are listed, one will probably be the service definition (.sd file), and the other is the feature service you want. You will need to try opening them to find out which is which.
- After adding all services, click Save.
- When you are done, click Save on the lower left side of the screen.
Content Cards
Content cards are icons in the Explore your data section of your site home page that have links to different categories of available data.
The categorization of data for these categories can be made with content tags, with explicit category assignments, or with portal groups. For a small site like this, the easiest option is to use tags.
- Adjust the Content cards to define and represent categories.
- Add tags to assign the data to categories.
- View the categories to verify successful categorization.
Data Export
One of the advantages of using ArcGIS Enterprise Sites for open data portals is that the software provides the ability for users to download feature data in multiple formats: CSV, zipped shapefile, GeoJSON, file geodatabase, and Excel spreadsheets.
To allow users to export data from a feature service, go into the Settings for the feature service layer(s) and select Allow others to export to different formats.
Publish the Site
Changes made in the site editor are staged as drafts until you publish them.
Deleting a Site
You can delete a site by clicking the ellipsis (...) at the top of the editor page and selecting Delete site.