Geospatial Data Storage in ArcGIS Pro

This tutorial will give a basic overview of how geospatial data is stored and organized in ArcGIS Pro.

Storage Architecture in ArcGIS Pro

The key words in geographic information systems are geographic information.

Data is "factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" (Merriam-Webster 2020).

Geospatial data is "data about objects, events, or phenomena that have a location on the surface of the earth" (Stock and Guesgen 2016). Geospatial data is "what is where" (Gritzner 2002).

A database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)" (Merriam-Webster 2020).

A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).

Feature classes are individual geographic datasets within a geodatabase. In non-ESRI terms, these can also be referred to as tables.

Each individual geographic entity in a feature class is called a feature. For example, you could have a feature class for roads in a county, where each road would be a feature (ESRI 2020).

Figure
The relationship between geodatabases, feature classes, features, maps, and layouts

File Systems

Computer file systems are collections of files (often on a single storage device) that are organized in a hierarchical structure of folders, which are also sometimes referred to as directories.

Hierarchies

Hierarchical file systems organize data files in categories and subcategories. This categorization not only makes it easier to understand how to access specific files or groups of files, but also facilitates portability, sharing, and access control.

While in the era of the cloud, end users often access conceptually flat collections of data using search engines (on the web) and search functions (in apps), the administrators and developers of online systems still need to organize data hierarchically to make management easier. Accordingly, as a GIS professional, you will need to be able to both navigate existing hierarchical file systems and develop hierarchical categorizations to organize your data.

The following is an example of a hierarchy

Figure
Example hierarchy of food sources

Hierarchies in text are commonly written as text using indentation and different bullet point symbols to visually show the relationship

File Paths

A file path points to a specific file within the hierarchical folder structure of a file system (W3Schools 2022). A path consists of three components:

This example shows the hierarchy in Windows Explorer to the file at:

U:\Documents\ArcGIS\Projects\Street Work\Street Work.aprx
Figure
Example folder hierarchy and file path

File System Names

Computers can have multiple file systems. The specific file system is the leftmost term in a file path. File systems can be specified in multiple ways on Windows machines (Microsoft 2022):

Windows File Systems

The primary (local) file system on a Windows computer is usually the C: drive. A: and B: were drives for removable floppy disks, which are now obsolete (Wikipedia 2022).

The primary file system is used for operating system and installed program files.

Figure
A typical C: drive

Home Directories

A home directory is "a file system directory on a multi-user operating system containing files for a given user of the system" (Wikipedia 2022).

Home directories on Windows systems have a standard set of subfolders for user files.

The full paths to these locations depends on the configuration of the system.

You can view the location of folder by right clicking on the folder in the Windows Explorer and showing the Properties. In this example, the Documents user subfolder in a home directory on a network drive is located at \\192.168.100.3\DeptUsers\minn2\Documents.

Figure
A user folder on a network file system

Network file systems are sometimes also mounted to drive letters. For example, the Documents folder in this network file system is also located under the U: drive letter.

Figure
A network folder also visible under a drive letter

File Types

A file's type determines what kinds of software can operate on that file. A file name extension is suffix on a file name consisting of a period and (usually) three or four letters that indicate the file type.

Some common file extensions (with the software typically used to view or edit the files) include:

Projects

An ArcGIS Pro project brings together data from different sources to perform mapping and analysis.

Figure
Projects in ArcGIS Pro

Project Folders

Files associated with a project are kept in a project folder, which is usually a subdirectory under <home>\Documents\ArcGIS\Projects.

The project folder contains:

Examining a project folder

Project File (.aprx)

A project file is a file with an .aprx suffix that contains design and source information for data, maps, and layouts that are part of the project.

Project Geodatabase

The project geodatabase is the default geodatabase used for storing geospatial data that is imported or created as part of your project.

Viewing the contents of the project file geodatabase in the Catalog Pane

Project Packages

A project package is a single-file bundle all of the files, feature classes, and map design information used in a project. Project package file names have the suffix .ppkx.

Figure
Projects in ArcGIS Pro

To create a project package file:

  1. On the Share tab and Package area, select Project.
  2. Give your project a meaningful name. The same name as the project is usually a good choice.
  3. Unless you are working with a group, you can usually just leave the Summary and Tags boxes blank.
  4. Share Outside Organization should be checked if your project uses files (like shapefiles or feature services) that are not included in your project geodatabase.
  5. Include History should generally not be checked. Project packages require your history to be free of errors. If you run a tool and it fails, your history will contain that error and saving the project package may fail with a cryptic error message.
  6. Include Toolboxes should be checked if you are using ModelBuilder or Python notebooks.
  7. Click Analyze and fix any identified problems. Unfortunately, analyze is of limited value and will often miss major problems that will cause your packaging to fail later.
  8. Click Package to create the package and upload. This can take a few minutes if your project contains large data files like rasters.
  9. To get a link to your project package, view your Contents page in ArcGIS Online, click on the name to open the information page for the project package, and copy the URL from the location bar.
Saving a project package

Reopening Project Packages

To reopen a project from a project package, go to Project, Open, Portal, My Content, and select the package to open.

Reopening a project from a project package

Project Versions

Once you have reopened a project package and modified the project, you should always work from a reopened project package.

Distribution and Storage File Formats

Although geospatial data is often distributed as services, there are still numerous instances where geospatial data needs to be stored and / or distributed in computer files.

This section briefly describes some common types of geospatial data file formats and how to import them into ArcGIS Pro for mapping and/or publication.

CSV / Excel With Latitudes and Longitudes

Point features can be stored in simple table formats like comma-separated variable (CSV) files as columns of latitude and longitude associated on each row with specific attributes at those latitudes and longitudes.

Aside from ease of use, the simplicity of a CSV file has an advantage in its potential for data preservation since CSV files will likely be readable for generations to come, while complex file formats (especially proprietary formats) will become obsolete as technology changes.

While you can use CSV files with feature name fields to represent lines (like roads) or polygons (areas like neighborhoods or census tracts), these files can be complex and unweildy to manage manually, and you will generally want to use specialized geospatial data file formats like the shapefile for non-point geometries.

The following video demonstrates how to import a CSV file with latitudes and longitudes to create a feature class. This example uses data on COVID-19 cases and fatalities in US states gathered Johns Hopkins University of Medicine and made available through GitHub.

  1. Acquire: Create or download the CSV file with location coordinates in columns named "Latitude" and "Longitude"
  2. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  3. Add: Select Add Data -> XY Point Data to run the XY Table to Point tool.
  4. Symbolize: Adjust the symbology of the layer as needed.
  5. Present: Create a print layout with a legend.
    • Export to create a figure or printable map.
Creating a Feature Class From a CSV File With Latitudes and Longitudes

CSV / Excel With Place Names

You can also store point data in CSV files that have place names rather than lat / long for the location identifiers.

Locations on the surface of the earth are often referenced using place names rather than numeric coordinates - most notably in street addresses. Geocoding is the process of converting from place names to latitude/longitude coordinates. Geocoding involves parsing the place names into component parts (state, city, street name, number, east/west, etc), and then looking up those parts in a large database of possible locations in an area.

Because there are billions of possible addresses, and a variety of different formats for writing addresses, accurate geocoding requires large, expensive databases and powerful computers. Geocoding using artificial intelligence is an active area of development (Lee, Claridades, and Lee 2020). Google Maps is popular, in part, because Google has built the technology and resources to support lightning-fast geocoding of both addresses and landmark/business names.

Geocoding always involves some level of uncertainty, and you should verify that all the geocoded points are in the right place if the accuracy of your map is important.

In this example, we use a CSV file of in-person FedEx shipping or printing locations from the company's website. Although they provide a map, the only location information directly available is the address, which was entered into a spreadsheet with the columns: Name, Address, City, and State. Geocoding of addresses is a common task needed when performing business analysis in GIS.

  1. Acquire: Download the CSV file or create it in a spreadsheet program like Excel and save it as a CSV file.
  2. Start: Create a new map in ArcGIS Pro.
  3. Add: In a new map, Add Data with the table.
    • Right-click on the table and select Geocode Table.
    • Follow the instructions to choose the options for geocoding.
  4. Symbolize: Adjust the symbology for the new layer.
    • Change the map projection to WGS 1984 Web Mercator, which is a good generic projection used with web maps.
    • Add labels, if desired.
    • Adjust the base map in case the map is cluttered.
  5. Present: Create a print layout
    • Add a legend (if needed).
    • Export to print or insert as a figure.
Creating a Feature Class From a CSV File With Addresses

Shapefiles

The shapefile is a geospatial data file format developed by ESRI with a standard published in 1998.

While the age of this format is reflected in its numerous limitations (such as column name length limit of 10 characters), this format is still commonly used for distributing geospatial data because it is reliable and well supported by a wide variety of GIS software.

The term shapefile is a misnomer since a shapefile is actually a collection of at least three (and usually more) separate files that store the locational data, the characteristics associated with those locations, and other information about the data.

Some common files associated with a shapefile include (listed by the file extension):

Figure
The four files of a shapefile viewed in the Windows file explorer (.dbf, .prj, .sjp, and .shx)

To help keep all these files together, they are usually compressed into a single .zip archive file for distribution on websites and servers.

  1. Acquire: Download the shapefile .zip archive from the website.
  2. Process: In Windows Explorer, extract the contents of the .zip archive file.
  3. Store: Under Analysis, Tools find the Export Features tool to export the shapefile data into a feature class in the project geodatabase.
  4. Communicate: Symbolize as needed.
Importing a shapefile as a new feature class

Keyhole Markup Language (KML)

Google Earth/Maps exchanges geospatial data in the Keyhole Markup Language (KML) format that is based on Extensible Markup Language (XML).

KML is designed for the web and contains information on how the geospatial data should be displayed on a web map like Google Maps, or in Google Earth. While you can use KML files for general exchange of spatial data, it usually only optimal when exchanging data to or from Google apps.

KML import

GeoJSON

GeoJSON is an extension to JavaScript Object Notation (JSON) that is used for data displayed in web maps.

GeoJSON import

GPX

The GPS Exchange Format (GPX) is another XML-based format that is commonly exported by GPS tracker apps in smartphones to store GPS points.

Although GPX files can contain a variety of different types of data, the data of primary interest is usually a sequence of waypoints, which are GPS latitude/longitude locations that are regularly captured by the tracking app as it records.

Waypoints also commonly include the date and time when they were recorded and an elevation supplied by GPS.

The approximate paths of travel can be connecting the waypoints with lines. Speed can be estimated based on the distance between waypoints divided by the difference in time between waypoints.

GPX files can be imported into ArcGIS Pro using the GPX To Features tool, which can import the waypoints as individual point features, or as a path line.

GPX import

ESRI File Geodatabases

In 2006, ESRI introduced geospatial data file format called the file geodatabase.

Figure
Individual files in a project file geodatabase displayed in Windows Explorer

AutoCAD

If you work with GIS in the construction industry, you will likely see geospatial data stored in the files used by the engineering drafting program AutoCAD. However, AutoCAD is a general use drafting program for objects of all sizes and the proprietary file format often does not contain adequate coordinate or attribute information to allow data to be transferred directly into GIS software.

Figure
AutoCAD

Raster Data Formats

Remotely-sensed raster data from satellites and other aerial platforms is stored in a wide variety of formats like:

These file formats are specialized to raster data and are discussed in much greater practical detail in classes or tutorials on remote sensing.

Figure
MODIS NDVI For the USA

Data Management Tasks

ArcGIS Pro has dozens of tools for performing helpful data management tasks that you can review in this An overview of the Data Management toolbox.

This section covers a handful of common data management tools and tasks.

Figure
Overview of the data management toolbox

Exporting Data

There may be occasions where you need to export data to a file either to archive the data or to share it with a collaborator or the community. As described above, while shapefiles have serious limitations, they are a safe choice when sharing geospatial data.

  1. Under Analysis, Tools search for the Feature Class to Shapefile tool.
  2. The Input Features is the feature class you want to export.
  3. Under Output Folder create a new folder for the shapefile files.
  4. Run the tool.
  5. In Windows Explorer, right click on the folder containing the shapefile files, and select Send to, Compressed (zipped) folder.
  6. You can then use the .zip file to share the shapefile.
Exporting a shapefile

Merging Data

You may have a need to merge (combine) two or more feature classes into a single feature class when updating or augmenting a data set.

  1. If you are bringing in data from a shapefile, unzip the shapefile into a folder that can be read by ArcGIS Pro.
  2. Under Analysis, Tools find the Merge tool.
  3. For the Input Datasets select the feature class(es) and shapefile(s) that you want to merge (Illinois_Tourism feature class and Missouri_Tourism shapefile).
  4. Give the Output Dataset a meaningful name in the project database (Illinois_Missouri).
  5. If needed, adjust the Field Map to make sure all fields are combined to appropriate output fields.
  6. Run the tool.
  7. Symbolize the merged dataset if needed.
Merging data from a shapefile

Repairing Broken Source Paths

You may occasionally encounter an issue with broken source path, where the data for a layer is no longer at the file path stored in the project file. This will be indicated with a red exclamation mark beside the broken layer in the Contents pane.

Situations like this can occur when:

To fix a broken source path:

  1. Recover the data needed for the layer and make sure it is in your local file system.
  2. Use the Windows File Explorer to find the folder name where the data is located.
  3. Right-click on the broken layer and select Properties and Source.
  4. Click Set Data Source to change the source path to the correct location.
  5. When you click OK, the data should appear on your map.
Repairing a broken source path

Set Data Source

When possible, it is a good practice to bring your data into the project geodatabase rather than relying on external shapefiles or file geodatabases. Shapefiles can be slow to work with and it is easy to misplace external data files, rendering your project unusable.

For this example, the layer on the map was originally added from a shapefile with Add Data. To copy the data into the geodatabase and change the source without having to recreate the symbology:

  1. From Analysis, Tools run the Export Features tool to copy the shapefile data into a new feature class in the project geodatabase.
  2. Remove the new layer automatically added by Export Features since you will be setting the data source on your old layer.
  3. In the Contents pane for the layer's Properties, view the Source and Set Data Source from the shapefile to the new feature class.
Set data source from a shapefile to a feature class in the project geodatabase

Appendix

Digital Data

In contemporary geographic information systems, geospatial data is stored as digital data.

As the name implies, digital data consists of digits or numbers. Internally, digital electronic technology represents data as binary signals (bits) that are either on or off. This binary representation allows a high level of flexibility and accuracy in the representation and processing of data.

For historical reasons, bits are clumped into groups of eight that are called bytes. If you run through all the possible combinations of eight bits, you will find that a byte can have 256 different values (numbers 0 - 255). This is enough for each byte to represent a single character in most languages, so a five-character word like Hello requires five bytes to store.

To improve speed, modern computers process multiple bytes at one time as words. Although mobile devices and older computers use 32-bit words (four bytes), most contemporary laptops and desktops use 64-bit words (eight bytes).

Figure
Computer memory words

The amount of storage in a computer or storage device is usually measured by the number of bytes that it can store. Because storage devices can store trillions or quadrillions of bytes, Greek prefixes are used to make referring to numbers of bytes easier. However, because this is digital data, powers of two are used, making the decimal numbers look a bit sloppy

Physical Storage Media

Digital data is stored on a variety of physical media, depending on how quickly the data needs to be accessed, how much data needs to be stored, and whether the data needs to continue to exist when the digital device is turned off or rebooted.

Considerations When Choosing Storage Formats and Platforms

A number of factors need to be considered when choosing the appropriate storage hardware and formats for a project. Those needs are driven by the organizational size and mission: What are you ultimately trying to accomplish with the data?

  1. Number of readers
    • How many people need to access the data?
    • How quickly do they need access to the data?
  2. Number of editors
    • How many people capture, process and maintain the data?
    • Will multiple be working on the data at the same time?
  3. Frequency of change:
    • How often is the data changed?
    • How quickly do changes need to be available to users?
  4. Volume and types of data:
    • How much data exists?
    • How much data will exist?
    • How many different types of data need to be kept together?
    • How will needs grow or shrink over time?
  5. Access security:
    • Who needs access to the data?
    • Who should be kept out of the data?
    • Do federal or state regulations require restricting access to the data?
    • How do the costs of a security breach balance against the costs of security?
  6. Availability security:
    • What would happen if this data were lost or destroyed?
    • Who will perform backups?
    • Does this data need to survive this project?
  7. Cost:
    • Will this be compatible with existing processes?
    • What are the set-up and maintenance costs for storage?
    • What can we afford in terms of both capital investment and manpower?
    • Do managers or co-workers have a preconceived bias against a technology?