Geospatial Data Storage in ArcGIS Pro

Rev. 14 January 2025

This tutorial will give a basic overview of how geospatial data is stored and organized in ArcGIS Pro.

Storage Architecture in ArcGIS Pro

The key words in geographic information systems are geographic information.

Data is "factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" (Merriam-Webster 2020).

Geospatial data is "data about objects, events, or phenomena that have a location on the surface of the earth" (Stock and Guesgen 2016). Geospatial data is "what is where" (Gritzner 2002).

A database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)" (Merriam-Webster 2020).

A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).

A feature class is a geographic dataset within a geodatabase that contain features of the same geometric type (points, lines, polygons) and a common set of attributes (ESRI 2024).

Each individual geographic entity in a feature class is called a feature. For example, in a feature class for roads in a county, each road segment would be a feature (ESRI 2020).

Figure
The relationship between geodatabases, feature classes, features, maps, and layouts

File Systems

Computer file systems are collections of files (often on a single storage device) that are organized in a hierarchical structure of folders, which are also sometimes referred to as directories.

Hierarchies

Hierarchical file systems organize data files in categories and subcategories. This categorization not only makes it easier to understand how to access specific files or groups of files, but also facilitates portability, sharing, and access control.

While in the era of the cloud, end users often access conceptually flat collections of data using search engines (on the web) and search functions (in apps), the administrators and developers of online systems still need to organize data hierarchically to make management easier. Accordingly, as a GIS professional, you will need to be able to both navigate existing hierarchical file systems and develop hierarchical categorizations to organize your data.

The following is an example of a hierarchy

Figure
Example hierarchy of food sources

Hierarchies in text are commonly written as text using indentation and different bullet point symbols to visually show the relationship

File Paths

A file path points to a specific file within the hierarchical folder structure of a file system (W3Schools 2022). A path consists of three components:

This example shows the hierarchy in Windows Explorer to the file at:

U:\Documents\ArcGIS\Projects\Street Work\Street Work.aprx
Figure
Example folder hierarchy and file path

File System Names

Computers can have multiple file systems. The specific file system is the leftmost term in a file path. File systems can be specified in multiple ways on Windows machines (Microsoft 2022):

Windows File Systems

The primary (local) file system on a Windows computer is usually the C: drive. A: and B: were drives for removable floppy disks, which are now obsolete (Wikipedia 2022).

The primary file system is used for operating system and installed program files.

Figure
A typical C: drive

Home Directories

A home directory is "a file system directory on a multi-user operating system containing files for a given user of the system" (Wikipedia 2022).

Home directories on Windows systems have a standard set of subfolders for user files.

The full paths to these locations depends on the configuration of the system.

You can view the location of folder by right clicking on the folder in the Windows Explorer and showing the Properties. In this example, the Documents user subfolder in a home directory on a network drive is located at \\192.168.100.3\DeptUsers\minn2\Documents.

Figure
A user folder on a network file system

Network file systems are sometimes also mounted to drive letters. For example, the Documents folder in this network file system is also located under the U: drive letter.

Figure
A network folder also visible under a drive letter

File Types

A file's type determines what kinds of software can operate on that file. A file name extension is suffix on a file name consisting of a period and (usually) three or four letters that indicate the file type.

Some common file extensions (with the software typically used to view or edit the files) include:

Projects

An ArcGIS Pro project brings together data from different sources to perform mapping and analysis.

Figure
Projects in ArcGIS Pro

Project Folders

Files associated with a project are kept in a project folder, which is usually a subdirectory under <home>\Documents\ArcGIS\Projects.

The project folder contains:

Examining a project folder

Project File (.aprx)

A project file is a file with an .aprx suffix that contains design and source information for data, maps, and layouts that are part of the project.

Project Geodatabase

The project geodatabase is the default geodatabase used for storing geospatial data that is imported or created as part of your project.

Viewing the contents of the project file geodatabase in the Catalog Pane

Project Packages

A project package is a single-file bundle all of the files, feature classes, and map design information used in a project. Project package file names have the suffix .ppkx.

Figure
Projects in ArcGIS Pro

To create a project package file:

  1. On the Share tab and Package area, select Project.
  2. Give your project a meaningful name. The same name as the project is usually a good choice.
  3. Unless you are working with a group, you can usually just leave the Summary and Tags boxes blank.
  4. Share Outside Organization should be checked if your project uses files (like shapefiles or feature services) that are not included in your project geodatabase.
  5. Include History should generally not be checked. Project packages require your history to be free of errors. If you run a tool and it fails, your history will contain that error and saving the project package may fail with a cryptic error message.
  6. Include Toolboxes should be checked if you are using ModelBuilder or Python notebooks.
  7. Click Analyze and fix any identified problems. Unfortunately, analyze is of limited value and will often miss major problems that will cause your packaging to fail later.
  8. Click Package to create the package and upload. This can take a few minutes if your project contains large data files like rasters.
  9. To get a link to your project package, view your Contents page in ArcGIS Online, click on the name to open the information page for the project package, and copy the URL from the location bar.
Saving a project package

Reopening Project Packages

To reopen a project from a project package, go to Project, Open, Portal, My Content, and select the package to open.

Reopening a project from a project package

Project Versions

Once you have reopened a project package and modified the project, you should always work from a reopened project package.

Tabular Data

Geospatial data is often stored and distributed in tables.

Figure
A CSV file with latitudes and longitudes opened for editing in Excel
Figure
A CSV file opened in Notepad showing rows and comma column delimiters

CSV with Latitudes and Longitudes

CSV files can be used for geospatial data with columns of latitude and longitude associated on each row with specific attributes at those latitudes and longitudes.

The following video demonstrates how to import a CSV file with latitudes and longitudes to create a feature class. This example uses data on COVID-19 cases and fatalities in US states gathered Johns Hopkins University of Medicine and made available through GitHub.

  1. Acquire: Create or download the CSV file with location coordinates in columns named "Latitude" and "Longitude"
  2. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  3. Add: Select Add Data -> XY Point Data to run the XY Table to Point tool.
  4. Symbolize: Adjust the symbology of the layer as needed.
  5. Present: Create a print layout with a legend.
    • Export to create a figure or printable map.
Creating a Feature Class From a CSV File With Latitudes and Longitudes

CSV with ISO Country Codes

The International Organization for Standardization (ISO), an international non-governmental organization, defines a set of three-letter country codes (alpha-3) that are commonly used to uniquely identify countries in data.

Indicator data from the World Bank includes ISO country codes, and they can be used to join downloaded table data with country polygons for choropleth mapping. ISO codes are preferred to country names to avoid mismatches due to differences in spelling, capitalization, and name formality.

For this example we will use the Annual freshwater withdrawals, total (% of internal resources) indicator from the World Bank. Country polygons are from Natural Earth via the Minn 2023 World Polygons feature service in the U of I ArcGIS Online organization.

  1. Acquire: Download the indicator data into Excel,
    • Remove the unneeded columns.
    • Rename the year data column to something meaningful (Freshwater_Percent_2021).
    • Export to a CSV file (freshwater.csv).
  2. Store: Run Export Table to bring the table into the project geodatabase (Freshwater_Table).
  3. Store: Under Analysis, Tools run the Export Features tool to import the world polygons.
  4. Process: Under Analysis, Tools run the Join Field tool to join the data field to the country polygons.
  5. Communicate: Symbolize the resulting layer by the new variable.
Creating a feature class by joining using ISO country codes

CSV with US GEOIDFQ Codes

Data from the US Census Bureau (USCB) can be downloaded from data.census.gov in table format for a variety of geographic area types.

The USCB tables include fully-qualified GEOIDs (GEOIDFQ) that combine FIPS codes (see below) with prefixes that indicate what type of area (state, county, tract, etc.) is represented on each row.

For this example we create a feature class from the 2019-2023 ACS DP02 table of the estimated number of people in each county (2019 - 2023) who lived in a different state one year prior (DP02_086E). We also include the population field (DP02_0079E) so we can normalize the number of people to a percent.

  1. Acquire: Download and unzip the county-level DP02 table from data.census.gov.
  2. Process: Clean up the file in Excel.
  3. Acquire: Download the TIGER Cartographic Boundary File with county polygons and unzip the file.
  4. Store: Run Export Table to bring the table into the project geodatabase (Different_State_Table).
  5. Store: Under Analysis, Tools run the Export Features tool to export the shapefile polygons into a feature class in the project geodatabase (Different_State).
  6. Process: Under Analysis, Tools run the Join Field tool to join the data field to the county polygons.
  7. Communicate: Symbolize the resulting layer by the different state count normalized by population.
Creating a feature class by joining using fully-qualified GEOIDs

CSV with FIPS Codes

US data from organizations other than the US Census Bureau often provide locations as FIPS codes that can be used to construct fully qualified GEOIDs for joining with US Census Bureau polygon data. GEOIDs and FIPS codes are described in more detail in this tutorial.

For this example, we use county-level data from the 2020 US Religion Census, which is produced by the Association of Statisticians of American Religious Bodies.

  1. Acquire: Download and open the county-level summary data.
  2. Store: Import the CSV file into the project geodatabase.
  3. Acquire: Download the TIGER Cartographic Boundary File with county polygons and unzip the file.
  4. Store: Import the county polygons into the project geodatabase.
  5. Process: Under Analysis, Tools run the Join Field tool to join the data field to the county polygons.
  6. Communicate: Symbolize the resulting layer by the desired variable.
Creating a feature class by joining using FIPS codes

Geocoding to Points

Locations on the surface of the earth are often referenced using place names rather than numeric coordinates - most notably in street addresses.

Geocoding is the process of converting place names to latitude/longitude coordinates.

In this example, we use a CSV file of in-person FedEx shipping or printing locations from the company's website. Although they provide a map, the only location information directly available is the address, which was entered into a spreadsheet with the columns: Name, Address, City, and State. Geocoding of addresses is a common task needed when performing business analysis in GIS.

  1. Acquire: Download the CSV file or create it in a spreadsheet program like Excel and save it as a CSV file.
  2. Start: Create a new map in ArcGIS Pro.
  3. Add: In a new map, Add Data with the table.
    • Right-click on the table and select Geocode Table.
    • Follow the instructions to choose the options for geocoding.
  4. Symbolize: Adjust the symbology for the new layer.
    • Change the map projection to WGS 1984 Web Mercator, which is a good generic projection used with web maps.
    • Add labels, if desired.
    • Adjust the base map in case the map is cluttered.
  5. Present: Create a print layout
    • Add a legend (if needed).
    • Export to print or insert as a figure.
Creating a feature class from a CSV file with addresses

Geocoding to Areas

If you have data for areas, but only have place names, you can geocode to points and then spatial join the points to area polygons.

Geocoding to areas is unreliable and error prone, so this is a technique of last resort and you should use a standardized code whenever one is available.

The Food and Agriculture Organization of the United Nations (FAO) collects a vast array of open country-level agricultural data and makes it available to the public through their FAOSTAT web portal.

This example uses FAO data for almond production by country. While almonds are delicious and nutritious (THC School of Public Health 2023) growing almonds is very water-intensive (Fulton, Norton, and Shilling 2019), so knowing where almonds are grown can point us to areas where agriculture may have a detrimental water footprint.

The only geospatial components the FAO tables contain are country names and FAO country codes, so one approach to creating choropleths is to geocode by country name to create points that can be joined to polygons for choropleth mapping.

  1. Acquire: Download the table and open it in a spreadsheet program like Excel.
  2. Process: Clean up the table.
  3. Process: Under Analysis, Tools run the Geocode Addresses tool.
  4. Store: Under Analysis, Tools run the Export Features tool to import the world polygons.
  5. Process: Under Analysis, Tools run the Spatial Join tool to join the table data to the country polygons.
  6. Communicate: Symbolize the resulting layer by the new variable.
Creating a choropleth from a geocoded table

Geospatial File Formats

There are a variety of common data file formats specifically designed for geospatial data that can be read into ArcGIS Pro for mapping and/or publication.

Shapefiles

The shapefile is a geospatial data file format developed by ESRI with a standard published in 1998.

While the age of this format is reflected in its numerous limitations (such as column name length limit of 10 characters), this format is still commonly used for distributing geospatial data because it is reliable and well supported by a wide variety of GIS software.

The term shapefile is a misnomer since a shapefile is actually a collection of at least three (and usually more) separate files that store the locational data, the characteristics associated with those locations, and other information about the data.

Some common files associated with a shapefile include (listed by the file extension):

Figure
The four files of a shapefile viewed in the Windows file explorer (.dbf, .prj, .sjp, and .shx)

To help keep all these files together, they are usually compressed into a single .zip archive file for distribution on websites and servers.

  1. Acquire: Download the shapefile .zip archive from the website.
  2. Process: In Windows Explorer, extract the contents of the .zip archive file.
  3. Store: Under Analysis, Tools find the Export Features tool to export the shapefile data into a feature class in the project geodatabase.
  4. Communicate: Symbolize as needed.
Importing a shapefile as a new feature class in the project geodatabase

GeoJSON

GeoJSON is an extension to JavaScript Object Notation (JSON) that is used for data displayed in web maps.

GeoJSON import

GPX

The GPS Exchange Format (GPX) is another XML-based format that is commonly exported by GPS tracker apps in smartphones to store GPS points.

GPX files can be imported into ArcGIS Pro using the GPX To Features tool, which can import the waypoints as individual point features, or as a path line.

GPX import

ESRI File Geodatabases

In 2006, ESRI introduced geospatial data file format called the file geodatabase.

Figure
Individual files in a project file geodatabase displayed in Windows Explorer

Keyhole Markup Language (KML)

Google Earth/Maps exchanges geospatial data in the Keyhole Markup Language (KML) format that is based on Extensible Markup Language (XML).

KML is designed for the web and contains information on how the geospatial data should be displayed on a web map like Google Maps, or in Google Earth. While you can use KML files for general exchange of spatial data, it usually only optimal when exchanging data to or from Google apps.

KML import

AutoCAD

If you work with GIS in the construction industry, you will likely see geospatial data stored in the files used by the engineering drafting program AutoCAD. However, AutoCAD is a general use drafting program for objects of all sizes and the proprietary file format often does not contain adequate coordinate or attribute information to allow data to be transferred directly into GIS software.

Figure
AutoCAD

Raster Data Formats

Remotely-sensed raster data from satellites and other aerial platforms is stored in a wide variety of formats like:

These file formats are specialized to raster data and are discussed in much greater practical detail in classes or tutorials on remote sensing.

Figure
MODIS NDVI For the USA

Data Management Tasks

ArcGIS Pro has dozens of tools for performing helpful data management tasks that you can review in this An overview of the Data Management toolbox.

This section covers a handful of common data management tools and tasks.

Figure
Overview of the data management toolbox

Exporting Data

There may be occasions where you need to export data to a file either to archive the data or to share it with a collaborator or the community. As described above, while shapefiles have serious limitations, they are a safe choice when sharing geospatial data.

  1. Under Analysis, Tools search for the Feature Class to Shapefile tool.
  2. The Input Features is the feature class you want to export.
  3. Under Output Folder create a new folder for the shapefile files.
  4. Run the tool.
  5. In Windows Explorer, right click on the folder containing the shapefile files, and select Send to, Compressed (zipped) folder.
  6. You can then use the .zip file to share the shapefile.
Exporting a shapefile

Merging Data

You may have a need to merge (combine) two or more feature classes into a single feature class when updating or augmenting a data set.

  1. If you are bringing in data from a shapefile, unzip the shapefile into a folder that can be read by ArcGIS Pro.
  2. Under Analysis, Tools find the Merge tool.
  3. For the Input Datasets select the feature class(es) and shapefile(s) that you want to merge (Illinois_Tourism feature class and Missouri_Tourism shapefile).
  4. Give the Output Dataset a meaningful name in the project database (Illinois_Missouri).
  5. If needed, adjust the Field Map to make sure all fields are combined to appropriate output fields.
  6. Run the tool.
  7. Symbolize the merged dataset if needed.
Merging data from a shapefile

Repairing Broken Source Paths

You may occasionally encounter an issue with broken source path, where the data for a layer is no longer at the file path stored in the project file. This will be indicated with a red exclamation mark beside the broken layer in the Contents pane.

Situations like this can occur when:

To fix a broken source path:

  1. Recover the data needed for the layer and make sure it is in your local file system.
  2. Use the Windows File Explorer to find the folder name where the data is located.
  3. Right-click on the broken layer and select Properties and Source.
  4. Click Set Data Source to change the source path to the correct location.
  5. When you click OK, the data should appear on your map.
Repairing a broken source path

Set Data Source

When possible, it is a good practice to bring your data into the project geodatabase rather than relying on external shapefiles or file geodatabases. Shapefiles can be slow to work with and it is easy to misplace external data files, rendering your project unusable.

For this example, the layer on the map was originally added from a shapefile with Add Data. To copy the data into the geodatabase and change the source without having to recreate the symbology:

  1. From Analysis, Tools run the Export Features tool to copy the shapefile data into a new feature class in the project geodatabase.
  2. Remove the new layer automatically added by Export Features since you will be setting the data source on your old layer.
  3. In the Contents pane for the layer's Properties, view the Source and Set Data Source from the shapefile to the new feature class.
Set data source from a shapefile to a feature class in the project geodatabase

Appendix

Digital Data

In contemporary geographic information systems, geospatial data is stored as digital data.

As the name implies, digital data consists of digits or numbers. Internally, digital electronic technology represents data as binary signals (bits) that are either on or off. This binary representation allows a high level of flexibility and accuracy in the representation and processing of data.

For historical reasons, bits are clumped into groups of eight that are called bytes. If you run through all the possible combinations of eight bits, you will find that a byte can have 256 different values (numbers 0 - 255). This is enough for each byte to represent a single character in most languages, so a five-character word like Hello requires five bytes to store.

To improve speed, modern computers process multiple bytes at one time as words. Although mobile devices and older computers use 32-bit words (four bytes), most contemporary laptops and desktops use 64-bit words (eight bytes).

Figure
Computer memory words

The amount of storage in a computer or storage device is usually measured by the number of bytes that it can store. Because storage devices can store trillions or quadrillions of bytes, Greek prefixes are used to make referring to numbers of bytes easier. However, because this is digital data, powers of two are used, making the decimal numbers look a bit sloppy

Physical Storage Media

Digital data is stored on a variety of physical media, depending on how quickly the data needs to be accessed, how much data needs to be stored, and whether the data needs to continue to exist when the digital device is turned off or rebooted.

Considerations When Choosing Storage Formats and Platforms

A number of factors need to be considered when choosing the appropriate storage hardware and formats for a project. Those needs are driven by the organizational size and mission: What are you ultimately trying to accomplish with the data?

  1. Number of readers
    • How many people need to access the data?
    • How quickly do they need access to the data?
  2. Number of editors
    • How many people capture, process and maintain the data?
    • Will multiple be working on the data at the same time?
  3. Frequency of change:
    • How often is the data changed?
    • How quickly do changes need to be available to users?
  4. Volume and types of data:
    • How much data exists?
    • How much data will exist?
    • How many different types of data need to be kept together?
    • How will needs grow or shrink over time?
  5. Access security:
    • Who needs access to the data?
    • Who should be kept out of the data?
    • Do federal or state regulations require restricting access to the data?
    • How do the costs of a security breach balance against the costs of security?
  6. Availability security:
    • What would happen if this data were lost or destroyed?
    • Who will perform backups?
    • Does this data need to survive this project?
  7. Cost:
    • Will this be compatible with existing processes?
    • What are the set-up and maintenance costs for storage?
    • What can we afford in terms of both capital investment and manpower?
    • Do managers or co-workers have a preconceived bias against a technology?