Geospatial Data Storage and Distribution in ArcGIS Pro
Geospatial Data and ArcGIS Pro
The key words in geographic information systems are geographic information.
Data is "factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" (Merriam-Webster 2020).
Geospatial data is "data about objects, events, or phenomena that have a location on the surface of the earth" (Stock and Guesgen 2016). Geospatial data is "what is where" (Gritzner 2002).
A database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)" (Merriam-Webster 2020).
A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).
ESRI refers to individual geographic datasets within a geodatabase as feature classes. In non-ESRI terms, these can also be referred to as tables. Each individual element in a feature class is called a feature. For example, you could have a feature class for roads in a county, where each road would be a feature (ESRI 2020).
In ArcGIS Pro, a project brings together data from one or more geodatabases to perform analysis and mapping.
Local vs. Server Databases
A server is a computer on a network that is dedicated to managing network resources. Servers provide services to other computers on that network called clients that need those services (Techopedia 2020).
A cloud is a collection of servers in one or more data centers that offer a variety of services to clients via the internet. While a cloud looks like a single server providing different services, in reality those services can be provided by multiple servers located around the world to improve speed and provide backup when server hardware fails.
In the ESRI GIS world, the primary cloud environment is ArcGIS Online, which provides not just data storage services, but also supports a variety of web apps and other GIS services.
If you are working in the ArcGIS Online web app, everything you do is kept on ESRI's servers in a cloud, and all your local computer is doing is displaying the information from the server in a browser window.
However, when working with a desktop GIS program like ArcGIS Pro, you have a choice whether to keep your data in a local geodatabase on the desktop computer or keep the data on a server. ArcGIS Pro is tightly integrated with ArcGIS Online, making it possible to easily store your data. But there are often times when it is more appropriate to keep your data in a local project geodatabase.
|Local Machine / Project||Server / ArcGIS Oline|
|Working alone||Working with a group|
|Working only on one desktop machine||Need access on multiple machines (such as for work-from-home)|
|Temporarily exploring new data that you are not sure you will need||Data that you will be using in the future|
|Data requiring heavy access for analysis (such as remote sensing analysis)||Data frequent access in short bursts|
|Data changes rarely||Data changes frequently and you need the most current version|
|Private data that you don't want anyone else to see||Data that will be shared within an organization and/or with the general public|
|Unimportant data that can easily be replaced if something bad happens to your machine||Mission-critical data that needs to be protected and backed-up by IT professionals|
Using Services in ArcGIS Pro
Public Feature Services
Data stored on a server can be made accessible to the public through a feature service endpoint, which is a URL to a service on server. These are sometimes referred to as REST endpoints for the type of technology they use (representational state transfer).
In this example, the US Geological Survey makes their National Map data available through service endpoints listed on their service endpoints page., and we will use the federal wetlands service.
- Copy the REST endpoint URL.
- In ArcGIS Pro, create a new map and on the Map -> Add Data dropdown, select Data From Path and paste in the endpoint URL.
- Because this data is very detailed, the layer is configured to only display in small geographic areas. So we zoom in on Urbana, IL.
- Create a print layout and add a legend.
- Export the map for printing or insertion as a figure.
Open Data Portals
Government agencies often make their geospatial data available through their open data portals. These portals occasionally provide links to feature services that are accessible to the general public, although they more commonly simply provide links to download the data as files.
You can find these open data portals by Googling "name_of_place open data" or "name_of_place gis data."
Although a variety of software exists to implement data portals, one package commonly used by city, county, and state governments is ESRI's ArcGIS Hub.
In this example, the Alabama GeoHub is implemented using ArcGIS Hub, which is obvious from the look of the site as well as the arcgis.com at the end of the URL.
We will use their layer of hospital locations.
- Copy the GeoServer link from the API dropdown. API stands for application program interface as these links are commonly used by programmers when they are creating web maps.
- In ArcGIS Pro, on the Map -> Add Data dropdown, select Data From Path and paste in the endpoint URL.
- Adjust the symbology appropriately.
- Create a print layout and add a legend (if needed).
- Export the map for printing or insertion as a figure.
ESRI's Living Atlas
Services can be protected so that they can only be used within a specific organization, or by people pay for access to those services.
ESRI's Living Atlas of the World is a collection of geospatial data that is made available to the general public through the ArcGIS Online web app, and to ArcGIS Pro users, who can access premium content as part of their subscriptions.
The video below shows how to add a Living Atlas layer to a map in ArcGIS Pro:
- On your map, select Map -> Add Data, and navigate to the Portal -> Living Atlas.
- Search for a variable of interest. For this example we use median household income.
- Perform a definition query if you want to limit the area displayed.
- Change the symbology if desired.
- Create a print layout and add a legend.
- Export to create a figure or printable map.
- See the properties to get the endpoint URL.
- View the endpoint to get the source info.
Feature layers that you create in ArcGIS Pro can be published as "web layers," which are services that can be used in both ArcGIS Pro and on ArcGIS Online web maps.
For this example, we use addresses of FedEx locations in Chambana, IL that were geocoded from a CSV file as described later in this document.
- Add the feature class as a layer to your map and adjust the symbology to whatever you want the default to be when someone adds the new service to a map.
- Edit the metadata for the layer to provide a meaningful title, summary, description, and credit for the original source of the data.
- Right click the layer and click Share -> Share As Web Layer.
- You can adjust the sharing of your service in ArcGIS Online so that it is is visible only to you, only to people in your organization, or to everyone.
- You can also verify that it works in a web map by opening the service in a new ArcGIS Online map viewer.
Distribution and Storage Formats
Although in a utopian (or dystopian) world, all geospatial data would be instantly accessible to everyone as services, for now, data is often distributed and stored in a variety of different types of computer files that can be shared on websites, flash drives, or via e-mail.
This section briefly describes some common types of geospatial data files and how to import them into ArcGIS Pro for mapping and/or publication.
When you initially load these files into ArcGIS Pro from your computer's hard drive, the data is stored in a private geodatabase that is part of your project. If you only need it for your personal use, you can leave it in that geodatabase. If you need to share it with others, you can then publish that data to upload it to ArcGIS Online as a new service that others can use.
CSV / Excel With Latitudes and Longitudes
Geospatial data can be stored in simple table formats like comma-separate variable (CSV) files as columns of latitude and longitude associated on each row with specific attributes at those latitudes and longitudes. However, this is largely limited to points, rather than areas. For lines (like roads) and areas (like neighborhoods or census tracts) you need to save data in a specialized geospatial data file format.
The simplicity of a CSV file also has an advantage in its potential for preserving data. File formats that are more complex (especially proprietary formats) will become obsolete as technology changes. But data in a CSV file will likely be readable for generations to come.
The following video demonstrates how to import a CSV file with latitudes and longitudes to create a feature class. This example uses data on COVID-19 cases and fatalities in US states gathered Johns Hopkins University of Medicine and made available through GitHub.
- Create or download the CSV file with location coordinates in columns named "Latitude" and "Longitude"
- Create a new map and give it a meaningful name.
- Select Add Data -> XY Point Data. and select the CSV file from your hard drive.
- Give the new layer a meaningful name. Note that this name should contain only letters and numbers with no spaces or punctuation.
- Run the tool to perform the import.
- Symbolize the layer as needed.
- Create a print layout with a legend.
- Export to create a figure or printable map.
CSV / Excel With Addresses
Locations on the surface of the earth are often referenced using descriptions rather than numeric coordinates - most notably in street addresses.
The process of converting from location names to latitude/longitude is called geocoding. This process involves parsing the address or description into its component parts (state, city, street name, number, east/west, etc), and then looking up those parts in a large database of all addresses in the world.
Because there are billions of possible addresses, accurate geocoding requires large, expensive databases and powerful computers. Google Maps is popular, in part, because Google has built the technology and resources to support lightning-fast geocoding of both addresses and landmark/business names. But geocoding always involves some level of uncertainty, and you should verify that all the geocoded points are in the right place if the accuracy of your map is important.
In this example, we use a CSV file of in-person FedEx shipping or printing locations from the company's website. Although they provide a map, the only location information directly available is the address, which was entered into a spreadsheet with the columns: Name, Address, City, and State. Geocoding of addresses is a common task needed when performing business analysis in GIS.
- Download the CSV file or create it in a spreadsheet program like Excel and save it as a CSV file.
- In a new map, Add Data with the table.
- Right-click on the table and select Geocode Table.
- Follow the instructions to choose the options for geocoding.
- Change the map projection to WGS 1984 Web Mercator, which is a good generic projection used with web maps.
- Adjust the symbology for the new layer.
- Add labels, if desired.
- Adjust the base map in case the map is cluttered.
- Create a print layout, add a legend (if needed), and export to print or insert as a figure.
The shapefile is a geospatial data file format that was developed by ESRI in the late 1990s. While the age of this format is reflected in its numerous limitations (such as column name length limit of 10 characters), this format is supported by a wide variety of GIS software and is still commonly use for distributing geospatial data by municipal governments, including Denver, Chicago, Los Angeles, New York, among many others.
The shapefile is actually a collection of at least three (and usually more) separate files that store the locational data, the characteristics associated with those locations, and other information about the data. Some common files associated with a shapefile include (listed by the file extension):
- .shp: Contains the feature geometry (points, lines, polygons)
- .shx: An index file that indicates where specific features are in the .shp file
- .dbf: A dBase IV database file of attributes associated with each of the shapes in the .shp file
- .prj: The coordinate system and projection used by the feature geometry (optional)
- .cpg: The character encoding used by the attributes (optional)
- .qpj: The coordinate system and projection in a format used by QGIS (optional)
For convenience, all these files are usually compressed into a single .zip archive file for distribution on websites and servers.
- Download the shapefile from the websites.
- In Windows Explorer, right click to open the .zip file in Windows Explorer.
- Copy the files and paste them to your Downloads directory or to your desktop.
- Open a new map in ArcGIS Pro, Add Data and add the shapefile.
- Update the symbology as needed.
- Create a print layout and export as a figure or printable document.
Other Common Vector File Formats
The following are some file formats that are commonly used in GIS, although you may not encounter them as often as those above.
ESRI File Geodatabases
In 2006, ESRI introduced a more-sophisticated file format for exchanging geospatial data called a file geodatabase. This is a proprietary data file format that are designed to fully support the features of ESRI software and can only be read with ESRI software.
When a geodatabase is stored on your local machine, this is the format that ArcGIS Pro uses. Like a shapefile, the file geodatabase is a collection of different files, with all files kept in a folder that has a .gdb extension. And, as with shapefiles, these files can be exchanged by copying them in .zip archives that can be e-mailed or posted on websites.
Keyhole Markup Language (KML)
Google Earth/Maps exchanges geospatial data in the Keyhole Markup Language (KML) format that is based on Extensible Markup Language (XML). KML is designed for the web and contains information on how the geospatial data should be displayed on a web map like Google Maps, or in Google Earth. Since KML was designed for simple web mapping, it is not particularly good for storing complex attribute data. KML can be imported and exported to/from ArcGIS Pro using the KML to Layer and Layer to KML tools, respectively. Most GIS software can read KML files, but shapefiles are usually preferred for serious analysis or when working with data sets of any significant size.
The GPS Exchange Format (GPX) is another XML-based format that is commonly exported by GPS tracker apps in smartphones to store GPS points. GPX files can be imported into ArcGIS Pro using the GPX To Features tool.
If you work with GIS in the construction industry, you will likely see geospatial data stored in the files used by the engineering drafting program AutoCAD. However, AutoCAD is a general use drafting program for objects of all sizes and the proprietary file format often does not contain adequate coordinate or attribute information to allow data to be transferred directly into GIS software.
Raster Data Formats
Remotely-sensed raster data from satellites and other aerial platforms is stored in a wide variety of formats like:
These file formats are specialized to raster data and are discussed in much greater practical detail in classes or tutorials on remote sensing.
The Digital Dark Ages
In the developed world, we capture and store almost everything that can be stored: security video, electronic communications, smartphone photos of events momentous and trivial.
Almost none of that data will survive us.
Although storage becomes cheaper every year, technology changes every year. Data must be migrated from old storage media and file formats, or it is lost to physical degradation or technological obsolescence.
Data in The Cloud never has a permanent physical home. The Cloud is a performance and requires constant flows of capital and resources to stay in operation. Changes in the economics of The Cloud will necessitate loss of some of that data. Which data will be lost to time?
Contrast the impermanence of the digital with papyrus text from 2500 BC or clay tablets from as far back as 3300 BC.
While security camera video from an ATM where there has been no criminal activity may not be something that should outlive us, your grandchildren may want to see some of those thousands of baby pictures that you took of your son in the first year of his life. You should plan accordingly.
United States Geological Survey. 2020. The National Map. https://www.usgs.gov/core-science-systems/national-geospatial-program/national-map. Accessed 12 July 2020.
ESRI. 2020. "What is a geodatabase?" https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/what-is-a-geodatabase-.htm. Accessed 12 July 2020.
ESRI. 2020. "Feature Class Basics." https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/feature-class-basics.htm. Accessed 22 July 2020.
Merriam-Webster. 2020. "Database." https://www.merriam-webster.com/dictionary/database. Accessed 12 July 2020.
Gritzner, Charles F. 2002. "What Is Where, Why There, and Why Care?" Journal of Geography 101(1): 38 - 40. https://doi.org/10.1080/00221340208978465.
Stock, Kristin and Hans Guesgen. 2016. "Geospatial Reasoning With Open Data." In Automating Open Source Intelligence: Algorithms for OSINT, edited by Robert Layton and Paul A. Watters, 171 - 204. Elsevier. https://www.sciencedirect.com/science/article/pii/B9780128029169000105.
Merriam Webster. 2020. Data. https://www.merriam-webster.com/dictionary/data. Accessed 12 July 2020.
Zins, Chaim. 2007. "Conceptual approaches for defining data, information, and knowledge." Journal of the American Society for Information Science and Technology 58 (4): 479-493.
Techopedia. 2020. "Server." https://www.techopedia.com/definition/2282/server. Accessed 12 July 2020.
In contemporary geographic information systems, geospatial data is stored as digital data.
As the name implies, digital data consists of digits or numbers. Internally, digital electronic technology represents data as binary signals (bits) that are either on or off. This binary representation allows a high level of flexibility and accuracy in the representation and processing of data.
For historical reasons, bits are clumped into groups of eight that are called bytes. If you run through all the possible combinations of eight bits, you will find that a byte can have 256 different values (numbers 0 - 255). This is enough for each byte to represent a single character in most languages, so a five-character word like Hello requires five bytes to store.
To improve speed, modern computers process multiple bytes at one time as words. Although mobile devices and older computers use 32-bit words (four bytes), most contemporary laptops and desktops use 64-bit words (eight bytes).
The amount of storage in a computer or storage device is usually measured by the number of bytes that it can store. Because storage devices can store trillions or quadrillions of bytes, Greek prefixes are used to make referring to numbers of bytes easier. However, because this is digital data, powers of two are used, making the decimal numbers look a bit sloppy
- One kilobyte (KB) = 210 bytes = 1,024 bytes
- One megabyte (MB) = 220 bytes = 1,048,576 bytes = 1,024 KB
- One gigabyte (MB) = 230 bytes = 1,073,741,824 bytes = 1,024 MB
- One terabyte (MB) = 240 bytes = 1,099,511,627,776 bytes = 1,024 GB
Physical Storage Media
Digital data is stored on a variety of physical media, depending on how quickly the data needs to be accessed, how much data needs to be stored, and whether the data needs to continue to exist when the digital device is turned off or rebooted.
- Random access memory (RAM) is made with silicon transistors to quickly store and access data that is being actively used. RAM is fast but more expensive than other forms of memory, and the data is lost when the device is turned off or rebooted
- Magnetic hard disks are spinning platters coated with magnetic material that stores data in magnetic patterns on the disk. Hard drives can store very large amounts of data (in the terabytes), but this data takes longer to access than RAM. Hard drives are a reliable, established technology. Data on a hard drive remains even after the hard drive is powered down, but hard drives do not last forever and will eventually fail, often taking their data with them
- Flash memory is made with transistors like RAM, but built with a special structure (floating-gate MOSFET) that allows the data to persist even if the power is turned off. Flash memory has become ubiquitous in consumer devices (SD cards, thumb drives, smartphones, etc) because it has high capacity and has become inexpensive over the past decade. Flash memory is slowly replacing magnetic hard disks with solid-state drives that are faster and use less power. However, flash memory is limited in the number of times it can be written to, so solid-state drives do not last as long as magnetic hard drives and are prone to unexpected failures
- Optical disks as compact disks (CDs) and digital versatile disks (DVDs) store bits as indentations in aluminum or chemical films that are then encased in plastic disks. Optical disks have high capacity and are inexpensive to manufacture in bulk. However, they are generally used only for data that will not change for extended periods of time, and they are commonly used to archive and backup data from magnetic and flash drives. It is uncertain how long data on a CD or DVD can be expected to last, and optical disks are rapidly becoming obsolete
- Magnetic floppy disks store data in a similar manner to magnetic hard disks, except on a removable plastic disk nestled in a protective case. You may occasionally encounter old data stored on floppy disks, although this technology is obsolete and unreliable. You should migrate any important data off these disks and onto a hard drive as soon as possible so the data is not lost to physical degradation
- Magnetic tape is a roll of plastic film coated with a magnetic material and used to store bits in a similar way as magnetic hard drives. Although tape is one of the oldest technologies for storing digital data, tape drives are still used to back up hard drives for long-term storage
Considerations When Choosing Storage Formats and Platforms
A number of factors need to be considered when choosing the appropriate storage hardware and formats for a project. Those needs are driven by the organizational size and mission: What are you ultimately trying to accomplish with the data?
- Number of readers
- How many people need to access the data?
- How quickly do they need access to the data?
- Number of editors
- How many people capture, process and maintain the data?
- Will multiple be working on the data at the same time?
- Frequency of change:
- How often is the data changed?
- How quickly do changes need to be available to users?
- Volume and types of data:
- How much data exists?
- How much data will exist?
- How many different types of data need to be kept together?
- How will needs grow or shrink over time?
- Access security:
- Who needs access to the data?
- Who should be kept out of the data?
- Do federal or state regulations require restricting access to the data?
- How do the costs of a security breach balance against the costs of security?
- Availability security:
- What would happen if this data were lost or destroyed?
- Who will perform backups?
- Does this data need to survive this project?
- Will this be compatible with existing processes?
- What are the set-up and maintenance costs for storage?
- What can we afford in terms of both capital investment and manpower?
- Do managers or co-workers have a preconceived bias against a technology?