Geospatial Data Storage and Distribution in ArcGIS Pro

Geospatial Data and ArcGIS Pro

The key words in geographic information systems are geographic information.

Data is "factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" (Merriam-Webster 2020).

Geospatial data is "data about objects, events, or phenomena that have a location on the surface of the earth" (Stock and Guesgen 2016). Geospatial data is "what is where" (Gritzner 2002).

A database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)" (Merriam-Webster 2020).

A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).

ESRI refers to individual geographic datasets within a geodatabase as feature classes. In non-ESRI terms, these can also be referred to as tables. Each individual element in a feature class is called a feature. For example, you could have a feature class for roads in a county, where each road would be a feature (ESRI 2020).

In ArcGIS Pro, a project brings together data from one or more geodatabases to perform analysis and mapping.

The Relationship Between Databases, Projects, and Maps

Local vs. Server Databases

A server is a computer on a network that is dedicated to managing network resources. Servers provide services to other computers on that network called clients that need those services (Techopedia 2020).

The Client-Server Model (Vignoni 2011 via Wikipedia)

A cloud is a collection of servers in one or more data centers that offer a variety of services to clients via the internet. While a cloud looks like a single server providing different services, in reality those services can be provided by multiple servers located around the world to improve speed and provide backup when server hardware fails.

Cloud Computing (Johnson 2009 via Wikipedia)

In the ESRI GIS world, the primary cloud environment is ArcGIS Online, which provides not just data storage services, but also supports a variety of web apps and other GIS services.

If you are working in the ArcGIS Online web app, everything you do is kept on ESRI's servers in a cloud, and all your local computer is doing is displaying the information from the server in a browser window.

However, when working with a desktop GIS program like ArcGIS Pro, you have a choice whether to keep your data in a local geodatabase on the desktop computer or keep the data on a server. ArcGIS Pro is tightly integrated with ArcGIS Online, making it possible to easily store your data. But there are often times when it is more appropriate to keep your data in a local project geodatabase.

Local Machine / Project Server / ArcGIS Oline
Working alone Working with a group
Working only on one desktop machine Need access on multiple machines (such as for work-from-home)
Temporarily exploring new data that you are not sure you will need Data that you will be using in the future
Data requiring heavy access for analysis (such as remote sensing analysis) Data frequent access in short bursts
Data changes rarely Data changes frequently and you need the most current version
Private data that you don't want anyone else to see Data that will be shared within an organization and/or with the general public
Unimportant data that can easily be replaced if something bad happens to your machine Mission-critical data that needs to be protected and backed-up by IT professionals

Using Services in ArcGIS Pro

Public Feature Services

Data stored on a server can be made accessible to the public through a feature service endpoint, which is a URL to a service on server. These are sometimes referred to as REST endpoints for the type of technology they use (representational state transfer).

In this example, the US Geological Survey makes their National Map data available through service endpoints listed on their service endpoints page., and we will use the federal wetlands service.

Creating a Map Using a National Map Feature Service

Open Data Portals

Government agencies often make their geospatial data available through their open data portals. These portals occasionally provide links to feature services that are accessible to the general public, although they more commonly simply provide links to download the data as files.

You can find these open data portals by Googling "name_of_place open data" or "name_of_place gis data."

Although a variety of software exists to implement data portals, one package commonly used by city, county, and state governments is ESRI's ArcGIS Hub.

In this example, the Alabama GeoHub is implemented using ArcGIS Hub, which is obvious from the look of the site as well as the arcgis.com at the end of the URL.

We will use their layer of hospital locations.

Creating a Map Using a Public Service

ESRI's Living Atlas

Services can be protected so that they can only be used within a specific organization, or by people pay for access to those services.

ESRI's Living Atlas of the World is a collection of geospatial data that is made available to the general public through the ArcGIS Online web app, and to ArcGIS Pro users, who can access premium content as part of their subscriptions.

The video below shows how to add a Living Atlas layer to a map in ArcGIS Pro:

Creating a Map Using a Living Atlas Feature Service

User-Created Services

Feature layers that you create in ArcGIS Pro can be published as "web layers," which are services that can be used in both ArcGIS Pro and on ArcGIS Online web maps.

For this example, we use addresses of FedEx locations in Chambana, IL that were geocoded from a CSV file as described later in this document.

Publishing a User-Created Layer

Distribution and Storage Formats

Although in a utopian (or dystopian) world, all geospatial data would be instantly accessible to everyone as services, for now, data is often distributed and stored in a variety of different types of computer files that can be shared on websites, flash drives, or via e-mail.

This section briefly describes some common types of geospatial data files and how to import them into ArcGIS Pro for mapping and/or publication.

When you initially load these files into ArcGIS Pro from your computer's hard drive, the data is stored in a private geodatabase that is part of your project. If you only need it for your personal use, you can leave it in that geodatabase. If you need to share it with others, you can then publish that data to upload it to ArcGIS Online as a new service that others can use.

CSV / Excel With Latitudes and Longitudes

Geospatial data can be stored in simple table formats like comma-separate variable (CSV) files as columns of latitude and longitude associated on each row with specific attributes at those latitudes and longitudes. However, this is largely limited to points, rather than areas. For lines (like roads) and areas (like neighborhoods or census tracts) you need to save data in a specialized geospatial data file format.

The simplicity of a CSV file also has an advantage in its potential for preserving data. File formats that are more complex (especially proprietary formats) will become obsolete as technology changes. But data in a CSV file will likely be readable for generations to come.

The following video demonstrates how to import a CSV file with latitudes and longitudes to create a feature class. This example uses data on COVID-19 cases and fatalities in US states gathered Johns Hopkins University of Medicine and made available through GitHub.

Creating a Feature Class From a CSV File With Latitudes and Longitudes

CSV / Excel With Addresses

Locations on the surface of the earth are often referenced using descriptions rather than numeric coordinates - most notably in street addresses.

The process of converting from location names to latitude/longitude is called geocoding. This process involves parsing the address or description into its component parts (state, city, street name, number, east/west, etc), and then looking up those parts in a large database of all addresses in the world.

Because there are billions of possible addresses, accurate geocoding requires large, expensive databases and powerful computers. Google Maps is popular, in part, because Google has built the technology and resources to support lightning-fast geocoding of both addresses and landmark/business names. But geocoding always involves some level of uncertainty, and you should verify that all the geocoded points are in the right place if the accuracy of your map is important.

In this example, we use a CSV file of in-person FedEx shipping or printing locations from the company's website. Although they provide a map, the only location information directly available is the address, which was entered into a spreadsheet with the columns: Name, Address, City, and State. Geocoding of addresses is a common task needed when performing business analysis in GIS.

Creating a Feature Class From a CSV File With Addresses

Shapefiles

The shapefile is a geospatial data file format that was developed by ESRI in the late 1990s. While the age of this format is reflected in its numerous limitations (such as column name length limit of 10 characters), this format is supported by a wide variety of GIS software and is still commonly use for distributing geospatial data by municipal governments, including Denver, Chicago, Los Angeles, New York, among many others.

The shapefile is actually a collection of at least three (and usually more) separate files that store the locational data, the characteristics associated with those locations, and other information about the data. Some common files associated with a shapefile include (listed by the file extension):

For convenience, all these files are usually compressed into a single .zip archive file for distribution on websites and servers.

Importing a Shapefile As a New Feature Class

Other Common Vector File Formats

The following are some file formats that are commonly used in GIS, although you may not encounter them as often as those above.

ESRI File Geodatabases

In 2006, ESRI introduced a more-sophisticated file format for exchanging geospatial data called a file geodatabase. This is a proprietary data file format that are designed to fully support the features of ESRI software and can only be read with ESRI software.

When a geodatabase is stored on your local machine, this is the format that ArcGIS Pro uses. Like a shapefile, the file geodatabase is a collection of different files, with all files kept in a folder that has a .gdb extension. And, as with shapefiles, these files can be exchanged by copying them in .zip archives that can be e-mailed or posted on websites.

Keyhole Markup Language (KML)

Google Earth/Maps exchanges geospatial data in the Keyhole Markup Language (KML) format that is based on Extensible Markup Language (XML). KML is designed for the web and contains information on how the geospatial data should be displayed on a web map like Google Maps, or in Google Earth. Since KML was designed for simple web mapping, it is not particularly good for storing complex attribute data. KML can be imported and exported to/from ArcGIS Pro using the KML to Layer and Layer to KML tools, respectively. Most GIS software can read KML files, but shapefiles are usually preferred for serious analysis or when working with data sets of any significant size.

GPX

The GPS Exchange Format (GPX) is another XML-based format that is commonly exported by GPS tracker apps in smartphones to store GPS points. GPX files can be imported into ArcGIS Pro using the GPX To Features tool.

GeoJSON

GeoJSON is an extension to JavaScript Object Notation (JSON) that is used for data displayed in web maps. ArcGIS Pro can convert to and from GeoJSON using the Features To JSON and JSON To Features tools, respectively.

AutoCAD

If you work with GIS in the construction industry, you will likely see geospatial data stored in the files used by the engineering drafting program AutoCAD. However, AutoCAD is a general use drafting program for objects of all sizes and the proprietary file format often does not contain adequate coordinate or attribute information to allow data to be transferred directly into GIS software.

AutoCAD

Raster Data Formats

Remotely-sensed raster data from satellites and other aerial platforms is stored in a wide variety of formats like:

These file formats are specialized to raster data and are discussed in much greater practical detail in classes or tutorials on remote sensing.

MODIS NDVI For the USA

The Digital Dark Ages

In the developed world, we capture and store almost everything that can be stored: security video, electronic communications, smartphone photos of events momentous and trivial.

Almost none of that data will survive us.

Although storage becomes cheaper every year, technology changes every year. Data must be migrated from old storage media and file formats, or it is lost to physical degradation or technological obsolescence.

Data in The Cloud never has a permanent physical home. The Cloud is a performance and requires constant flows of capital and resources to stay in operation. Changes in the economics of The Cloud will necessitate loss of some of that data. Which data will be lost to time?

Contrast the impermanence of the digital with papyrus text from 2500 BC or clay tablets from as far back as 3300 BC.

Mesopotamian Cuneiform Tablet, ca. 2000 BC (Musée de Mariemont via Wikipedia)

While security camera video from an ATM where there has been no criminal activity may not be something that should outlive us, your grandchildren may want to see some of those thousands of baby pictures that you took of your son in the first year of his life. You should plan accordingly.

Me and My Dad, ca. 1966

Bibliography

United States Geological Survey. 2020. The National Map. https://www.usgs.gov/core-science-systems/national-geospatial-program/national-map. Accessed 12 July 2020.

ESRI. 2020. "What is a geodatabase?" https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/what-is-a-geodatabase-.htm. Accessed 12 July 2020.

ESRI. 2020. "Feature Class Basics." https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/feature-class-basics.htm. Accessed 22 July 2020.

Merriam-Webster. 2020. "Database." https://www.merriam-webster.com/dictionary/database. Accessed 12 July 2020.

Gritzner, Charles F. 2002. "What Is Where, Why There, and Why Care?" Journal of Geography 101(1): 38 - 40. https://doi.org/10.1080/00221340208978465.

Stock, Kristin and Hans Guesgen. 2016. "Geospatial Reasoning With Open Data." In Automating Open Source Intelligence: Algorithms for OSINT, edited by Robert Layton and Paul A. Watters, 171 - 204. Elsevier. https://www.sciencedirect.com/science/article/pii/B9780128029169000105.

Merriam Webster. 2020. Data. https://www.merriam-webster.com/dictionary/data. Accessed 12 July 2020.

Zins, Chaim. 2007. "Conceptual approaches for defining data, information, and knowledge." Journal of the American Society for Information Science and Technology 58 (4): 479-493.

Techopedia. 2020. "Server." https://www.techopedia.com/definition/2282/server. Accessed 12 July 2020.

Appendix

Digital Data

In contemporary geographic information systems, geospatial data is stored as digital data.

As the name implies, digital data consists of digits or numbers. Internally, digital electronic technology represents data as binary signals (bits) that are either on or off. This binary representation allows a high level of flexibility and accuracy in the representation and processing of data.

For historical reasons, bits are clumped into groups of eight that are called bytes. If you run through all the possible combinations of eight bits, you will find that a byte can have 256 different values (numbers 0 - 255). This is enough for each byte to represent a single character in most languages, so a five-character word like Hello requires five bytes to store.

To improve speed, modern computers process multiple bytes at one time as words. Although mobile devices and older computers use 32-bit words (four bytes), most contemporary laptops and desktops use 64-bit words (eight bytes).

The amount of storage in a computer or storage device is usually measured by the number of bytes that it can store. Because storage devices can store trillions or quadrillions of bytes, Greek prefixes are used to make referring to numbers of bytes easier. However, because this is digital data, powers of two are used, making the decimal numbers look a bit sloppy

Physical Storage Media

Digital data is stored on a variety of physical media, depending on how quickly the data needs to be accessed, how much data needs to be stored, and whether the data needs to continue to exist when the digital device is turned off or rebooted.

Considerations When Choosing Storage Formats and Platforms

A number of factors need to be considered when choosing the appropriate storage hardware and formats for a project. Those needs are driven by the organizational size and mission: What are you ultimately trying to accomplish with the data?

  1. Number of readers
    • How many people need to access the data?
    • How quickly do they need access to the data?
  2. Number of editors
    • How many people capture, process and maintain the data?
    • Will multiple be working on the data at the same time?
  3. Frequency of change:
    • How often is the data changed?
    • How quickly do changes need to be available to users?
  4. Volume and types of data:
    • How much data exists?
    • How much data will exist?
    • How many different types of data need to be kept together?
    • How will needs grow or shrink over time?
  5. Access security:
    • Who needs access to the data?
    • Who should be kept out of the data?
    • Do federal or state regulations require restricting access to the data?
    • How do the costs of a security breach balance against the costs of security?
  6. Availability security:
    • What would happen if this data were lost or destroyed?
    • Who will perform backups?
    • Does this data need to survive this project?
  7. Cost:
    • Will this be compatible with existing processes?
    • What are the set-up and maintenance costs for storage?
    • What can we afford in terms of both capital investment and manpower?
    • Do managers or co-workers have a preconceived bias against a technology?