Geospatial Data Storage and Distribution in ArcGIS Pro

This tutorial will give a basic overview of how geospatial data is stored and distributed using ArcGIS Pro and ArcGIS Online.

Storage Architecture in ArcGIS Pro

The key words in geographic information systems are geographic information.

Data is "factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation" (Merriam-Webster 2020).

Geospatial data is "data about objects, events, or phenomena that have a location on the surface of the earth" (Stock and Guesgen 2016). Geospatial data is "what is where" (Gritzner 2002).

A database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)" (Merriam-Webster 2020).

A geodatabase is special type of database containing "a collection of geographic datasets of various types" (ESRI 2020).

ESRI refers to individual geographic datasets within a geodatabase as feature classes. In non-ESRI terms, these can also be referred to as tables. Each individual element in a feature class is called a feature. For example, you could have a feature class for roads in a county, where each road would be a feature (ESRI 2020).

Databases contain feature classes, Feature classes contain features and are added to maps as layers that display the features on the mapsmaps. Maps are included on layouts for printing.

The relationship between geodatabases, feature classes, features, maps, and layouts

Projects and Project Packages

In ArcGIS Pro, a project brings together data from one or more geodatabases, and from independent geospatial data files like shapefiles or CSV files, to perform analysis and mapping.

When you File -> Save a project, the project information is saved in an .aprx file. Note that this file only saves information about where the different components in a project are kept on your local hard drive or in the cloud. It does not save the files or feature classes as part of the file. If you just copy the .aprx file to another machine, it will have missing components if it references any data or files from your local hard drive.

However, projects can be saved as project packages (.ppkx files) that bundle all of the files, feature classes, and map design information used in the project into a single file. Project package files can be uploaded to ArcGIS Online, and then downloaded to a new machine so you can be assured that you are using all of the same components that you used when you last edited the project.

Project packages are especially helpful when you are working in a remote desktop environment where you get a new virtual machine every time you log in, and you cannot be assured that anything you saved on your local drives will persist after a single session. They are also useful for archiving the local contents of projects so that you have everything together in one place if you wish to revisit that project in the future.

Standalone projects in ArcGIS Pro

Services

If you are working alone on a simple research task, keeping everything in a project on your desktop computer may be adequate. However, if you are collaborating with other people in an organization or want to make your data available to the general public, you need a system architecture designed for serving data.

A server is a computer on a network that is dedicated to managing network resources. Servers provide services to other computers on that network called clients that need those services (Techopedia 2020). Client are user devices like desktop computers and mobile devices that interact with data on servers. This client-server model of system architecture is fundamental to contemporary internet-based communication.

The client-server model

The Cloud

The cloud is a collection of servers in one or more data centers that offer a variety of services to clients via the internet. While a cloud looks like a single server providing different services, in reality those services can be provided by multiple servers located around the world to improve speed and provide backup when server hardware fails.

With cloud infrastructure, you generally do not know (or care) exactly where your data is physically being stored or served. Cloud services can be moved around or scaled up and down depending on demand (and budget) in order to best serve the needs of the organization using that infrastructure. Accordingly, businesses do not need to maintain large collections of underutilized servers.

A challenge with using cloud services (and services in general) is that they require a reliable internet connect to access data on the servers. While this is generally not a major issue in urban areas in the developed world, this can necessitate designing systems that can work in a standalone mode when operating in rural areas or areas in the Global South with limited connectivity.

Cloud architecture

Feature Services

Feature services are one way to share your geospatial data with other people. ESRI provides a cloud environment called ArcGIS Online that enables users to publish feature classes as feature services. You (or people you have granted access) can then use those feature services in ArcGIS Pro, in web maps, or in geospatial apps. If you make updates to the published data, those changes will automatically be made available to everyone using your feature service.

Feature services can be configured in ArcGIS Online to restrict access to just yourself, to members of a group, to everyone in your organization, or to everyone on the internet. Feature services can also be configured so that users can modify data, which permits easy collaboration when multiple users need to be able to update data.

ArcGIS Online also provides the ArcGIS Online web app that users can access in a web browser or on a mobile device perform basic web map and feature service creation, and management of data and services in the ArcGIS Online environment.

ArcGIS Online

The ArcGIS environment also allows companies or government agencies who wish to have more control over their GIS environment to set up their own servers using the ArcGIS Enterprise software. These enterprise servers can be physical servers at a company data center or cloud servers. These enterprise servers can be affiliated with ArcGIS Online so that services can be provided by whichever server is best suited and least expensive for the task.

ArcGIS Enterprise

Local vs. Cloud

If you are working in the ArcGIS Online web app, everything you do is kept on ESRI's servers in a cloud, and all your local computer is doing is displaying the information from the server in a browser window.

However, when working with a desktop GIS program like ArcGIS Pro, you have a choice whether to keep your data in locally on the desktop computer or keep the data on a server. ArcGIS Pro is tightly integrated with ArcGIS Online, making it possible to easily store your data in the cloud. But there are often times when it is more appropriate to keep your data in a local project geodatabase.

Local Machine Server / ArcGIS Oline
Working alone Working with a group or sharing data with the general public
Working only on one desktop machine Working with data on multiple different machines (such as for work-from-home) and / or on shared devices (like remote desktops)
Temporarily exploring new data that you are not sure you will keep Working with data that you will be using in the future
Heavy use of high volumes of data in a desktop program (such as remote sensing analysis) Big data analysis requiring high-performance computing resources
Personal data that you don't want anyone else to see Data that will be shared within an organization and/or with the general public
Unimportant data that can easily be replaced if something bad happens to your machine Mission-critical data that needs to be protected and backed-up by IT professionals

Using Services in ArcGIS Pro

Public Feature Services

Data stored on a server can be made accessible to clients through a feature service endpoint, which is a URL to a service on server. These are sometimes referred to as REST endpoints for the type of communications protocol they use (representational state transfer).

In this example, the US Geological Survey makes their National Map data available through service endpoints listed on their service endpoints page., and we will use the federal wetlands service.

  1. Acquire: Copy the REST endpoint URL.
  2. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  3. Add: Oon the Map -> Add Data dropdown, select Data From Path and paste in the endpoint URL.
  4. Symbolize: This example layer has predefined symbology, but you can change that symbology if desired
    • Because this data is very detailed, the layer is configured to only display in small geographic areas. So we zoom in on Urbana, IL.
    • Feature services can be slow and may take a few seconds to load and display.
  5. Present: Create a print layout
    • Add a legend (if needed).
    • Export the map for printing or insertion as a figure.
Creating a Map Using a National Map Feature Service

Open Data Portals

Government agencies often make their geospatial data available through their open data portals. These portals occasionally provide links to feature services that are accessible to the general public, although they more commonly simply provide links to download the data as files.

You can find these open data portals by Googling "name_of_place open data" or "name_of_place gis data."

Although a variety of software exists to implement data portals, one package commonly used by city, county, and state governments is ESRI's ArcGIS Hub.

In this example, the Alabama GeoHub is implemented using ArcGIS Hub, which is obvious from the look of the site as well as the arcgis.com at the end of the URL.

We will use their layer of hospital locations.

  1. Acquire: Copy the GeoServer link from the API dropdown. API stands for application program interface as these links are commonly used by programmers when they are creating web maps.
  2. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  3. Add: In ArcGIS Pro, on the Map -> Add Data dropdown, select Data From Path and paste in the endpoint URL.
  4. Symbolize: Adjust the symbology appropriately.
  5. Present: Create a print layout
    • Add a legend (if needed).
    • Export the map for printing or insertion as a figure.
Creating a Map Using a Public Service

ESRI's Living Atlas

Services can be protected so that they can only be used within a specific organization, or by people pay for access to those services.

ESRI's Living Atlas of the World is a collection of geospatial data feature services that can be accessed in ArcGIS Pro or in ArcGIS Online. Some of the services are open to the general public, while other are proprietary and accessible only to licensed users.

The video below shows how to add a Living Atlas layer to a map in ArcGIS Pro:

  1. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  2. Add: On your map, select Map -> Add Data, and navigate to the Portal -> Living Atlas.
  3. Symbolize: Adjust the symbology if desired.
  4. Present: Create a print layout.
Creating a Map Using a Living Atlas Feature Service

User-Created Services

Feature layers that you create in ArcGIS Pro can be published as "web layers," which are services that can be used in both ArcGIS Pro and on ArcGIS Online web maps. Published features are stored in geodatabases on ESRI's servers in the cloud and can be accessed through REST endpoints and through the ArcGIS Online web app. Organizations that have their own ArcGIS Enterprise servers can also host these endpoints on those servers.

For this example, we use addresses of FedEx locations in Chambana, IL that were geocoded from a CSV file as described later in this document.

  1. Add the feature class as a layer to your map and adjust the symbology to whatever you want the default to be when someone adds the new service to a map.
  2. Edit the metadata for the layer to provide a meaningful title, summary, description, and credit for the original source of the data.
  3. Right click the layer and click Share -> Share As Web Layer.
  4. You can adjust the sharing of your service in ArcGIS Online so that it is is visible only to you, only to people in your organization, or to everyone.
  5. You can also verify that it works in a web map by opening the service in a new ArcGIS Online map viewer.
Publishing a User-Created Layer

Distribution and Storage File Formats

Although in a utopian (or dystopian) world, all geospatial data would be instantly accessible to everyone as services, for now, data is often distributed and stored in a variety of different types of computer files that can be shared on websites, flash drives, or via e-mail.

This section briefly describes some common types of geospatial data files and how to import them into ArcGIS Pro for mapping and/or publication.

When you initially load these files into ArcGIS Pro from your computer's hard drive, the data is stored in a private geodatabase that is part of your project. If you only need it for your personal use, you can leave it in that geodatabase. If you need to share it with others, you can then publish that data to upload it to ArcGIS Online as a new service that others can use.

CSV / Excel With Latitudes and Longitudes

Geospatial data can be stored in simple table formats like comma-separate variable (CSV) files as columns of latitude and longitude associated on each row with specific attributes at those latitudes and longitudes. However, this is limited to points, rather than areas. For lines (like roads) and areas (like neighborhoods or census tracts) you need to save data in a specialized geospatial data file format like the shapefile.

The simplicity of a CSV file also has an advantage in its potential for preserving data. File formats that are more complex (especially proprietary formats) will become obsolete as technology changes. But data in a CSV file will likely be readable for generations to come.

The following video demonstrates how to import a CSV file with latitudes and longitudes to create a feature class. This example uses data on COVID-19 cases and fatalities in US states gathered Johns Hopkins University of Medicine and made available through GitHub.

  1. Acquire: Create or download the CSV file with location coordinates in columns named "Latitude" and "Longitude"
  2. Start: Create a new map in ArcGIS Pro and give it a meaningful name.
  3. Add: Select Add Data -> XY Point Data. and select the CSV file from your hard drive.
    • Give the new layer a meaningful name. Note that this name should contain only letters and numbers with no spaces or punctuation.
    • Run the tool to perform the import.
  4. Symbolize: Adjust the symbology of the layer as needed.
  5. Present: Create a print layout with a legend.
    • Export to create a figure or printable map.
Creating a Feature Class From a CSV File With Latitudes and Longitudes

CSV / Excel With Addresses

Locations on the surface of the earth are often referenced using descriptions rather than numeric coordinates - most notably in street addresses.

The process of converting from location names to latitude/longitude is called geocoding. This process involves parsing the address or description into its component parts (state, city, street name, number, east/west, etc), and then looking up those parts in a large database of all addresses in the world.

Because there are billions of possible addresses, accurate geocoding requires large, expensive databases and powerful computers. Google Maps is popular, in part, because Google has built the technology and resources to support lightning-fast geocoding of both addresses and landmark/business names. But geocoding always involves some level of uncertainty, and you should verify that all the geocoded points are in the right place if the accuracy of your map is important.

In this example, we use a CSV file of in-person FedEx shipping or printing locations from the company's website. Although they provide a map, the only location information directly available is the address, which was entered into a spreadsheet with the columns: Name, Address, City, and State. Geocoding of addresses is a common task needed when performing business analysis in GIS.

  1. Acquire: Download the CSV file or create it in a spreadsheet program like Excel and save it as a CSV file.
  2. Start: Create a new map in ArcGIS Pro.
  3. Add: In a new map, Add Data with the table.
    • Right-click on the table and select Geocode Table.
    • Follow the instructions to choose the options for geocoding.
  4. Symbolize: Adjust the symbology for the new layer.
    • Change the map projection to WGS 1984 Web Mercator, which is a good generic projection used with web maps.
    • Add labels, if desired.
    • Adjust the base map in case the map is cluttered.
  5. Present: Create a print layout
    • Add a legend (if needed).
    • Export to print or insert as a figure.
Creating a Feature Class From a CSV File With Addresses

Shapefiles

The shapefile is a geospatial data file format that was developed by ESRI in the late 1990s. While the age of this format is reflected in its numerous limitations (such as column name length limit of 10 characters), this format is supported by a wide variety of GIS software and is still commonly use for distributing geospatial data by municipal governments, including Denver, Dallas, Los Angeles, New York, among many others.

The term shapefile is a misnomer since a shapefile is actually a collection of at least three (and usually more) separate files that store the locational data, the characteristics associated with those locations, and other information about the data. Some common files associated with a shapefile include (listed by the file extension):

For convenience, all these files are usually compressed into a single .zip archive file for distribution on websites and servers.

  1. Acquire: Download the shapefile .zip archive from the website.
    • In Windows Explorer, Extract the contents of the .zip archive file (0:29).
  2. Start: Create a new map in ArcGIS Pro (0:53).
  3. Add: Add Data and add the shapefile (1:10).
    • Right click on the new shapefile layer and select Data -> Export Features. This will start the Feature Class to Feature Class tool. This step is needed to copy the shapefile data into the project geodatabase so it will be saved when you package your project (1:24).
    • For Output Feature Class provide a short but descriptive name with no punctuation or spaces.
    • Run the to save the shapefile data. This should add the data as a new layer to your map (1:50).
    • Delete the old shapefile layer (2:08).
  4. Symbolize: Update the symbology as desired (2:19).
  5. Present: Create a print layout
    • Export as a figure or printable document (2:45).
Importing a Shapefile As a New Feature Class

Other Common Vector File Formats

The following are some file formats that are commonly used in GIS, although you may not encounter them as often as those above.

ESRI File Geodatabases

In 2006, ESRI introduced a more-sophisticated file format for exchanging geospatial data called a file geodatabase. This is a proprietary data file format that are designed to fully support the features of ESRI software and can only be read with ESRI software.

When a geodatabase is stored on your local machine, this is the format that ArcGIS Pro uses. Like a shapefile, the file geodatabase is a collection of different files, with all files kept in a folder that has a .gdb extension. And, as with shapefiles, these files can be exchanged by copying them in .zip archives that can be e-mailed or posted on websites.

Keyhole Markup Language (KML)

Google Earth/Maps exchanges geospatial data in the Keyhole Markup Language (KML) format that is based on Extensible Markup Language (XML). KML is designed for the web and contains information on how the geospatial data should be displayed on a web map like Google Maps, or in Google Earth. KML supports vector points, lines, and polygons.

Since KML was designed for simple web mapping, it is not particularly good for storing complex attribute data. KML can be imported and exported to/from ArcGIS Pro using the KML to Layer and Layer to KML tools, respectively. Most GIS software can read KML files, but shapefiles are usually preferred for serious analysis or when working with data sets of any significant size.

GPX

The GPS Exchange Format (GPX) is another XML-based format that is commonly exported by GPS tracker apps in smartphones to store GPS points. GPX files can be imported into ArcGIS Pro using the GPX To Features tool.

GeoJSON

GeoJSON is an extension to JavaScript Object Notation (JSON) that is used for data displayed in web maps. GeoJSON supports vector points, lines, and polygons. ArcGIS Pro can convert to and from GeoJSON using the Features To JSON and JSON To Features tools, respectively.

AutoCAD

If you work with GIS in the construction industry, you will likely see geospatial data stored in the files used by the engineering drafting program AutoCAD. However, AutoCAD is a general use drafting program for objects of all sizes and the proprietary file format often does not contain adequate coordinate or attribute information to allow data to be transferred directly into GIS software.

AutoCAD

Raster Data Formats

Remotely-sensed raster data from satellites and other aerial platforms is stored in a wide variety of formats like:

These file formats are specialized to raster data and are discussed in much greater practical detail in classes or tutorials on remote sensing.

MODIS NDVI For the USA

The Digital Dark Ages

The Dark Ages was a period of European history between around 500 - 1,000 AD when there was little urban life. Accordingly, there is very little written information that survives from that period of time (Encyclopedia Britannica, 2021).

In the developed world, we capture and store almost everything that can be stored: security video, electronic communications, smartphone photos of events momentous and trivial.

Almost none of that data will survive us.

Although storage becomes cheaper every year, technology changes every year. Data must be migrated to new storage media and modes of expression, or it is lost to physical degradation or technological obsolescence. It is likely that only a small fraction of the data from our time will migrated, and eventually almost all of it will be lost, leaving future generations with only a small fraction of the vast body of information we have accumulated in our time.

Data in the cloud never has a permanent physical home. The cloud is a performance and requires constant flows of capital and resources to stay in operation. Changes in the economics of the cloud will necessitate loss of some of that data.

Which data will be lost to time? Will this period be as "dark" to future historians as medieval Europe is to present day historians?

Contrast the impermanence of the digital with papyrus text from 2500 BC or clay tablets from as far back as 3300 BC.

Mesopotamian Cuneiform Tablet, ca. 2000 BC (Musée de Mariemont via Wikipedia)

While security camera video from an ATM where there has been no criminal activity may not be something that should outlive us, your grandchildren may want to see some of those thousands of baby pictures that you took of your son in the first year of his life. You should plan accordingly.

Me and My Dad, ca. 1966

Bibliography

United States Geological Survey. 2020. The National Map. https://www.usgs.gov/core-science-systems/national-geospatial-program/national-map. Accessed 12 July 2020.

ESRI. 2020. "What is a geodatabase?" https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/what-is-a-geodatabase-.htm. Accessed 12 July 2020.

ESRI. 2020. "Feature Class Basics." https://pro.arcgis.com/en/pro-app/help/data/geodatabases/overview/feature-class-basics.htm. Accessed 22 July 2020.

Merriam-Webster. 2020. "Database." https://www.merriam-webster.com/dictionary/database. Accessed 12 July 2020.

Gritzner, Charles F. 2002. "What Is Where, Why There, and Why Care?" Journal of Geography 101(1): 38 - 40. https://doi.org/10.1080/00221340208978465.

Stock, Kristin and Hans Guesgen. 2016. "Geospatial Reasoning With Open Data." In Automating Open Source Intelligence: Algorithms for OSINT, edited by Robert Layton and Paul A. Watters, 171 - 204. Elsevier. https://www.sciencedirect.com/science/article/pii/B9780128029169000105.

Merriam Webster. 2020. Data. https://www.merriam-webster.com/dictionary/data. Accessed 12 July 2020.

Zins, Chaim. 2007. "Conceptual approaches for defining data, information, and knowledge." Journal of the American Society for Information Science and Technology 58 (4): 479-493.

Techopedia. 2020. "Server." https://www.techopedia.com/definition/2282/server. Accessed 12 July 2020.

Appendix

Digital Data

In contemporary geographic information systems, geospatial data is stored as digital data.

As the name implies, digital data consists of digits or numbers. Internally, digital electronic technology represents data as binary signals (bits) that are either on or off. This binary representation allows a high level of flexibility and accuracy in the representation and processing of data.

For historical reasons, bits are clumped into groups of eight that are called bytes. If you run through all the possible combinations of eight bits, you will find that a byte can have 256 different values (numbers 0 - 255). This is enough for each byte to represent a single character in most languages, so a five-character word like Hello requires five bytes to store.

To improve speed, modern computers process multiple bytes at one time as words. Although mobile devices and older computers use 32-bit words (four bytes), most contemporary laptops and desktops use 64-bit words (eight bytes).

The amount of storage in a computer or storage device is usually measured by the number of bytes that it can store. Because storage devices can store trillions or quadrillions of bytes, Greek prefixes are used to make referring to numbers of bytes easier. However, because this is digital data, powers of two are used, making the decimal numbers look a bit sloppy

Physical Storage Media

Digital data is stored on a variety of physical media, depending on how quickly the data needs to be accessed, how much data needs to be stored, and whether the data needs to continue to exist when the digital device is turned off or rebooted.

Considerations When Choosing Storage Formats and Platforms

A number of factors need to be considered when choosing the appropriate storage hardware and formats for a project. Those needs are driven by the organizational size and mission: What are you ultimately trying to accomplish with the data?

  1. Number of readers
    • How many people need to access the data?
    • How quickly do they need access to the data?
  2. Number of editors
    • How many people capture, process and maintain the data?
    • Will multiple be working on the data at the same time?
  3. Frequency of change:
    • How often is the data changed?
    • How quickly do changes need to be available to users?
  4. Volume and types of data:
    • How much data exists?
    • How much data will exist?
    • How many different types of data need to be kept together?
    • How will needs grow or shrink over time?
  5. Access security:
    • Who needs access to the data?
    • Who should be kept out of the data?
    • Do federal or state regulations require restricting access to the data?
    • How do the costs of a security breach balance against the costs of security?
  6. Availability security:
    • What would happen if this data were lost or destroyed?
    • Who will perform backups?
    • Does this data need to survive this project?
  7. Cost:
    • Will this be compatible with existing processes?
    • What are the set-up and maintenance costs for storage?
    • What can we afford in terms of both capital investment and manpower?
    • Do managers or co-workers have a preconceived bias against a technology?