Geographic Information Systems Architecture
Geographic information systems consist of a variety of different hardware, software, and human components that work together harmoniously. The architecture of a computer system is "the manner in which the components of a computer or computer system are organized and integrated" (Merriam-Webster 2020). Different types of system architecture are used for different needs. These architectures can be simple, such as a stand-alone home desktop systems, or they can be intricately complex, like cloud-architectures that rely on layers of developers, engineers, and technicians to satisfy challenging business or governmental requirements.
Understanding the different available software and hardware options can help GIS professionals make optimal system design choices needed to meet the needs of the organization in the most efficient manner possible.
Types of Architecture
There are four fundamental components to a geographic information system:
- The user interface where users interact with the system. This component consists of hardware like laptops, desktop computers, and mobile devices, as well as software like browsers and apps that the user accesses directly.
- The applications are software components that the user can use to manipulate or analyze the data.
- The database management system (DBMS) is software that controls access to the data and converts it into formats that can be used by the tools and user interfaces.
- The database is where geospatial data is stored.
The simplest GIS architecture is a stand-alone desktop or laptop that houses all components. This architecture is appropriate for users who are working alone and do not need to regularly share data. Examples of this include ArcMap software or, in the non-spatial world, use of Microsoft desktop programs like Word, Excel, or PowerPoint on files stored on your personal hard drive.
Since most GIS projects and training involves some measure of collaboration, most architectures separate components in the client-server model. Servers are computers on a network that are dedicated to managing network resources. Servers provide services to other computers on that network called clients that need those services (Techopedia 2020).
For a academic classroom lab or small business, the simplest form of client-server architecture is stand-alone computers connected to a central file server or database server through a local network. In such cases, the applications are still on individual desktop computers, but the data is kept on a server and managed by the DBMS. The users see the data as if it were on their own machines.
As the needs of an organization get larger and the types of clients become more diverse, an enterprise architecture is needed to separate the system components across multiple servers and networks in various configurations. In this usage, the term enterprise refers to large businesses and government organizations.
As a network continues to grow, the expense and difficulty of maintaining increasing numbers of dedicated physical server computers becomes an issue. This challenge, along with the ubiquitous deployment of high-speed internet connections led in the 1990s to the development of the cloud architecture.
In a cloud architecture, massive racks of servers run in large data centers, and customers contract with cloud providers to have access to services provides across the internet by those virtual servers. While this architecture may appear to users as if it were a traditional enterprise architecture, you do not know or care exactly what physical server is providing you with a service at any given time. Server administrators can increase or decrease server capacity as needs dictate, allowing for more flexibility and more economical use of resources.
The flexibility and low cost of this architecture for users and companies have made cloud architecture increasingly dominant for both simple private consumer needs as well as large enterprise needs. The Google apps (GMail, Google Sheets, Google Docs) and Office 365 (from Microsoft) are examples of cloud-based consumer applications.
Companies needing to build enterprise networks commonly contract with cloud service providers who handle the construction and maintence of the hardware and networks. Cloud services are available from a number of providers, but three major companies currently dominate cloud computing as of this writing: Amazon Web Services (AWS), Microsoft (Azure), and Google.
Database Management Systems
Outside of single-user projects, data needs to be shared by groups of people. These groups of people can have a variety of different data needs, use different devices and applications to access the data, and need access from a variety of different geographic locations.
A database management system is "a specialized computer program for organizing and manipulating data" (Bolstad 2019, 331). Following the client-server model, the database management system for a project or organization is located on centralized server(s), and users access the data on the server through a computer network, which can include the internet.
The use of centralized database management systems provides numerous advantages over keeping data on individual machines:
- Centralized control: The data can be kept in an organized structure and there is always authoritative versions of the data that users can trust to be the most current or accurate versions.
- Multiple views: Different users may need to use the data in different ways with different programs and keeping the data centralized in a general, standardized structure makes support of these different programs easier and less error-prone.
- Security: Administrators can limits access and/or changes to specific parts of the data to users with appropriate authorization.
- Backup / restore: The data can be efficiently backed up (preferably to a remote location), which allows at least partial restoration and recovery in the case of a hardware failure, major software failure, or malicious hack.
- Concurrency control: When multiple users need to update the data at the same time, those requests can be handled in a way that resolves conflicts and prevents the data from becoming corrupted or inconsistent.
- Data independence: A database can usually be designed in a way that reflects the phenomena being represented rather than how applications are designed to use that data. This makes it possible to conveniently design and upgrade applications without having to completely redo your database design.
- Speed: Database management systems use a variety of techniques that have evolved over the decades to provide extremely fast access to large volumes of data.
- Efficiency: Established design techniques and software optimizations permit the most efficient use of available storage space.
Most contemporary geospatial databases are extensions of relational databases, which were general purpose databased first proposed by computer scientist E.F. Codd in (1970).
Relational databases are composed of sets of tables. Tables are like spreadsheets in that they are arrays of data arranged in rows and columns.
With geospatial data, the rows are records represent individual features and the columns represent attributes for each feature. The attribute columns are referred to as fields.
Unlike cells in a spreadsheet, each field must have a specific data type indicating what kind of values it can have (text, integers, real numbers, etc.).
Fields also sometime have domains which indicate the range of acceptable values. Setting domains on fields can be useful to prevent accidental insertion of invalid values.
As an example, the following is a table that represents building locations.
A database table has one or more columns referred to as keys. Primary key values in each row uniquely identify that particular row. In the street table example above, the BIN (building identification number) column is the primary key.
Primary keys, as well as columns of spatial data, can be connected to indexes that are designed to dramatically increase the speed with which values from a key field can be searched for specific values.
The relational part of the name relational database means that the tables in the database are related to each other. Keys allow rows in one table to be associated with rows in another table.
For example, below is a table of restaurants where the BIN field is a foreign key that identifies the building for each restaurant.
Separating information into multiple tables connected by keys reduces duplicated information and wasted space. It also makes changes simpler, since information only needs to be changed in one record rather than across multiple records. The mathematically ideal structure for a database is called a normal form. The process of structuring a relational database in this way is called normalization.
Interaction with a relational database is commonly performed with a language called structured query language (SQL). Even when geospatial data is being handled transparently by a consumer app, behind the scenes, the app may be using SQL to extract, add, or modify information in the database.
SQL is a complex and powerful language, and there are variations on that language used by different DBMS. However, giving examples of a few common commands will give you some sense of what can be done with SQL.
For example, the primary command for extracting information from a table is SELECT. For example, to show all street segments on East John Street:
SELECT * FROM Segments WHERE Street = 'South Gregory'; +-------+---------------+-----------------------+-------------------------------+ | BIN | Number | Street | Geometry | +-------+---------------+-----------------------+-------------------------------+ | 1001 | 701 | South Gregory | POINT(40.1064, -88.2217) | +-------+---------------+-----------------------+-------------------------------+
Software Business Models
Expenses are associated with development, operation, and maintenance costs for software and services. Accordingly, all types of geographic information systems have associated business models that define how income is generated to pay for the costs associated with the software and services, and in the case of private companies, how those companies will make a profit off the software and services they provide. Different architectures are conducive to different business models.
There are four general business models that are common in GIS:
- Software-Based Licensing
- User pays up-front cost to license the software
- User pays additional charges for upgrades and / or support
- The classic model for proprietary desktop and enterprise network architectures
- Subscription-Based Licensing
- Users pay a regular fee to access proprietary software and services.
- Creates a steady revenue stream for providers
- Commonly used with cloud-based services
- There are no direct costs to user for ordinary use of services.
- Advertisers are the customers who pay for access to user eyeballs and data.
- Users are the product who recieve services in exchange for monitoring of their activity.
- Used with cloud-based services
- Open Source
- Software and data can be freely dowloaded and customized.
- The software is supported by a community of companies and individuals rather than a single company.
- End-users are on their own for support and hosting of services.
Proprietary Software and Services
The term proprietary means "something that is used, produced, or marketed under exclusive legal right of the inventor or maker" (Merriam-Webster 2021).Proprietary software is completely controlled by a single company and the details of how that software is built (the source code) and how that software operates and shares data is often information that is shared with users outside the company.
With proprietary GIS software, this monopoly power is the basis of the company's business model that enables it to pay for the continued development of the software and services while making a profit for its shareholders.
The dominant company in enterprise GIS is ESRI, which was founded in 1969 by Jack and Laura Dangermond as a land-use consulting firm.
Although the company came to prominence with with stand-alone software running on minicomputers and desktops, over the past decade the company has increasingly moved to cloud-based architectures. ESRI is probably best best known to academics for desktop software like ArcMap and ArcGIS Pro, and for the ArcGIS Online cloud environment.
ESRI also offers server-based architectures built around their ArcGIS Enterprise software that can be configured in a wide variety of ways to suit business requirements.
While Google is, perhaps, best known as a web search company, their geospatial apps, services, and APIs are integral to the geospatial web. The integration of technology from acquired companies into Google Maps in 2005 revolutionized web mapping.
CARTO is a cloud computing platform that provides GIS, web mapping, and spatial data science tools. The company markets itself as a "Location Intelligence" platform that is readily usable for data analysis and visualization without prior GIS experience.
While the web app software is open source, the online service and data sets are available as a subscription service.
AutoCAD is popular desktop engineering design software includes toolsets that can be used to integrate and visualize geospatial data. Design data is commonly exchanged between GIS and CAD, although the process often requires some manual tweaking, and experience moving data between CAD and GIS is a useful job skill for work in government and work with consulting firms.
Bentley Microstation is another commonly used engineering CAD software package that incorporates mapping capability.
MapInfo is a desktop mapping application first introduced in 1986 that still has a user base, and you may encounter MapInfo files when working with organizations that still use it.
Open collaboration is "any system of innovation or production that relies on goal-oriented yet loosely coordinated participants who interact to create a product (or service) of economic value, which they make available to contributors and noncontributors alike" (Jemielniak and Przegalinska 2020).
The open model exists in contrast to the proprietary model under the belief that community is stronger by standing on each other's shoulders rather than standing on each other toes. The open model emerged as an offshoot of the free software movement in the 1990s. Rather that one company having to bear the total burden of development costs, multiple individuals and organizations make smaller contributions that over time add up to robust software.
The open model is manifest in GIS in three ways: open-source software, open standards, and open data.
Open Source Software
Open source software is software where the programming source code can be accessed and modified, although development expertise is needed to actually make such modifications. More importantly to most users, open source software is also usually freely downloadable.
While open source GIS software projects are developed and supported by a variety of community groups, The Open Source Geospatial Foundation (OSGeo) is a not-for-profit organization whose mission is to foster global adoption of open geospatial technology by being an inclusive software foundation devoted to an open philosophy and participatory community driven development. While not having any direct control over open projects, OSGeo promotes selected projects and, in some cases, serves as a conduit for development funding.
Some notable open-source GIS projects:
- QGIS is open-source desktop GIS software with a user interface similar to ArcMap.
- MapServer and GeoServer are open source platforms for publishing spatial data and interactive mapping applications to the web.
- PostGIS is a set of extensions to the open-source PostGres relational database that permits storage and manipulation of geospatial data.
- The open source R statistical computing package and the Python general-purpose computing language both have libraries for manipulating geospatial data and performing geospatial data science.
For example, the diagram below compares simplified open and proprietary architectures for publishing web maps.
On the open left, QGIS is used to import and process data stored in a PostGIS database. MapServer is used to render the geospatial data into services that can then be accessed by web or mobile clients over the internet.
On the proprietary right, ESRI provides a fully-integrated stack of software. ArcGIS Pro is used to import and process data stored in a SQL Server database. The ArcGIS Enterprise software is at the center of all operations, including rendering the geospatial data into services that can then be accessed by the web or mobile clients over the internet.
A major challenge with geospatial data is that it is stored and disseminated in a variety of different and, often, proprietary formats. This creates a situation where GIS professionals often have to spend significant amounts of time reformatting or recreating data so it can be used in their GIS.
An approach to mitigating this wasted effort is the promotion of open standards that "ensure interoperability, enhance collaboration, and create a diverse, interoperable, decentralized software and data ecosystem that benefits all participants" This makes it possible to create a "data ecosystem where diverse data sources and software can easily be combined in novel ways to create value and provide a platform for innovation" (Alameh 2020).
Even proprietary software companies like ESRI or Microsoft develop their software to utilize open standards to enable interoperability with software from other vendors (giving the proprietary software a wider potential audience), and to eliminate the burden of maintianing proprietary protocols and formats.
The Open Geospatial Consortium (OGC) is a professional community that create free, publicly available geospatial standards. Open standards can be used by both open and proprietary systems.
Some notable standards include:
- SRID: an identification for spatial coordinate systems
- WFS – Web Feature Service: for retrieving or altering feature descriptions
- WMS – Web Map Service: provides map images
- WMTS – Web Map Tile Service: provides map image tiles
- KML – Keyhole Markup Language: XML-based language schema for expressing geographic annotation and visualization on existing (or future) Web-based, two-dimensional maps and three-dimensional Earth browsers
The open model also applies to data. Many governmental organizations around the world make their data freely available to the general public through open data portals, although this data is collected and maintained by government employees.
One effort that represents collaborative creation is OpenStreetMap (OSM), which is a collection of geospatial data built by a community of mappers that contribute and maintain data about roads, trails, cafés, railway stations, and much more, all over the world. While the site is often thought of as the OSM web map that is similar to Google Maps, the primary focus of the OSM project is the collecting and disseminating of the data itself.
The project was initially started by Steve Coast in 2004 following the lead of Wikipedia as an open-source encyclopedia. As of 21 December 2020, OSM had around seven million users, and the OSM database contained around 8.4 billion nodes (lat-long points) and around 726 million ways (line or polygon features).
The community emphasizes:
- Community Driven: OpenStreetMap's contributors include enthusiast mappers, GIS professionals, engineers running the OSM servers, humanitarians mapping disaster-affected areas, etc. The community associated with OSM includes for-profit companies that both use and contribute to OSM, such as CraigsList, MapQuest, JMP (statistical software), Foursquare, MapBox, and many more.
- Local Knowledge: Contributors use aerial imagery, GPS devices, and low-tech field maps to verify that OSM is accurate and up to date. Data is also initially sourced from government entities like the US Census Bureau, and is often based on aerial imagery that for-profit companies like Yahoo and Micro$oft (Bing) have permitted to be used for reference.
- Open Data: OpenStreetMap is open data that you are free to use it for any purpose as long as you credit OpenStreetMap and its contributors. To protect both the data and the project, OpenStreet Map is licenced under the Open Data Commons Open Database License, and if you alter or build upon the data in certain ways, you may distribute the result only under the same licence.
- Foundation Governance: The OSM website and related services are formally operated by the OpenStreetMap Foundation (OSMF) on behalf of the community. Hosting is supported by the UCL VR Centre, Imperial College London and Bytemark Hosting, and other partners.
Considerations With the Open Model
There are a number of issues that should be considered when chosing what kind of software business-model will be most appropriate for your situation.
- Total cost of ownership (TCO): Proprietary software licenses can be expensive, but the total cost over time needed to customize and support "free" open source software may be higher.
- Vendor lock-in: Changing software and business processes can be disruptive and expensive. However, choosing a proprietary solution may lock you into a continued relationship with a vendor that may limit your options and increase your costs in the future.
- License management: Proprietary systems frequently have complex licensing arrangements and software configurations that can be burdensome when working in a dynamic situation that involves changing numbers of users and shifting software needs.
- Customizability: Open-source solutions facilitate customization, although significant knowledge is needed to create such customization, and hiring employees and/or contractors with those skills can be a significant additional expense if customization is not a fundamental part of the usage of the software.
- Integration: Since proprietary software is controlled by a single company, they can design the different components of their software to work together seamlessly. Depending on the application, additional effort and expense can be needed to make disparate open source software components work together.
- Security: Open-source software is often (but not always) considered more secure than closed, proprieatry software since there are more eyes looking at the software. However, proprietary software companies can be more responsive to security issues since their company reputation is at stake.
- Compliance: Large proprietary software companies have lawyers that can certify their software for legal compliance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA), the Family Educational Rights and Privacy Act (FERPA), and Section 508 accessibility (VPATs).