CyberGISX

This tutorial demonstrates how to create a basic Jupyter notebook using spatial data on the CyberGISX platform provided by the CyberGIS Center for Advanced Digital and Spatial Studies and CyberInfrastructure and Geospatial Information Laboratory (CIGI) at the University of Illinois.

Background

CyberGIS is a conceptual framework that merges cyberinfrastructure and geographic information systems to facilitate computationally intensive and collaborative spatial analysis and modeling (Wang 2010).

The term cyberinfrastructure emerged in the 1990s with, perhaps, the earliest appearance in a comment by Jeffrey Hunker (then director of the Critical Infrastructure Assurance Office) at a 1998 press conference on Presidential Decision Directive NSC 63: Critical Infrastructure Protection (The White House 1998):

One of the key conclusions of the President's commission that laid the intellectual framework for the President's announcement today was that while we certainly have a history of some real attacks, some very serious, to our cyber-infrastructure, the real threat lay in the future. And we can't say whether that's tomorrow or years hence. But we've been very successful as a country and as an economy in wiring together our critical infrastructures. This is a development that's taken place really over the last 10 or 15 years -- the Internet, most obviously, but electric power, transportation systems, our banking and financial systems.

The term was more clearly defined in a 2003 NSF report (Atkins et al 2003):

The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function. Although good infrastructure is often taken for granted and noticed only when it stops functioning, it is among the most complex and expensive thing that society creates.

The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy.

For the purposes of CyberGIS in an academic institution, a more specific definition of cyberinfrastructure was developed in 2009 by the EDUCAUSE Campus Cyberinfrastructure Working Group and Coalition for Academic Scientific Computation (EDUCAUSE 2009, Stewart et al. 2010):

Cyberinfrastructure consists of computational systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible.

The CyberGIS Center for Advanced Digital and Spatial Studies "was established in 2013 as a cross- and trans-disciplinary center engaging a number of units at the University of Illinois at Urbana-Champaign and diverse partners in the US and world" (CyberGIS Center 2021).

In 2014, the CyberGIS Center received a National Science Foundation major research instrumentation grant to establish the ROGER (Resourcing Open Geospatial Education and Research) cyberGIS supercomputer (Wikipedia 2021). The physical ROGER was later supplanted by the cloud-based Virtual ROGER integrated with the Keeling compute cluster operated by the U of I School of Earth, Society, and Environment (SESE) (CyberGIS Center 2018)

CyberGISX is an integrated development and sharing platform running on Virtual ROGER that provides support for geospatial software and applications.

Figure image
The ROGER CyberGIS supercomputer (Wikipedia 2016)

Creating a New Session

UIUC students and faculty can log in to the CyberGISX Hub using your U of I NetID and password. Users from more than 4,000 additional universities, research institutes, and academic organizations in the US and worldwide can quickly register and start using CyberGISX with their institutional credentials.

Logging in to CyberGISX

Create the Notebook

CyberGISX uses Jupyter notebooks as the programming interface.

A notebook is an interactive interface that allows you to integrate programming code with documentation, analysis, and visualizations.

To create a new Jupyter noteboox in CyberGISX:

  1. At the Home Page, create a New Notebook:, Python 3.
  2. Right click and Rename the new Untitled.ipynb notebook file to something meaningful (Minn 2023 State Energy).
Creating a new notebook in CyberGISX

Markdown Cells

Jupyter notebooks are composed of cells, which are individual sections of the notebook that can contain programming code (Python) or text (markdown).

To add a heading to your notebook:

# State Energy Profiles

Michael Minn

28 August 2023
Creating a markdown cell

Code Cells

New cells for code can be added by clicking the plus sign (+) on the toolbar.

The following code loads state energy profile data that will be used in this tutorial.

import geopandas

import matplotlib.pyplot as plt

states = geopandas.read_file("https://michaelminn.net/tutorials/data/2019-state-energy.geojson")

states.info()

RangeIndex: 53 entries, 0 to 52
Data columns (total 42 columns):
 #   Column                           Non-Null Count  Dtype   
---  ------                           --------------  -----   
 0   ST                               53 non-null     object  
 1   Name                             53 non-null     object  
 2   GEOID                            53 non-null     object  
 3   AFFGEOID                         53 non-null     object  
 4   Square.Miles.Land                53 non-null     float64 
 5   Square.Miles.Water               53 non-null     float64  
 6   State.Name                       51 non-null     object  
...
 36  CO2.Per.Capita.Tonnes            51 non-null     float64 
 37  Renewable.Standard.Type          51 non-null     object  
 38  Renewable.Standard.Name          51 non-null     object  
 39  Renewable.Standard.Year          38 non-null     float64 
 40  Senators.Party                   50 non-null     object  
 41  geometry                         53 non-null     geometry
dtypes: float64(33), geometry(1), object(8)
memory usage: 17.5+ KB
Creating a code cell

Mapping with GeoPandas

Mapping Categorical Attributes

Methods are functions that are associated with specific classes of objects. While functions can stand alone, methods, as the name implies, perform actions on or based on the contents of the classed objects that they are called with.

Methods are foundational to object-oriented programming where the complexity of operations on objects are hidden from the programmer so they can focus on the high-level objectives of the program rather than the low-level details of operations on complex objects.

states.plot("Renewable.Standard.Type", legend=True)

plt.show()
Default plot of the types of renewable energy standards in each state

Categorical Color Map

There are a variety of predefined colormaps that you can use to create a map using more descriptive colors by passing the name of the colormap in the cmap parameter on plot()

states.plot("Renewable.Standard.Type", legend=True, cmap="coolwarm")

plt.show()
Map of renewable energy standards by state as of 2019 using an explicit predefined colormap

Mapping Quantitative Attributes

The plot() function can also be used to map quantitative attributes, although, again, you may need additional parameters to get what you want.

This creates a map of per-capita energy use in each state in millions of BTUs. The default plot uses a blue-purple-yellow color ramp:

states.plot("Consumption.Per.Capita.MM.BTU", legend=True)

plt.show()
Default plot of energy use per state in millions of BTU in 2019

Quantitative Color Map

You can specify a specific colormap with the cmap parameter and use easier-to-read categorized colors by passing the scheme = "naturalbreaks" parameters to plot().

states.plot("Consumption.Per.Capita.MM.BTU", legend=True, cmap="coolwarm_r", scheme="naturalbreaks")

plt.show()
Choropleth with categorized colors

Selection

There will likely be situations where you only want to use a specific selection of features from a geospatial data set.

GeoPandas GeoDataFrame are extensions of Pandas DataFrame, and rows can be selected by attribute using the same techniques used in Pandas

northeast = states[states.ST.isin( \
	['ME', 'VT', 'NH', 'CT', 'RI', 'NY', 'NJ', 'MA', 'PA'])]

northeast.plot("Consumption.Per.Capita.MM.BTU", legend=True, cmap="coolwarm", scheme="naturalbreaks")

plt.show()
Choropleth with selected data

Projections

The Earth exists in three-dimensions but, other than globes, most representations of the earth, such as printed maps or web maps, are two dimensional. A projection is a set of mathematical transformations used to represent the three-dimensional world in two dimensions.

By default, the plot() function plots the geospatial data using an equirectangular projection that may be undesirable depending on what part of the world you are mapping and what you are using the map for.

The to_crs() method can be used to reproject a geospatial object to a new projection. The parameter will accept a EPSG or ESRI WKID or a proj-string that describes the desired projection.

For this map of the US, we use an ESRI WKID for a Lambert conformal conic projection centered on the continental US.

continental = states[~states.ST.isin(['AK', 'HI'])]

continental = continental.to_crs("ESRI:102009")

continental.plot("Consumption.Per.Capita.MM.BTU", legend=True, cmap="coolwarm", scheme="naturalbreaks")

plt.show()
Choropleth using a Lambert conformal conic projection

Correlation

The pandas package upon which geopandas is built is used for data manipulation and analysis. While there are specialized analysis functions that take advantage of the unique characteristics of spatial data, simple non-spatial functions can be used for data exploration.

For example, there is a field for gross domestic product (GDP.B), which represents the total amount of economic activity in the state. Greater economic activity is generally associated with higher energy use. To examine whether that is true at the state level, we can plot an X/Y scatter chart between the two attributes to see if the plot shows a correlation.

plt.scatter(states["GDP.B"], states["Consumption.Total.B.BTU"])
plt.ylabel("2019 Energy Consumption (MM BTU)")
plt.xlabel("2019 GDP ($B)")
plt.yscale("log")
plt.xscale("log")
plt.show()
X/Y scatter chart comparing GDP with total energy use by state

Linear Model

We can use the OLS.fit() method from the statsmodels module to create a simple bivariate linear model for the relationship between GDP and energy consumption.

import statsmodels.api as sm

y = states["Consumption.Total.B.BTU"]

x = states[["GDP.B"]]

model = sm.OLS(y, x, missing="drop").fit()

model.summary()

R2 is a value from zero (no correlation) to one (perfect correlation) and the adjusted R2 value of 0.779 indicates a very strong correlation, as expected.

Linear model of total energy use by state as a function of GDP

Regression Line

Finally we can plot model predictions as a regression line across the scatter chart.

plt.scatter(states["GDP.B"], states["Consumption.Total.B.BTU"])

y_model = model.predict(x)

plt.plot(x, y_model, color="maroon")

plt.text(50, 7e6, "R^2 = " + str(round(model.rsquared_adj, 3)))
plt.ylabel("2019 Energy Consumption (MM BTU)")
plt.xlabel("2019 GDP ($B)")
plt.yscale("log")
plt.xscale("log")

plt.show()
X/Y scatter chart comparing GDP with total energy use

Finishing Up

Render

Rendering a notebook is the process of transforming a notebook and the computed results into a format that can be read outside of the Jupyter interface.

Jupyter notebooks are commonly rendered into HTML for viewing in web browsers, or PDF files for printing.

Rendering a notebook to HTML

Log Out

When you are finished with your notebook, log out.

Ending a session

Reopening a Notebook

To reopen a notebook, find it in the directory list on the left side of the CyberGISX screen and double-click to reopen.

Reopening a notebook

Sharing with OneDrive

CyberGISX kernels are local to the CyberGISX environment, so if you want to share your notebook or associated data over the Internet, you need to put it on a server.

Sharing a Notebook on OneDrive

  1. Right click on the notebook and Download the notebook to your local storage.
  2. Upload the file in OneDrive.
  3. Share with Anyone with the link.
Sharing a notebook with OneDrive

Sharing HTML on OneDrive

  1. File, Save and Export Notebook As..., HTML
  2. Upload the file to OneDrive
Sharing HTML file with OneDrive

Sharing with GitHub

GitHub is an internet hosting service for sofware developers that uses the Git version control software (Wikipedia 2023).

GitHub provides a wide variety of sophisticated features for collaborating on complex software projects, but some easily-accessible features can be useful by users with limited development experience. GitHub is integrated into CyberGISX for sharing notebooks and data.

New Account

You can create a new GitHub account by clicking the Sign in link on the GitHub.com home page and then clicking the Create an account link.

Figure
Creating a new GitHub account

New Repository

Collections of related project files are kept together in GitHub repositories.

  1. To create a new repository, navigate to the Repositories page and select New.
  2. Give the repository a meaningful name (cybergisx).
  3. Click Create repository.
Creating a new GitHub repository

Sharing Notebooks

  1. Click the Restart the kernel and rerun the whole notebook to make sure the notebook runs from the top and contains graphics.
  2. Save the notebook to update the file.
  3. Right click on the notebook and Download.
  4. Upload the file to GitHub:
  5. Commit the changes to the repository.
  6. Click the file to see the file.
  7. Copy permalink to get a link you can share with others.
Sharing a notebook via GitHub

Updating Files

To upload a new version of a file:

  1. Re-run, save, and download your notebook.
  2. Add file and Upload files using a file with the same name.
  3. Commit the changed file.
  4. Git track changes to different versions of the same file, and you can see prior versions of a file in its History page.
Updating a file in GitHub

Sharing Data Files

If you have a data file that you wish to share so people who have your notebook can access that data through the internet, you can upload it to GitHub and get a shared link to incorporate in your notebook.

Sharing a data file on GitHub