Introduction to Python
Rev. 5 September 2023
This tutorial will give a brief introduction to the syntax of the python programming language.
About Python
Python is an cross-platform, open-source, general purpose programming language is an cross-platform, open-source, general purpose programming language developed by Dutch programmer Guido van Rossum and first released in 1991.
Van Rossum was solely responsible for the project until he ceded responsibility for governance to a five-member steering committee in 2019.
Van Rossum named the language Python because he was reading scripts from the Monty Python's Flying Circus TV show while trying to come up with a name, and he felt that "Python" would be appropriately "short, unique, and slightly mysterious" (Python Software Foundation 2022).
Of the myriad uses for Python, four areas are notable:
- Education: Python has a clear syntax and is feature rich, which makes it both accessible and powerful, which in turn makes it a popular language for teaching programming concepts.
- Analytics: Those same characteristics make Python a common language (along with R) for data analytics and data science. Python works well in notebooks with integrated text.
- Software plugins and extensions: Python is commonly used to customize and extend the capabilities of software using Python application programmer interfaces (APIs). Examples include GIS software like ArcGIS Pro and QGIS, and the 3D graphics software Blender.
- Web development: Python can be used for server side web programming with framworks like Django.
Getting Python
Windows and MacOS installers can be downloaded for free from Python.org. Linux users can install from the standard Debian and RPM repositories.
Note that unless you are working on old legacy software, you should always use one of the Python 3.x.x versions rather than Python 2.7.
Python Console
You can directly interact with Python from the Python console, where you can type in commands line-by-line and see results.
Scripts
A script is "a sequence of instructions or commands for a computer to execute" (Merriam-Webster 2022).
Scripts allow you to easily repeat complex sequences of operations. And if you find an error in your script, you can fix it and rerun the script without the labor of having to repeat long sequences of button clicks that you would need when using software with a graphical user interface like ArcGIS Pro.
Although there are programmer focused interactive development environments like PyCharm available for download, many users can adequately edit and run Python scripts using the Integrated Development and Learning Environment (IDLE) editor that is included with the standard Python installation. The IDLE editor has a simple beginner-friendly user interface while providing syntax highlighting and context-sensitive help.
This video demonstrates creating and running a short script in IDLE.
Notebooks
A notebook is an interactive interface that allows you to integrate programming code with documentation, analysis, and visualizations.
- Jupyter notebooks were developed by Project Jupyter, which was spun off from the IPython interactive computing project in 2014.
- The name Jupyter is a portmanteau formed from the names of the three core language supported by the project: Julia, Python, and R (Wikipedia 2021).
- Notebooks are an incarnation of the concept of literate programming, codified by Donald Knuth in 1984.
- Knuth proposed that since programming methodology had progressed to a point where programs should be considered "works of literature," this should result in a paradigm shift for programmers from "imagining that our main task is to instruct a computer what to do," to concentrating "rather on explaining to human beings what we want a computer to do."
Expressions
At it's simplest, you can use Python as a calculator and it will display the value of mathematical expressions.
2 + 2
4
Python expressions are are similar to traditional mathematical notation and use the same mathematical symbols or operators: + - * /. The double asterisk operator (**) is used for exponents.
Operation | Example | Output |
---|---|---|
Addition | 10 + 2 | 12 |
Subtraction | 10 - 2 | 8 |
Multiplication | 10 * 2 | 20 |
Division | 10 / 2 | 5 |
Exponents | 10**2 | 100 |
As with traditional mathematical notation, parentheses can be used to add clarity to expressions, or to override the normal precedence of operators.
3 + 2 * 4
11
(3 + 2) * 4
20
Objects
To make it possible to use the values of calculations in subsequent formulas, you can assign values to named objects.
The symbolic names used to refer to objects are called variables.
In Python, objects are areas in memory where data is stored, and variables are names that point to those areas in memory (Eubank 2022).
You can use variables in later expressions to save the effort of repeating calculations, or simply to make expressions easier to read.
To display the contents of a object at the console, you simply type in the variable.
x = 10 * 2 x
20
x + 15
35
Variable Naming Styles
Variables must start with a letter and are case sensitive.
hello = 12 Hello = 15 hello
12
Hello
15
You should always try to make your variables meaningful so that you and other people can understand what your objects are. Rather than calling a object containing a standard deviation "s" you might call it "stdev". The extra time spent typing now may save you confusion later.
Variables cannot contain spaces. However there are techniques for representing multi-word variables that get around this issue:
- wordword (lower case)
- word_word (underscore)
- wordWord (camelback or camelCase)
- WordWord (CapWords)
Note that the Style Guide for Python Code generally recommends CapWords formatting, although this is far from a universally followed convention.
Strings
One of the most powerful features of Python is that objects can contain many different types of data.
Objects can contain text. Segments of text are called strings of characters. You assign text by enclosing your text in either double or single quotation marks.
Be aware that text strings and variable names are separate things.
x = "Hello" x
'Hello'
Be aware that text strings and variable names are separate things.
Hello = "Goodbye" Hello
'Goodbye'
The plus (+) operator can be used to concatenate (combine end to end) multiple strings (Lofsöngur).
country = "Iceland" anthem = "Lofsöngur" print('The national anthem of ' + country + ' is "' + anthem + '."')
The national anthem of Iceland is "Lofsöngur."
Lists
In statistical calculations, we commonly deal with multiple numbers at the same time. One of the most powerful features of is that it permits objects to contain multiple numbers at the same time. These collections of numbers are called lists.
Lists can be created by enclosing multiple numbers, strings, or variables in square brackets, and separating the values with commas:
x = [1,3,5,7,10] x
[1, 3, 5, 7, 10]
You can perform operations on lists using mathematical operators.
The plus sign concatenates (combines) two lists.
x = ['Alpha', 'Beta', 'Gamma'] y = [1, 2, 3] x + y
['Alpha', 'Beta', 'Gamma', 1, 2, 3]
The multiplication sign repeats the contents of a list by the given number of times.
y = [1, 2, 3] y * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
Dictionaries
Dictionaries in Python are collections of objects similar to lists, except rather than accessing elements with numbered indices, elements in a dictionary are accessed using values called keys.
Dictionaries can be constructed by specifying key:value pairs within brackets { }.
Dictionary values can be accessed by specifying the key in square brackets [ ].
anthems = { 'United States': 'The Star-Spangled Banner', 'Canada': 'Oh, Canada', 'Mexico': 'Himno Nacional Mexicano', 'Russia': 'Patrioticheskaya Pesnya' } anthems['Russia']
'Patrioticheskaya Pesnya'
You can add or change dictionary values using that same square bracket notation:
anthems['Ukraine'] = 'Derzhavnyy Himn Ukrayiny' anthems['Ukraine']
'Derzhavnyy Himn Ukrayiny'
List entries can be used in expressions just like other variables.
anthems['Russia'] = 'Госудáрственный гимн Росси́йской Федерáции' print('The national anthem of Russia is ' + anthems['Russia'] + '.')
The national anthem of Russia is Госудáрственный гимн Росси́йской Федерáции.
Variables can be used as keys.
country = "Ukraine" print("The national anthem of " + country + " is " + anthems[country] + ".")
The national anthem of Ukraine is Derzhavnyy Himn Ukrayiny.
Functions
A Python function is "a series of statements which returns some value to a caller" (Python Software Foundation 2022).
You call a function with a function name, an open parenthesis, a set of zero or more parameters separated by commas, and then a closing parenthesis. The function then returns an object based on the parameters.
name(parameter1, parameter2, ...)
Python has dozens of built-in functions that are available with a default installation.
The basic descriptive statistical functions are similar to those available in Excel.
x = [2, 5, 13, 24, 35, 40, 35, 24, 13, 5, 2] max(x)
40
Functions that return numeric values can be used in mathematical expressions just like numbers or variables.
y = sum(x) + 2 y
200
print()
The built-in print() function is often used in scripts to display the value of variables or expressions to the screen. While simply typing in the expression or variable will display the value in the Python console, in scripts simply putting a variable alone on a line causes no action.
x = 45 + 72 print(x)
117
str()
If you wish to append a numeric value to a string, you must first use a type convertor like the built-in str() function to convert the number to a string before you can append.
This example also uses the round() function to round the value to two decimal places, which is consistent with display as dollars and cents.
dinner = (18.50 + 4.5 + 22.99 + 4.5) * 1.24 print("The total cost for our dinner with tax and tip was $" + str(round(dinner, 2)) + ".")
The total cost for our dinner with tax and tip was $62.61
range()
The built-in range() function is useful for generating lists of values. The first parameter is the starting value, the second parameter is the value immediatly after the last value.
range(1, 10)
range(1, 10)
range() returns a range object, and you can use the list() function to convert the range to a list.
list(range(1,10))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
An optional third paramter gives the spacing between values (defaulting to one).
list(range(1, 10, 3))
[1, 4, 7]
Modules
A module is a set of functions and other objects that you can include in your script.
Specialized functions can be brought in from modules that permit different types of operations to be performed on different types of data.
For example, the statistics module adds functions for calculating descriptive statistics for lists of numeric values.
Modules are loaded with the import command.
import statistics x = [2, 5, 13, 24, 35, 40, 35, 24, 13, 5, 2] statistics.mean(x)
18
statistics.median(x)
13
statistics.stdev(x)
14.26184
The math Module
The math module provides a wide variety of mathematical functions, such as trigonometric functions.
For this example, we use sin() to calculate the length of the opposite leg of a right triangle, given the angle and the length of the hypoteneuse (radius).
Note that angles in math functions are specified in radians (2π = 360 degrees).
import math degrees = 75 radius = 5 radians = math.pi * (degrees / 180) length = math.sin(radians) * radius print("The length of the opposite side of a right triangle", "with a radius of", str(radius), "and angle", str(degrees) , "is", str(length), ".")
The length of the opposite side of a right triangle with a radius of 5 and angle 75 is 4.8296291314453415.
Great-Circle Distance
Module functions can facilitate complex calculations.
Great-circle distance is the shortest possible distance across the surface of a sphere, and is use to find straight-line distance between two points on the surface of the earth.
The Haversine formula can be used to calculate the distance between two points on the surface of the earth specified with latitudes and longitudes.
Complex formulas like the Haversine formula need to be used when calculating distances across the surface of the earth because the earth is three-dimensional and simple two-dimensional formulas like the Pythagorean theorem are inadequate for making three-dimensional calculations.
The "from math import *" import statement allows you to use imported math functions without having to type the namespace before the function name: sin() instead of math.sin().
from math import * lat1 = 40.10900 long1 = -88.22699 name1 = "Illini Union" lat2 = 40.10621 long2 = -88.22719 name2 = "Foellenger Auditorium" radius = 6378137 # meters flattening = 1/298.257223563 start_x = long1 * pi / 180 start_y = atan2((1 - flattening) * sin(lat1 * pi / 180), cos(lat1 * pi / 180)) end_x = long2 * pi / 180 end_y = atan2((1 - flattening) * sin(lat2 * pi / 180), cos(lat2 * pi / 180)) arc_distance = (sin((end_y - start_y) / 2) ** 2) + \ (cos(start_y) * cos(end_y) * (sin((end_x - start_x) / 2) ** 2)) distance = 2 * radius * atan2(sqrt(arc_distance), sqrt(1 - arc_distance)) print("The distance between " + name1 + " and " + name2 + " is " + str(round(distance)) + " meters.")
The distance between Illini Union and Foellenger Auditorium is 310 meters.
User-Defined Functions
Users can create custom functions using the def keyword.
For example, this function calculates the hypoteneuse of a right triangle using the Pythagorean theorem (勾股定理).
- The def line contains the function name, followed by the function parameters, followed by a colon (:).
- All lines in the block of code that comprise the function are indented one tab space from the def line. Python is an indent-based language.
- The return line specifies the value returned by the function.
- This function uses the square root math.sqrt() function from the math module.
import math def hypotenuse(rise, run): return math.sqrt((rise^2) + (run^2)) hypotenuse(3, 4)
2.6457513110645907
Paths: The os Module
A commonly used module for accessing operating system capabilities is the os module.
The os.getcwd() function returns the name of the current working directory.
import os os.getcwd()
'C:\\Program Files\\Python39'
You can use The os.listdir() function returns a list of files in the directory passed as a parameter. This can be useful if you need to perform some kind of operation on every file in a directory.
path = os.getcwd() os.listdir(path)
[ 'Documents', 'Downloads', 'Photos', 'Music']
Directories exist within a hierarchical system of directories used to organize files.
- A path is a sequence of directories separated by backslashes (\).
- Because the backslash is an "escape" symbol in Python, backslashes are represented with two backslashes back-to-back (\\).
- On Windows systems, the top level (beginning) directory is usually underneath a drive letter indicating the system where the directories are located.
C:\\Program Files\\Python39
The os.listdir() function returns a list of all the files in a directory. For example, the Program Files directory on Windows systems contains (as the name indicates), the files for the software installed on the system.
os.listdir("c:\\Program Files")
['7-Zip', 'Agisoft', 'ArcGIS', 'Common Files', 'Dell', 'desktop.ini', 'dotnet', 'Emulex', 'Exelis', 'GDAL', 'GeoDa Software', 'Golden Software', 'Google', 'Gwb', 'Internet Explorer', 'LAPS', 'LAStools', 'Managed Defender', 'MATLAB', 'Microsoft MPI', 'Microsoft Office', 'Microsoft Policy Platform', 'Microsoft Silverlight', 'MSBuild', 'Notepad++', 'PackageManagement', 'Python310', 'Python37', 'Python38', 'Python39', 'QGIS 2.18', 'QGIS 3.10', 'QGIS 3.16', 'QGIS 3.22.7', 'QGIS 3.4', 'R', 'Reference Assemblies', 'RStudio', 'rtools40', 'SedInConnect', 'SIFT3D 1.4.5', 'TauDEM', 'tempini', 'Uninstall Information', 'VcXsrv', 'Windows Defender', 'Windows Firewall Configuration Provider', 'Windows Mail', 'Windows Media Player', 'Windows Multimedia Platform', 'Windows NT', 'Windows Photo Viewer', 'Windows Portable Devices', 'Windows Sidebar', 'WindowsApps', 'WindowsPowerShell', 'Zabbix']
Graphs: The matplolib Module
The matplotlib library permits visualization of data in Python.
The plot() function draws graphs. By default, when passed a single list, the plot() function draws a line graph.
import matplotlib.pyplot as plt y = [2, 5, 13, 24, 35, 40, 35, 24, 13, 5, 2] graph = plt.plot(y) plt.show()
Histograms can be plotted with the hist() function.
graph = plt.hist(x) plt.show()
Conditions
Python provides operators for comparing values that return logical values (true or false):
- Equals: a == b
- Not Equals: a != b
- Less than: a < b
- Less than or equal to: a <= b
- Greater than: a > b
- Greater than or equal to: a >= b
Comparison operators are commonly used with if statements to choose whether to execute a block of code.
Note that the if statement ends with a colon (:) and, as with functions, the block of code controlled by the if statement is indented.
latitude = 89 if latitude > 90: print("Invalid latitude: ", str(latitude))
Invalid latitude: 91
latitude = 91 if latitude > 90: print("Invalid latitude: ", str(latitude))
Iteration
A for loop is used to run a block of code on all items in a list.
values = [14, 69, 32, 75] for x in values: print(x)
14 69 32 75
One common application for for loops is to perform some operation on all the files in a given directory.
import os path = "C:/Documents/ArcGIS Pro/Projects/Merge/Sources" for file in os.listdir(path): print(path + '/' + file)
C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Alpha C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Beta C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Gamma C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Delta
Python has an in operator that can be used to examine whether one object contains another object. With strings, the in operator can be used to check whether a string contains a substring.
In this example, in is used with an if statement to list only the Word documents (.docx) files in a directory and ignore all others.
import os path = "c:/Documents/ArcGIS Pro/Projects/Merge/Sources" for file in os.listdir(path): if ".docx" in file: print(path + '/' + file)
C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Alpha C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Beta C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Gamma C:/Documents/ArcGIS Pro/Projects/Merge/Sources/Delta
Comments
Comments in scripts are lines that the program ignores. These lines are used for documenting the authorship of scripts and for adding comments that explain what is going on when you have complex sequences of expressions and function calls.
Comments start with a pound sign (#) and tell the Python interpreter to ignore everything that follows on that line.
# Name of script (date) # This is a comment that explains what the line after it does x = 2 + 2 print(x)
4
Packages
Python has a wide variety of modules available beyond those that come with a standard Python installation (like math or statistics).
Because there are so many different modules, and because those modules can be interrelated, there is a hierarchy of structures used to manage modules.
- A module is a set of functions and other objects that you can include in your script.
- A package is collection of related modules distributed and managed as a group.
- A library is a collection of related packages along with operating system files that contain the system code needed by the modules in the packages in the library.
Modules also sometimes provide bindings. Bindings are specialized Python modules containing functions and methods that can be used to call libraries created to be used with other programming languages, usually C or C++.
Repositories are collections of libraries on the internet that are maintained by the Python development team.
You can install packages from repositories using PIP, the package installer for Python.
PIP handles installation of dependencies. Dependencies additional libraries and packages that must also be installed in order to use the modules in a package. Dependencies can get messy and cause confusing installation error messages when they are not carefully configured, or when you are installing a package on a machine with an unusual configuration.
If an import command fails because the module is not installed, you can probably install the needed packages with PIP.
In this example, the numpy module needed by the matplotlib module is not installed and fails when you attempt to import.
Open the Windows Command Prompt and run:
pip import <module_name>
Alternatively, if PIP has not been set up with appropriate environment variables, you can run PIP via Python:
py -m pip install <module_name>