# Patterns

Visualizations of data make it possible to see patterns that give us some indication of general trends and relationships within time or space. Identification of patterns can then guide further exploration that can help us understand our data and, more importantly, the phenomena that data represents.

Many of the examples in this tutorial are from the Google Public Data Explorer. R code for the simulated curves is here.

## Temporal Trends

Time-series line graphs show changes in a variable
over a period of time. Trends observed in time-series graphs
are **temporal** patterns because they are related to time
rather than space.

Patterns in line graphs are commonly described based on mathematical formulas for curves that roughly model their shapes.

### Linear

Linear patterns form something close to a straight line.

For example, population has grown in a fairly linear fashion since 1960.

With most human or environmental phenomena, patterns of change have some irregularity. For example, there has been a nearly linear increase in the number of secure internet servers since 2001, although there was a notable spike and dip around the late 2000s financial crisis.

### Exponential

Exponential patterns reflect a consistent growth rate. Since growth compounds over successive years, the curve slopes upward at an increasing rate.

For example, air travel worldwide has grown exponentially since 1970.

### Logarithmic

Logarithmic curves are the mathematical reverse of exponential curves. Logarithmic curves start with a steep growth slope, then start to level off.

For example, the number of tractors per 100 square kilometers of arable (farmable) land increased swiftly during the green revolution of the 1960s, but has leveled off as mechanized farming has become the norm around the world in countries that can afford such high-energy techniques.

### Sigmoid Curve

Sigmoid curves are commonly called S-curves because they roughly resemble the shape of the letter "S." Sigmoid curves commonly represent the life cycle of a phenomenon, starting with slow growth following the invention of a technology, exponential growth as that technology is deployed, then a logarithmic leveling off at a plateau as the technology matures or the market saturates.

For example, nuclear power saw significant growth worldwide in the 1970s and 1980s, but plateaued in the 1990s as high costs and growing public concerns made construction of new plants increasingly difficult.

### Cyclical

Many human and environmental phenomena follow regular or irregular cycles of increases and decreases. While some of these cycles are highly predictable (such as seasonal temperature cycles), other cycles can be more erratic.

For example, growth in gross domestic product (GDP, the total amount of economic activity in a country) follows business cycles where the economy booms, then slows down, then returns to growth again.

## Spatial Patterns

Thematic maps are commonly used to find **patterns** that
indicate the presence or absence of some meaningful influence
on the geospatial distribution of a phenomena. For example,
disease hot-spots where large numbers of cases are clustered
together may indicate the presence of some negative environmental
characteristic. On a more positive note, clusters of specific
demographics of people may be used as a guide for businesses
seeking to locate retail outlets targeting those demographics.

While there are an infinite variety of patterns, they can be grouped into three very broad categories: clustered, regular, and random.

### Clustered Patterns

A clustered pattern means that there are clear groupings
of high and low values. This is referred to as **spatial autocorrelation** - the
data correlates with itself (auto) in space (spatial).

For example, in the northeast United States, there are clear clusters of high income counties in the wealthy suburbs of New Jersey (adjacent to NYC) and Virginia/Maryland (adjacent to Washington, DC), while there are obvious clusters of low income in Appalachia (West Virginia, Eastern Kentucky).

### Random Pattern

A random pattern means that there is no obvious arrangement to a distribution. For example, this is the distribution of cropland in Central Illinois. Almost all the agriculture in this part of the state is devoted to alternating crops of corn and soybeans. The choice of whether to plant corn or soybeans is determined by the history of a particular field and market conditions, so there is no broad pattern to which areas are corn and which are soybeans.

### Regular Pattern

A regular pattern means that the shapes or values appear in some kind of consistent geometric relationship such as a grid or spiral. For example, counties in Iowa were laid out in the 19th century in the Public Land Survey System, which was based on numeric intervals of latitudes and longitudes rather than any existing, irregular human or environmental features on the ground. The occasional (but regular) jagged edges reflect adjustments for the curvature of the earth.

## Correlation Patterns

One simple and effective way to visualize the relationship between
two variables is with an **X/Y scatter chart** that places one
variable on the x-axis and the other variable on the y-axis.
The extent to which change in one variable is associated with
change in the other variable is called **correlation**.
Variables with strong correlation form a pattern that forms
a fairly clear curve (usually a line).

### Positive Correlation

A **positive correlation** means that as one variable goes up, the other
goes up as well. When two variables with a positive correlation are plotted on
the two axes of an X/Y scatter chart, the points form a rough line or curve
upward from left to right.

For example, there is a positive geographic correlation between GDP per capita and electricity consumption per capita. Countries at a higher level of development use more electricity for things like air conditioning, home appliances, street lights, etc.

### Negative Correlation

A **negative correlation** means that as one variable goes up, the other
goes down. When two variables with a negative correlation are plotted on the
two axes of an X/Y scatter chart, the points form a rough line or curve
downward from left to right.

Using our prior example, there is a negative geographic correlation between GDP per capita and mortality of children under the age of five. Wealthier countries tend to have better nutrition, medical care, and social order than poorer countries. The wealthier the country, the lower the chance that a child will die before the age of five.

### Weak Correlation

In the positive and negative correlation examples given above, the correlation between the two variables is fairly strong, as indicated by the fairly narrow band of dots forming a fairly clear line.

In some cases, the correlation is not quite as strong, meaning
that there is a general upward or downward pattern on the X/Y scatter
chart, but there are numerous **outliers** that are exceptions
to the trend.

For example, the graph below shows that there is a weak correlation between GDP per capita and the adolescent fertility rate (births per 1,000 women ages 15-19 = teen pregnancy). While women in wealthy countries generally wait until their 20s or 30s to have their children, women in poorer countries often begin having children much earlier in life, reflecting characteristics like the traditions of that country, education levels for women, the ability of women to control their own destiny, etc.

However, there are a number of notable exceptions, such as the Sub-Saharan country of Burundi, which is poor but has an adolescent fertility rate similar to that of the wealthy United States.

A graph of weak correlation often exhibits **heteroskedasticity** where the
correlation is weaker on one side of the graph than the other. The dots tend to
spread out on one side of the graph more than they do on the other side.

### No Correlation

When two variables with **no correlation** are plotted
on the two axes of an X/Y scatter chart, the points form a
diffuse cloud.

For example, there is no correlation between GDP per capita and the proportion of seats held by women in the national legislature. There are wealthy countries such as Sweden that have a high percentage of female legislators and there are poor countries like Rwanda that also have a high percentage of female legislators.