# Spatial Sampling and Interpolation

Many phenomena that we want to understand occur over wide geographical areas. While there are numerous cases where remote sensing techniques using satellite or aerial photography can be used to capture such data, there are often situations where it is not possible to directly capture a complete surface data set.

In such cases, you will need take **sample** measurements at selected
points and then **interpolate** the values for the areas in between.

For example, it would be impossible to completely measure the amount and quality of gold ore deep under a large field. But if we could take samples from handful of drilled boreholes and estimate the nature of the deposit in the areas between those boreholes, that would allow us to estimate the volume and quality of the reserve, and know whether further investment would be justified.

While the techniques for spatial interpolation are often domain-specific and, accordingly, complex, there are some fundamental spatial sampling and interpolation techniques that are applicable across a wide variety of domains, and this tutorial will introduce how to execute those techniques in R.

## Surfaces and Rasters

The values of a variable across space are commonly conceived of as
**surfaces**. While the imagery of a surface implies elevation (height),
variables modeled with surfaces can be anything that has a distribution across
space, such as prevalance of an animal species, temperature, or concentrations
of pollutants.

Surfaces are commonly represented as **raster** grids of evenly spaced
cells. In R, the **raster** library provides a wide variety of functions for
building, manipulating, and analysing surfaces represented with rasters.

For example, **digital elevation models (DEM)** are
rasters of elevation data. The US Geological Survey
distributes DEMs via
The National Map. The example below uses this DEM data
for an area around Spokane, WA. The **plot()** function gives a
**false-color** view of the data.

dem = raster("imgn48w118_1.img") plot(dem)

The **persp()** function creates a perspective visualization of surface data.

The exact parameters needed to make the visualization useful will depend on your data. In the example below:

*border=NA*turns the borders off, because with high-resolution data like this, borders turn the surface black.*shade=0.5*enables hillshading, giving the effect of light and shadows*scale=F*preserves the aspect ratio of cells (rather than expanding them to fit the display) and*expand=0.0001*reduces the scaling of elevation since x/y coordinates are in degrees but elevation is in meters

persp(dem, col='tan', scale=F, expand = 0.0001, shade=0.5, border=NA)

## Spatial Sampling

There are a variety of spatial **probability sampling**
approaches that are similar to their non-spatial counterparts.
de Smith, Goodchild, and Longley (2015) point out four
systematic approaches:

- Regularly spaced X/Y locations on a grid
- Random X/Y locations
- Regular grid with random X/Y offsets
- Regular grid with random row or column offsets

However, as with other types of sampling like polls and surveys,
strict adherence to rigorous regularity or randomness may not be practical.
**Nonprobability sampling** techniques like **convenience sampling** where samples are
gathered at the most accessible or permissible locations may be necessary.

Although the statistical validity of inferences made from that data is reduced and uncertainty is increased, depending on the nature of the sampled phenomenon and the desired level of rigor, such data may be of value for preliminary research or conclusions that are presented with caveats.

## Example GPX Data

The following examples will demonstrate spatial interpolation using GPX data gathered as a convenience sample by traversing walkways on the Eastern Washington University campus. This demonstration GPX file can be downloaded HERE....

While spatial interpolation is used for a wide variety of applications other than elevation, elevation is a readily-understandable and accessible example for demonstration.

GPX is an XML-based format used to store GPS tracking information. There are a wide variety of apps available for both iPhone and Android devices that can create GPX files using the GPS receivers in all contemporary smartphones.

GPX files can contain a number of different sets of points that
are treated as layers by OGR, although some layers may be empty.
You can list the layers with
the **ogrListLayer()** function from the **rgdal** library.
You can list information about a layer with the **ogrInfo()** function.

> library(rgdal) > ogrListLayers("2017-05-010-175527.gpx") [1] "waypoints" "routes" "tracks" "route_points" "track_points" attr(,"driver") [1] "GPX" attr(,"nlayers") [1] 5 > ogrInfo("2017-05-010-175527.gpx", "track_points") Source: "2017-05-010-175527.gpx", layer: "track_points" Driver: GPX; number of rows: 261 Feature type: wkbPoint with 2 dimensions Extent: (-117.5863 47.48662) - (-117.5771 47.49273) CRS: +proj=longlat +datum=WGS84 +no_defs Number of fields: 26 name type length typeName 1 track_fid 0 0 Integer 2 track_seg_id 0 0 Integer 3 track_seg_point_id 0 0 Integer 4 ele 2 0 Real 5 time 11 0 DateTime 6 magvar 2 0 Real 7 geoidheight 2 0 Real 8 name 4 0 String 9 cmt 4 0 String 10 desc 4 0 String 11 src 4 0 String 12 link1_href 4 0 String 13 link1_text 4 0 String 14 link1_type 4 0 String 15 link2_href 4 0 String 16 link2_text 4 0 String 17 link2_type 4 0 String 18 sym 4 0 String 19 type 4 0 String 20 fix 4 0 String 21 sat 0 0 Integer 22 hdop 2 0 Real 23 vdop 2 0 Real 24 pdop 2 0 Real 25 ageofdgpsdata 2 0 Real 26 dgpsid 0 0 Integer

The **readOGR()** function will read points from
a GPX file into a SpatialPointsDataFrame object:

> points = readOGR(dsn='2017-05-010-175527.gpx', 'track_points', stringsAsFactors=F) OGR data source with driver: GPX Source: "2017-05-010-175527.gpx", layer: "track_points" with 261 features It has 26 fields > names(points) [1] "track_fid" "track_seg_id" "track_seg_point_id" [4] "ele" "time" "course" [7] "speed" "magvar" "geoidheight" [10] "name" "cmt" "desc" [13] "src" "url" "urlname" [16] "sym" "type" "fix" [19] "sat" "hdop" "vdop" [22] "pdop" "ageofdgpsdata" "dgpsid" > plot(points)

If your data includes points gathered outside the particular area of interest, they can be removed by subsetting as with any other SpatialPointsDataFrame. For example, the following selection removes possibly errant points from the beginning and ending of the survey.

> range(points$time) [1] "2017/05/01 23:52:11+00" "2017/05/02 00:55:30+00" > points = points[(points$time >= '2017/05/01 23:54') & (points$time < '2017/05/02 00:50:00'),] > range(points$time) [1] "2017/05/01 23:54:04+00" "2017/05/02 00:49:44+00" > spplot(points, zcol='ele')

An OpenStreetMap base map can be used to give geographic context:

library(OpenStreetMap) basemap = openmap(upperLeft = c(47.494, -117.588), lowerRight = c(47.486, -117.576), type="osm") palette = colorRampPalette(c("#a6611a", "#018571")) breaks = qunif(seq(0, 1, 0.2), min=min(points$ele), max = max(points$ele)) colors = palette(5)[cut(points$ele, breaks)] plot(basemap) plotpoints = spTransform(points, osm()) points(plotpoints, pch=19, col=colors)

Spatial interpolation will be more meaningful if a planar coordinate system is used, where coordinates represent distances on the ground. The State Plane Coordinate system is commonly used in the US. Note that the transformation casuses the map to be tilted off the north-south axis due to the conic transformation needed to make ground distances accurate.

# NAD 1983 HARN StatePlane Washington North FIPS 4601 # http://spatialreference.org/ref/esri/102348/ washington_north = CRS("+proj=lcc +lat_1=47.5 +lat_2=48.73333333333333 +lat_0=47 +lon_0=-120.8333333333333 +x_0=500000 +y_0=0 +ellps=GRS80 +units=m +no_defs") points = spTransform(points, washington_north) basemap = openproj(basemap, washington_north) plot(basemap) points(points, pch=19, col=colors)

## Rasterization

The first step to creating a surface interpolated from points is to define the spatial extent and cell dimension for the raster used to represent the surface.

The function below creates a template raster with no values that can be used by subsequent functions to create rasters with the same dimensions. Because the state plane coordinate system used is in feet, the pixel size of 25 means each pixel is 25 feet square. The value you use will be based on how much spatial resolution you need. Higher resolutions require more memory, and imply higher spatial accuracy, which may not be justified if you do not have enough data.

library(raster) pixelsize = 25 box = round(extent(points) / pixelsize) * pixelsize template = raster(box, crs = washington_north, nrows = (box@ymax - box@ymin) / pixelsize, ncols = (box@xmax - box@xmin) / pixelsize)

This template raster has no data, but the
**rasterize()** function from the **raster** library can
be used to visualize the sampled data in the context of the surface
that will be created, and assess whether the pixel size is appropriate
for your data. The **fun=mean**
parameter is passed to use the mean() function to find the average elevation
when multiple points overlap single raster pixel.

points$PRESENT = 1 pointraster = rasterize(points, template, field = 'ele', fun = mean) plot(pointraster)

## Thin Plate Spline

Interpolation techniques are based on **models** that use patterns in the
data (and sometimes other fields) to predict what missing values should be.
There are a variety of different models that can be fed into the
**interpolate()** function from the raster library, along with a template
raster, to create interpolated surfaces.

A simple model is a **thin plate spline**, which can be created
using the **Tps()** function from the **fields** library. Thin plate
splines were introduced to geometric design
Jean Duchon in 1977.

As the name implies, a thin plate spline can be visualized metaphorically as the shape resulting from draping a slightly rigid sheet of metal over a set of elevation points. This creates a smooth surface.

Because this is a geometric technique, it may not fully model the processes causing differentiation in your data. While you may get a surface that looks attractive, it may not accurately represent the actual modeled phenomenon.

library(fields) spline = Tps(points@coords, points$ele) splined = interpolate(template, spline) plot(splined)

Creating a perspective visualization allows a richer view of the surface.

persp(splined, scale=F, expand=15, shade=0.5, col='tan')

Perspective plots can be **false-colored** to accentuate differences,
although this is a bit cumbersome with persp(). Colors passed to persp() are the facets
between points, not the points themselves, so the color matrix needs one fewer
rows and one fewer columns. And the matrix needs to be transposed. And rows
need to be reversed for some reason. These quirks explain the ugly formula on
the first line of code below.

colors = t(as.array(splined)[nrow(splined):2, 2:ncol(splined), 1]) colors = (colors - min(colors)) / diff(range(colors)) colors = rgb(colorRamp(c("#a6611a", "white", "#018571"))(colors), maxColorValue = 255) persp(splined, col = colors, scale=F, expand=10)

## Inverse-Distance Weighting

Another simple model for interpolation is **inverse-distance weighting**,
which interpolates points based on an average of all the sampled points, but which
gives heigher weights to closer points so nearby points have more influence
than more-distant points.

The **gstat()** function from the eponymous **gstat** library
can extract the necessary values and distances from SpatialPointsDataFrame.
The formula in this case tells gstat() to model the elevation attribute
based on one times the inverse distance for each point.

library(gstat) model = gstat(formula = ele ~ 1, data = points) idist = interpolate(template, model) plot(idist)

This model brings out more fine-grained differences than the thin-plate spline, although in this case those differences are as much a result of GPS elevation inaccuracy as actual changes in elevation. This is made clearer in a perspective plot:

persp(idist, scale=F, expand=15, shade=0.5, col='tan')

## Ordinary Kriging

Perhaps the most common spatial interpolation technique
is **kriging**, which was initially proposed by South African
mining engineer Danie Gerhardus Krige in 1951 and further
developed by later mathematicians. Krige is pronounced with a
hard 'G' (rhymes with 'dig').

Kriging assumes that spatial variation in the phenomenon being estimated is statistically homogeneous throughout the surface. The same pattern of variation can be observed at all locations on the surface. Accordingly, ordinary kriging is not appropriate when dealing with surfaces known to have anomolous pits, spikes, or inconsistent changes, and inverse distance weighting might be the better choice. (ESRI 2008)

There are three types of kriging:

**Ordinary kriging**involves fitting a simple curve to the variogram to model a single variable in terms of itself**Universal kriging**involves a model that considers a broad trend over the surface, a random, but spatially correlated component (like ordinary kriging and IDW), and random noise**Co-Kriging**involves a model built from kriging multiple variables

Ordinary kriging involves first creating a **variogram** which models
how much two samples will differ from each other depending on
the distance between those two samples.

Variograms can be created with the **variogram()** function from the
**gstat** library. That variogram can then be
viewed to determine the correct curve to fit to that model.

library(gstat) vario = variogram(ele ~ 1, data = points) plot(vario)

The safest model to start with is the Gaussian model.
The **vgm()** function
creates the model and the **fit.variogram()** adjusts the
model parameters to fit the variogram.

variogram_model = vgm(psill = max(vario$gamma), model = 'Gau', range = max(vario$dist), nugget = min(vario$gamma)) fitted_model = fit.variogram(vario, variogram_model) plot(vario, model=fitted_model)

That model can then be fed into gstat(), and the gstat model can be fed into the interpolate() function to create an interpolated surface.

gstat_model = gstat(formula = ele ~ 1, data = points, model = fitted_model) kriged = interpolate(template, gstat_model) plot(kriged)

The perspective plot shows the smoothing associated with the Gaussian curve:

persp(kriged, scale=F, expand=15, shade=0.5, col='tan')

There are a variety of different model curves and a variety
of different parameters used to tweak those different models.
However, the process can be a bit frustrating, and you might
explore the **autofitVariogram()** function from the
**automap** library if you are having issues getting
fit.variogram() to fit your model.

For example, even though the variogram points form a fairly straight line, fit.variogram() issues a warning and subsequent interpolate() calls will fail.

> linear_model = vgm(psill = max(vario$gamma), model = 'Lin', range = max(vario$dist), nugget = min(vario$gamma)) > linear_fit = fit.variogram(vario, linear_model) Warning message: In fit.variogram(vario, linear_model) : singular model in variogram fit > gstat_model = gstat(formula = ele ~ 1, data = points, model = linear_fit) > kriged = interpolate(template, gstat_model) [using ordinary kriging] There were 50 or more warnings (use warnings() to see the first 50)

## Comparing Methods

Using an actual DEM from the USGS, and reprojecting it to our area of analysis, we can see the actual elevation does not cleanly match the output of any of the interpolation methods. Ironically, the simplest of the models (thin-plate spline) probably gets closest, although the high error associated with the GPS data may also be a factor in poor model fit.

Nonetheless, this example dramatizes how different interpolation techniques can give different results with the same data, and that consideration of the underlying modeled phenomena is important in choosing an interpolation technique.

actual = raster("imgn48w118_1.img") actual = projectRaster(actual, template) persp(actual, scale=F, expand=15, shade=0.5, col='tan')

## Kriging With Meuse Data

Many of the examples of Kriging on the web use the meuse data set that comes with the standard installation of R. This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15 m x 15 m.

While these examples are often arcane in structure and difficult to relate to for people who are not soil scientists, this multivariable data set is especially useful to demonstrate universal kriging and co-kriging.

library(sp) data(meuse) # Rijksdriehoek (RDH) (Netherlands topographical) map coordinates netherlands = CRS("+init=epsg:28992 +towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812") wgs84 = CRS("+proj=longlat +datum=WGS84 +no_defs") points = SpatialPointsDataFrame(coords = meuse[,c('x','y')], data = meuse, proj4string = netherlands) spplot(points, zcol='zinc') library(OpenStreetMap) keymap = openmap(upperLeft = c(51.0919, 5.5069), lowerRight = c(50.8523, 5.993), type="osm") plot(keymap) points(spTransform(points, osm()), pch=15, col='red', cex=0.5)

### Surface Raster Template and OpenStreetMap Base Map

pixelsize = 20 box = round(extent(points) / pixelsize) * pixelsize template = raster(box, crs = netherlands, vals = 0, nrows = (box@ymax - box@ymin) / pixelsize, ncols = (box@xmax - box@xmin) / pixelsize) library(OpenStreetMap) outline = projectExtent(template, longlat()) basemap = openmap(c(ymax(outline), xmin(outline)), c(ymin(outline), xmax(outline)), type='stamen-toner')

### Ordinary Kriging

vario = variogram(log(zinc) ~ 1, points) fitted = fit.variogram(vario, vgm(1, "Sph", 300, 1)) plot(vario, model=fitted)

gstat_model = gstat(formula = log(zinc) ~ 1, locations = points, model = fitted) kriged = interpolate(template, gstat_model) plot(basemap) palette = colorRampPalette(c("#a6611ac0", "white", "#018571c0"), alpha=T)(256) plot(projectRaster(kriged, projectExtent(kriged, osm())), col=palette, add=T)

### Universal Kriging

Universal Kriging incorporates a model based on another spatial variable. In this case, zinc as a function of elevation.

vario = variogram(log(zinc) ~ elev, points) fitted = fit.variogram(vario, vgm(1, "Sph", 300, 1)) names(template) = 'elev' gstat_model = gstat(formula = log(zinc) ~ elev, locations = points, model = fitted) kriged = interpolate(template, gstat_model, xyOnly = F) plot(basemap) palette = colorRampPalette(c("#a6611ac0", "white", "#018571c0"), alpha=T)(256) plot(projectRaster(kriged, projectExtent(kriged, osm())), col=palette, add=T)

### Co-Kriging

Co-Kriging involves performing kriging on multiple variables to build a model based on covariance with other variables. This model has the clearest definition of the group.

model = gstat(formula = log(zinc) ~ 1, data = points) model = gstat(g = model, formula = elev ~ 1, data = points) model = gstat(g = model, formula = cadmium ~ 1, data = points) model = gstat(g = model, formula = copper ~ 1, data = points) covario = variogram(model) cofitted = fit.lmc(covario, model, vgm(model = 'Sph', range = 1000)) cokriged = interpolate(template, cofitted) plot(basemap) palette = colorRampPalette(c("#a6611ac0", "white", "#018571c0"), alpha=T)(256) plot(projectRaster(cokriged, projectExtent(cokriged, osm())), col=palette, add=T)