1 Setup

Make sure your have rgdal, raster and igraph installed:

library('raster')
library('rgdal')

2 Raster data objects: raster

The sp package supports raster (grid) data with the SpatialGridDataFrame and SpatialPixelsDataFrame classes. However, the raster package provides more specialized functionalities to raster data. Similar to the sp package, the raster package provides a number of classes and a number of functionalities to operate and interact with these classes. The RasterLayer, RasterStack and RasterBrick classes are the most important.

Whereas sp mainly focuses on the classes itself, the raster package has functions for creating, reading, manipulating, and writing raster data. The package provides, among other things, general raster data manipulation functions that can easily be used to develop more specific functions.

REMEMBER:

When required, the conversion from Spatial*** raster representations to the raster package raster representations is just a single command raster(sp_raster_variable_name) away.

3 Raster representations

A 2D matrix is actually also a representation of a GIS raster, as it provides the possibility to do element-wise calculations:

example_matrix <- matrix(1:6, nrow = 2, ncol = 3)
example_matrix
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

Hence, a reclassification of the raster data can be achieved as follows:

example_matrix[example_matrix >= 3]  <- 3.
example_matrix[example_matrix < 3]  <- 0.
example_matrix
##      [,1] [,2] [,3]
## [1,]    0    3    3
## [2,]    0    3    3

However, for GIS raster representations, the classes provided by the raster package provide these type of operations while having the spatial extent and the projection integrated as well.

3.1 RasterLayer

A RasterLayer object represents single-layer (variable) raster data. A RasterLayer object always stores a number of fundamental parameters that describe it:

  • the number of columns and rows (resolution)
  • the spatial extent (cfr. bbox)
  • the Coordinate Reference System (CRS)

To create a RasterLayer manually, provide the required minimal information:

# specify the RasterLayer with the following parameters:
# - minimum x coordinate (left border)
# - minimum y coordinate (bottom border)
# - maximum x coordinate (right border)
# - maximum y coordinate (top border)
# - resolution (cell size) in each dimension
r_example <- raster(xmn = 4.42, ymn = 50.9, xmx = 5.4, ymx = 51.4, 
            resolution = c(0.1, 0.1))
r_example
## class       : RasterLayer 
## dimensions  : 5, 10, 50  (nrow, ncol, ncell)
## resolution  : 0.1, 0.1  (x, y)
## extent      : 4.42, 5.42, 50.9, 51.4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

From the output we learn that with the default parameters, the CRS is defined here in degrees. However, when it would not make sense, the CRS is default put to NA:

raster(ncol = 36, nrow = 18, xmn = -1000, xmx = 1000, ymn = -100, ymx = 900)
## class       : RasterLayer 
## dimensions  : 18, 36, 648  (nrow, ncol, ncell)
## resolution  : 55.55556, 55.55556  (x, y)
## extent      : -1000, 1000, -100, 900  (xmin, xmax, ymin, ymax)
## coord. ref. : NA

Consider the example r_example defined in degrees and extract some additional information:

class(r_example)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

Indeed, we created a single layer raster layer…

res(r_example)
## [1] 0.1 0.1

with a resolution of 0.1 degrees, which we can change:

res(r_example) <- 0.05
r_example
## class       : RasterLayer 
## dimensions  : 10, 20, 200  (nrow, ncol, ncell)
## resolution  : 0.05, 0.05  (x, y)
## extent      : 4.42, 5.42, 50.9, 51.4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

However, the data currently only defines the skeleton of a raster data set. That is, it knows about its location, resolution, etc., but there are no values associated with it:

hasValues(r_example)
## [1] FALSE

Let’s provide some random numbers from a uniform distribution:

r_example <- setValues(r_example, runif(200, min = 1., max = 10.))
r_example
## class       : RasterLayer 
## dimensions  : 10, 20, 200  (nrow, ncol, ncell)
## resolution  : 0.05, 0.05  (x, y)
## extent      : 4.42, 5.42, 50.9, 51.4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : layer 
## values      : 1.00307, 9.942678  (min, max)
plot(r_example)

These values can be sliced, just as we would do with vectors and matrices:

values(r_example)[1:10]
##  [1] 8.434002 1.673186 2.903345 8.621475 8.962932 6.403370 4.129534
##  [8] 8.900143 4.838132 6.870831

An important feature of the raster package is the option to work inMemory or not:

inMemory(r_example)
## [1] TRUE

This defines the data itself is stored in the working memory of your computer. For manually created RasterLayers, the data will be in memory. However, when accessing data from files, raster support data access without actually loading all the data in memory.

REMEMBER:

A RasterLayer objects can be created from scratch, a file, an Extent object, a matrix, an ‘image’ object, or from a Raster*, Spatial***,… object

A useful functionality is the conversion between XYZ columns spatial data representation and a raster representation for points on a regular grid:

xyz_example <- rasterToPoints(r_example)
head(xyz_example)
##          x      y    layer
## [1,] 4.445 51.375 8.434002
## [2,] 4.495 51.375 1.673186
## [3,] 4.545 51.375 2.903345
## [4,] 4.595 51.375 8.621475
## [5,] 4.645 51.375 8.962932
## [6,] 4.695 51.375 6.403370
plot(rasterFromXYZ(xyz_example))

However, in many occasions these XYZ combination are not provided on a regular grid. See the rasterize function for points that are not on a regular grid. An example of points that are not on a regular grid:

longitude <- c(4.4543, 5.02789, 4.94202, 4.49238, 4.49054, 4.54044, 5.95192, 
               4.49496, 4.4958, 5.68327, 4.49054, 4.49054, 4.4958, 4.48938)
latitude <- c(51.33847, 51.24824, 51.24325, 51.26218, 51.25701, 51.27518, 
              51.30803, 51.25803, 51.26106, 51.04185, 51.25701, 51.25701, 
              51.26106, 51.25955)
lonlat <- cbind(longitude, latitude)
head(lonlat)
##      longitude latitude
## [1,]   4.45430 51.33847
## [2,]   5.02789 51.24824
## [3,]   4.94202 51.24325
## [4,]   4.49238 51.26218
## [5,]   4.49054 51.25701
## [6,]   4.54044 51.27518

Exercise:

Check the documentation of the rasterize command provided by the raster package and convert the lonlat set of irregular points to a regular grid with a resolution of 0.1 degrees and an appropriate spatial extent (e.g. Belgium). Make sure that the resulting raster values represent the number of points per grid cell. Make a plot of the resulting raster.

The output should look like:

3.2 RasterStack and RasterBrick

Single layer raster objects are very common. Still, a collection of rasters with the same spatial extent and resolution is a common case when doing spatial analysis (combining rasters with representing each a variable). In fact, a RasterStack/RasterBrick is a collection of RasterLayer objects with the same spatial extent and resolution.

The main difference between RasterStack and RasterBrick is that a RasterStack is a loose collection of RasterLayer objects that can refer to different files (but must all have the same extent and resolution), whereas a RasterBrick can only point to a single file. A typical example of a RasterBrick object is the representation of a multi-band satellite image.

As an example, create a dummy RasterStack with 3 alternatives of the r_example bundled together:

s_example <- stack(r_example, r_example**2, r_example/2.)
plot(s_example)

When checking the properties

s_example
## class       : RasterStack 
## dimensions  : 10, 20, 200, 3  (nrow, ncol, ncell, nlayers)
## resolution  : 0.05, 0.05  (x, y)
## extent      : 4.42, 5.42, 50.9, 51.4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## names       :   layer.1,   layer.2,   layer.3 
## min values  : 1.0030703, 1.0061501, 0.5015352 
## max values  :  9.942678, 98.856836,  4.971339

Make sure you notice the additional dimension in the description (nlayers).

3.3 Projection

To transform a RasterLayer to another CRS you can use the function projectRaster. However, whereas the transformation of coordinates (vector formats) is rather trivial, the re-projection of a raster representation can result in a loss of precision and estimates for the values of new cells must be made based on the values in the old cells. If the values are class data, the nearest neighbor is commonly used. Otherwise some sort of interpolation (e.g. bilinear) is required.

An introduction to the CRS class definition to specify the coordinate reference system is provided in the 07-gis-r-vectors.Rmd notebook. The definition of the CRS object is completely similar:

crs_lambert <- CRS("+init=epsg:31370") 

Using the CRS, the re-projection with the projectRaster:

r_reproject <- projectRaster(r_example, crs = crs_lambert)
r_reproject
## class       : RasterLayer 
## dimensions  : 12, 24, 288  (nrow, ncol, ncell)
## resolution  : 3500, 5560  (x, y)
## extent      : 146566.1, 230566.1, 171613.3, 238333.3  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs 
## data source : in memory
## names       : layer 
## values      : 0.9594252, 9.641255  (min, max)

However, using the default parameters, we do not have control on the spatial resolution (dimensions) of the resulting reprojected raster. Setting the resolution provides more control:

r_reproject <- projectRaster(r_example, crs = crs_lambert, 
                             res = 5000)
r_reproject
## class       : RasterLayer 
## dimensions  : 13, 16, 208  (nrow, ncol, ncell)
## resolution  : 5000, 5000  (x, y)
## extent      : 148566.1, 228566.1, 172773.3, 237773.3  (xmin, xmax, ymin, ymax)
## coord. ref. : +init=epsg:31370 +proj=lcc +lat_1=51.16666723333333 +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747 +units=m +no_defs 
## data source : in memory
## names       : layer 
## values      : 1.346314, 9.209351  (min, max)

But is generally advised to project a raster to another raster object. By providing an existing RasterLayer object, your newly projected data perfectly aligns with it.

# Create a base_layer to project data to
base_raster <- raster(ncol = 20, nrow = 10, 
                      xmn = 153566.1, xmx = 223869.7, 
                      ymn = 176630, ymx = 232773.3)
crs(base_raster) <- crs_lambert
# reproject to the new raster
r_reproject <- projectRaster(r_example, base_raster)
plot(r_reproject)

Notice:

The function projectExtent is a great utility to only project the extent of the data and retrieve similar boundaries in the new CRS.

4 Reading data: rgdal or raster

First unzip the .tif example data in the data file to the scratch folder:

unzip('../data/NE1_50m_SR.zip', exdir = "../scratch")

Note that in most cases where real data is analyzed, these raster* objects are created from a file. The functionality to read raster data is provided by the readGDAL function of the rgdal package:

r_data = readGDAL("../scratch/NE1_50M_SR/NE1_50M_SR.tif")
## ../scratch/NE1_50M_SR/NE1_50M_SR.tif has GDAL driver GTiff 
## and has 5400 rows and 10800 columns
class(r_data)
## [1] "SpatialGridDataFrame"
## attr(,"package")
## [1] "sp"

which results in a SpatialGridDataFrame.

Actually, the set of data formats you can interact with, does not depend on R, but is dependent on your GDAL installation. To check if the installation supports a specific raster data format, you can get an overview of them by the gdalDrivers() command:

head(gdalDrivers()) # Just the first 6 records are shown here

Notice:

Most formats supported for reading can also be written to. This is supported by both writeGDAL (rgdal package) and writeRaster (raster package). Furthermore, for large rasters, the writeValues function is useful as well, as it supports writing the data in chunks.

Still, the raster command of the raster package can also directly read raster files in several formats, also relying on the rgdal package (and the underlying GDAL driver) behind the scenes.

Let’s check the space this SpatialGridDataFrame requires in memory:

print(object.size(r_data) , units = "auto")
## 667.4 Mb

and converting to a RasterLayer with the raster command:

r_data <- raster(r_data)
class(r_data)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

Let’s check again the space this requires in memory after conversion to a raster object:

print(object.size(r_data) , units = "auto")
## 222.5 Mb

Hence, for this type of raster GIS data, the raster representation is less memory demanding. Moreover, when reading these files directly with the raster command, the package will not load the data values in memory, while extracting the spatial information (extent, crs, dimensions):

r_data_raster <- raster("../scratch/NE1_50M_SR/NE1_50M_SR.tif")
r_data_raster
## class       : RasterLayer 
## band        : 1  (of  3  bands)
## dimensions  : 5400, 10800, 58320000  (nrow, ncol, ncell)
## resolution  : 0.03333333, 0.03333333  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : C:\Users\stijn_vanhoey\Documents\GitHub\course_gis_scripting\scratch\NE1_50M_SR\NE1_50M_SR.tif 
## names       : NE1_50M_SR 
## values      : 0, 255  (min, max)
inMemory(r_data_raster)
## [1] FALSE

In the case of multi-layer files, the raster function reads by default the first layer only. For multi-layer objects (e.g. multi-layer GeoTIFF), the brick or stack functions can be used to read data from file:

landstat_example <- brick('../data/LE71700552001036SGS00_SR_Gewata_INT1U.tif')
landstat_example
## class       : RasterBrick 
## dimensions  : 593, 653, 387229, 6  (nrow, ncol, ncell, nlayers)
## resolution  : 30, 30  (x, y)
## extent      : 829455, 849045, 825405, 843195  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs 
## data source : C:\Users\stijn_vanhoey\Documents\GitHub\course_gis_scripting\data\LE71700552001036SGS00_SR_Gewata_INT1U.tif 
## names       : LE7170055//ta_INT1U.1, LE7170055//ta_INT1U.2, LE7170055//ta_INT1U.3, LE7170055//ta_INT1U.4, LE7170055//ta_INT1U.5, LE7170055//ta_INT1U.6 
## min values  :                     4,                     6,                     3,                    18,                     6,                     2 
## max values  :                    39,                    56,                    71,                   102,                   138,                   408
plot(landstat_example)

(Overview about the landstat bands/layers wavelengths is given here )

REMEMBER:

For reading in raster file data, try with the raster command first. If this is not successful, check if the readGDAL function can read it.

Exercise:

In the data folder, an Arc/Info Binary Grid is provided, called grnt_bodem. read in the data with the variable-name grnt_bodem and make a plot using the bpy.colors color scale.

(In the case your GDAL driver does not support Arc/Info Binary Grid, read in the grote_nete_bodem.tif file in the data folder)

The output should look like:

5 Raster data manipulation

5.1 Raster algebra

Many generic functions that allow for simple and elegant raster algebra have been implemented for Raster* objects, including the normal algebraic operators such as {}, logical operators such as >, >=, <, ==, ! and functions like abs, round, ceiling, floor, trunc, sqrt, log, log10, exp, cos, sin, atan, tan, max, min, range, prod, sum, any, all. In these functions you can mix raster objects with numbers, as long as the first argument is a raster object.

plot(sqrt(r_example + 10)/2, 
     col = heat.colors(20))

Again, also boolean indexing (conditional replacement) is provided, which supports the reclassification of a raster data set:

r_example[r_example >= 3]  <- 3.
r_example[r_example < 3]  <- 0.
plot(r_example)

Notice:

The reclassify function provides the same functionality, but as a higher level function. For larger reclassify operations (many classes), the application of reclassify is worthwhile to use.

Exercise:

In the data folder, a text file systemtable_example.txt is available containing the information of a reclassification of the grnt_bodem variable. Read the text-file as a data.frame and save the content as variable class_table. Use the class_table to reclassify the grnt_bodem raster. Plot the output with any colors you would like to use.

The output could look like (depends on your color preferences):

To decide about the color you could use for maps, the colorbrewer website is a great start. You can copy/paste the color codes from the website itself, but the package scales provides the brewer_pal function, providing direct load of the colors as a colormap:

plot(grnt_bodem_reclassified, breaks = c(0, 1, 2, 3), 
     col = scales::brewer_pal(palette = "Greens")(3))

To perform calculations between the individual layers of a RasterBrick or RasterStack, the expression should refer to the individual layers of the object. Referring to individual layers in a RasterBrick or RasterStack object is done by using double square brackets [[]]. As an example, the calculation of the NDVI:

\[NDVI = \frac{NIR - Red}{NIR + Red}\]

ndvi <- (landstat_example[[4]] - landstat_example[[3]]) / (landstat_example[[4]] + landstat_example[[3]])
plot(ndvi)

Although this is a quick way to perform the calculation, directly adding, subtracting, multiplying, etc, the layers of big raster objects is not recommended. When working with big objects, it is advisable to use the calc() function to perform these types of calculations. The reason is that R needs to load all the data first into its internal memory before performing the calculation and then runs everything in one block. It is really easy to run out of memory when doing that. A big advantage of the calc() function is that it has a built-in block processing option for any vectorized function, allowing such calculations to be fully “RAM friendly”. The example below illustrates how to calculate NDVI from the same date set using the calc() function.

## Define the function to calculate NDVI from 
ndvi.calc <- function(x) {
    ndvi <- (x[[4]] - x[[3]]) / (x[[4]] + x[[3]])
    return(ndvi)
}
ndvi2 <- calc(x = landstat_example, fun = ndvi.calc)
plot(ndvi2)

5.2 High-level functions

Several high level functions (i.e. typical GIS software functions) have been implemented for RasterLayer objects. Examples are contour(), focal() (moving window), clump() (detect patches of connected cells), zonal() (zonal statistics), terrain() (slope, aspect and other terrain characteristics from dem), (dis)aggregate (change resolution)… See the help files for more detailed descriptions of each function. (also calc is actually a high level function).

REMEMBER:

These functions work equally well for raster data sets that cannot be loaded into memory!

As an example, converting the grnt_bodem raster to a binary mask with 0/1 values can be achieved by the clump function:

plot(clump(grnt_bodem), col = c("darkgreen"))

Another GIS function worthwhile to mention, is the clipping if a raster with a vector layer, which is provided by the function mask:

library(rgdal)
deelbekkens <- readOGR("../data/deelbekkens/Deelbekken.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\stijn_vanhoey\Documents\GitHub\course_gis_scripting\data\deelbekkens\Deelbekken.shp", layer: "Deelbekken"
## with 102 features
## It has 8 fields
## Integer64 fields read as strings:  UIDN OIDN
crs_wgs84 <- CRS("+init=epsg:4326") 
deelbekkens_wgs84 <- spTransform(deelbekkens, crs_wgs84)
r_data_crop <- crop(r_data, deelbekkens_wgs84)
r_data_mask <- mask(r_data_crop, deelbekkens_wgs84)
plot(r_data_mask)

Notice:

The gdalwarp functionality to crop a vector file can also be executed directly with GDAL from the command line. As such, similar to the subprocess trick in Python (05-the-power-of-gdal.ipynb) to run a GDAL command within Python as if it would be on the command line, this can be achieved by using system() (instead of subprocess).

6 Mapping raster data

Actually, the plot function provided by the raster package (find specific help by executing ?raster::plot in the console) provides a lot of functionality. Also other plot functions from the system R will work on the Raster* objects:

hist(r_example)

For multi-band files, the plotRGB provides the option to plot false color composites:

plotRGB(landstat_example, r = 5, g = 4, b = 3)

Notice:

Another interesting package, is the rasterVis package. The package provides a set of methods for enhanced visualization and interaction. Specifically the space-time plots are well designed.