KrigR Workshop - Introduction to the R Package

Climate Data for your Study Needs

Recording Availability

A recording of me presenting much of the contents herein can be found on YouTube.

Setting Things Up

Installing KrigR

First of all, we need to install KrigR. Since we are currently preparing the package for submission to CRAN, it is only available through the associated GitHub repository (https://github.com/ErikKusch/KrigR) for the time being. There, you may also find presentations as well as this guide/workshop.

Here, we use the devtools package within R to get access to the install_github() function. For this to run, we need to tell R to not stop the installation from GitHub as soon as a warning is produced. This would stop the installation process as early as one of the KrigR dependencies hits a warning as benign as “Package XYZ was built under R Version X.X.X” which can usually be ignored safely.

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
devtools::install_github("https://github.com/ErikKusch/KrigR")
library(KrigR)

For the sake of this tutorial, we need some extra packages for visualisations. We check if they are already installed, subsequently install (if necessary) and load them with this user-defined function:

install.load.package <- function(x) {
  if (!require(x, character.only = TRUE)) {
    install.packages(x, repos = "http://cran.us.r-project.org")
  }
  require(x, character.only = TRUE)
}
package_vec <- c(
  "tidyr", # for turning rasters into ggplot-dataframes
  "ggplot2", # for plotting
  "viridis", # colour palettes
  "cowplot", # gridding multiple plots
  "rosm", # obtaining satellite maps
  "gimms" # to get some pre-existing data to match in our downscaling
)
sapply(package_vec, install.load.package)
##   tidyr ggplot2 viridis cowplot    rosm   gimms 
##    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE

Before we can proceed, you need to open up an account at the CDS and create an API key by following this link: https://cds.climate.copernicus.eu/api-how-to. This is required so that you may issue download requests through the KrigR package. Once you have done so, please register the user ID and API Key as characters as seen below:

API_User <- "XXXXXX"
API_Key <- "XXXXXXX"

Setting Up Directories

The tutorial is designed to run completely from scratch. For this to work in a structured way, we create a folder/directory structure so that we got some nice compartments on our hard drives for where each separate Kriging process is run. We create the following directories:
- A data directory for all of our individual Kriging processes
- A shapefile directory (located within our data directory) for all of the shapefiles that we will use

Dir.Base <- getwd() # identifying the current directory
Dir.Data <- file.path(Dir.Base, "Data") # folder path for data
Dir.Shapes <- file.path(Dir.Data, "Shapes") # folder path for shapefiles
Dirs <- sapply(c(Dir.Data, Dir.Shapes), function(x) if (!dir.exists(x)) dir.create(x))

Downloading Shapefiles

Here, we download some of the most commonly used shapefiles (for our analyses using KrigR so far, at least). For repeat code-sourcing, we have implemented checks which only trigger the download of a given shapefile set if the data in question is not present in our Shapes directory yet.

This tutorial doesn’t make use of all the shapefiles we download here. We simply thought it prudent to show you what we have found to be useful and how to get your hands on the data in a reproducible way.

Land Cover

As we will see in this tutorial, masking out unnecessary areas from our analyses speeds up Kriging tremendously. Often, this will entail limiting data sets to terrestrial habitats. This land mask here does a terrific job at masking out non-land areas.

#### LAND MASK (for masking covariates according to land vs. sea)
if (!file.exists(file.path(Dir.Shapes, "LandMask.zip"))) { # if not downloaded yet
  download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_land.zip",
    destfile = file.path(Dir.Shapes, "LandMask.zip")
  ) # download cultural vector
  unzip(file.path(Dir.Shapes, "LandMask.zip"), exdir = Dir.Shapes) # unzip data
}
LandMask <- readOGR(Dir.Shapes, "ne_10m_land", verbose = FALSE) # read
ggplot() +
  geom_polygon(data = LandMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() +
  labs(x = "Longitude", y = "Latitude")

States & Provinces

Political divisions are often what policy makers are after. Hence, we also download a shapefile for states and provinces across the Globe.

#### STATE MASK (for within-nation borders)
if (!file.exists(file.path(Dir.Shapes, "StateMask.zip"))) { # if not downloaded yet
  download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip",
    destfile = file.path(Dir.Shapes, "StateMask.zip")
  ) # download cultural vector
  unzip(file.path(Dir.Shapes, "StateMask.zip"), exdir = Dir.Shapes) # unzip data
}
StateMask <- readOGR(Dir.Shapes, "ne_10m_admin_1_states_provinces", verbose = FALSE) # read
ggplot() +
  geom_polygon(data = StateMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() +
  labs(x = "Longitude", y = "Latitude")

As you can see, this shapefile provides a set of rather small (geographically speaking) regions. We will use these for showing how Kriging works within this tutorial because Kriging runs in a timely manner for these regions.

Additionally, it is worth pointing out that www.naturalearthdata.com offers a large host of further shapefile data sets including (but not limited to):
- National borders
- Rivers and Lakes
- Urban areas
- Reefs and Bathymetry

We strongly recommend looking there early on in your projects for any shapefile needs you may have.

Ecoregions

When we’re not bound by political boundaries, ecoregions can often help to limit the spatial scope of our macroecological studies to manageable sizes (as far as Kriging effort is concerned).

#### ECOREGIONS (for ecoregional borders)
if (!file.exists(file.path(Dir.Shapes, "WWF_ecoregions"))) { # if not downloaded yet
  download.file("http://assets.worldwildlife.org/publications/15/files/original/official_teow.zip",
    destfile = file.path(Dir.Shapes, "wwf_ecoregions.zip")
  ) # download regions
  unzip(file.path(Dir.Shapes, "wwf_ecoregions.zip"), exdir = file.path(Dir.Shapes, "WWF_ecoregions")) # unzip data
}
EcoregionMask <- readOGR(file.path(Dir.Shapes, "WWF_ecoregions", "official", "wwf_terr_ecos.shp"), verbose = FALSE) # read
ggplot() +
  geom_polygon(data = EcoregionMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() +
  labs(x = "Longitude", y = "Latitude")

This particular data set offers the shapes of biomes and biogeographical realms across the Earth. As far as Kriging with the KrigR package is concerned, we heavily advise against Kriging using biogeographical realms without further consideration since these large regions lead to extreme processing requirements.

Plotting Functions

In order to easily visualise our Kriging procedure including (1) inputs, (2) covariates, and (3) outputs without repeating too much of the same code, we create a few plotting functions of our own here.

Don’t worry about understanding how these work off the bat here. Kriging and the package KrigR are what we want to demonstrate here - not visualisation strategies.

Raw Data

This function plots the raw data that we will krig and exports a single plot of all layers of the input raster:

Plot_Raw <- function(Raw, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
  Raw_df <- as.data.frame(Raw, xy = TRUE) # turn raster into dataframe
  colnames(Raw_df)[c(-1, -2)] <- Dates # set colnames
  Raw_df <- gather(data = Raw_df, key = Values, value = "value", colnames(Raw_df)[c(-1, -2)]) #  make ggplot-ready
  Raw_plot <- ggplot() + # create a plot
    geom_raster(data = Raw_df, aes(x = x, y = y, fill = value)) + # plot the raw data
    facet_wrap(~Values) + # split raster layers up
    theme_bw() +
    labs(x = "Longitude", y = "Latitude") + # make plot more readable
    scale_fill_gradientn(name = Legend, colours = inferno(100)) # add colour and legend
  if (!is.null(Shp)) { # if a shape has been designated
    Raw_plot <- Raw_plot + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
  }
  return(Raw_plot)
} # export the plot

Covariates

This function plots the covariate data we will be using by showing each variable on both resolution levels side-by-side:

Plot_Covs <- function(Covs, Shp = NULL) {
  Plots_ls <- as.list(rep(NA, nlayers(Covs[[1]]) * 2)) # create as many plots as there are covariates variables * 2
  Counter <- 1 # counter for list position
  for (Variable in 1:nlayers(Covs[[1]])) { # loop over all covariate variables
    Covs_Iter <- list(Covs[[1]][[Variable]], Covs[[2]][[Variable]]) # extract the data for this variable
    for (Plot in 1:2) { # loop over both resolutions for the current variable
      Cov_df <- as.data.frame(Covs_Iter[[Plot]], xy = TRUE) # turn raster into data frame
      colnames(Cov_df)[3] <- "Values" # assign a column name to the data column
      Plots_ls[[Counter]] <- ggplot() + # create plot
        geom_raster(data = Cov_df, aes(x = x, y = y, fill = Values)) + # plot the covariate data
        theme_bw() +
        labs(x = "Longitude", y = "Latitude") + # make plot more readable
        scale_fill_gradientn(name = names(Covs_Iter[[Plot]]), colours = cividis(100)) + # add colour and legend
        theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
      if (!is.null(Shp)) { # if a shape has been designated
        Plots_ls[[Counter]] <- Plots_ls[[Counter]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
      }
      Counter <- Counter + 1 # raise list counter
    } # end of resolution loop
  } # end of variable loop
  ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 2, labels = "AUTO") # fuse the plots into one big plot
  return(ggPlot)
} # export the plot

Kriged Products

This function plots the Kriging outputs by splitting them up according to whether they are Kriging predictions or the uncertainties associated with them:

Plot_Krigs <- function(Krigs, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
  Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
  Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
  Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
  for (Plot in 1:2) { # loop over both output types
    Krig_df <- as.data.frame(Krigs[[Plot]], xy = TRUE) # turn raster into dataframe
    colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
    Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
    Plots_ls[[Plot]] <- ggplot() + # create plot
      geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
      facet_wrap(~Values) + # split raster layers up
      theme_bw() +
      labs(x = "Longitude", y = "Latitude") + # make plot more readable
      scale_fill_gradientn(name = Legend, colours = Colours_ls[[Plot]]) + # add colour and legend
      theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
    if (!is.null(Shp)) { # if a shape has been designated
      Plots_ls[[Plot]] <- Plots_ls[[Plot]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
    }
  } # end of type-loop
  ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 1, labels = "AUTO") # fuse the plots into one big plot
  return(ggPlot)
} # export the plot

Kriging - The Three Steps

Using KrigR in this way has us use the functions download_ERA(), download_DEM(), and krigR(). Running these functions by themselves as opposed to executing the KrigR pipeline (which we cover later in this tutorial) gives you the most control and oversight of the KrigR workflow.

We will start with a simple Kriging process and subsequently make it more sophisticated during this tutorial.

The most simple way in which you can run the functions of the KrigR package is by specifying a rectangular bounding box (i.e. an extent) to specify your study region(s). Here, we will run a small downscaling exercise of my birth-state of Saxony. It is a medium-sized state in the east of Germany and lends itself to some nice statistical downscaling since it presents us with mountainous regions along the south-eastern border and flatland areas towards the north-western edges.

Here’s the full area we will be obtaining and downscaling data for:

Extent <- extent(c(11.8, 15.1, 50.1, 51.7)) # roughly the extent of Saxony
bmaps.plot(bbox = Extent, type = "AerialWithLabels", quiet = TRUE, progress = "none")

Here, you can see a map of the region with the mountains of the Erzgebirge along the southern border to the Czech Republic as well as the flat terrains around Leipzig in the north of Saxony.

Now, let’s create a separate directory within which we will store the raw ERA5-land data, GMTED2010 DEM covariate data, and Kriging outputs:

Dir.StateExt <- file.path(Dir.Data, "State_Extent")
dir.create(Dir.StateExt)

Climate Data

No we are ready to carry out our first download of climate reanalysis data through the KrigR package!

For this part of the tutorial, we download air temperature for a three-day interval around my birthday (03-01-1995) using the extent of my home-state of Saxony.

Notice that the downloading of ERA-family reanalysis data may take a short while to start as the download request gets queued with the CDS of the ECMWF before it is executed.

State_Raw <- download_ERA(
  Variable = "2m_temperature",
  DataSet = "era5-land",
  DateStart = "1995-01-02",
  DateStop = "1995-01-04",
  TResolution = "day",
  TStep = 1,
  Extent = Extent,
  Dir = Dir.StateExt,
  API_User = API_User,
  API_Key = API_Key
)
## [1] "donwload_ERA() is starting. Depending on your specifications, this can take a significant time."
## [1] "Staging 1 downloads sequentially."
## [1] "0001_2m_temperature_1995-01-02_1995-01-04_day.nc download queried"
## 
  |                                                                                                                                                                                                    
  |                                                                                                                                                                                              |   0%
  |                                                                                                                                                                                                    
  |===============                                                                                                                                                                               |   8%
  |                                                                                                                                                                                                    
  |==============================                                                                                                                                                                |  16%
  |                                                                                                                                                                                                    
  |==============================================                                                                                                                                                |  24%
  |                                                                                                                                                                                                    
  |==============================================================                                                                                                                                |  32%
  |                                                                                                                                                                                                    
  |=============================================================================                                                                                                                 |  41%
  |                                                                                                                                                                                                    
  |=============================================================================================                                                                                                 |  49%
  |                                                                                                                                                                                                    
  |============================================================================================================                                                                                  |  57%
  |                                                                                                                                                                                                    
  |============================================================================================================================                                                                  |  65%
  |                                                                                                                                                                                                    
  |=======================================================================================================================================                                                       |  71%
  |                                                                                                                                                                                                    
  |=======================================================================================================================================================                                       |  79%
  |                                                                                                                                                                                                    
  |======================================================================================================================================================================                        |  88%
  |                                                                                                                                                                                                    
  |======================================================================================================================================================================================        |  96%
  |                                                                                                                                                                                                    
  |==============================================================================================================================================================================================| 100%
## [1] "Checking for known data issues."
## [1] "Loading downloaded data for masking and aggregation."
## [1] "Aggregating to temporal resolution of choice"
Plot_Raw(State_Raw, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

As you can see the download_ERA() function updates you on what it is currently working on at each major step. I implemented this to make sure people don’t get too anxious staring at an empty console in R. If this feature is not appealing to you, you can turn this progress tracking off by setting verbose = FALSE in the function call to download_ERA(). For the rest of this workshop, I supress all output from download_ERA() via other means so that when you execute the code of this material by yourself, you get progress tracking.

Now, let’s look at the raster that was produced:

State_Raw
## class      : RasterBrick 
## dimensions : 19, 36, 684, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.1, 0.09999996  (x, y)
## extent     : 11.65, 15.25, 49.95, 51.85  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      :  index_1,  index_2,  index_3 
## min values : 267.8741, 266.8182, 262.5400 
## max values : 273.2644, 272.0682, 269.5377

As you can see, we obtained a RasterBrick object with 3 layers of data (one for each day we are interested in). Notice that extent of our downloaded data set does not fit the extent we set earlier manually. This is a precaution we have taken within KrigR to make sure that all data cells you are interested in are covered.

KrigR widens the spatial extent that is specified to ensure full coverage of the respective ERA5(-Land) raster cells. Global downloads are not affected by this and work just as you’d expect.

Additionally, as could be gleamed from the plots above, it is apparent that my home-state got a lot cooler on the day after my birth (04/01/1995) when compared to the two preceding days. What to make of this, I leave up to you because I don’t think I like the interpretation myself if we were to assume causality here.

Keep in mind that every function within the KrigR package produces NetCDF (.nc) files in the specified directory (Dir argument in the function call) to allow for further manipulation outside of R if necessary (for example, using Panoply).

Covariates

Next, we use the download_DEM() function which comes with KrigR to obtain elevation data as our covariate of choice. This produces two rasters:

  1. A raster of training resolution which matches the input data in all attributes except for the data in each cell
  2. A raster of target resolution which matches the input data as closely as possible in all attributes except for the resolution (which is specified by the user) and the data in each cell

Both of these products are bundled into a list where the first element corresponds to the training resolution and the second element contains the target resolution covariate data. Here, we specify a target resolution of .02.

Covs_ls <- download_DEM(
  Train_ras = State_Raw,
  Target_res = .02,
  Dir = Dir.StateExt,
  Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls)

Alternatively to specifying a target resolution, you can specify a different raster which should be matched in all attributes by the raster at target resolution. We get to this again at a later point in this tutorial.

For now, let’s simply inspect our list of covariate rasters:

Covs_ls
## [[1]]
## class      : RasterLayer 
## dimensions : 19, 36, 684  (nrow, ncol, ncell)
## resolution : 0.1, 0.09999996  (x, y)
## extent     : 11.65, 15.25, 49.95, 51.85  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : DEM 
## values     : 59.11391, 926.513  (min, max)
## 
## 
## [[2]]
## class      : RasterLayer 
## dimensions : 114, 216, 24624  (nrow, ncol, ncell)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.64986, 15.24986, 49.94986, 51.84986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : DEM 
## values     : 46.5, 1150.25  (min, max)

This set of covariates has 684 and 24624 cells containing values at training and target resolution, respectively.

Kriging

Now let’s krig these data:

KrigStart <- Sys.time()
State_Krig <- krigR(
  Data = State_Raw, # data we want to krig as a raster object
  Covariates_coarse = Covs_ls[[1]], # training covariate as a raster object
  Covariates_fine = Covs_ls[[2]], # target covariate as a raster object
  Keep_Temporary = FALSE, # we don't want to retain the individually kriged layers on our hard-drive
  Cores = 1, # we want to krig on just one core
  FileName = "State_Ext.nc", # the file name for our full kriging output
  Dir = Dir.StateExt # which directory to save our final input in
)
KrigStop <- Sys.time()
Plot_Krigs(State_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

There we go. All the data has been downscaled and we do have uncertainties recorded for all of our outputs. As you can see, the elevation patterns show up clearly in our kriged air temperature output. Furthermore, you can see that our certainty of Kriging predictions drops on the 04/01/1995 in comparison to the two preceding days. However, do keep in mind that a maximum standard error of 0.195, 0.206, 0.333 (for each layer of our output respectively) on a total range of data of 6.553, 6.318, 7.788 (again, for each layer in the output respectively) is evident of a downscaling result we can be confident in.

Now, what does the output actually look like?

State_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here
## $Kriging_Output
## class      : RasterBrick 
## dimensions : 114, 216, 24624, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.64986, 15.24986, 49.94986, 51.84986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : var1.pred.1, var1.pred.2, var1.pred.3 
## min values :    266.8146,    265.6941,    261.6227 
## max values :    273.3675,    272.0124,    269.4109 
## 
## 
## $Kriging_SE
## class      : RasterBrick 
## dimensions : 114, 216, 24624, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.64986, 15.24986, 49.94986, 51.84986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : var1.stdev.1, var1.stdev.2, var1.stdev.3 
## min values :    0.1624252,    0.1706871,    0.2581532 
## max values :    0.1951452,    0.2059939,    0.3330932

As output of the krigR() function, we obtain a list of downscaled data as the first element and downscaling standard errors as the second list element.

Finally, let’s see how long it took to carry out this shapefile-informed downscaling:

BaseTime <- KrigStop - KrigStart
BaseTime
## Time difference of 50.11185 secs

Kriging - The Pipeline

Now that we’ve seen how you can execute the KrigR workflow using three separate functions, it is time that we show you the most simplified function call to obtain some downscaled products: the pipeline.

First, let’s create a directory which the KrigR pipeline will leak .nc files to. Don’t worry, this leak in the pipeline is very much intentional and of no environmental concern (unless you are running low on disk space).

Dir.StatePipe <- file.path(Dir.Data, "State_Pipe")
dir.create(Dir.StatePipe)

We have coded the krigR() function in such a way that it can either be addressed at already present spatial products within your R environment, or handle all the downloading and resampling of input data and covariates for you from scratch. To run the exact same Kriging approach as within our extent-example, we can specify the krigR() function as such:

Pipe_Krig <- krigR(
  Variable = "2m_temperature",
  Type = "reanalysis",
  DataSet = "era5-land",
  DateStart = "1995-01-02",
  DateStop = "1995-01-04",
  TResolution = "day",
  TStep = 1,
  Extent = Extent,
  API_User = API_User,
  API_Key = API_Key,
  Target_res = .02,
  Cores = 1,
  FileName = "State_Pipe.nc",
  Dir = Dir.StatePipe,
  Keep_Temporary = FALSE
)
Pipe_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here
Plot_Krigs(Pipe_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

Let’s just check how this compares to non-pipeline product:

all.equal(State_Krig[[1]], Pipe_Krig[[1]])
## [1] TRUE

Surprise! There is no difference.

Getting More out of KrigR

Kriging By Shapefile

The computational cost of Kriging depends on many factors, one of which is number of observation (i.e. raster cells), which we can limit by using a shapefile.

Let’s revisit our above Saxony-Kriging example, but this time, we will use a shapefile to mask for the exact boundaries of the state itself! We extract the shapefile from the states & provinces data set as follows:

Position <- which(StateMask$name_en == "Saxony") # find Saxony
State_Shp <- StateMask[Position,]
bmaps.plot(bbox = extent(State_Shp), type = "AerialWithLabels", quiet = TRUE, progress = "none", stoponlargerequest = FALSE)

Here, you can see a map of the region with. Notice the outline of the state of Saxony; we will reconstruct that using our shapefile.

Again, we create a directory for all the .nc files that our functions create:

Dir.StateShp <- file.path(Dir.Data, "State_Shape")
dir.create(Dir.StateShp)

Climate Data

Again, we download air temperature data from the ERA5-Land data set for the three days from the second to the fourth of January 1995. This time, however, we limit the data by shapefile.

State_Raw <- download_ERA(
  Variable = "2m_temperature",
  DataSet = "era5-land",
  DateStart = "1995-01-02",
  DateStop = "1995-01-04",
  TResolution = "day",
  TStep = 1,
  Extent = State_Shp,
  Dir = Dir.StateShp,
  API_User = API_User,
  API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

Note that we have implemented masking by shapefile in such a way that every cell that is even partially covered by your shapefile is retained in the downloaded ERA5(-Land) data set. Regular masking via the raster package-contained mask() function only retains those cells whose centroids fall within the shapefile boundaries.

Covariates

Again, we obtain elevation data as our covariate of choice:

Covs_ls <- download_DEM(
  Train_ras = State_Raw,
  Target_res = .02,
  Shape = State_Shp,
  Dir = Dir.StateShp,
  Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls, State_Shp)

Notice that despite the covariate rasters (and input rasters, for that matter) containing 578 and 2.080810^{4} for training and target resolution respectively, we only obtain data for 299 and 8765 cells respectively due to our specification of a shapefile.

Kriging

Again, we execute our Kriging functionality using the krigR() function:

KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
  Data = State_Raw, # what to krig
  Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
  Covariates_fine = Covs_ls[[2]], # covariates at target resolution
  Keep_Temporary = FALSE, # delete temporary krigs of layers
  Cores = 1, # only run this on 1 core
  FileName = "State_Shape.nc", # save the finished file as this
  Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

The results look just like the ones that we obtained without using the shapefile (this time, we only see what’s contained in the shapefile, though). It is noteworthy to highlight the change in uncertainties here. Overall, our uncertainty decreased in going from extent-informed to shapefile-informed kriging in this example. However, we now see some serious chessboard pattern of the uncertainties in our outputs. This is due to the uncertainty decreasing as fine-resolution cells are more closely located towards the centre of a coarse-scaled cell which they have been downscaled from. This behaviour is normal. To obtain more useful kriging standard errors (i.e. uncertainties), we use local Kriging (we get to later in this tutorial).

Let’s see how long this took:

ShapeTime <- KrigStop - KrigStart
ShapeTime
## Time difference of 3.879688 secs

Running this downscaling process for Saxony using the shapefile took 0.0774206 of the original downscaling time using an extent statement.

Using a shapefile rather than an extent statement speeds up the Kriging process!

Kriging For Point-Data

KrigR also support point-data which is usually used for plot/observation measurements.

Often times, we only require data for a certain number of geo-referenced locations which may be separated by large areas of no interest to a given analysis. For these analyses, drawing a shapefile around all locations is already a big step up from just using the extent of the area within which the locations fall. That being said, such a shapefile could still cover big areas of which are of no consideration to our analysis. We have this implemented support for point-data input in KrigR.

Let’s first look at the data for this example. Here, we will use hypothetical sightings of Bigfoot:

Bigfoot.URL <- "https://opendata.arcgis.com/datasets/9947fc49e6c44120b4a1b3133c073dbc_0.csv" # link to the data
Bigfoot.data <- read.csv(Bigfoot.URL) # red the data
Bigfoot_ext <- extent(min(Bigfoot.data$Lon), max(Bigfoot.data$Lon),
                      min(Bigfoot.data$Lat), max(Bigfoot.data$Lat))
BigfootLand <- crop(LandMask , Bigfoot_ext)
bmaps.plot(bbox = Bigfoot_ext, type = "AerialWithLabels", quiet = TRUE, progress = "none")
osm.points(x = Bigfoot.data$Lon,
           y = Bigfoot.data$Lat,
           pch=10, cex=0.3, col = "red")

As we can see, Bigfoot is mostly reported to be found in the west and east of the contiguous US with some occurrences ranging across Canada and Alaska. Quite obviously, there are large areas here, within which we do not require fine-resolution climate data to understand Bigfoot occurrence.

What does this look like with our land mask?

ggplot() + 
  geom_polygon(data = BigfootLand, aes(x = long, y = lat, group = group), colour = "red", fill = "lightgrey") +
  geom_point(data = Bigfoot.data, aes(x = Lon, y = Lat), pch = 3) +
  theme_bw() + labs(x = "Longitude", y = "Latitude")

We could use the cropped land mask above to download some data. This would already be better than going by extent of the point-data since we bypass sea-areas. However, KrigR can do better. To visualise how this works, we create a smaller subset of the Bigfoot location data here:

set.seed(42)
Bigfoot_sample <- sample(x = 1:dim(Bigfoot.data)[1], size = 60)
Bigfoot_df <- Bigfoot.data[Bigfoot_sample, ]

So what does this look like on a map?

Bigfoot_ext <- extent(min(Bigfoot_df$Lon), max(Bigfoot_df$Lon),
                      min(Bigfoot_df$Lat), max(Bigfoot_df$Lat))
BigfootLand <- crop(LandMask , Bigfoot_ext)
ggplot() + 
  geom_polygon(data = BigfootLand, aes(x = long, y = lat, group = group), colour = "red", fill = "lightgrey") +
  geom_point(data = Bigfoot_df, aes(x = Lon, y = Lat), pch = 3, cex = 4) +
  theme_bw() + labs(x = "Longitude", y = "Latitude")

We’ve obtained a nice sample showing distinct patches of Bigfoot occurrences in the west and east parts of the contiguous US. Again, the shapefile we would make from this with the landmask would cover vast areas which we would not be interested in and thus rack up processing time without a payoff. But what if we drew a tiny extent around each of our locations? KrigR can do that for you!

Before we get stuck in with this, we create a directory for our following analysis before we proceed:

Dir.Points <- file.path(Dir.Data, "Point_Data")
dir.create(Dir.Points)

Climate Data

Now, it is time to obtain climate data for our Bigfoot sightings. To do so within KrigR we make use of three arguments:

  1. Extent - Here, we provide KrigR with a data frame containing Lat and Lon coordinates with columns that are named as such.
  2. Buffer - How big a rectangular (square, to be precise) buffer to draw around each location. This is measured in centesimal degrees.
  3. ID - Column name in our Extent-object containing unique location/observation/plot identifiers.

You might already ask yourself? “What if two or more buffers overlap?” Well, these buffers are fused automatically to build a single polygon.

Here, we are after a climatology or air temperature spanning the years 1982 - 2012 in a 0.1 degree square buffer around around each of our sampled Bigfoot occurrences. Since downloading this data set takes quite a while, I have implemented a check here to make skip downloading and formatting of the data if it has already been done:

if (!file.exists(file.path(Dir.Points, "AT_Climatology.nc"))) {
  Points_Raw <- download_ERA(
    Variable = "2m_temperature",
    DataSet = "era5-land",
    DateStart = "1982-01-01",
    DateStop = "2012-12-31",
    TResolution = "year",
    TStep = 30,
    Extent = Bigfoot_df, # the point data with Lon and Lat columns
    Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
    ID = "FID_", # this is the column which holds point IDs
    API_User = API_User,
    API_Key = API_Key,
    Dir = Dir.Points,
    FileName = "AT_Climatology.nc"
  )
} else {
  Points_Raw <- raster(file.path(Dir.Points, "AT_Climatology.nc"))
}
Plot_Raw(Points_Raw, Shp = BigfootLand, Dates = c("Climatology"))

Setting the Extent argument to be a data.frame object containing Lat and Lon data, you can obtain climate reanalysis data around locations/points. This can be controlled using the Buffer and ID arguments.

Covariates

Next, we need covariates. We obtain these using the same arguments as we used in our call to download_ERA:

Covs_ls <- download_DEM(
  Train_ras = Points_Raw,
  Target_res = .02,
  Keep_Temporary = TRUE,
  Shape = Bigfoot_df, # the point data with Lon and Lat columns
  Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
  ID = "FID_", # this is the column which holds point IDs
  Dir = Dir.Points
)
Plot_Covs(Covs_ls, Shp = BigfootLand)

Kriging

Finally, we are ready to krig our data. This is done the exact same way as in any other approach. However, to be sensible, I add the use of the maxdist argument (passed on to gstat::krige) here. I set it to the same value as the Buffer argument previously so that we perform local Kriging (more on that later) by looking only at data that is no further than what we set our maxdist to:

KrigStart <- Sys.time() # record when we started
Points_Krig <- krigR(
  Data = Points_Raw,
  Covariates_coarse = Covs_ls[[1]],
  Covariates_fine = Covs_ls[[2]],
  Keep_Temporary = FALSE,
  Cores = 1,
  Dir = Dir.Points,
  FileName = "Points.nc",
  maxdist = 0.1 # only krig in areas of 0.1 around each location
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(Points_Krig, Shp = BigfootLand, Dates = c("Climatology"))

That went by quick! We can’t really see the results clearly here and will look at the in more detail in just a second. Firstly, let’s see how long this took:

PointsTime <- KrigStop - KrigStart
PointsTime
## Time difference of 7.69957 secs

From experience, Kriging for a region this large instead of just for small regions around locations would easily take upwards of 10 hours! This is a big improvement.

What did we gain by doing this? Why not just obtain the raw reanalysis data and be done with it? For this, we plot our data:

## DATA EXTRACTION
AirTemp_Krig <- raster::extract(
  x = stack(Points_Krig[-3]),
  y = Bigfoot_df[, c("Lon", "Lat")]
)
AirTemp_Coarse <- raster::extract(
  x = stack(Points_Raw),
  y = Bigfoot_df[, c("Lon", "Lat")]
)
## DATA PREPARATION
ggplot_df <- data.frame(
  Data = c(AirTemp_Coarse, AirTemp_Krig[, 1]),
  Source = rep(c("Native", "Kriged"), each = nrow(Bigfoot_df)),
  Difference = c(AirTemp_Coarse - AirTemp_Krig[, 1]),
  Uncertainty = AirTemp_Krig[, 2],
  Index = 1:nrow(Bigfoot_df)
)
## DENSITIES
Densities_gg <- ggplot(data = ggplot_df, aes(x = Data, fill = Source)) +
  geom_density(alpha = 0.42) +
  theme_bw() +
  labs(x = "Density", y = "Air Temperature")
## DIFFERENCES
Difference_gg <- ggplot(data = ggplot_df, aes(x = Index, y = Difference)) +
  geom_point(cex = 5, pch = 18) +
  geom_errorbar(aes(ymin = Difference - Uncertainty, ymax = Difference + Uncertainty)) +
  geom_abline(intercept = 0, slope = 0, col = "red", lwd = 1.5) +
  theme_bw() +
  labs(x = "Location", y = "Native vs. Kriged Temprature-Difference")
### FINAL PLOT
plot_grid(plotlist = list(Densities_gg, Difference_gg), nrow = 1, labels = "AUTO")

Looking at the density distributions we can see a higher peak of intermediate temperature values when compared to the native Era5-Land data. However, that is not a big difference. This is even more evident when looking at the differences between native and kriged data. Many of the data points therein fall close to a difference of 0K. Taking into account the kriging uncertainty, not many fall significantly outside of the 0K range. Thjat being said, there are outliers which might be very important in understanding the physiology of Bigfoot.

By obtaining climate reanalysis data around locations/points and subsequently downscaling these, we can obtain more precise local data.

Of course, you can do what we did above in just one call to the KrigR pipeline with the following specification:

krigR(
  Variable = "2m_temperature",
  DataSet = "era5-land",
  Type = "reanalysis",
  DateStart = "1982-01-01",
  DateStop = "2012-12-31",
  TResolution = "year",
  TStep = 30,
  Extent = Bigfoot_df, # the point data with Lon and Lat columns
  Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
  ID = "FID_", # this is the column which holds point IDs
  API_User = API_User,
  API_Key = API_Key,
  Target_res = .02,
  Cores = 1,
  maxdist = 0.1,
  FileName = "Points.nc",
  Keep_Temporary = FALSE
)

Multi-Core Kriging

The computational cost of Kriging can be divided up amongst multiple cores using our krigR() function and the Cores argument therein. This enables parallel computation of multi-layer rasters.

Let’s stick with our shapefile-informed downscaling of air temperature records across Saxony. First, we again obtain our data:

State_Raw <- download_ERA(
  Variable = "2m_temperature",
  DataSet = "era5-land",
  DateStart = "1995-01-02",
  DateStop = "1995-01-04",
  TResolution = "day",
  TStep = 1,
  Extent = State_Shp,
  Dir = Dir.StateShp,
  API_User = API_User,
  API_Key = API_Key
)
Covs_ls <- download_DEM(
  Train_ras = State_Raw,
  Target_res = .02,
  Shape = State_Shp,
  Dir = Dir.StateShp,
  Keep_Temporary = TRUE
)

This time, however, we make use of the Cores argument in the krigR() function to run the downscaling of our three time steps worth of data in paralle:.

KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
  Data = State_Raw, # what to krig
  Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
  Covariates_fine = Covs_ls[[2]], # covariates at target resolution
  Keep_Temporary = FALSE, # delete temporary krigs of layers
  Cores = 3, # run this on 3 core
  FileName = "State_Cores.nc", # save the finished file as this
  Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

The results look just like the ones that we obtained without using the shapefile and just one-core processing. Let’s see how long this took:

CoreTime <- KrigStop - KrigStart
CoreTime
## Time difference of 4.621171 secs

Running this downscaling process for Saxony using the shapefile took 1.1911192 of the original downscaling time using an extent statement (the effects multi-core Kriging really kick in when looking at rasters with many bands because the initial opening of clusters for parallel processing takes some time by itself.).

Using the Cores argument with a value of n (with n > 1) speeds up the Kriging process of multi-layer rasters by an order of n!

Weighted Local Kriging

By default Kriging of the krigR() function uses all cells in a spatial product to downscale individual cells of rasters. The nmax argument can circumvent this.

Let’s build further on our above example by adding the nmax argument (passed on to gstat::krige()) to our krigR() function call. This argument controls how many of the closest cells the Kriging algorithm should consider in the downscaling of individual coarse, training cells.

State_Krig <- krigR(
  Data = State_Raw, # what to krig
  Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
  Covariates_fine = Covs_ls[[2]], # covariates at target resolution
  Keep_Temporary = FALSE, # delete temporary krigs of layers
  Cores = 3, # only run this on 1 core
  FileName = "State_Local.nc", # save the finished file as this
  Dir = Dir.StateShp, # where to save the output
  nmax = 15 # how many nearest neighbours to consider in Kriging
)
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

The air temperature prediction/downscaling results look just like the ones that we obtained above. However, we seriously improved our localised understanding of Kriging uncertainties (i.e. we see much more localised patterns of Kriging standard error). In this case for Saxony, our uncertainties seem to be highest for areas where the landscape is dominated by large, abrupt changes in elevation (e.g. around the Fichtelberg along the southern border) and water-dominated areas such as streams and lakes (e.g. the lakes around Leipzig in the North of Saxony).

Using the nmax argument helps to identify highly localised patterns in the Kriging uncertainty as well as predictions!

We are working on an automatic feature for the identification of an appropriate number of nmax straight from the data itself.

Aggregation of Temporal Resolutions

The KrigR package is capable of some neat pre-processing of raw climate data. - both spatially and temporally.

The functions used for downloading reanalysis data with the KrigR package (download_ERA() and download_UERRA()) accept the following arguments which you can use to control the temporal resolution of your climate data:
- TResolution controls the time-line that TStep indexes. You can specify anything from the following: 'hour', 'day', 'month', or 'year'. The default is 'month'.
- TStep controls how many time-steps to aggregate into one layer of data each. Aggregation is done via taking the mean per cell in each raster comprising time steps that go into the final, aggregated time-step. The default is 1.
- FUN controls which values to calculate for the temporal aggregates, e.g.: 'min', 'max', or 'mean' (default).

Specifying FUN only does something when a Tstep > 1 is specified - otherwise, there is no temporal aggregation to be done.

KrigR automatically identifies which data set to download from given your temporal aggregation specification. As soon as TResolution is set to 'month' or 'year', the package automatically downloads monthly mean data from the CDS. We do this to make the temporal aggregtation calculation more light-weight on your computing units and to make downloads less heavy.

If you want to obtain the most extreme minima/maxima for a given time-period, you need to specify a TResolution of 'hour' or 'day' so that temporal aggregation is carried out on hourly data.

Let’s run through a few examples to make clear how desired temporal resolution of data can be achieved using the KrigR package:

What we want TResolution TStep
Hourly intervals 'hour' 1 / don’t specify
6-hour intervals 'hour' 6
Half-day intervals 'hour' 12
Daily intervals 'day' 1 / don’t specify
3-day intervals 'day' 3
Weekly intervals 'day' 7
Monthly aggregates 'month' / don’t specify 1 / don’t specify
4-month intervals 'month' / don’t specify 4
Annual intervals 'year' 1 / don’t specify
10-year intervals 'year' 10

Specifying TResolution of 'month' will result in the download of full month aggregates for every month included in your time series.

For example, DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'month', and TStep = 1 does not result in the mean aggregate for the month between the 20/01/200 and the 20/02/2000, but does result in the monthly aggregates for January and February 2000. If you desire the former, you would need to specify DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'day', and TStep = 32 (the number of days between the two dates).

The reanalysis download functions of the KrigR package automatically break down download requests into monthly intervals thus circumventing the danger of running into making a download request that is too big for the CDS.

For example, DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'day', and TStep = 8 will lead to two download requests to the CDS: (1) hourly data in the range 20/01/2000 00:00 to 31/01/2000 23:00, and (2) hourly data in the range 01/02/2000 00:00 to 20/02/2000 23:00. These data sets are subsequently fused in R, aggregated to daily aggregates, and finally, aggregated to four big aggregates.

This gives you a lot of flexibility, but always keep in mind that third-party data sets might not account for leap-years so make sure the dates of third-party data (should you chose to use some) lines up with the ones as specified by your calls to the functions of the KrigR package.

Use the TResolution, and TStep arguments in the download_ERA(), and download_UERRA() functions of the KrigR package to adjust the temporal resolution of your climate data when downloading reanalysis data with our package.

Kriging to Match Already Available Data

We expect that you won’t want to downscale to specific resolutions most of the time, but rather, match an already existing spatial data set in terms of spatial resolution and extent. Again, the KrigR package got you covered!

Usually, you probably want to krig to match a certain pre-existing data set rather than a certain resolution.

Here, we illustrate this with an NDVI-based example. The NDVI is a satellite-derived vegetation index which tells us how green the Earth is. It comes in bi-weekly intervals and at a spatial resolution of .08333 (roughly 9km). Here, we download all NDVI data for the year 2000 and then create the annual mean. This time, we do so for the state of California because of its size and topographical variety.

Position <- which(StateMask$name_en == "California") # find our state
State_Shp <- StateMask[Position, ] # extract shapefile
if (!file.exists(file.path(Dir.Data, "ReferenceRaster.nc"))) { # if reference raster doesn't already exist
  gimms_files <- downloadGimms(
    x = as.Date("2000-01-01"), # download from January 2000
    y = as.Date("2000-12-31"), # download to December 2000
    dsn = Dir.Data, # save downloads in data folder
    quiet = FALSE
  ) # show download progress
  Reference_ras <- rasterizeGimms(x = gimms_files, remove_header = TRUE) # rasterize the data
  Negatives <- which(values(Reference_ras) < 0) # identify all negative values
  values(Reference_ras)[Negatives] <- 0 # set threshold for barren land (NDVI<0)
  Reference_ras <- calc(Reference_ras, fun = mean, na.rm = TRUE) # annual mean
  Reference_ras <- crop(Reference_ras, State_Shp) # crop to fit shape
  Reference_ras <- mask(Reference_ras, State_Shp) # mask by shape
  writeRaster(x = Reference_ras, filename = file.path(Dir.Data, "ReferenceRaster"), format = "CDF") # write raster to data directory
  unlink(gimms_files, recursive = TRUE) # delete raw GIMMs data
  rm(list = c("Negatives", "gimms_files")) # delete from environment
} else { # if reference raster already exists
  Reference_ras <- raster(file.path(Dir.Data, "ReferenceRaster.nc")) # GIMMs NDVI data as baseline raster
}

So what does this raster look like?

Reference_ras
## class      : RasterLayer 
## dimensions : 114, 124, 14136  (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333  (x, y)
## extent     : -124.4167, -114.0833, 32.5, 42  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : D:/Documents/Website/content/post/krigr-mats/28-01-2021-WorkShop/Data/ReferenceRaster.nc 
## names      : layer 
## values     : 0.007266667, 0.8722875  (min, max)
## zvar       : layer

As stated above, we want to match this with our output.

So, let’s create our directory for this as we did before:

Dir.NDVI <- file.path(Dir.Data, "NDVI_Matching")
dir.create(Dir.NDVI)

We could do this whole analysis in our three steps as outlined above, but why bother when the pipeline described above gets the job done just as well?

Matching Kriging outputs with a pre-existing data set is as easy as plugging the pre-existing raster into the Target_res argument of the krigR() or the download_DEM() function.

This time we want to downscale from ERA5 resolution (roughly 30km) because the ERA5-Land data already matches the NDVI resolution (roughly 9km). Here’s how we do this:

NDVI_Krig <- krigR(
  Variable = "2m_temperature",
  Type = "reanalysis",
  DataSet = "era5",
  DateStart = "2000-01-01",
  DateStop = "2000-12-31",
  TResolution = "year",
  TStep = 1,
  Extent = State_Shp,
  API_User = API_User,
  API_Key = API_Key,
  Target_res = Reference_ras,
  Cores = 1,
  FileName = "AirTemp_NDVI.nc",
  nmax = 80, # we are still working on an automatic optimisation of this. Here, let's try a bigger number
  Dir = Dir.NDVI
)

So? Did we match the pre-existing data?

NDVI_Krig[[1]]
## class      : RasterBrick 
## dimensions : 114, 124, 14136, 1  (nrow, ncol, ncell, nlayers)
## resolution : 0.08333333, 0.08333333  (x, y)
## extent     : -124.4167, -114.0833, 32.5, 42  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : var1.pred 
## min values :  271.4974 
## max values :   299.116

We nailed this!

Let’s take one final look at our (A) raw ERA5 data, (B) NDVI data, (C) Kriged ERA5 data, and (D) standard error of our Kriging output:

### ERA-Plot
ERA_df <- as.data.frame(raster(file.path(Dir.NDVI, "2m_temperature_2000-01-01_2000-12-31_year.nc")), xy = TRUE) # turn raster into dataframe
colnames(ERA_df)[c(-1, -2)] <- "Air Temperature 2000 (ERA5)"
ERA_df <- gather(data = ERA_df, key = Values, value = "value", colnames(ERA_df)[c(-1, -2)]) #  make ggplot-ready
Raw_plot <- ggplot() + # create a plot
  geom_raster(data = ERA_df, aes(x = x, y = y, fill = value)) + # plot the raw data
  facet_wrap(~Values) + # split raster layers up
  theme_bw() +
  labs(x = "Longitude", y = "Latitude") + # make plot more readable
  scale_fill_gradientn(name = "Air Temperature [K]", colours = inferno(100)) + # add colour and legend
  geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### NDVI-Plot
NDVI_df <- as.data.frame(Reference_ras, xy = TRUE) # turn raster into dataframe
colnames(NDVI_df)[c(-1, -2)] <- "NDVI 2000"
NDVI_df <- gather(data = NDVI_df, key = Values, value = "value", colnames(NDVI_df)[c(-1, -2)]) #  make ggplot-ready
NDVI_plot <- ggplot() + # create a plot
  geom_raster(data = NDVI_df, aes(x = x, y = y, fill = value)) + # plot the raw data
  facet_wrap(~Values) + # split raster layers up
  theme_bw() +
  labs(x = "Longitude", y = "Latitude") + # make plot more readable
  scale_fill_gradientn(name = "NDVI", colours = rev(terrain.colors(100))) + # add colour and legend
  geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### KRIGED-Plots
Dates <- c("Kriged Air Temperature 2000 (NDVI Resolution)")
Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
for (Plot in 1:2) { # loop over both output types
  Krig_df <- as.data.frame(NDVI_Krig[[Plot]], xy = TRUE) # turn raster into dataframe
  colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
  Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
  Plots_ls[[Plot]] <- ggplot() + # create plot
    geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
    facet_wrap(~Values) + # split raster layers up
    theme_bw() +
    labs(x = "Longitude", y = "Latitude") + # make plot more readable
    scale_fill_gradientn(name = "Air Temperature [K]", colours = Colours_ls[[Plot]]) + # add colour and legend
    theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) + # reduce margins (for fusing of plots)
    geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
} # end of type-loop
### FINAL PLOT
plot_grid(plotlist = list(Raw_plot, NDVI_plot, Plots_ls[[1]], Plots_ls[[2]]), nrow = 2, labels = "AUTO")

The Call List

So far, we have only ever looked at the first two elements in the list returned by krigR(). A quick look at the help file, the code, or this guide reveals that there is a third list element - the call list. When coding this feature into krigR() we intended for this to be a neat, clean, storage-friendly way of keeping track of how the spatial product was created. It does so without storing additional spatial products. Let’s have a look at it:

NDVI_Krig[[3]]
## $Data
## $Data$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
## 
## $Data$Dimensions
## $Data$Dimensions$nrow
## [1] 20
## 
## $Data$Dimensions$ncol
## [1] 23
## 
## $Data$Dimensions$ncell
## [1] 460
## 
## 
## $Data$Extent
## class      : Extent 
## xmin       : -125.16 
## xmax       : -113.659 
## ymin       : 32.282 
## ymax       : 42.282 
## 
## $Data$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs 
## 
## $Data$layers
## [1] "index_1"
## 
## 
## $Covariates_coarse
## $Covariates_coarse$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
## 
## $Covariates_coarse$Dimensions
## $Covariates_coarse$Dimensions$nrow
## [1] 20
## 
## $Covariates_coarse$Dimensions$ncol
## [1] 23
## 
## $Covariates_coarse$Dimensions$ncell
## [1] 460
## 
## 
## $Covariates_coarse$Extent
## class      : Extent 
## xmin       : -125.16 
## xmax       : -113.659 
## ymin       : 32.282 
## ymax       : 42.282 
## 
## $Covariates_coarse$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs 
## 
## $Covariates_coarse$layers
## [1] "DEM"
## 
## 
## $Covariates_fine
## $Covariates_fine$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
## 
## $Covariates_fine$Dimensions
## $Covariates_fine$Dimensions$nrow
## [1] 114
## 
## $Covariates_fine$Dimensions$ncol
## [1] 124
## 
## $Covariates_fine$Dimensions$ncell
## [1] 14136
## 
## 
## $Covariates_fine$Extent
## class      : Extent 
## xmin       : -124.4167 
## xmax       : -114.0833 
## ymin       : 32.5 
## ymax       : 42 
## 
## $Covariates_fine$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs 
## 
## $Covariates_fine$layers
## [1] "DEM"
## 
## 
## $KrigingEquation
## ERA ~ DEM
## <environment: 0x00000000310c1ae8>
## 
## $Cores
## [1] 1
## 
## $FileName
## [1] "AirTemp_NDVI.nc"
## 
## $Keep_Temporary
## [1] TRUE
## 
## $nmax
## [1] 80
## 
## $Data_Retrieval
## $Data_Retrieval$Variable
## [1] "2m_temperature"
## 
## $Data_Retrieval$Type
## [1] "reanalysis"
## 
## $Data_Retrieval$PrecipFix
## [1] FALSE
## 
## $Data_Retrieval$DataSet
## [1] "era5"
## 
## $Data_Retrieval$DateStart
## [1] "2000-01-01"
## 
## $Data_Retrieval$DateStop
## [1] "2000-12-31"
## 
## $Data_Retrieval$TResolution
## [1] "year"
## 
## $Data_Retrieval$TStep
## [1] 1
## 
## $Data_Retrieval$Extent
## class       : SpatialPolygonsDataFrame 
## features    : 1 
## extent      : -124.4092, -114.1191, 32.53167, 41.99954  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 83
## names       :         featurecla, scalerank, adm1_code, diss_me, iso_3166_2,                               wikipedia, iso_a2, adm0_sr,       name,  name_alt, name_local,  type, type_en, code_local, code_hasc, ... 
## value       : Admin-1 scale rank,         2,  USA-3521,    3521,      US-CA, http://en.wikipedia.org/wiki/California,     US,       8, California, CA|Calif.,         NA, State,   State,       US06,     US.CA, ...

This lengthy list should contain all information you need to trace how you created a certain data set using krigR(). If you feel like anything is missing in this list, please contact us.

Using Third-Party Data

Our krigR() function is designed to work with non-ERA climate data as well as non-GMTED2010 covariate data. To krig your own spatial products using different covariate data than the GMTED2010 DEM we use as a default, you need to step into the three-step workflow.

ATTENTION: Kriging only works on square-cell spatial products!

krigR() supports any combination of ERA-family reanalysis, GMTED2010, third-party climate data, and third-party covariate data. Here, we just demonstrate the use of other covariates than the GMTED2010 used by KrigR by default.

For this example, let’s go back to Saxony:

Position <- which(StateMask$name_en == "Saxony") # find Saxony
State_Shp <- StateMask[Position, ]

Now, let’s download some covariate data:
- TKSat - thermal conductivity of saturated soil
- TKDry - thermal conductivity of dry soil

Dir.TP <- file.path(Dir.Data, "Third-Party_Data")
dir.create(Dir.TP)
#### Thermal Conductivity of saturated soil
if (!file.exists(file.path(Dir.TP, "TKSat.zip"))) { # if not downloaded yet
  download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tksat.zip",
    destfile = file.path(Dir.TP, "TKSat.zip")
  ) # download cultural vector
  unzip(file.path(Dir.TP, "TKSat.zip"), exdir = Dir.TP) # unzip data
}
TKSat_ras <- raster(file.path(Dir.TP, "tksatu_l1.nc"))

#### Thermal Conductivity of dry soil
if (!file.exists(file.path(Dir.TP, "TKDry.zip"))) { # if not downloaded yet
  download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tkdry.zip",
    destfile = file.path(Dir.TP, "TKdry.zip")
  ) # download cultural vector
  unzip(file.path(Dir.TP, "TKdry.zip"), exdir = Dir.TP) # unzip data
}
TKDry_ras <- raster(file.path(Dir.TP, "tkdry_l1.nc"))

Climate Data

This time, let’s mix it up a bit and look at soil temperature in the uppermost depth layer that ERA5(-Land) recognizes (0-7cm):

State_Raw <- download_ERA(
  Variable = "soil_temperature_level_1",
  DataSet = "era5-land",
  DateStart = "1995-01-02",
  DateStop = "1995-01-04",
  TResolution = "day",
  TStep = 1,
  Extent = State_Shp,
  Dir = Dir.TP,
  API_User = API_User,
  API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")

Covariates

As far as covariate data is concerned, we downloaded and loaded into R global data sets which now limit to Saxony:

## TKSat
TKSat_ras <- crop(TKSat_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKSat_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKSat_ras <- mask(TKSat_ras, Shape_ras) # mask for the shape of the state
## TKDry
TKDry_ras <- crop(TKDry_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKDry_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKDry_ras <- mask(TKDry_ras, Shape_ras) # mask for the shape of the state

Personally, I am not content with just looking at thermal conductivity of dry and wet soil in predicting soil temperatures. Other covariates that come to mind are elevation and air temperature (which we can easily provide at target resolution through Kriging it reliably with elevation). Here, we just use some elevation data. Also, let’s push the boundaries and downscale all the way to a 0.01. resolution:

Covs_ls <- download_DEM(
  Train_ras = State_Raw,
  Target_res = .01,
  Shape = State_Shp,
  Dir = Dir.StateShp,
  Keep_Temporary = TRUE
)

Now we combine these into a list containing to stacks of rasters and plot our third-party covariates:

# resample fine-resolution covariate data to match coarse input data
Coarsecovs <- stack(resample(x = TKSat_ras, y = State_Raw), resample(x = TKDry_ras, y = State_Raw), Covs_ls[[1]])
names(Coarsecovs) <- c("TKWet", "TKDry", "DEM") # setting some proper names
# High-resolution to which we downscale
Finecovs <- stack(TKSat_ras, TKDry_ras, Covs_ls[[2]])
names(Finecovs) <- c("TKWet", "TKDry", "DEM") # setting some proper names
# combine data into covariate list
Covs_ls <- list(Coarsecovs, Finecovs)
Plot_Covs(Covs_ls, State_Shp)

Kriging

Finally, we can krig our soil temperature data using the thermal conductivity of the soil. For this, we need to specify a new KrigingEquation. If we don’t, krigR() will notify us that it our default formula cannot be executed and will attempt to build a formula from the data it can find (this auto-generated formula would be the same as the one we specify here - an additive combination of all covariates found both at coarse and fine resolutions). Of course, this formula can also be specified to reflect interactive effects.

State_Krig <- krigR(
  Data = State_Raw, # what to krig
  Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
  Covariates_fine = Covs_ls[[2]], # covariates at target resolution
  Keep_Temporary = FALSE, # delete temporary krigs of layers
  Cores = 3, # run this on 3 cores
  KrigingEquation = "ERA ~ DEM + TKDry + TKWet", # this time, we need to specify a new formula. Try and see what happens if you forget to do this.
  FileName = "State_TP.nc", # save the finished file as this
  Dir = Dir.TP, # where to save the output
  nmax = 15 # how many nearest neighbours to consider in Kriging
)
## Warning in KrigR:::check_Krig(Data = Data, CovariatesCoarse = Covariates_coarse, : It is not recommended to use kriging for statistical downscaling of more than one order of magnitude. You are
## currently attempting this. Kriging will proceed.
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")

Setting aside the output of the Kriging itself, you find a warning message produced by krigR() - since we specified a resolution of 0.01 as our target and our raw data (both third-party and GMTED2010) comes at a native resolution of 0.0083, we cannot aggregate to 0.01 and are stuck with 0.0083. krigR() warns us that this means we krig below the threshold of 0.09 which could be argued to be sensible bny rule of thumb as our lowest resolution we might go to, but proceeds with the Kriging.

You can use the KrigingEquation argument in krigR() to specify the use of third-party covariates and/or data to be kriged. You can specify additive as well as interaction effects.

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gimms_1.2.0       rosm_0.2.5        cowplot_1.1.1     viridis_0.6.0     viridisLite_0.4.0 ggplot2_3.3.3     tidyr_1.1.3       KrigR_0.1.0       httr_1.4.2        stars_0.5-2      
## [11] abind_1.4-5       fasterize_1.0.3   sf_0.9-8          lubridate_1.7.10  automap_1.0-14    doParallel_1.0.16 iterators_1.0.13  foreach_1.5.1     rgdal_1.5-23      raster_3.4-5     
## [21] sp_1.4-5          stringr_1.4.0     keyring_1.1.0     ecmwfr_1.3.0      ncdf4_1.17       
## 
## loaded via a namespace (and not attached):
##  [1] xts_0.12.1         R.cache_0.14.0     tools_4.0.2        backports_1.2.1    bslib_0.2.4        utf8_1.2.1         R6_2.5.0           zyp_0.10-1.1       KernSmooth_2.23-17 rgeos_0.5-5       
## [11] DBI_1.1.1          colorspace_2.0-0   withr_2.4.2        tidyselect_1.1.0   gridExtra_2.3      curl_4.3           compiler_4.0.2     gstat_2.0-7        labeling_0.4.2     bookdown_0.21     
## [21] sass_0.3.1         scales_1.1.1       classInt_0.4-3     proxy_0.4-25       digest_0.6.27      rmarkdown_2.7      R.utils_2.10.1     jpeg_0.1-8.1       pkgconfig_2.0.3    htmltools_0.5.1.1 
## [31] styler_1.4.1       highr_0.9          fastmap_1.1.0      rlang_0.4.10       FNN_1.1.3          farver_2.1.0       jquerylib_0.1.3    generics_0.1.0     zoo_1.8-9          jsonlite_1.7.2    
## [41] dplyr_1.0.5        R.oo_1.24.0        magrittr_2.0.1     Rcpp_1.0.6         munsell_0.5.0      fansi_0.4.2        lifecycle_1.0.0    R.methodsS3_1.8.1  stringi_1.5.3      yaml_2.2.1        
## [51] plyr_1.8.6         grid_4.0.2         crayon_1.4.1       lattice_0.20-41    knitr_1.32         pillar_1.6.0       rjson_0.2.20       boot_1.3-25        spacetime_1.2-4    codetools_0.2-16  
## [61] glue_1.4.2         evaluate_0.14      blogdown_1.3       png_0.1-7          vctrs_0.3.7        gtable_0.3.0       purrr_0.3.4        rematch2_2.1.2     reshape_0.8.8      assertthat_0.2.1  
## [71] cachem_1.0.4       xfun_0.22          lwgeom_0.2-6       e1071_1.7-6        class_7.3-17       Kendall_2.2        tibble_3.1.1       intervals_0.15.2   memoise_2.0.0      units_0.7-1       
## [81] ellipsis_0.3.1
Erik Kusch
Erik Kusch
PhD Student

In my research, I focus on statistical approaches to understanding complex processes and patterns in biology using a variety of data banks.

Related