Recording Availability
A recording of me presenting much of the contents herein can be found on YouTube.
Setting Things Up
Installing KrigR
First of all, we need to install KrigR
. Since we are currently preparing the package for submission to CRAN, it is only available through the associated GitHub repository (https://github.com/ErikKusch/KrigR) for the time being. There, you may also find presentations as well as this guide/workshop.
Here, we use the devtools
package within R
to get access to the install_github()
function. For this to run, we need to tell R
to not stop the installation from GitHub as soon as a warning is produced. This would stop the installation process as early as one of the KrigR
dependencies hits a warning as benign as “Package XYZ was built under R Version X.X.X” which can usually be ignored safely.
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
devtools::install_github("https://github.com/ErikKusch/KrigR")
library(KrigR)
For the sake of this tutorial, we need some extra packages for visualisations. We check if they are already installed, subsequently install (if necessary) and load them with this user-defined function:
install.load.package <- function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, repos = "http://cran.us.r-project.org")
}
require(x, character.only = TRUE)
}
package_vec <- c(
"tidyr", # for turning rasters into ggplot-dataframes
"ggplot2", # for plotting
"viridis", # colour palettes
"cowplot", # gridding multiple plots
"rosm", # obtaining satellite maps
"gimms" # to get some pre-existing data to match in our downscaling
)
sapply(package_vec, install.load.package)
## tidyr ggplot2 viridis cowplot rosm gimms
## TRUE TRUE TRUE TRUE TRUE TRUE
Before we can proceed, you need to open up an account at the CDS and create an API key by following this link: https://cds.climate.copernicus.eu/api-how-to. This is required so that you may issue download requests through the KrigR
package. Once you have done so, please register the user ID and API Key as characters as seen below:
API_User <- "XXXXXX"
API_Key <- "XXXXXXX"
Setting Up Directories
The tutorial is designed to run completely from scratch. For this to work in a structured way, we create a folder/directory structure so that we got some nice compartments on our hard drives for where each separate Kriging process is run. We create the following directories:
- A data directory for all of our individual Kriging processes
- A shapefile directory (located within our data directory) for all of the shapefiles that we will use
Dir.Base <- getwd() # identifying the current directory
Dir.Data <- file.path(Dir.Base, "Data") # folder path for data
Dir.Shapes <- file.path(Dir.Data, "Shapes") # folder path for shapefiles
Dirs <- sapply(c(Dir.Data, Dir.Shapes), function(x) if (!dir.exists(x)) dir.create(x))
Downloading Shapefiles
Here, we download some of the most commonly used shapefiles (for our analyses using KrigR
so far, at least). For repeat code-sourcing, we have implemented checks which only trigger the download of a given shapefile set if the data in question is not present in our Shapes directory yet.
This tutorial doesn’t make use of all the shapefiles we download here. We simply thought it prudent to show you what we have found to be useful and how to get your hands on the data in a reproducible way.
Land Cover
As we will see in this tutorial, masking out unnecessary areas from our analyses speeds up Kriging tremendously. Often, this will entail limiting data sets to terrestrial habitats. This land mask here does a terrific job at masking out non-land areas.
#### LAND MASK (for masking covariates according to land vs. sea)
if (!file.exists(file.path(Dir.Shapes, "LandMask.zip"))) { # if not downloaded yet
download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_land.zip",
destfile = file.path(Dir.Shapes, "LandMask.zip")
) # download cultural vector
unzip(file.path(Dir.Shapes, "LandMask.zip"), exdir = Dir.Shapes) # unzip data
}
LandMask <- readOGR(Dir.Shapes, "ne_10m_land", verbose = FALSE) # read
ggplot() +
geom_polygon(data = LandMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")
States & Provinces
Political divisions are often what policy makers are after. Hence, we also download a shapefile for states and provinces across the Globe.
#### STATE MASK (for within-nation borders)
if (!file.exists(file.path(Dir.Shapes, "StateMask.zip"))) { # if not downloaded yet
download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip",
destfile = file.path(Dir.Shapes, "StateMask.zip")
) # download cultural vector
unzip(file.path(Dir.Shapes, "StateMask.zip"), exdir = Dir.Shapes) # unzip data
}
StateMask <- readOGR(Dir.Shapes, "ne_10m_admin_1_states_provinces", verbose = FALSE) # read
ggplot() +
geom_polygon(data = StateMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")
As you can see, this shapefile provides a set of rather small (geographically speaking) regions. We will use these for showing how Kriging works within this tutorial because Kriging runs in a timely manner for these regions.
Additionally, it is worth pointing out that www.naturalearthdata.com offers a large host of further shapefile data sets including (but not limited to):
- National borders
- Rivers and Lakes
- Urban areas
- Reefs and Bathymetry
We strongly recommend looking there early on in your projects for any shapefile needs you may have.
Ecoregions
When we’re not bound by political boundaries, ecoregions can often help to limit the spatial scope of our macroecological studies to manageable sizes (as far as Kriging effort is concerned).
#### ECOREGIONS (for ecoregional borders)
if (!file.exists(file.path(Dir.Shapes, "WWF_ecoregions"))) { # if not downloaded yet
download.file("http://assets.worldwildlife.org/publications/15/files/original/official_teow.zip",
destfile = file.path(Dir.Shapes, "wwf_ecoregions.zip")
) # download regions
unzip(file.path(Dir.Shapes, "wwf_ecoregions.zip"), exdir = file.path(Dir.Shapes, "WWF_ecoregions")) # unzip data
}
EcoregionMask <- readOGR(file.path(Dir.Shapes, "WWF_ecoregions", "official", "wwf_terr_ecos.shp"), verbose = FALSE) # read
ggplot() +
geom_polygon(data = EcoregionMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")
This particular data set offers the shapes of biomes and biogeographical realms across the Earth. As far as Kriging with the KrigR
package is concerned, we heavily advise against Kriging using biogeographical realms without further consideration since these large regions lead to extreme processing requirements.
Plotting Functions
In order to easily visualise our Kriging procedure including (1) inputs, (2) covariates, and (3) outputs without repeating too much of the same code, we create a few plotting functions of our own here.
Don’t worry about understanding how these work off the bat here. Kriging and the package KrigR
are what we want to demonstrate here - not visualisation strategies.
Raw Data
This function plots the raw data that we will krig and exports a single plot of all layers of the input raster:
Plot_Raw <- function(Raw, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
Raw_df <- as.data.frame(Raw, xy = TRUE) # turn raster into dataframe
colnames(Raw_df)[c(-1, -2)] <- Dates # set colnames
Raw_df <- gather(data = Raw_df, key = Values, value = "value", colnames(Raw_df)[c(-1, -2)]) # make ggplot-ready
Raw_plot <- ggplot() + # create a plot
geom_raster(data = Raw_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = Legend, colours = inferno(100)) # add colour and legend
if (!is.null(Shp)) { # if a shape has been designated
Raw_plot <- Raw_plot + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
return(Raw_plot)
} # export the plot
Covariates
This function plots the covariate data we will be using by showing each variable on both resolution levels side-by-side:
Plot_Covs <- function(Covs, Shp = NULL) {
Plots_ls <- as.list(rep(NA, nlayers(Covs[[1]]) * 2)) # create as many plots as there are covariates variables * 2
Counter <- 1 # counter for list position
for (Variable in 1:nlayers(Covs[[1]])) { # loop over all covariate variables
Covs_Iter <- list(Covs[[1]][[Variable]], Covs[[2]][[Variable]]) # extract the data for this variable
for (Plot in 1:2) { # loop over both resolutions for the current variable
Cov_df <- as.data.frame(Covs_Iter[[Plot]], xy = TRUE) # turn raster into data frame
colnames(Cov_df)[3] <- "Values" # assign a column name to the data column
Plots_ls[[Counter]] <- ggplot() + # create plot
geom_raster(data = Cov_df, aes(x = x, y = y, fill = Values)) + # plot the covariate data
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = names(Covs_Iter[[Plot]]), colours = cividis(100)) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
if (!is.null(Shp)) { # if a shape has been designated
Plots_ls[[Counter]] <- Plots_ls[[Counter]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
Counter <- Counter + 1 # raise list counter
} # end of resolution loop
} # end of variable loop
ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 2, labels = "AUTO") # fuse the plots into one big plot
return(ggPlot)
} # export the plot
Kriged Products
This function plots the Kriging outputs by splitting them up according to whether they are Kriging predictions or the uncertainties associated with them:
Plot_Krigs <- function(Krigs, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
for (Plot in 1:2) { # loop over both output types
Krig_df <- as.data.frame(Krigs[[Plot]], xy = TRUE) # turn raster into dataframe
colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
Plots_ls[[Plot]] <- ggplot() + # create plot
geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = Legend, colours = Colours_ls[[Plot]]) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
if (!is.null(Shp)) { # if a shape has been designated
Plots_ls[[Plot]] <- Plots_ls[[Plot]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
} # end of type-loop
ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 1, labels = "AUTO") # fuse the plots into one big plot
return(ggPlot)
} # export the plot
Kriging - The Three Steps
Using KrigR
in this way has us use the functions download_ERA()
, download_DEM()
, and krigR()
. Running these functions by themselves as opposed to executing the KrigR
pipeline (which we cover later in this tutorial) gives you the most control and oversight of the KrigR
workflow.
We will start with a simple Kriging process and subsequently make it more sophisticated during this tutorial.
The most simple way in which you can run the functions of the KrigR
package is by specifying a rectangular bounding box (i.e. an extent) to specify your study region(s). Here, we will run a small downscaling exercise of my birth-state of Saxony. It is a medium-sized state in the east of Germany and lends itself to some nice statistical downscaling since it presents us with mountainous regions along the south-eastern border and flatland areas towards the north-western edges.
Here’s the full area we will be obtaining and downscaling data for:
Extent <- extent(c(11.8, 15.1, 50.1, 51.7)) # roughly the extent of Saxony
bmaps.plot(bbox = Extent, type = "AerialWithLabels", quiet = TRUE, progress = "none")
Here, you can see a map of the region with the mountains of the Erzgebirge along the southern border to the Czech Republic as well as the flat terrains around Leipzig in the north of Saxony.
Now, let’s create a separate directory within which we will store the raw ERA5-land data, GMTED2010 DEM covariate data, and Kriging outputs:
Dir.StateExt <- file.path(Dir.Data, "State_Extent")
dir.create(Dir.StateExt)
Climate Data
No we are ready to carry out our first download of climate reanalysis data through the KrigR
package!
For this part of the tutorial, we download air temperature for a three-day interval around my birthday (03-01-1995) using the extent of my home-state of Saxony.
Notice that the downloading of ERA-family reanalysis data may take a short while to start as the download request gets queued with the CDS of the ECMWF before it is executed.
State_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = Extent,
Dir = Dir.StateExt,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
Now, let’s look at the raster that was produced:
State_Raw
## class : RasterBrick
## dimensions : 19, 36, 684, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.1, 0.09999996 (x, y)
## extent : 11.65, 15.25, 49.95, 51.85 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : index_1, index_2, index_3
## min values : 267.8741, 266.8182, 262.5400
## max values : 273.2644, 272.0682, 269.5377
As you can see, we obtained a RasterBrick object with 3 layers of data (one for each day we are interested in). Notice that extent of our downloaded data set does not fit the extent we set earlier manually. This is a precaution we have taken within KrigR
to make sure that all data cells you are interested in are covered.
KrigR
widens the spatial extent that is specified to ensure full coverage of the respective ERA5(-Land) raster cells. Global downloads are not affected by this and work just as you’d expect.
Additionally, as could be gleamed from the plots above, it is apparent that my home-state got a lot cooler on the day after my birth (04/01/1995) when compared to the two preceding days. What to make of this, I leave up to you because I don’t think I like the interpretation myself if we were to assume causality here.
Keep in mind that every function within the KrigR
package produces NetCDF (.nc) files in the specified directory (Dir
argument in the function call) to allow for further manipulation outside of R
if necessary (for example, using Panoply).
Covariates
Next, we use the download_DEM()
function which comes with KrigR
to obtain elevation data as our covariate of choice. This produces two rasters:
- A raster of training resolution which matches the input data in all attributes except for the data in each cell
- A raster of target resolution which matches the input data as closely as possible in all attributes except for the resolution (which is specified by the user) and the data in each cell
Both of these products are bundled into a list
where the first element corresponds to the training resolution and the second element contains the target resolution covariate data. Here, we specify a target resolution of .02
.
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .02,
Dir = Dir.StateExt,
Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls)
Alternatively to specifying a target resolution, you can specify a different raster which should be matched in all attributes by the raster at target resolution. We get to this again at a later point in this tutorial.
For now, let’s simply inspect our list of covariate rasters:
Covs_ls
## [[1]]
## class : RasterLayer
## dimensions : 19, 36, 684 (nrow, ncol, ncell)
## resolution : 0.1, 0.09999996 (x, y)
## extent : 11.65, 15.25, 49.95, 51.85 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : DEM
## values : 59.11391, 926.513 (min, max)
##
##
## [[2]]
## class : RasterLayer
## dimensions : 114, 216, 24624 (nrow, ncol, ncell)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : DEM
## values : 46.5, 1150.25 (min, max)
This set of covariates has 684 and 24624 cells containing values at training and target resolution, respectively.
Kriging
Now let’s krig these data:
KrigStart <- Sys.time()
State_Krig <- krigR(
Data = State_Raw, # data we want to krig as a raster object
Covariates_coarse = Covs_ls[[1]], # training covariate as a raster object
Covariates_fine = Covs_ls[[2]], # target covariate as a raster object
Keep_Temporary = FALSE, # we don't want to retain the individually kriged layers on our hard-drive
Cores = 1, # we want to krig on just one core
FileName = "State_Ext.nc", # the file name for our full kriging output
Dir = Dir.StateExt # which directory to save our final input in
)
KrigStop <- Sys.time()
Plot_Krigs(State_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
There we go. All the data has been downscaled and we do have uncertainties recorded for all of our outputs. As you can see, the elevation patterns show up clearly in our kriged air temperature output. Furthermore, you can see that our certainty of Kriging predictions drops on the 04/01/1995 in comparison to the two preceding days. However, do keep in mind that a maximum standard error of 0.195, 0.206, 0.333 (for each layer of our output respectively) on a total range of data of 6.553, 6.318, 7.788 (again, for each layer in the output respectively) is evident of a downscaling result we can be confident in.
Now, what does the output actually look like?
State_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here
## $Kriging_Output
## class : RasterBrick
## dimensions : 114, 216, 24624, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.pred.1, var1.pred.2, var1.pred.3
## min values : 266.8146, 265.6941, 261.6227
## max values : 273.3675, 272.0124, 269.4109
##
##
## $Kriging_SE
## class : RasterBrick
## dimensions : 114, 216, 24624, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.stdev.1, var1.stdev.2, var1.stdev.3
## min values : 0.1624252, 0.1706871, 0.2581532
## max values : 0.1951452, 0.2059939, 0.3330932
As output of the krigR()
function, we obtain a list of downscaled data as the first element and downscaling standard errors as the second list element.
Finally, let’s see how long it took to carry out this shapefile-informed downscaling:
BaseTime <- KrigStop - KrigStart
BaseTime
## Time difference of 50.12492 secs
Kriging - The Pipeline
Now that we’ve seen how you can execute the KrigR
workflow using three separate functions, it is time that we show you the most simplified function call to obtain some downscaled products: the pipeline.
First, let’s create a directory which the KrigR
pipeline will leak .nc files to. Don’t worry, this leak in the pipeline is very much intentional and of no environmental concern (unless you are running low on disk space).
Dir.StatePipe <- file.path(Dir.Data, "State_Pipe")
dir.create(Dir.StatePipe)
We have coded the krigR()
function in such a way that it can either be addressed at already present spatial products within your R
environment, or handle all the downloading and resampling of input data and covariates for you from scratch. To run the exact same Kriging approach as within our extent-example, we can specify the krigR()
function as such:
Pipe_Krig <- krigR(
Variable = "2m_temperature",
Type = "reanalysis",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = Extent,
API_User = API_User,
API_Key = API_Key,
Target_res = .02,
Cores = 1,
FileName = "State_Pipe.nc",
Dir = Dir.StatePipe,
Keep_Temporary = FALSE
)
Pipe_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here
Plot_Krigs(Pipe_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
Let’s just check how this compares to non-pipeline product:
all.equal(State_Krig[[1]], Pipe_Krig[[1]])
## [1] TRUE
Surprise! There is no difference.
Getting More out of KrigR
Kriging By Shapefile
The computational cost of Kriging depends on many factors, one of which is number of observation (i.e. raster cells), which we can limit by using a shapefile.
Let’s revisit our above Saxony-Kriging example, but this time, we will use a shapefile to mask for the exact boundaries of the state itself! We extract the shapefile from the states & provinces data set as follows:
Position <- which(StateMask$name_en == "Saxony") # find Saxony
State_Shp <- StateMask[Position,]
bmaps.plot(bbox = extent(State_Shp), type = "AerialWithLabels", quiet = TRUE, progress = "none", stoponlargerequest = FALSE)
Here, you can see a map of the region with. Notice the outline of the state of Saxony; we will reconstruct that using our shapefile.
Again, we create a directory for all the .nc files that our functions create:
Dir.StateShp <- file.path(Dir.Data, "State_Shape")
dir.create(Dir.StateShp)
Climate Data
Again, we download air temperature data from the ERA5-Land data set for the three days from the second to the fourth of January 1995. This time, however, we limit the data by shapefile.
State_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = State_Shp,
Dir = Dir.StateShp,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

Note that we have implemented masking by shapefile in such a way that every cell that is even partially covered by your shapefile is retained in the downloaded ERA5(-Land) data set. Regular masking via the raster
package-contained mask()
function only retains those cells whose centroids fall within the shapefile boundaries.
Covariates
Again, we obtain elevation data as our covariate of choice:
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .02,
Shape = State_Shp,
Dir = Dir.StateShp,
Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls, State_Shp)
Notice that despite the covariate rasters (and input rasters, for that matter) containing 578 and 2.080810^{4} for training and target resolution respectively, we only obtain data for 299 and 8765 cells respectively due to our specification of a shapefile.
Kriging
Again, we execute our Kriging functionality using the krigR()
function:
KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 1, # only run this on 1 core
FileName = "State_Shape.nc", # save the finished file as this
Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
The results look just like the ones that we obtained without using the shapefile (this time, we only see what’s contained in the shapefile, though). It is noteworthy to highlight the change in uncertainties here. Overall, our uncertainty decreased in going from extent-informed to shapefile-informed kriging in this example. However, we now see some serious chessboard pattern of the uncertainties in our outputs. This is due to the uncertainty decreasing as fine-resolution cells are more closely located towards the centre of a coarse-scaled cell which they have been downscaled from. This behaviour is normal. To obtain more useful kriging standard errors (i.e. uncertainties), we use local Kriging (we get to later in this tutorial).
Let’s see how long this took:
ShapeTime <- KrigStop - KrigStart
ShapeTime
## Time difference of 3.642711 secs
Running this downscaling process for Saxony using the shapefile took 0.0726727 of the original downscaling time using an extent statement.
Using a shapefile
rather than an extent
statement speeds up the Kriging process!
Kriging For Point-Data
KrigR
also support point-data which is usually used for plot/observation measurements.
Often times, we only require data for a certain number of geo-referenced locations which may be separated by large areas of no interest to a given analysis. For these analyses, drawing a shapefile around all locations is already a big step up from just using the extent of the area within which the locations fall. That being said, such a shapefile could still cover big areas of which are of no consideration to our analysis. We have this implemented support for point-data input in KrigR
.
Let’s first look at the data for this example. Here, we will use hypothetical sightings of Bigfoot:
Bigfoot.URL <- "https://opendata.arcgis.com/datasets/9947fc49e6c44120b4a1b3133c073dbc_0.csv" # link to the data
Bigfoot.data <- read.csv(Bigfoot.URL) # red the data
Bigfoot_ext <- extent(min(Bigfoot.data$Lon), max(Bigfoot.data$Lon),
min(Bigfoot.data$Lat), max(Bigfoot.data$Lat))
BigfootLand <- crop(LandMask , Bigfoot_ext)
bmaps.plot(bbox = Bigfoot_ext, type = "AerialWithLabels", quiet = TRUE, progress = "none")
osm.points(x = Bigfoot.data$Lon,
y = Bigfoot.data$Lat,
pch=10, cex=0.3, col = "red")
As we can see, Bigfoot is mostly reported to be found in the west and east of the contiguous US with some occurrences ranging across Canada and Alaska. Quite obviously, there are large areas here, within which we do not require fine-resolution climate data to understand Bigfoot occurrence.
What does this look like with our land mask?
ggplot() +
geom_polygon(data = BigfootLand, aes(x = long, y = lat, group = group), colour = "red", fill = "lightgrey") +
geom_point(data = Bigfoot.data, aes(x = Lon, y = Lat), pch = 3) +
theme_bw() + labs(x = "Longitude", y = "Latitude")
We could use the cropped land mask above to download some data. This would already be better than going by extent of the point-data since we bypass sea-areas. However, KrigR
can do better. To visualise how this works, we create a smaller subset of the Bigfoot location data here:
set.seed(42)
Bigfoot_sample <- sample(x = 1:dim(Bigfoot.data)[1], size = 60)
Bigfoot_df <- Bigfoot.data[Bigfoot_sample, ]
So what does this look like on a map?
Bigfoot_ext <- extent(min(Bigfoot_df$Lon), max(Bigfoot_df$Lon),
min(Bigfoot_df$Lat), max(Bigfoot_df$Lat))
BigfootLand <- crop(LandMask , Bigfoot_ext)
ggplot() +
geom_polygon(data = BigfootLand, aes(x = long, y = lat, group = group), colour = "red", fill = "lightgrey") +
geom_point(data = Bigfoot_df, aes(x = Lon, y = Lat), pch = 3, cex = 4) +
theme_bw() + labs(x = "Longitude", y = "Latitude")
We’ve obtained a nice sample showing distinct patches of Bigfoot occurrences in the west and east parts of the contiguous US. Again, the shapefile we would make from this with the landmask would cover vast areas which we would not be interested in and thus rack up processing time without a payoff. But what if we drew a tiny extent around each of our locations? KrigR
can do that for you!
Before we get stuck in with this, we create a directory for our following analysis before we proceed:
Dir.Points <- file.path(Dir.Data, "Point_Data")
dir.create(Dir.Points)
Climate Data
Now, it is time to obtain climate data for our Bigfoot sightings. To do so within KrigR
we make use of three arguments:
Extent
- Here, we provideKrigR
with a data frame containing Lat and Lon coordinates with columns that are named as such.
Buffer
- How big a rectangular (square, to be precise) buffer to draw around each location. This is measured in centesimal degrees.
ID
- Column name in ourExtent
-object containing unique location/observation/plot identifiers.
You might already ask yourself? “What if two or more buffers overlap?” Well, these buffers are fused automatically to build a single polygon.
Here, we are after a climatology or air temperature spanning the years 1982 - 2012 in a 0.1 degree square buffer around around each of our sampled Bigfoot occurrences. Since downloading this data set takes quite a while, I have implemented a check here to make skip downloading and formatting of the data if it has already been done:
if (!file.exists(file.path(Dir.Points, "AT_Climatology.nc"))) {
Points_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1982-01-01",
DateStop = "2012-12-31",
TResolution = "year",
TStep = 30,
Extent = Bigfoot_df, # the point data with Lon and Lat columns
Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
ID = "FID_", # this is the column which holds point IDs
API_User = API_User,
API_Key = API_Key,
Dir = Dir.Points,
FileName = "AT_Climatology.nc"
)
} else {
Points_Raw <- raster(file.path(Dir.Points, "AT_Climatology.nc"))
}
Plot_Raw(Points_Raw, Shp = BigfootLand, Dates = c("Climatology"))

Setting the Extent
argument to be a data.frame
object containing Lat and Lon data, you can obtain climate reanalysis data around locations/points. This can be controlled using the Buffer
and ID
arguments.
Covariates
Next, we need covariates. We obtain these using the same arguments as we used in our call to download_ERA
:
Covs_ls <- download_DEM(
Train_ras = Points_Raw,
Target_res = .02,
Keep_Temporary = TRUE,
Shape = Bigfoot_df, # the point data with Lon and Lat columns
Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
ID = "FID_", # this is the column which holds point IDs
Dir = Dir.Points
)
Plot_Covs(Covs_ls, Shp = BigfootLand)
Kriging
Finally, we are ready to krig our data. This is done the exact same way as in any other approach. However, to be sensible, I add the use of the maxdist
argument (passed on to gstat::krige
) here. I set it to the same value as the Buffer
argument previously so that we perform local Kriging (more on that later) by looking only at data that is no further than what we set our maxdist
to:
KrigStart <- Sys.time() # record when we started
Points_Krig <- krigR(
Data = Points_Raw,
Covariates_coarse = Covs_ls[[1]],
Covariates_fine = Covs_ls[[2]],
Keep_Temporary = FALSE,
Cores = 1,
Dir = Dir.Points,
FileName = "Points.nc",
maxdist = 0.1 # only krig in areas of 0.1 around each location
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(Points_Krig, Shp = BigfootLand, Dates = c("Climatology"))
That went by quick! We can’t really see the results clearly here and will look at the in more detail in just a second. Firstly, let’s see how long this took:
PointsTime <- KrigStop - KrigStart
PointsTime
## Time difference of 8.56543 secs
From experience, Kriging for a region this large instead of just for small regions around locations would easily take upwards of 10 hours! This is a big improvement.
What did we gain by doing this? Why not just obtain the raw reanalysis data and be done with it? For this, we plot our data:
## DATA EXTRACTION
AirTemp_Krig <- raster::extract(
x = stack(Points_Krig[-3]),
y = Bigfoot_df[, c("Lon", "Lat")]
)
AirTemp_Coarse <- raster::extract(
x = stack(Points_Raw),
y = Bigfoot_df[, c("Lon", "Lat")]
)
## DATA PREPARATION
ggplot_df <- data.frame(
Data = c(AirTemp_Coarse, AirTemp_Krig[, 1]),
Source = rep(c("Native", "Kriged"), each = nrow(Bigfoot_df)),
Difference = c(AirTemp_Coarse - AirTemp_Krig[, 1]),
Uncertainty = AirTemp_Krig[, 2],
Index = 1:nrow(Bigfoot_df)
)
## DENSITIES
Densities_gg <- ggplot(data = ggplot_df, aes(x = Data, fill = Source)) +
geom_density(alpha = 0.42) +
theme_bw() +
labs(x = "Density", y = "Air Temperature")
## DIFFERENCES
Difference_gg <- ggplot(data = ggplot_df, aes(x = Index, y = Difference)) +
geom_point(cex = 5, pch = 18) +
geom_errorbar(aes(ymin = Difference - Uncertainty, ymax = Difference + Uncertainty)) +
geom_abline(intercept = 0, slope = 0, col = "red", lwd = 1.5) +
theme_bw() +
labs(x = "Location", y = "Native vs. Kriged Temprature-Difference")
### FINAL PLOT
plot_grid(plotlist = list(Densities_gg, Difference_gg), nrow = 1, labels = "AUTO")
Looking at the density distributions we can see a higher peak of intermediate temperature values when compared to the native Era5-Land data. However, that is not a big difference. This is even more evident when looking at the differences between native and kriged data. Many of the data points therein fall close to a difference of 0K. Taking into account the kriging uncertainty, not many fall significantly outside of the 0K range. Thjat being said, there are outliers which might be very important in understanding the physiology of Bigfoot.
By obtaining climate reanalysis data around locations/points and subsequently downscaling these, we can obtain more precise local data.
Of course, you can do what we did above in just one call to the KrigR
pipeline with the following specification:
krigR(
Variable = "2m_temperature",
DataSet = "era5-land",
Type = "reanalysis",
DateStart = "1982-01-01",
DateStop = "2012-12-31",
TResolution = "year",
TStep = 30,
Extent = Bigfoot_df, # the point data with Lon and Lat columns
Buffer = 0.1, # a 0.1 degree buffer should be drawn around each point
ID = "FID_", # this is the column which holds point IDs
API_User = API_User,
API_Key = API_Key,
Target_res = .02,
Cores = 1,
maxdist = 0.1,
FileName = "Points.nc",
Keep_Temporary = FALSE
)
Multi-Core Kriging
The computational cost of Kriging can be divided up amongst multiple cores using our krigR()
function and the Cores
argument therein. This enables parallel computation of multi-layer rasters.
Let’s stick with our shapefile-informed downscaling of air temperature records across Saxony. First, we again obtain our data:
State_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = State_Shp,
Dir = Dir.StateShp,
API_User = API_User,
API_Key = API_Key
)
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .02,
Shape = State_Shp,
Dir = Dir.StateShp,
Keep_Temporary = TRUE
)
This time, however, we make use of the Cores
argument in the krigR()
function to run the downscaling of our three time steps worth of data in paralle:.
KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # run this on 3 core
FileName = "State_Cores.nc", # save the finished file as this
Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
The results look just like the ones that we obtained without using the shapefile and just one-core processing. Let’s see how long this took:
CoreTime <- KrigStop - KrigStart
CoreTime
## Time difference of 4.51963 secs
Running this downscaling process for Saxony using the shapefile took 1.2407325 of the original downscaling time using an extent statement (the effects multi-core Kriging really kick in when looking at rasters with many bands because the initial opening of clusters for parallel processing takes some time by itself.).
Using the Cores
argument with a value of n (with n > 1) speeds up the Kriging process of multi-layer rasters by an order of n!
Weighted Local Kriging
By default Kriging of the krigR()
function uses all cells in a spatial product to downscale individual cells of rasters. The nmax
argument can circumvent this.
Let’s build further on our above example by adding the nmax
argument (passed on to gstat::krige()
) to our krigR()
function call. This argument controls how many of the closest cells the Kriging algorithm should consider in the downscaling of individual coarse, training cells.
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # only run this on 1 core
FileName = "State_Local.nc", # save the finished file as this
Dir = Dir.StateShp, # where to save the output
nmax = 15 # how many nearest neighbours to consider in Kriging
)
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
The air temperature prediction/downscaling results look just like the ones that we obtained above. However, we seriously improved our localised understanding of Kriging uncertainties (i.e. we see much more localised patterns of Kriging standard error). In this case for Saxony, our uncertainties seem to be highest for areas where the landscape is dominated by large, abrupt changes in elevation (e.g. around the Fichtelberg along the southern border) and water-dominated areas such as streams and lakes (e.g. the lakes around Leipzig in the North of Saxony).
Using the nmax
argument helps to identify highly localised patterns in the Kriging uncertainty as well as predictions!
We are working on an automatic feature for the identification of an appropriate number of nmax
straight from the data itself.
Aggregation of Temporal Resolutions
The KrigR
package is capable of some neat pre-processing of raw climate data. - both spatially and temporally.
The functions used for downloading reanalysis data with the KrigR
package (download_ERA()
and download_UERRA()
) accept the following arguments which you can use to control the temporal resolution of your climate data:
- TResolution
controls the time-line that TStep
indexes. You can specify anything from the following: 'hour'
, 'day'
, 'month'
, or 'year'
. The default is 'month'
.
- TStep
controls how many time-steps to aggregate into one layer of data each. Aggregation is done via taking the mean per cell in each raster comprising time steps that go into the final, aggregated time-step. The default is 1
.
- FUN
controls which values to calculate for the temporal aggregates, e.g.: 'min'
, 'max'
, or 'mean'
(default).
Specifying FUN
only does something when a Tstep
> 1 is specified - otherwise, there is no temporal aggregation to be done.
KrigR
automatically identifies which data set to download from given your temporal aggregation specification. As soon as TResolution
is set to 'month'
or 'year'
, the package automatically downloads monthly mean data from the CDS. We do this to make the temporal aggregtation calculation more light-weight on your computing units and to make downloads less heavy.
If you want to obtain the most extreme minima/maxima for a given time-period, you need to specify a TResolution
of 'hour'
or 'day'
so that temporal aggregation is carried out on hourly data.
Let’s run through a few examples to make clear how desired temporal resolution of data can be achieved using the KrigR
package:
What we want | TResolution |
TStep |
---|---|---|
Hourly intervals | 'hour' |
1 / don’t specify |
6-hour intervals | 'hour' |
6 |
Half-day intervals | 'hour' |
12 |
Daily intervals | 'day' |
1 / don’t specify |
3-day intervals | 'day' |
3 |
Weekly intervals | 'day' |
7 |
Monthly aggregates | 'month' / don’t specify |
1 / don’t specify |
4-month intervals | 'month' / don’t specify |
4 |
Annual intervals | 'year' |
1 / don’t specify |
10-year intervals | 'year' |
10 |
Specifying TResolution
of 'month'
will result in the download of full month aggregates for every month included in your time series.
For example, DateStart = "2000-01-20"
, DateStop = "2000-02-20"
with TResolution = 'month'
, and TStep = 1
does not result in the mean aggregate for the month between the 20/01/200 and the 20/02/2000, but does result in the monthly aggregates for January and February 2000. If you desire the former, you would need to specify DateStart = "2000-01-20"
, DateStop = "2000-02-20"
with TResolution = 'day'
, and TStep = 32
(the number of days between the two dates).
The reanalysis download functions of the KrigR
package automatically break down download requests into monthly intervals thus circumventing the danger of running into making a download request that is too big for the CDS.
For example, DateStart = "2000-01-20"
, DateStop = "2000-02-20"
with TResolution = 'day'
, and TStep = 8
will lead to two download requests to the CDS: (1) hourly data in the range 20/01/2000 00:00 to 31/01/2000 23:00, and (2) hourly data in the range 01/02/2000 00:00 to 20/02/2000 23:00. These data sets are subsequently fused in R
, aggregated to daily aggregates, and finally, aggregated to four big aggregates.
This gives you a lot of flexibility, but always keep in mind that third-party data sets might not account for leap-years so make sure the dates of third-party data (should you chose to use some) lines up with the ones as specified by your calls to the functions of the KrigR
package.
Use the TResolution
, and TStep
arguments in the download_ERA()
, and download_UERRA()
functions of the KrigR
package to adjust the temporal resolution of your climate data when downloading reanalysis data with our package.
Kriging to Match Already Available Data
We expect that you won’t want to downscale to specific resolutions most of the time, but rather, match an already existing spatial data set in terms of spatial resolution and extent. Again, the KrigR
package got you covered!
Usually, you probably want to krig to match a certain pre-existing data set rather than a certain resolution.
Here, we illustrate this with an NDVI-based example. The NDVI is a satellite-derived vegetation index which tells us how green the Earth is. It comes in bi-weekly intervals and at a spatial resolution of .08333
(roughly 9km). Here, we download all NDVI data for the year 2000 and then create the annual mean. This time, we do so for the state of California because of its size and topographical variety.
Position <- which(StateMask$name_en == "California") # find our state
State_Shp <- StateMask[Position, ] # extract shapefile
if (!file.exists(file.path(Dir.Data, "ReferenceRaster.nc"))) { # if reference raster doesn't already exist
gimms_files <- downloadGimms(
x = as.Date("2000-01-01"), # download from January 2000
y = as.Date("2000-12-31"), # download to December 2000
dsn = Dir.Data, # save downloads in data folder
quiet = FALSE
) # show download progress
Reference_ras <- rasterizeGimms(x = gimms_files, remove_header = TRUE) # rasterize the data
Negatives <- which(values(Reference_ras) < 0) # identify all negative values
values(Reference_ras)[Negatives] <- 0 # set threshold for barren land (NDVI<0)
Reference_ras <- calc(Reference_ras, fun = mean, na.rm = TRUE) # annual mean
Reference_ras <- crop(Reference_ras, State_Shp) # crop to fit shape
Reference_ras <- mask(Reference_ras, State_Shp) # mask by shape
writeRaster(x = Reference_ras, filename = file.path(Dir.Data, "ReferenceRaster"), format = "CDF") # write raster to data directory
unlink(gimms_files, recursive = TRUE) # delete raw GIMMs data
rm(list = c("Negatives", "gimms_files")) # delete from environment
} else { # if reference raster already exists
Reference_ras <- raster(file.path(Dir.Data, "ReferenceRaster.nc")) # GIMMs NDVI data as baseline raster
}
So what does this raster look like?
Reference_ras
## class : RasterLayer
## dimensions : 114, 124, 14136 (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -124.4167, -114.0833, 32.5, 42 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : D:/Documents/Website/content/post/krigr-mats/28-01-2021-WorkShop/Data/ReferenceRaster.nc
## names : layer
## values : 0.007266667, 0.8722875 (min, max)
## zvar : layer
As stated above, we want to match this with our output.
So, let’s create our directory for this as we did before:
Dir.NDVI <- file.path(Dir.Data, "NDVI_Matching")
dir.create(Dir.NDVI)
We could do this whole analysis in our three steps as outlined above, but why bother when the pipeline described above gets the job done just as well?
Matching Kriging outputs with a pre-existing data set is as easy as plugging the pre-existing raster into the Target_res
argument of the krigR()
or the download_DEM()
function.
This time we want to downscale from ERA5 resolution (roughly 30km) because the ERA5-Land data already matches the NDVI resolution (roughly 9km). Here’s how we do this:
NDVI_Krig <- krigR(
Variable = "2m_temperature",
Type = "reanalysis",
DataSet = "era5",
DateStart = "2000-01-01",
DateStop = "2000-12-31",
TResolution = "year",
TStep = 1,
Extent = State_Shp,
API_User = API_User,
API_Key = API_Key,
Target_res = Reference_ras,
Cores = 1,
FileName = "AirTemp_NDVI.nc",
nmax = 80, # we are still working on an automatic optimisation of this. Here, let's try a bigger number
Dir = Dir.NDVI
)
So? Did we match the pre-existing data?
NDVI_Krig[[1]]
## class : RasterBrick
## dimensions : 114, 124, 14136, 1 (nrow, ncol, ncell, nlayers)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -124.4167, -114.0833, 32.5, 42 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.pred
## min values : 271.4974
## max values : 299.116
We nailed this!
Let’s take one final look at our (A) raw ERA5 data, (B) NDVI data, (C) Kriged ERA5 data, and (D) standard error of our Kriging output:
### ERA-Plot
ERA_df <- as.data.frame(raster(file.path(Dir.NDVI, "2m_temperature_2000-01-01_2000-12-31_year.nc")), xy = TRUE) # turn raster into dataframe
colnames(ERA_df)[c(-1, -2)] <- "Air Temperature 2000 (ERA5)"
ERA_df <- gather(data = ERA_df, key = Values, value = "value", colnames(ERA_df)[c(-1, -2)]) # make ggplot-ready
Raw_plot <- ggplot() + # create a plot
geom_raster(data = ERA_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "Air Temperature [K]", colours = inferno(100)) + # add colour and legend
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### NDVI-Plot
NDVI_df <- as.data.frame(Reference_ras, xy = TRUE) # turn raster into dataframe
colnames(NDVI_df)[c(-1, -2)] <- "NDVI 2000"
NDVI_df <- gather(data = NDVI_df, key = Values, value = "value", colnames(NDVI_df)[c(-1, -2)]) # make ggplot-ready
NDVI_plot <- ggplot() + # create a plot
geom_raster(data = NDVI_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "NDVI", colours = rev(terrain.colors(100))) + # add colour and legend
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### KRIGED-Plots
Dates <- c("Kriged Air Temperature 2000 (NDVI Resolution)")
Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
for (Plot in 1:2) { # loop over both output types
Krig_df <- as.data.frame(NDVI_Krig[[Plot]], xy = TRUE) # turn raster into dataframe
colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
Plots_ls[[Plot]] <- ggplot() + # create plot
geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "Air Temperature [K]", colours = Colours_ls[[Plot]]) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) + # reduce margins (for fusing of plots)
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
} # end of type-loop
### FINAL PLOT
plot_grid(plotlist = list(Raw_plot, NDVI_plot, Plots_ls[[1]], Plots_ls[[2]]), nrow = 2, labels = "AUTO")
The Call List
So far, we have only ever looked at the first two elements in the list returned by krigR()
. A quick look at the help file, the code, or this guide reveals that there is a third list element - the call list. When coding this feature into krigR()
we intended for this to be a neat, clean, storage-friendly way of keeping track of how the spatial product was created. It does so without storing additional spatial products. Let’s have a look at it:
NDVI_Krig[[3]]
## $Data
## $Data$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $Data$Dimensions
## $Data$Dimensions$nrow
## [1] 20
##
## $Data$Dimensions$ncol
## [1] 23
##
## $Data$Dimensions$ncell
## [1] 460
##
##
## $Data$Extent
## class : Extent
## xmin : -125.16
## xmax : -113.659
## ymin : 32.282
## ymax : 42.282
##
## $Data$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Data$layers
## [1] "index_1"
##
##
## $Covariates_coarse
## $Covariates_coarse$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $Covariates_coarse$Dimensions
## $Covariates_coarse$Dimensions$nrow
## [1] 20
##
## $Covariates_coarse$Dimensions$ncol
## [1] 23
##
## $Covariates_coarse$Dimensions$ncell
## [1] 460
##
##
## $Covariates_coarse$Extent
## class : Extent
## xmin : -125.16
## xmax : -113.659
## ymin : 32.282
## ymax : 42.282
##
## $Covariates_coarse$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Covariates_coarse$layers
## [1] "DEM"
##
##
## $Covariates_fine
## $Covariates_fine$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $Covariates_fine$Dimensions
## $Covariates_fine$Dimensions$nrow
## [1] 114
##
## $Covariates_fine$Dimensions$ncol
## [1] 124
##
## $Covariates_fine$Dimensions$ncell
## [1] 14136
##
##
## $Covariates_fine$Extent
## class : Extent
## xmin : -124.4167
## xmax : -114.0833
## ymin : 32.5
## ymax : 42
##
## $Covariates_fine$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Covariates_fine$layers
## [1] "DEM"
##
##
## $KrigingEquation
## ERA ~ DEM
## <environment: 0x000000001fe55b00>
##
## $Cores
## [1] 1
##
## $FileName
## [1] "AirTemp_NDVI.nc"
##
## $Keep_Temporary
## [1] TRUE
##
## $nmax
## [1] 80
##
## $Data_Retrieval
## $Data_Retrieval$Variable
## [1] "2m_temperature"
##
## $Data_Retrieval$Type
## [1] "reanalysis"
##
## $Data_Retrieval$PrecipFix
## [1] FALSE
##
## $Data_Retrieval$DataSet
## [1] "era5"
##
## $Data_Retrieval$DateStart
## [1] "2000-01-01"
##
## $Data_Retrieval$DateStop
## [1] "2000-12-31"
##
## $Data_Retrieval$TResolution
## [1] "year"
##
## $Data_Retrieval$TStep
## [1] 1
##
## $Data_Retrieval$Extent
## class : SpatialPolygonsDataFrame
## features : 1
## extent : -124.4092, -114.1191, 32.53167, 41.99954 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 83
## names : featurecla, scalerank, adm1_code, diss_me, iso_3166_2, wikipedia, iso_a2, adm0_sr, name, name_alt, name_local, type, type_en, code_local, code_hasc, ...
## value : Admin-1 scale rank, 2, USA-3521, 3521, US-CA, http://en.wikipedia.org/wiki/California, US, 8, California, CA|Calif., NA, State, State, US06, US.CA, ...
This lengthy list should contain all information you need to trace how you created a certain data set using krigR()
. If you feel like anything is missing in this list, please contact us.
Using Third-Party Data
Our krigR()
function is designed to work with non-ERA climate data as well as non-GMTED2010 covariate data. To krig your own spatial products using different covariate data than the GMTED2010 DEM we use as a default, you need to step into the three-step workflow.
ATTENTION: Kriging only works on square-cell spatial products!
krigR()
supports any combination of ERA-family reanalysis, GMTED2010, third-party climate data, and third-party covariate data. Here, we just demonstrate the use of other covariates than the GMTED2010 used by KrigR
by default.
For this example, let’s go back to Saxony:
Position <- which(StateMask$name_en == "Saxony") # find Saxony
State_Shp <- StateMask[Position, ]
Now, let’s download some covariate data:
- TKSat
- thermal conductivity of saturated soil
- TKDry
- thermal conductivity of dry soil
Dir.TP <- file.path(Dir.Data, "Third-Party_Data")
dir.create(Dir.TP)
#### Thermal Conductivity of saturated soil
if (!file.exists(file.path(Dir.TP, "TKSat.zip"))) { # if not downloaded yet
download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tksat.zip",
destfile = file.path(Dir.TP, "TKSat.zip")
) # download cultural vector
unzip(file.path(Dir.TP, "TKSat.zip"), exdir = Dir.TP) # unzip data
}
TKSat_ras <- raster(file.path(Dir.TP, "tksatu_l1.nc"))
#### Thermal Conductivity of dry soil
if (!file.exists(file.path(Dir.TP, "TKDry.zip"))) { # if not downloaded yet
download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tkdry.zip",
destfile = file.path(Dir.TP, "TKdry.zip")
) # download cultural vector
unzip(file.path(Dir.TP, "TKdry.zip"), exdir = Dir.TP) # unzip data
}
TKDry_ras <- raster(file.path(Dir.TP, "tkdry_l1.nc"))
Climate Data
This time, let’s mix it up a bit and look at soil temperature in the uppermost depth layer that ERA5(-Land) recognizes (0-7cm):
State_Raw <- download_ERA(
Variable = "soil_temperature_level_1",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = State_Shp,
Dir = Dir.TP,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")
Covariates
As far as covariate data is concerned, we downloaded and loaded into R
global data sets which now limit to Saxony:
## TKSat
TKSat_ras <- crop(TKSat_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKSat_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKSat_ras <- mask(TKSat_ras, Shape_ras) # mask for the shape of the state
## TKDry
TKDry_ras <- crop(TKDry_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKDry_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKDry_ras <- mask(TKDry_ras, Shape_ras) # mask for the shape of the state
Personally, I am not content with just looking at thermal conductivity of dry and wet soil in predicting soil temperatures. Other covariates that come to mind are elevation and air temperature (which we can easily provide at target resolution through Kriging it reliably with elevation). Here, we just use some elevation data. Also, let’s push the boundaries and downscale all the way to a 0.01. resolution:
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .01,
Shape = State_Shp,
Dir = Dir.StateShp,
Keep_Temporary = TRUE
)
Now we combine these into a list containing to stacks of rasters and plot our third-party covariates:
# resample fine-resolution covariate data to match coarse input data
Coarsecovs <- stack(resample(x = TKSat_ras, y = State_Raw), resample(x = TKDry_ras, y = State_Raw), Covs_ls[[1]])
names(Coarsecovs) <- c("TKWet", "TKDry", "DEM") # setting some proper names
# High-resolution to which we downscale
Finecovs <- stack(TKSat_ras, TKDry_ras, Covs_ls[[2]])
names(Finecovs) <- c("TKWet", "TKDry", "DEM") # setting some proper names
# combine data into covariate list
Covs_ls <- list(Coarsecovs, Finecovs)
Plot_Covs(Covs_ls, State_Shp)
Kriging
Finally, we can krig our soil temperature data using the thermal conductivity of the soil. For this, we need to specify a new KrigingEquation
. If we don’t, krigR()
will notify us that it our default formula cannot be executed and will attempt to build a formula from the data it can find (this auto-generated formula would be the same as the one we specify here - an additive combination of all covariates found both at coarse and fine resolutions). Of course, this formula can also be specified to reflect interactive effects.
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # run this on 3 cores
KrigingEquation = "ERA ~ DEM + TKDry + TKWet", # this time, we need to specify a new formula. Try and see what happens if you forget to do this.
FileName = "State_TP.nc", # save the finished file as this
Dir = Dir.TP, # where to save the output
nmax = 15 # how many nearest neighbours to consider in Kriging
)
## Warning in KrigR:::check_Krig(Data = Data, CovariatesCoarse = Covariates_coarse, : It is not recommended to use kriging for statistical downscaling of more than one order of magnitude. You are
## currently attempting this. Kriging will proceed.
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")
Setting aside the output of the Kriging itself, you find a warning message produced by
krigR()
- since we specified a resolution of 0.01 as our target and our raw data (both third-party and GMTED2010) comes at a native resolution of 0.0083, we cannot aggregate to 0.01 and are stuck with 0.0083. krigR()
warns us that this means we krig below the threshold of 0.09 which could be argued to be sensible bny rule of thumb as our lowest resolution we might go to, but proceeds with the Kriging.
You can use the KrigingEquation
argument in krigR()
to specify the use of third-party covariates and/or data to be kriged. You can specify additive as well as interaction effects.
Session Info
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gimms_1.1.3 rosm_0.2.5 cowplot_1.1.0 viridis_0.5.1 viridisLite_0.3.0 ggplot2_3.3.2 tidyr_1.1.2 KrigR_0.1.0 httr_1.4.2 stars_0.4-3
## [11] abind_1.4-5 fasterize_1.0.3 sf_0.9-6 lubridate_1.7.9 automap_1.0-14 doParallel_1.0.15 iterators_1.0.12 foreach_1.5.0 rgdal_1.5-18 raster_3.3-13
## [21] sp_1.4-4 stringr_1.4.0 keyring_1.1.0 ecmwfr_1.3.0 ncdf4_1.17
##
## loaded via a namespace (and not attached):
## [1] xts_0.12.1 R.cache_0.14.0 tools_4.0.2 backports_1.1.10 R6_2.5.0 KernSmooth_2.23-17 zyp_0.10-1.1 rgeos_0.5-5 DBI_1.1.0 colorspace_1.4-1
## [11] withr_2.3.0 tidyselect_1.1.0 gridExtra_2.3 curl_4.3 compiler_4.0.2 gstat_2.0-6 labeling_0.3 bookdown_0.21 scales_1.1.1 classInt_0.4-3
## [21] digest_0.6.27 rmarkdown_2.6 R.utils_2.10.1 jpeg_0.1-8.1 pkgconfig_2.0.3 htmltools_0.5.0 styler_1.3.2 rlang_0.4.10 FNN_1.1.3 generics_0.0.2
## [31] farver_2.0.3 zoo_1.8-8 jsonlite_1.7.2 dplyr_1.0.2 R.oo_1.24.0 magrittr_2.0.1 Rcpp_1.0.5 munsell_0.5.0 lifecycle_0.2.0 R.methodsS3_1.8.1
## [41] stringi_1.5.3 yaml_2.2.1 plyr_1.8.6 grid_4.0.2 crayon_1.3.4 lattice_0.20-41 knitr_1.30 pillar_1.4.6 boot_1.3-25 rjson_0.2.20
## [51] spacetime_1.2-3 codetools_0.2-16 glue_1.4.2 evaluate_0.14 blogdown_1.0.2 vctrs_0.3.4 png_0.1-7 gtable_0.3.0 purrr_0.3.4 rematch2_2.1.2
## [61] reshape_0.8.8 assertthat_0.2.1 xfun_0.20 lwgeom_0.2-5 e1071_1.7-3 class_7.3-17 Kendall_2.2 tibble_3.0.3 intervals_0.15.2 memoise_1.1.0
## [71] units_0.6-7 ellipsis_0.3.1