Change Analysis

Preparing The Work

Let’s create our basic structure for this document:

Preamble

I am keeping the same preamble as last time because we will need to index the data and the plot directory in this exercise. Our preamble then looks like this:

rm(list=ls()) # clearing the entire environment
Dir.Base <- getwd() # identifying the current directory
Dir.Data <- paste(Dir.Base, "Data", sep="/") # generating the folder path for data folder
Dir.Plots <- paste(Dir.Base, "Plots", sep="/") # generating the folder path for figures folder

Notice, that we do not call the function dir.create() this time. We don’t need to do so, because we already created the two directories established above in our last exercise. Usually, we would create this entire analysis of your BFTP project in one R code script. In this case, we would only have one preamble which defines and creates directories instead of doing this step for every single sub-part of the analysis. Alas, we want to break this down for you. Therefore, you see this preamble here and will again in the next exercise.

This time, we actually do load packages here as we really only need the raster package. By now, I am assuming you know what we use it for:

library(raster) # the raster package for rasters

Additionally, we reload our .RData workspace from the last exercise to gain back our mclust model objects in particular.

load(file = "Workspace.RData") # load workspace

Coding

Again, all of the important Coding happens after the head and the preamble are written and run in R. Basically, this is the rest of this document once more.

Change Analysis

Loading Data

Firstly, we need the raw NDVI mean and seasonality data for the time frames we want to compare. Let’s deal with that right quick.

1982

Let’s load our 1982 NDVI mean and seasonality data:

Mean1982_ras <- raster(paste(Dir.Data, "1982Mean.nc", sep="/")) # loading means
Season1982_ras <- raster(paste(Dir.Data, "1982Season.nc", sep="/")) # loading seasonalities
All1982_ras <- stack(Mean1982_ras, Season1982_ras) # creating a stack
names(All1982_ras) <- c("Mean", "Seasonality") # assign names to stack layers
Vals1982_mat <- values(All1982_ras) # extract data
rownames(Vals1982_mat) <- 1:dim(Vals1982_mat)[1] # rownames to index raster cell number
Vals1982_mat <- na.omit(Vals1982_mat) # omit all rows which contain at least one NA record
summary(Vals1982_mat) # a summary of the data
##       Mean       Seasonality  
##  Min.   :0.00   Min.   :0.00  
##  1st Qu.:0.22   1st Qu.:0.55  
##  Median :0.33   Median :0.67  
##  Mean   :0.32   Mean   :0.64  
##  3rd Qu.:0.41   3rd Qu.:0.76  
##  Max.   :0.84   Max.   :1.00

2015

In order to assess how biome distributions have changed, we need another time frame to compare our 1982 data to. For this, I have re-run the code from our first BFTP exercise for the year 2015. We now load that data into R:

Mean2015_ras <- raster(paste(Dir.Data, "2015Mean.nc", sep="/")) # loading means
Season2015_ras <- raster(paste(Dir.Data, "2015Season.nc", sep="/")) # loading seasonalities
All2015_ras <- stack(Mean2015_ras, Season2015_ras) # creating a stack
names(All2015_ras) <- c("Mean", "Seasonality") # assign names to stack layers
Vals2015_mat <- values(All2015_ras) # extract data
rownames(Vals2015_mat) <- 1:dim(Vals2015_mat)[1] # rownames to index raster cell number
Vals2015_mat <- na.omit(Vals2015_mat) # omit all rows which contain at least one NA record
summary(Vals2015_mat) # a summary of the data
##       Mean       Seasonality  
##  Min.   :0.00   Min.   :0.00  
##  1st Qu.:0.25   1st Qu.:0.56  
##  Median :0.33   Median :0.67  
##  Mean   :0.34   Mean   :0.64  
##  3rd Qu.:0.43   3rd Qu.:0.77  
##  Max.   :0.83   Max.   :1.00

Notice, that the output of the summary() function is different for both matrices built from raster data values. This is important to ensure that our analysis actually references different time frames.

Predictions

Secondly, we want to compare cluster assignments. To do so, we need to use our mclust models to predict cluster assignments for each cell in our target region raster using the NDVI mean and seasonality data that we loaded previously.

1982

Let’s deal with the 1982 data first. mod2 is the mclust model object for 4 clusters from our last exercise. Here, we predict clusters and place them on a raster:

ModPred1982 <- predict.Mclust(mod2, Vals1982_mat) # prediction
Pred1982_ras <- Mean1982_ras # establishing a rediction raster
values(Pred1982_ras) <- NA # set everything to NA
# set values of prediction raster to corresponding classification according to rowname
values(Pred1982_ras)[as.numeric(rownames(Vals1982_mat))] <- as.vector(ModPred1982$classification)
colours <- rainbow(mod2$G) # define 4 colours
plot(Pred1982_ras, # what to plot
     col = colours, # colours for groups
     colNA = "black", # which colour to assign to NA values
     legend.shrink=1, # vertical size of legend
     legend.width=2 # horizontal size of legend
     )

2015

Now, we deal with the 2015 time frame. Notice, that we are using the mod2 mclust model which was established for 1982 in our last exercise. It is important that we use the same model when predicting our classes between time frames to ensure comparability. After all, we want to make sure that cluster 1 is the same in 1982 as 2015. It is debatable whether we should use a cluster model built from just one year of data or even from the same time frame as one of the the time frames which are to be compared. In fact, I would argue that we should establish a mclust model for the mean annual NDVI and mean annual seasonality of NDVI across the entire time for which data is available. For now, we simply use the 1982-reliant model:

ModPred2015 <- predict.Mclust(mod2, Vals2015_mat) # prediction
Pred2015_ras <- Mean2015_ras # establishing a rediction raster
values(Pred2015_ras) <- NA # set everything to NA
# set values of prediction raster to corresponding classification according to rowname
values(Pred2015_ras)[as.numeric(rownames(Vals2015_mat))] <- as.vector(ModPred2015$classification)
colours <- rainbow(mod2$G) # define 4 colours
plot(Pred2015_ras, # what to plot
     col = colours, # colours for groups
     colNA = "black", # which colour to assign to NA values
     legend.shrink=1, # vertical size of legend
     legend.width=2 # horizontal size of legend
     )

You can already see some cluster assignment changes on Nunivak island.

Initial Assesment

Let’s first assess how many raster cells have changed cluster assignment between our two time frames:

# identify how many cell assignments don't match between rasters
Change <- sum(ModPred1982$classification != ModPred2015$classification)
# divide number of mismatches by number of all cells
Change/length(ModPred2015$classification)
## [1] 0.22

As you can see, there is a proportion of 0.22 raster cells which have changed cluster assignment between the two time frames. Now, let’s put this on a map:

PredChange_ras <- Mean2015_ras # establishing a rediction raster
values(PredChange_ras) <- NA # set everything to NA
# set values of prediction raster to corresponding classification according to rowname
values(PredChange_ras)[as.numeric(rownames(Vals2015_mat))] <- 
  ModPred1982$classification != ModPred2015$classification
colours <- c("green", "red") # define 2 colours
plot(PredChange_ras, col = colours, colNA = "black",
     legend.shrink=1, legend.width=2)

I leave it to you to interpret these patterns (there actually is an interpretation to be had here).

In-Depth Assesment

I’d argue that a simple understanding whether things have changed won’t be what we want to report. What we want, is to know which cluster took over the cells of which raster. I.e., I’d like to answer the question: “Which clusters take over the regions of other clusters and which ones?”. I hope you’re interested in this, too. Here’s how we can analyse this: For each cluster assignment we:
- Identify the cells corresponding to it in 1982/the past
- Count how many of these cells are classified as the same cluster in 2015/the present
- Repeat the above for all combinations of cluster assignments imaginable

NClusters <- mod2$G # identify the number of clusters
present <- as.vector(Pred2015_ras) # assignments in 2015
past <- as.vector(Pred1982_ras) # assignments in 1982
# this matrix will hold the data, rows will show past state, columns will show present state
changematrix <- matrix(rep(NA, NClusters^2), nrow=NClusters, ncol=NClusters)
changevec <- rep(NA, NClusters) # this vector will fill rows in our matrix
for(k in 1:NClusters){ # loop over clusters in past
  changerun <- changevec 
  changeperc <- changevec
  for(m in 1:NClusters){ # loop over clusters in present
    presentcells <- which(present==m) # figure out which cells hold value m
    pastcells <- which(past==k) # figure out which cells hold value k
    # figure out how many of the cell denominators are shared by the two vectors
    rate <- length(Reduce(intersect, list(pastcells,presentcells))) 
    changerun[m] <- rate # save rate to changerun in place m
  } # end of present-loop
  changematrix[k,] <- changerun # save changerun to k row in matrix
  for(n in 1:NClusters){ # turn rates into portions
    # divide number of in a cell by total number of cells in its row
    changeperc[n] <- changematrix[k,n] / sum(changematrix[k,])
  } # end of percentages
  changematrix[k,] <- changeperc # save changeperc to row k
} # end of past-loop
changematrix <- changematrix*100 # turn everything into percentages
rownames(changematrix) <- paste0("Past", 1:NClusters)
colnames(changematrix) <- paste0("Present", 1:NClusters)
changematrix # show the matrix
##       Present1 Present2 Present3 Present4
## Past1   76.268     23.7        0     0.00
## Past2    4.437     82.1       13     0.84
## Past3    0.029     16.8       80     3.03
## Past4    0.073      3.6       36    60.30

As you can see, there’s quite a bit going on here. Let me explain how to read this. From past (1982) to present (2015), 76.27% of raster cells assigned to cluster 1 in 1982 are assigned to cluster 1 in 2015 as well. 23.73% of raster cells previously assigned to cluster 1 are classified as cluster 2 in 2015. Notice, how all rows sum up to 100% each. Representing the total of assigned raster cells in the 1982 record.

Given the biological counterparts of the clusters, how would you interpret these shifts?

Previous