Now we’ve fitted some models and estimated abundance, we can estimate the variance associated with the abundance estimate (and plot it).

Aims

By the end of this practical, you should feel comfortable:

Load packages and data

library(dsm)
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-7. For overview type 'help("mgcv-package")'.
## Loading required package: mrds
## This is mrds 2.1.14
## Built: R 3.2.0; ; 2015-07-30 10:07:19 UTC; unix
## This is dsm 2.2.11
## Built: R 3.2.2; ; 2015-10-23 20:20:41 UTC; unix
library(raster)
## Loading required package: sp
## Warning: no function found corresponding to methods exports from 'raster'
## for: 'overlay'
## 
## Attaching package: 'raster'
## 
## The following object is masked from 'package:nlme':
## 
##     getData
library(ggplot2)
library(viridis)
library(plyr)
library(knitr)
library(rgdal)
## rgdal: version: 1.0-4, (SVN revision 548)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/proj
##  Linking to sp version: 1.1-1

Load the models and prediction grid:

load("dsms.RData")
load("dsms-xy.RData")
load("predgrid.RData")

Estimation of variance

Depending on the model response (count or Horvitz-Thompson) we can use either dsm.var.prop or dsm.var.gam, respectively. dsm_nb_xy_ms doesn’t include any covariates at the observer level in the detection function, so we can use the variance propagation method and estimate the uncertainty in detection function parameters in one step.

# need to remove the NAs as we did when plotting
predgrid_var <- predgrid[!is.na(predgrid$Depth),]
# now estimate variance
var_nb_xy_ms <- dsm.var.prop(dsm_nb_xy_ms, predgrid_var, off.set=predgrid_var$off.set)

To summarise the results of this variance estimate:

summary(var_nb_xy_ms)
## Summary of uncertainty in a density surface model calculated
##  by variance propagation.
## 
## Quantiles of differences between fitted model and variance model
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.476e-05 -4.005e-07  6.570e-09  7.103e-07  4.632e-07  4.101e-05 
## 
## Approximate asymptotic confidence interval:
##       5%     Mean      95% 
## 1139.275 1710.406 2567.850 
## (Using delta method)
## 
## Point estimate                 : 1710.406 
## Standard error                 : 358.4382 
## Coefficient of variation       : 0.2096

Try this out for some of the other models you’ve saved. Remember to use dsm.var.gam when there are covariates in the detection function and dsm.var.prop when there aren’t.

Summarise multiple models

We can again summarise all the models, as we did with the DSMs and detection functions, now including the variance:

summarize_dsm_var <- function(model, predgrid){

  summ <- summary(model)
  
  vp <- summary(dsm.var.prop(model, predgrid, off.set=predgrid$off.set))
  unconditional.cv.square <- vp$cv^2
  asymp.ci.c.term <- exp(1.96*sqrt(log(1+unconditional.cv.square)))
  asymp.tot <- c(vp$pred.est / asymp.ci.c.term,
                 vp$pred.est,
                 vp$pred.est * asymp.ci.c.term)

  data.frame(response = model$family$family,
             terms    = paste(rownames(summ$s.table), collapse=", "),
             AIC      = AIC(model),
             REML     = model$gcv.ubre,
             "Deviance_explained" = paste0(round(summ$dev.expl*100,2),"%"),
             "lower_CI" = round(asymp.tot[1],2),
             "Nhat" = round(asymp.tot[2],2),
             "upper_CI" = round(asymp.tot[3],2)
             )
}
# make a list of models (add more here!)
model_list <- list(dsm_nb_xy, dsm_nb_x_y, dsm_nb_xy_ms, dsm_nb_x_y_ms)
# give the list names for the models, so we can identify them later
names(model_list) <- c("dsm_nb_xy", "dsm_nb_x_y", "dsm_nb_xy_ms", "dsm_nb_x_y_ms")
per_model_var <- ldply(model_list, summarize_dsm_var, predgrid=predgrid_var)
kable(per_model_var)
.id response terms AIC REML Deviance_explained lower_CI Nhat upper_CI
dsm_nb_xy Negative Binomial(0.105) s(x,y) 775.2593 392.6460 40.49% 1081.45 1687.29 2632.52
dsm_nb_x_y Negative Binomial(0.085) s(x), s(y) 789.7554 395.8609 31.11% 1025.49 1586.08 2453.12
dsm_nb_xy_ms Negative Binomial(0.114) s(x,y), s(Depth), s(DistToCAS), s(SST), s(EKE), s(NPP) 754.0326 382.7591 39.16% 1139.27 1710.41 2567.85
dsm_nb_x_y_ms Negative Binomial(0.116) s(x), s(y), s(Depth), s(DistToCAS), s(SST), s(EKE), s(NPP) 752.5570 383.0325 40.68% 766.70 1442.13 2712.58

Plotting

We can plot a map of the coefficient of variation, but we first need to estimate the variance per prediction cell, rather than over the whole area. This calculation takes a while!

# use the split function to make each row of the predictiond data.frame into
# an element of a list
predgrid_var_split <- split(predgrid_var, 1:nrow(predgrid_var))
var_split_nb_xy_ms <- dsm.var.prop(dsm_nb_xy_ms, predgrid_var_split, off.set=predgrid_var$off.set)

Now we have the per-cell coefficients of variation, we assign that to a column of the prediction grid data and plot it as usual:

predgrid_var_map <- predgrid_var
cv <- sqrt(var_split_nb_xy_ms$pred.var)/unlist(var_split_nb_xy_ms$pred)
predgrid_var_map$CV <- cv
p <- ggplot(predgrid_var_map) +
       geom_tile(aes(x=x, y=y, fill=CV, width=10*1000, height=10*1000)) +
       scale_fill_viridis() +
       coord_equal() +
       geom_point(aes(x=x,y=y, size=count), data=dsm_nb_xy_ms$data[dsm_nb_xy_ms$data$count>0,])
print(p)

Note that here we overplot the segments where sperm whales were observed (and scale the size of the point according to the number observed), using geom_point().

We can also overplot the effort, which can be a useful way to see what the cause of uncertainty is. Though it may not only be caused by lack of effort but also covariate coverage, this can be useful to see.

First we need to load the segment data from the gdb

tracks <- readOGR("Analysis.gdb", "Segments")
## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Segments"
## with 949 features
## It has 8 fields
tracks <- fortify(tracks)

We can then just add this to the plot object we have built so far (with +), but this looks a bit messy with the observations, so let’s start from scratch:

p <- ggplot(predgrid_var_map) +
       geom_tile(aes(x=x, y=y, fill=CV, width=10*1000, height=10*1000)) +
       scale_fill_viridis() +
       coord_equal() +
       geom_path(aes(x=long,y=lat, group=group), data=tracks)
print(p)

Try this with the other models you fitted and see what the differences are between the maps of coefficient of variation.

Save the uncertainty maps to raster files

As with the predictions, we’d like to save our uncertainty estimates to a raster layer so we can plot them in ArcGIS. Again, this involves a bit of messing about with the data format before we can save.

# setup the storage for the cvs
cv_raster <- raster(predictorStack)
# we removed the NA values to make the predictions and the raster needs them
# so make a vector of NAs, and insert the CV values...
cv_na <- rep(NA, nrow(predgrid))
cv_na[!is.na(predgrid$Depth)] <- cv
# put the values in, making sure they are numeric first
cv_raster <- setValues(cv_raster, cv_na)
# name the new, last, layer in the stack
names(cv_raster) <- "CV_nb_xy"

We can then save that object to disk as a raster file:

writeRaster(cv_raster, "cv_raster.img", datatype="FLT4S")

Extra credit