By the end of this practical, you should feel comfortable:
predict()
functiondsm.var.prop
and when to use dsm.var.gam
summary()
output for uncertainty estimateslibrary(knitr)
library(dsm)
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
## Loading required package: mrds
## This is mrds 2.2.3
## Built: R 4.0.2; ; 2020-08-01 10:33:56 UTC; unix
## Loading required package: numDeriv
## This is dsm 2.3.0
## Built: R 4.0.2; ; 2020-07-16 23:56:50 UTC; unix
library(ggplot2)
Load our models and data:
load("spermwhale.RData")
# load models
load("dsms.RData")
load("dsms-checked.RData")
# rename one model
dsm_nb_xy_ms <- dsm_nb_xy_ms_p1_4
Our prediction data lives in the predgrid
object. This data was prepared from various sources and the since some of the covariates were dynamic (SST, NPP and EKE), the dates for these were arbitrarily chosen as 2 June 2004, or Julian date 153.
If you’re preparing this data yourself, the rerddap
package can be used to obtain marine data from NOAA for both segments and predictions.
We can take a look at our data using head()
to see the first few rows:
head(predgrid)
## x y Depth SST NPP DistToCAS EKE off.set
## 126 547984.6 788254 153.59825 12.04609 1462.521 11788.974 0.0008329031 1e+08
## 127 557984.6 788254 552.31067 12.81379 1465.410 5697.248 0.0009806611 1e+08
## 258 527984.6 778254 96.81992 12.90251 1429.432 13722.626 0.0011575423 1e+08
## 259 537984.6 778254 138.23763 13.21393 1424.862 9720.671 0.0013417297 1e+08
## 260 547984.6 778254 505.14386 13.75655 1379.351 8018.690 0.0026881567 1e+08
## 261 557984.6 778254 1317.59521 14.42525 1348.544 3775.462 0.0045683752 1e+08
## long lat
## 126 -66.52252 40.94697
## 127 -66.40464 40.94121
## 258 -66.76551 40.86781
## 259 -66.64772 40.86227
## 260 -66.52996 40.85662
## 261 -66.41221 40.85087
As we saw in the slides, the predict()
function will fill in the variables in the model formula with these values to obtain the predictions at each cell.
For example using the model dsm_nb_xy_ms
:
pp <- predict(dsm_nb_xy_ms, predgrid, off.set=predgrid$off.set)
This is just a list of numbers – the predicted abundance per cell. We can sum these to get the estimated abundance for the study area:
sum(pp, na.rm=TRUE)
## [1] 1580.121
We can also plot this to get a spatial representation of the predictions:
# assign the predictions to the prediction grid data.frame
predgrid$Nhat_nb_xy <- pp
# plot!
p <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_nb_xy, width=10*1000, height=10*1000)) +
coord_equal() +
# better colour scheme
scale_fill_viridis_c() +
# cleaner plot
theme_minimal() +
# set the label for the fill colour
labs(fill="Abundance")
print(p)
Use the chunks above as a template and make predictions for the other models you saved in the previous exercises. In particular, compare the models with only spatial terms to those with environmental covariates included.
Depending on the covariates in the model, we can use either dsm.var.prop
or dsm.var.gam
. Remember: - dsm.var.gam
- assumes spatial model and detection function are independent - dsm.var.prop
- propagates uncertainty from detection function to spatial model - only works for count
models - covariates can only vary at segment level
If there are no covariates in the detection function, use dsm.var.gam
since there’s no covariance between the detection function and the spatial model.
dsm_nb_xy_ms
doesn’t include any covariates at the observer level in the detection function, so we can use dsm.var.gam
to estimate the uncertainty.
# now estimate variance
var_nb_xy_ms <- dsm.var.gam(dsm_nb_xy_ms, predgrid,
off.set=predgrid$off.set)
To summarise the results of this variance estimate:
summary(var_nb_xy_ms)
## Summary of uncertainty in a density surface model calculated
## analytically for GAM, with delta method
##
## Approximate asymptotic confidence interval:
## 2.5% Mean 97.5%
## 1111.871 1580.121 2245.569
## (Using log-Normal approximation)
##
## Point estimate : 1580.121
## CV of detection function : 0.06670757
## CV from GAM : 0.168
## Total standard error : 285.6377
## Total coefficient of variation : 0.1808
Try this out for some of the other models you’ve saved.
We can plot a map of the coefficient of variation, but we first need to estimate the variance per prediction cell, rather than over the whole area.
# use the split function to make each row of the predictiond data.frame into
# an element of a list
predgrid_var_split <- split(predgrid, 1:nrow(predgrid))
var_split_nb_xy_ms <- dsm.var.gam(dsm_nb_xy_ms, predgrid_var_split,
off.set=predgrid$off.set)
Now we have the per-cell coefficients of variation, we assign that to a column of the prediction grid data and plot it as usual:
cv <- sqrt(var_split_nb_xy_ms$pred.var)/unlist(var_split_nb_xy_ms$pred)
predgrid$CV <- cv
p <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=CV, width=10*1000, height=10*1000)) +
scale_fill_viridis_c() +
coord_equal() +
geom_point(aes(x=x,y=y, size=count), alpha=0.4,
data=dsm_nb_xy_ms$data[dsm_nb_xy_ms$data$count>0,]) +
theme_minimal()
print(p)
Note that here we overplot the segments where sperm whales were observed (and scale the size of the point according to the number observed), using geom_point()
. The alpha=
argument makes the points semi-transparent.
Try this with the other models you fitted and see what the differences are between the maps of coefficient of variation.
Try plotting the effort over a CV plot, using the data in the model.