By the end of this practical, you should feel comfortable:
library(Distance)
## Loading required package: mrds
## This is mrds 2.2.3
## Built: R 4.0.2; ; 2020-08-01 10:33:56 UTC; unix
##
## Attaching package: 'Distance'
## The following object is masked from 'package:mrds':
##
## create.bins
library(dsm)
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
## Loading required package: numDeriv
## This is dsm 2.3.0
## Built: R 4.0.2; ; 2020-07-16 23:56:50 UTC; unix
library(ggplot2)
library(patchwork)
library(knitr)
Load the data and the fitted detection function objects from the previous exercises:
load("spermwhale.RData")
load("df-models.RData")
We can plot the covariates together using the following code (don’t worry too much about understanding what that code is doing at the moment).
# make a list to hold our plots
p <- list()
# make a plot for each covariate
for(covname in c("Depth", "SST", "NPP", "DistToCAS", "EKE")){
# make
p[[covname]] <- ggplot() +
# covariates are plotted as tiles
geom_tile(aes_string(x="x", y="y", fill=covname), data=predgrid) +
geom_point(aes(x=x, y=y, size=size),
alpha=0.6,
data=subset(obs, size>0))+
# remove grey background etc
theme_minimal() +
# remove axis labels and fiddle with the legend
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.position="right", legend.key.width=unit(0.005, "npc")) +
# make the fill scale be colourblind friendly
scale_fill_viridis_c()
}
# using patchwork to stick the plots together
p[["Depth"]] + p[["SST"]] + p[["NPP"]] + p[["DistToCAS"]] + p[["EKE"]] + plot_layout(ncol = 3, nrow=3)
We could improve these plots by adding a map of the USA but this will do for now!
As we did in the previous exercise, we must remove the observations from the spatial data that we excluded when we fitted the detection function, i.e. those observations at distances greater than the truncation.
obs <- obs[obs$distance <= df_hn$ddf$meta.data$width, ]
Here we used the value of the truncation stored in the detection function object (df_hn$ddf
), but we could also use the numeric value (which we can find by checking the model’s summary()
).
Again note that if you want to fit DSMs using detection functions with different truncation distances, then you’ll need to reload the spermwhale.RData
and do the truncation again for that detection function.
+
We can build a really big model using +
to include all the terms that we want in the model. We can check what covariates are available to us by using head()
to look at the segment table:
head(segs)
## Sample.Label CenterTime SegmentID Depth DistToCAS SST
## 1 1 2004-06-24 07:27:04 1 118.5027 14468.1533 15.54390
## 2 2 2004-06-24 08:08:04 2 119.4853 10262.9648 15.88358
## 3 3 2004-06-24 09:03:18 3 177.2779 6900.9829 16.21920
## 4 4 2004-06-24 09:51:27 4 527.9562 1055.4124 16.45468
## 5 5 2004-06-24 10:25:39 5 602.6378 1112.6293 16.62554
## 6 6 2004-06-24 11:00:22 6 1094.4402 707.5795 16.83725
## EKE NPP x y Effort X Y Survey
## 1 0.0014442616 1908.129 214544.0 689074.3 10288.91 -70.48980 40.18245 en04395
## 2 0.0014198086 1889.540 222654.3 682781.0 10288.91 -70.39681 40.12377 en04395
## 3 0.0011704842 1842.057 230279.9 675473.3 10288.91 -70.30994 40.05597 en04395
## 4 0.0004101589 1823.942 239328.9 666646.3 10288.91 -70.20708 39.97406 en04395
## 5 0.0002553244 1721.949 246686.5 659459.2 10288.91 -70.12361 39.90731 en04395
## 6 0.0006556266 1400.281 254307.0 652547.2 10288.91 -70.03713 39.84294 en04395
## TransectID Beaufort
## 1 en0439520040624 1.4
## 2 en0439520040624 2.3
## 3 en0439520040624 1.2
## 4 en0439520040624 1.0
## 5 en0439520040624 1.2
## 6 en0439520040624 1.0
We can then fit a model with the available covariates in it, each as an s()
term.
dsm_nb_xy_ms <- dsm(count~s(x,y, bs="ts") +
s(Depth, bs="ts") +
s(DistToCAS, bs="ts") +
s(SST, bs="ts") +
s(EKE, bs="ts") +
s(NPP, bs="ts"),
df_hn, segs, obs,
family=nb())
summary(dsm_nb_xy_ms)
##
## Family: Negative Binomial(0.114)
## Link function: log
##
## Formula:
## count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS,
## bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP,
## bs = "ts") + offset(off.set)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -20.7732 0.2295 -90.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(x,y) 1.8636924 29 19.141 3.05e-05 ***
## s(Depth) 3.4176460 9 46.263 < 2e-16 ***
## s(DistToCAS) 0.0000801 9 0.000 0.9067
## s(SST) 0.0002076 9 0.000 0.5403
## s(EKE) 0.8563344 9 5.172 0.0134 *
## s(NPP) 0.0001018 9 0.000 0.7822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0947 Deviance explained = 39.3%
## -REML = 382.76 Scale est. = 1 n = 949
Notes:
bs="ts"
to use the shrinkage thin plate regression spline. More technical detail on these smooths can be found on their manual page ?smooth.construct.ts.smooth.spec
.k
) at the moment. Note that if you want to specify the same complexity for multiple terms, it’s often easier to create a variable that can then be provided to k
(for example, specify k1 <- 15
and then set k=k1
in the required s()
terms).Let’s plot the smooths from this model:
plot(dsm_nb_xy_ms, pages=1)
Notes:
shade=TRUE
gives prettier confidence bands (by default shade=FALSE
).vis.gam()
the response is on the link scale.scale=0
puts each plot on a different \(y\)-axis scale, making it easier to see the effects. Setting scale=-1
(the default) will put the plots on a common \(y\)-axis scale (so you can gauge relative importance).We can also plot the bivariate smooth of x
and y
using vis.gam()
:
vis.gam(dsm_nb_xy_ms, view=c("x","y"), plot.type="contour", too.far=0.1,
main="s(x,y) (link scale)", asp=1)
Compare this plot to the equivalent plot generated in the previous exercise when only x
and y
were included in the model.
As was covered in the lectures, we can select terms by (approximate) \(p\)-values and by looking for terms that have EDFs significantly less than 1 (those which have been shrunk).
Remove the terms that are non-significant at this level and re-run the above checks, summaries and plots to see what happens. It’s helpful to make notes for yourself as you go.
Decide on a significance level that you’ll use to discard terms in the model. It’s easiest to either comment out the terms that are to be removed (using #
) or by copying the code chunk above and pasting it below.
Having removed one smooth and reviewed your model, you may decide you wish to remove another. Repeat the process of removing a term and looking at plots and diagnostics again.
Try doing the same thing but using \(p\)-values. Are the resulting models different? Why?
Compare these results to the diagram in the lecture notes. Do your results differ?
We can see how well a response distribution performs by comparing quantile-quantile plots (q-q plots). The qq.gam
function can create these plots for you.
qq.gam(dsm_nb_xy_ms, asp=1, rep=100)
The rep
argument gives us the grey “envelope” that allows us to determine how far away the points are from the line.
Try this out with your own models, comparing the results between two different response distributions (remember that you can change the response distribution using family=
in the dsm()
and our two usual options are tw()
and nb()
).
As with the detection functions in practical 1, here is a quick function to generate model results tables with response distribution, smooth terms list and AIC:
summarize_dsm <- function(model){
summ <- summary(model)
data.frame(response = model$family$family,
terms = paste(rownames(summ$s.table), collapse=", "),
AIC = AIC(model)
)
}
We can make a list of the models and pass the list to the above function.
# add your models to this list!
model_list <- list(dsm_nb_xy_ms)
# use plyr to go from list to data.frame via summarize_dsm
library(plyr)
summary_table <- ldply(model_list, summarize_dsm)
# make the row names whatever you like
row.names(summary_table) <- c("Full model, `dsm_nb_xy_ms`")
# sort that table by AIC
summary_table <- summary_table[order(summary_table$AIC, decreasing=TRUE),]
# print it in a nice format
kable(summary_table,
caption = "Model selection table")
response | terms | AIC | |
---|---|---|---|
Full model, dsm_nb_xy_ms |
Negative Binomial(0.114) | s(x,y), s(Depth), s(DistToCAS), s(SST), s(EKE), s(NPP) | 754.0326 |
So far we have only looked at models with count
as the response. Try using a detection function with observation-level covariates and use abundance.est
, instead of count
, as the response in the chunk below:
Now save the models that you’d like to use to check later: I recommend saving as many models as you can so you can compare the results later.
# add your models here
save(dsm_nb_xy_ms,
file="dsms.RData")