## Aims
By the end of this practical, you should feel comfortable:
- Making a prediction using the `predict()` function
- Summing the prediction cells to obtain a total abundance for a given area
- Plotting a map of predictions
- Knowing when to use `dsm.var.prop` and when to use `dsm.var.gam`
- Estimating variance for a given prediction area
- Estimating variance per-cell for a prediction grid
- Interpreting the `summary()` output for uncertainty estimates
- Making maps of the coefficient of variation in R
## Loading the packages and data
```{r load-packages}
library(knitr)
library(dsm)
library(ggplot2)
library(patchwork)
```
Load our models and data:
```{r load-models}
load("spermwhale.RData")
# load models
load("dsms.RData")
load("dsms-checked.RData")
# rename one model
dsm_nb_xy_ms <- dsm_nb_xy_ms_p1_4
```
## Prediction data
Our prediction data lives in the `predgrid` object. This data was prepared from various sources and the since some of the covariates were dynamic (SST, NPP and EKE), the dates for these were arbitrarily chosen as 2 June 2004, or Julian date 153.
If you're preparing this data yourself, the [`rerddap` package](https://cran.r-project.org/web/packages/rerddap/) can be used to obtain marine data from NOAA for both segments and predictions.
We can take a look at our data using `head()` to see the first few rows:
```{r inspect-preddata}
head(predgrid)
```
As we saw in the slides, the `predict()` function will fill in the variables in the model formula with these values to obtain the predictions at each cell.
For example using the model `dsm_nb_xy_ms`:
```{r makepred1}
pp <- predict(dsm_nb_xy_ms, predgrid, off.set=predgrid$off.set)
```
This is just a list of numbers -- the predicted abundance per cell. We can sum these to get the estimated abundance for the study area:
```{r predsum}
sum(pp, na.rm=TRUE)
```
We can also plot this to get a spatial representation of the predictions:
```{r predsp, fig.cap="Predicted surface for abundance estimates with bivariate spatial smooth along with environmental covariates."}
# assign the predictions to the prediction grid data.frame
predgrid$Nhat_nb_xy <- pp
# plot!
p <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_nb_xy, width=10*1000, height=10*1000)) +
coord_equal() +
# better colour scheme
scale_fill_viridis_c() +
# cleaner plot
theme_minimal() +
# set the label for the fill colour
labs(fill="Abundance")
print(p)
```
Use the chunks above as a template and make predictions for the other models you saved in the previous exercises. In particular, compare the models with only spatial terms to those with environmental covariates included.
---
**Making predictions**
Let's try making predictions with a few models. Let's compare two count models (negative binomial and Tweedie) an two estimated abundance models (negative binomial and Tweedie).
I'll make the plots then use the `patchwork` package to add together plots and put them together at the end.
First the Tweedie count, with a smooth of space.
```{r makepred-tw}
predgrid$Nhat_tw_xy <- predict(dsm_tw_xy_ms, predgrid,
off.set=predgrid$off.set)
# plot
p_tw_xy <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_tw_xy,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
ggtitle("dsm_tw_xy_ms") +
labs(fill="Abundance")
```
Now make a plot for the negative binomial habitat count model:
```{r makepred-nb}
predgrid$Nhat_nb_noxy <- predict(dsm_nb_noxy_ms_refit,
predgrid,
off.set=predgrid$off.set)
# plot
p_nb_noxy <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_nb_noxy,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
ggtitle("dsm_nb_noxy_ms_refit") +
labs(fill="Abundance")
```
Now the estimated abundance, with negative binomial response:
```{r makepred-nb-ae}
predgrid$Nhat_nb_ae <- predict(dsm_nb_xy_ms_ae, predgrid,
off.set=predgrid$off.set)
# plot
p_nb_ae <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_nb_ae,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
ggtitle("dsm_nb_xy_ms_ae") +
labs(fill="Abundance")
```
Finally estimated abundance, with Tweedie response:
```{r makepred-tw-ae}
predgrid$Nhat_tw_ae <- predict(dsm_tw_xy_ms_ae, predgrid,
off.set=predgrid$off.set)
# plot
p_tw_ae <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_tw_ae,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
ggtitle("dsm_tw_xy_ms_ae") +
labs(fill="Abundance")
```
Now plotting these all together:
```{r makeplot, fig.height=10, fig.width=10}
(p_tw_xy + p_nb_noxy)/(p_tw_ae + p_nb_ae)
```
We can see a similar pattern generally in all the plots with high values at the shelf edge, though we can see that `dsm_nb_noxy_ms_refit` has much more fine detail in it than the other models.
Now make some abundance estimates:
```{r abundance-ests}
sum(predgrid$Nhat_tw_xy)
sum(predgrid$Nhat_nb_noxy)
sum(predgrid$Nhat_tw_ae)
sum(predgrid$Nhat_nb_ae)
```
We do see a big difference here between the estimated abundance models and the count models. Bear in mind that the count models were both fitted just using the half-normal detection function.
```{r hab-refit-df}
load("df-models.RData")
# need to build the binned Beaufort covariate as was
# in the detection function
segs$BeaufortCut <- cut(segs$Beaufort, seq(0, 5, by=1),
include.lowest=TRUE,
ordered.result=TRUE)
# do a quick model selection, based off previous results
dsm_nb_xy_ms_refit <- dsm(count~s(x,y, bs="ts", k=60) +
s(Depth, bs="ts", k=20) +
#s(DistToCAS, bs="ts") + # 1
#s(SST, bs="ts") + # 2
s(EKE, bs="ts", k=20),# +
#s(NPP, bs="ts"), # 3
df_hr_beau, segs, obs,
family=nb())
summary(dsm_nb_xy_ms_refit)
```
Plotting that resulting model:
```{r makepred-nb-xy}
predgrid$Nhat_nb_xy <- predict(dsm_nb_xy_ms_refit,
predgrid,
off.set=predgrid$off.set)
# plot
p_nb_xy <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_nb_xy,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
labs(fill="Abundance")
print(p_nb_xy)
```
Now we get something more like the estimated abundance models, so we can see that including sea state can have a really big effect!
```{r new-abund-xy}
sum(predgrid$Nhat_nb_xy)
```
And let's do the same thing for the Tweedie:
```{r hab-refit-df-tw}
# do a quick model selection, based off previous results
dsm_tw_xy_ms_refit <- dsm(count~s(x,y, bs="ts", k=60) +
s(Depth, bs="ts", k=20) +
#s(DistToCAS, bs="ts", k=20) + # 1
s(SST, bs="ts", k=20),# +
#s(EKE, bs="ts", k=20),# +
#s(NPP, bs="ts", k=20), # 2
df_hr_beau, segs, obs,
family=tw())
summary(dsm_tw_xy_ms_refit)
```
So now with SST in the model, let's see what that looks like:
```{r makepred-tw-refit}
predgrid$Nhat_tw_xy <- predict(dsm_tw_xy_ms_refit,
predgrid,
off.set=predgrid$off.set)
# plot
p_tw_xy <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=Nhat_tw_xy,
width=10*1000, height=10*1000)) +
coord_equal() +
scale_fill_viridis_c() +
theme_minimal() +
labs(fill="Abundance")
print(p_tw_xy)
```
However this doesn't change the predicted abundance.
```{r new-abund}
sum(predgrid$Nhat_tw_xy)
```
However, let's remember that this was not our favoured model previously and remains so: the quantile-quantile plot shows the negative binomial model has a better fit:
```{r tw-nb-qq}
par(mfrow=c(1,2))
qq.gam(dsm_nb_xy_ms_refit, rep=200, main="negative binomial")
qq.gam(dsm_tw_xy_ms_refit, rep=200, main="negative binomial")
```
So we won't pursue this model (`dsm_tw_xy_ms_refit`) further.
---
## Variance estimation
Depending on the covariates in the model, we can use either `dsm.var.prop` or `dsm.var.gam`. Remember:
- `dsm.var.gam`
- assumes spatial model and detection function are independent
- `dsm.var.prop`
- propagates uncertainty from detection function to spatial model
- only works for `count` models
- covariates can only vary at segment level
If there are no covariates in the detection function, use `dsm.var.gam` since there's no covariance between the detection function and the spatial model.
`dsm_nb_xy_ms` doesn't include any covariates at the observer level in the detection function, so we can use `dsm.var.gam` to estimate the uncertainty.
```{r varest}
# now estimate variance
var_nb_xy_ms <- dsm.var.gam(dsm_nb_xy_ms, predgrid,
off.set=predgrid$off.set)
```
To summarise the results of this variance estimate:
```{r varest-summary}
summary(var_nb_xy_ms)
```
Try this out for some of the other models you've saved.
---
**Variance estimation**
We can use either the delta method (`dsm.var.gam`) or variance propagation (`dsm.var.prop`) with our count model:
```{r var-count}
dsm_nb_vargam <- dsm.var.gam(dsm_nb_xy_ms_refit, predgrid,
off.set=predgrid$off.set)
summary(dsm_nb_vargam)
dsm_nb_varprop <- dsm.var.prop(dsm_nb_xy_ms_refit, predgrid,
off.set=predgrid$off.set)
summary(dsm_nb_varprop)
```
Looking at these summaries we see that we actually get a reduction in the CV for our predictions! This is due to negative covariance between the spatial and detection parts of the model.
The table at the top of the `dsm.var.prop` output lets us check whether anything went wrong during the variance estimation. Values in the `Refitted.model` column should be no more than twice `Fitted.model.se` away from their corresponding `Fitted.model` values.
For the estimated abundance models, we must use `dsm.var.gam`:
```{r var-count-ae}
dsm_nb_vargam_ae <- dsm.var.gam(dsm_nb_xy_ms_ae, predgrid,
off.set=predgrid$off.set)
summary(dsm_nb_vargam_ae)
dsm_tw_vargam_ae <- dsm.var.gam(dsm_tw_xy_ms_ae, predgrid,
off.set=predgrid$off.set)
summary(dsm_tw_vargam_ae)
```
Again the Tweedie model here is just for example, we didn't prefer this model based on quantile-quantile plots.
---
## Plotting
We can plot a map of the coefficient of variation, but we first need to estimate the variance per prediction cell, rather than over the whole area.
```{r per-cell-var}
# use the split function to make each row of the predictiond data.frame into
# an element of a list
predgrid$height <- predgrid$width <- 10*1000
predgrid_var_split <- split(predgrid, 1:nrow(predgrid))
var_split_nb_xy_ms <- dsm.var.gam(dsm_nb_xy_ms, predgrid_var_split,
off.set=predgrid$off.set)
```
Now we have the per-cell coefficients of variation, we assign that to a column of the prediction grid data and plot it as usual:
```{r varest-map-obs, fig.cap="Uncertainty (CV) in prediction surface from bivariate spatial smooth with environmental covariates. Sightings overlaid."}
cv <- sqrt(var_split_nb_xy_ms$pred.var)/unlist(var_split_nb_xy_ms$pred)
predgrid$CV <- cv
p <- ggplot(predgrid) +
geom_tile(aes(x=x, y=y, fill=CV, width=10*1000, height=10*1000)) +
scale_fill_viridis_c() +
coord_equal() +
geom_point(aes(x=x,y=y, size=count), alpha=0.4,
data=dsm_nb_xy_ms$data[dsm_nb_xy_ms$data$count>0,]) +
theme_minimal()
print(p)
```
Note that here we overplot the segments where sperm whales were observed (and scale the size of the point according to the number observed), using `geom_point()`. The `alpha=` argument makes the points semi-transparent.
Try this with the other models you fitted and see what the differences are between the maps of coefficient of variation.
Try plotting the effort over a CV plot, using the data in the model.
---
**Plotting uncertainty**
Let's make plots to compare the maps using the delta method and variance propagation for the count model `dsm_nb_xy_ms_refit`:
```{r var-count-plot}
# vargam plot
dsm_nb_vargam_plot <- dsm.var.gam(dsm_nb_xy_ms_refit, predgrid_var_split,
off.set=predgrid$off.set)
# can use plot=FALSE to get the ggplot object that we can modify
pl_vg <- plot(dsm_nb_vargam_plot, plot=FALSE, observations=FALSE)
# modify plot
pl_vg <- pl_vg +
geom_point(aes(x=x, y=y), data=dsm_nb_noxy_ms_refit$data,
colour="lightgrey", alpha=.5, size=0.2) +
theme(legend.position="bottom") +
scale_fill_viridis_c()
# varprop plot
dsm_nb_varprop_plot <- dsm.var.prop(dsm_nb_xy_ms_refit, predgrid_var_split,
off.set=predgrid$off.set)
pl_vp <- plot(dsm_nb_varprop_plot, plot=FALSE, observations=FALSE)
pl_vp <- pl_vp +
geom_point(aes(x=x, y=y), data=dsm_nb_noxy_ms_refit$data,
colour="lightgrey", alpha=.5, size=0.2) +
theme(legend.position="bottom") +
scale_fill_viridis_c()
```
Putting that together in a plot:
```{r plot-vg-vp}
pl_vg + pl_vp
```
We can see some differences here in the spatial pattern of the ucnertainty as well as the overall size we see in the `summary` difference above.
---
---
**One more thing...**
What about using an interaction of EKE and Depth? I'll try this just for the count model with negative binomial response.
We can construct this using a *tensor product* which takes two (or more) univariate smooths and create a 2 (or more) dimensional smooth. We can set `k` for each dimension (we can even set different basis functions in each direction) but note that it's the term will use the product of these `k`s number of parameters (`k=c(10,10)` leads to a 100 parameter term!). The tensor product includes the univariate terms (it's a full interaction), so we need to remove them from the model:
```{r te}
dsm_nb_te <- dsm(count~s(x,y, bs="ts", k=30) +
#s(DistToCAS, bs="ts") + # 1
#s(SST, bs="ts") + # 3
te(EKE, Depth, bs="ts", k=c(10, 10)),# +
#s(NPP, bs="ts"), # 2
df_hr_beau, segs, obs,
family=nb())
summary(dsm_nb_te)
```
We can plot this just like any other model:
```{r plot-te}
plot(dsm_nb_te, scheme=2, shade=TRUE)
```
A useful test here of whether we need the tensor would be using the `ti` form of the tensor. This allows us to look at the parts of the interaction on their own. We can write a tensor product `te(x, y)` as `ti(x) + ti(y) + ti(x, y)`, so we get the main effects and interactions separately in the model `summary` table. Doing that:
```{r ti}
dsm_nb_ti <- dsm(count~s(x,y, bs="ts", k=30) +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
ti(EKE, bs="ts", k=10) +
ti(Depth, bs="ts", k=10) +
ti(EKE, Depth, bs="ts", k=c(10, 10)),# +
#s(NPP, bs="ts"),
df_hr_beau, segs, obs,
family=nb())
summary(dsm_nb_ti)
```
We can see that the interaction term `ti(EKE, Depth)` has a very small EDF and isn't significant at the 0.05 level. So I think this one isn't a go-er unfortunately, even if the biological/oceanographic story is more compelling.
---