Multivariate smoothing, model selection
========================================
author: David L Miller
css: custom.css
transition: none
Recap
======
- How GAMs work
- How to include detection info
- Simple spatial-only models
- How to check those models
```{r setup, include=FALSE}
library(knitr)
library(magrittr)
library(viridis)
library(ggplot2)
library(reshape2)
library(dsm)
library(animation)
opts_chunk$set(cache=TRUE, echo=FALSE)
load("../spermwhale-analysis/count-models.RData")
load("../spermwhale-analysis/df-models.RData")
load("../spermwhale-analysis/sperm-data.RData")
library(dsm)
```
Univariate models are fun, but...
==================================
type: section
Ecology is not univariate
============================
- Many variables affect distribution
- Want to model the **right** ones
- Select between possible models
- Smooth term selection
- Response distribution
- Large literature on model selection
Tobler's first law of geography
==================================
type:section
"Everything is related to everything else, but near things are more related than distant things"
Tobler (1970)
Implications of Tobler's law
==============================
```{r pairrrrs, fig.width=10}
plot(segs[,c("x","y","SST","EKE","NPP","Depth")], pch=19, cex=0.4)
```
blah
=====
type:section
title:none
Covariates are not only correlated (linearly)...
...they are also "concurve"
What can we do about this?
===========================
- Careful inclusion of smooths
- Fit models using robust criteria (REML)
- Test for concurvity
- Test for sensitivity
Models with multiple smooths
===========================
type:section
Adding smooths
===============
- Already know that `+` is our friend
- Add everything then remove smooth terms?
```{r add-term, echo=TRUE}
dsm_all_tw <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts") +
s(DistToCAS, bs="ts") +
s(SST, bs="ts") +
s(EKE, bs="ts") +
s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
Now we have a huge model, what do we do?
=========================================
type: section
Smooth term selection
================
- Classically two main approaches:
- Stepwise - path dependence
- All possible subsets - computationally expensive
Removing terms by shrinkage
=============================
- Remove smooths using a penalty (shrink the EDF)
- Basis `"ts"` - thin plate splines with shrinkage
- "Automatic"
p-values
=================
- $p$-values can be used
- They are **approximate**
- Reported in `summary`
- Generally useful though
Let's employ a mixture of these techniques
===========================================
type: section
How do we select smooth terms?
===============================
1. Look at EDF
- Terms with EDF<1 may not be useful
- These can usually be removed
2. Remove non-significant terms by $p$-value
- Decide on a significance level and use that as a rule
Example of selection
======================
type: section
Selecting smooth terms
========================
```{r add-term-summary}
summary(dsm_all_tw)
```
Shrinkage in action
====================
```{r smooth-shrinkage, fig.width=15}
plot(dsm_all_tw, pages=1, scale=0)
```
Same model with no shrinkage
=============================
```{r add-term-nots}
dsm_all_tw_nots <- dsm(count~s(x, y) +
s(Depth) +
s(DistToCAS) +
s(SST) +
s(EKE) +
s(NPP),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
```{r smooth-no-shrinkage, fig.width=15}
plot(dsm_all_tw_nots, pages=1, scale=0)
```
Let's remove some smooth terms & refit
=======================================
```{r add-term-rm-terms, echo=TRUE}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts") +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
s(EKE, bs="ts"),#+
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
What does that look like?
===========================
```{r add-term-summary-rm}
summary(dsm_all_tw_rm)
```
Removing EKE...
=================
```{r add-term-rm-terms2}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts"),# +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
#s(EKE, bs="ts"), #+
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
summary(dsm_all_tw_rm)
```
General strategy
=================
For each response distribution and non-nested model structure:
1. Build a model with the smooths you want
2. Make sure that smooths are flexible enough (`k=...`)
3. Remove smooths that have been shrunk
4. Remove non-significant smooths
Comparing models
=================
type: section
Nested vs. non-nested models
==============================
- Compare `~s(x)+s(depth)` with `~s(x)`
- nested models
- What about `s(x) + s(y)` vs. `s(x, y)`
- don't want to have all these in the model
- not nested models
Measures of "fit"
===================
- Two listed in `summary`
- Deviance explained
- Adjusted $R^2$
- Deviance is a generalisation of $R^2$
- Highest likelihood value (*saturated* model) minus estimated model value
- (These are usually not very high for DSMs)
A quick note about REML scores
================================
- Use REML to select the smoothness
- Can also use the score to do model selection
- **BUT** only compare models with the same fixed effects
- (i.e. same "linear terms" in the model)
- $\Rightarrow$ **All terms** must be penalised (e.g. `bs="ts"`)
- Alternatively set `select=TRUE` in `gam()`
Selecting between response distributions
==========================================
type: section
Goodness of fit tests
======================
- Q-Q plots
- Closer to the line == better
```{r gof-qq, fig.width=15}
dsm_x_nb <- dsm(count~s(x, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=nb(), method="REML")
par(mfrow=c(1,2))
qq.gam(dsm_all_tw, asp=1, main="Tweedie")
qq.gam(dsm_x_nb, asp=1, main="Negative binomial")
```
Going back to concurvity
==========================
type:section
"How much can one smooth be approximated by one or more other smooths?"
Concurvity (model/smooth)
=========================
```{r concurvity, echo=TRUE}
concurvity(dsm_all_tw)
```
Concurvity between smooths
===========================
```{r concurvity-all, echo=TRUE}
concurvity(dsm_all_tw, full=FALSE)$estimate
```
Visualising concurvity between terms
==========================
```{r concurvity-all-vis}
vis.concurvity <- function(b, type="estimate"){
cc <- concurvity(b, full=FALSE)[[type]]
diag(cc) <- NA
cc[lower.tri(cc)]<-NA
layout(matrix(1:2, ncol=2), widths=c(5,1))
opar <- par(mar=c(5, 6, 5, 0) + 0.1)
# main plot
image(z=cc, x=1:ncol(cc), y=1:nrow(cc), ylab="", xlab="",
axes=FALSE, asp=1, zlim=c(0,1))
axis(1, at=1:ncol(cc), labels = colnames(cc), las=2)
axis(2, at=1:nrow(cc), labels = rownames(cc), las=2)
# legend
opar <- par(mar=c(5, 0, 4, 3) + 0.1)
image(t(matrix(rep(seq(0, 1, len=100), 2), ncol=2)),
x=1:3, y=1:101, zlim=c(0,1), axes=FALSE, xlab="", ylab="")
axis(4, at=seq(1,101,len=5), labels = round(seq(0,1,len=5),1), las=2)
par(opar)
}
vis.concurvity(dsm_all_tw)
```
***
- Previous matrix output visualised
- Diagonal/lower triangle left out for clarity
- High values (yellow) = BAD
Path dependence
==================
type:section
Sensitivity
============
- General path dependency?
- What if there are highly concurve smooths?
- Is the model is sensitive to them?
What can we do?
=================
- Fit variations excluding smooths
- Concurve terms that are excluded early on
- Appendix of Winiarski et al (2014) has an example
Sensitivity example
======================
- `s(Depth)` and `s(x, y)` are highly concurve (`r round(concurvity(dsm_all_tw_rm, full=FALSE)$estimate[2,3],4)`)
- Refit removing `Depth` first
```{r nodepth}
dsm_no_depth <- dsm(count~s(x, y, bs="ts") +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
s(EKE, bs="ts") +
s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
```{r sensitivity-anova}
cat("# with depth")
summary(dsm_all_tw_rm)$s.table
cat("# without depth")
summary(dsm_no_depth)$s.table
```
Comparison of spatial effects
===============================
```{r sensitivity-vis, fig.width=12}
par(mfrow=c(1,2))
vis.gam(dsm_all_tw_rm, view=c("x","y"), plot.type="contour", main="s(x,y) + s(Depth)", asp=1, too.far=0.1, zlim=c(-6, 2))
vis.gam(dsm_no_depth, view=c("x","y"), plot.type="contour", main="s(x,y)+s(EKE)+s(NPP)", asp=1, too.far=0.1, zlim=c(-6, 2))
```
Sensitivity example
======================
- Refit removing `x` and `y`...
```{r noxy}
dsm_no_xy <- dsm(count~
#s(DistToCAS, bs="ts") +
s(SST, bs="ts") +
s(Depth, bs="ts"),# +
#s(EKE, bs="ts") +
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
```{r sensitivity-anova-noxy, fig.width=12}
cat("# without xy")
summary(dsm_no_xy)$s.table
cat("# with xy")
summary(dsm_all_tw_rm)$s.table
```
Comparison of depth smooths
============================
```{r sensitivity-depth, fig.width=12}
par(mfrow=c(1,2))
plot(dsm_all_tw_rm, select=2, ylim=c(-5,5))
plot(dsm_no_xy, select=2, ylim=c(-5,5))
```
Comparing those three models...
================================
```{r sensitivity-table, results="asis"}
sens <- data.frame(Name = c("`s(x,y) + s(Depth)`",
"`s(x,y)+s(EKE)+s(NPP)`",
"`s(SST)+s(Depth)`"),
Rsq = c(summary(dsm_all_tw_rm)$r.sq,
summary(dsm_no_depth)$r.sq,
summary(dsm_no_xy)$r.sq),
Deviance = c(summary(dsm_all_tw_rm)$dev.expl,
summary(dsm_no_depth)$dev.expl,
summary(dsm_no_xy)$dev.expl))
sens$Deviance <- sens$Deviance*100
sens[,2] <- round(sens[,2],4)
sens[,3] <- round(sens[,3],2)
kable(sens)
```
- "Full" model still explains most deviance
- No depth model requires spatial smooth to "mop up" extra variation
- We'll come back to this when we do prediction
Recap
========
type: section
Recap
=======
- Adding smooths
- Removing smooths
- $p$-values
- shrinkage
- Comparing models
- Comparing response distributions
- Sensitivity