Multivariate smoothing, model selection ======================================== author: David L Miller css: custom.css transition: none Recap ====== - How GAMs work - How to include detection info - Simple spatial-only models - How to check those models ```{r setup, include=FALSE} library(knitr) library(magrittr) library(viridis) library(ggplot2) library(reshape2) library(dsm) library(animation) opts_chunk$set(cache=TRUE, echo=FALSE) load("../spermwhale-analysis/count-models.RData") load("../spermwhale-analysis/df-models.RData") load("../spermwhale-analysis/sperm-data.RData") library(dsm) ``` Univariate models are fun, but... ================================== type: section Ecology is not univariate ============================ - Many variables affect distribution - Want to model the **right** ones - Select between possible models - Smooth term selection - Response distribution - Large literature on model selection Tobler's first law of geography ================================== type:section "Everything is related to everything else, but near things are more related than distant things" Tobler (1970) Implications of Tobler's law ============================== ```{r pairrrrs, fig.width=10} plot(segs[,c("x","y","SST","EKE","NPP","Depth")], pch=19, cex=0.4) ``` blah ===== type:section title:none Covariates are not only correlated (linearly)...

...they are also "concurve" What can we do about this? =========================== - Careful inclusion of smooths - Fit models using robust criteria (REML) - Test for concurvity - Test for sensitivity Models with multiple smooths =========================== type:section Adding smooths =============== - Already know that `+` is our friend - Add everything then remove smooth terms? ```{r add-term, echo=TRUE} dsm_all_tw <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` Now we have a huge model, what do we do? ========================================= type: section Smooth term selection ================ - Classically two main approaches: - Stepwise - path dependence - All possible subsets - computationally expensive Removing terms by shrinkage ============================= - Remove smooths using a penalty (shrink the EDF) - Basis `"ts"` - thin plate splines with shrinkage - "Automatic" p-values ================= - $p$-values can be used - They are **approximate** - Reported in `summary` - Generally useful though Let's employ a mixture of these techniques =========================================== type: section How do we select smooth terms? =============================== 1. Look at EDF - Terms with EDF<1 may not be useful - These can usually be removed 2. Remove non-significant terms by $p$-value - Decide on a significance level and use that as a rule Example of selection ====================== type: section Selecting smooth terms ======================== ```{r add-term-summary} summary(dsm_all_tw) ``` Shrinkage in action ==================== ```{r smooth-shrinkage, fig.width=15} plot(dsm_all_tw, pages=1, scale=0) ``` Same model with no shrinkage ============================= ```{r add-term-nots} dsm_all_tw_nots <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` ```{r smooth-no-shrinkage, fig.width=15} plot(dsm_all_tw_nots, pages=1, scale=0) ``` Let's remove some smooth terms & refit ======================================= ```{r add-term-rm-terms, echo=TRUE} dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + s(EKE, bs="ts"),#+ #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` What does that look like? =========================== ```{r add-term-summary-rm} summary(dsm_all_tw_rm) ``` Removing EKE... ================= ```{r add-term-rm-terms2} dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts"),# + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + #s(EKE, bs="ts"), #+ #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") summary(dsm_all_tw_rm) ``` General strategy ================= For each response distribution and non-nested model structure: 1. Build a model with the smooths you want 2. Make sure that smooths are flexible enough (`k=...`) 3. Remove smooths that have been shrunk 4. Remove non-significant smooths Comparing models ================= type: section Nested vs. non-nested models ============================== - Compare `~s(x)+s(depth)` with `~s(x)` - nested models - What about `s(x) + s(y)` vs. `s(x, y)` - don't want to have all these in the model - not nested models Measures of "fit" =================== - Two listed in `summary` - Deviance explained - Adjusted $R^2$ - Deviance is a generalisation of $R^2$ - Highest likelihood value (*saturated* model) minus estimated model value - (These are usually not very high for DSMs) A quick note about REML scores ================================ - Use REML to select the smoothness - Can also use the score to do model selection - **BUT** only compare models with the same fixed effects - (i.e. same "linear terms" in the model) - $\Rightarrow$ **All terms** must be penalised (e.g. `bs="ts"`) - Alternatively set `select=TRUE` in `gam()` Selecting between response distributions ========================================== type: section Goodness of fit tests ====================== - Q-Q plots - Closer to the line == better ```{r gof-qq, fig.width=15} dsm_x_nb <- dsm(count~s(x, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=nb(), method="REML") par(mfrow=c(1,2)) qq.gam(dsm_all_tw, asp=1, main="Tweedie") qq.gam(dsm_x_nb, asp=1, main="Negative binomial") ``` Going back to concurvity ========================== type:section "How much can one smooth be approximated by one or more other smooths?" Concurvity (model/smooth) ========================= ```{r concurvity, echo=TRUE} concurvity(dsm_all_tw) ``` Concurvity between smooths =========================== ```{r concurvity-all, echo=TRUE} concurvity(dsm_all_tw, full=FALSE)$estimate ``` Visualising concurvity between terms ========================== ```{r concurvity-all-vis} vis.concurvity <- function(b, type="estimate"){ cc <- concurvity(b, full=FALSE)[[type]] diag(cc) <- NA cc[lower.tri(cc)]<-NA layout(matrix(1:2, ncol=2), widths=c(5,1)) opar <- par(mar=c(5, 6, 5, 0) + 0.1) # main plot image(z=cc, x=1:ncol(cc), y=1:nrow(cc), ylab="", xlab="", axes=FALSE, asp=1, zlim=c(0,1)) axis(1, at=1:ncol(cc), labels = colnames(cc), las=2) axis(2, at=1:nrow(cc), labels = rownames(cc), las=2) # legend opar <- par(mar=c(5, 0, 4, 3) + 0.1) image(t(matrix(rep(seq(0, 1, len=100), 2), ncol=2)), x=1:3, y=1:101, zlim=c(0,1), axes=FALSE, xlab="", ylab="") axis(4, at=seq(1,101,len=5), labels = round(seq(0,1,len=5),1), las=2) par(opar) } vis.concurvity(dsm_all_tw) ``` *** - Previous matrix output visualised - Diagonal/lower triangle left out for clarity - High values (yellow) = BAD Path dependence ================== type:section Sensitivity ============ - General path dependency? - What if there are highly concurve smooths? - Is the model is sensitive to them? What can we do? ================= - Fit variations excluding smooths - Concurve terms that are excluded early on - Appendix of Winiarski et al (2014) has an example Sensitivity example ====================== - `s(Depth)` and `s(x, y)` are highly concurve (`r round(concurvity(dsm_all_tw_rm, full=FALSE)$estimate[2,3],4)`) - Refit removing `Depth` first ```{r nodepth} dsm_no_depth <- dsm(count~s(x, y, bs="ts") + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` ```{r sensitivity-anova} cat("# with depth") summary(dsm_all_tw_rm)$s.table cat("# without depth") summary(dsm_no_depth)$s.table ``` Comparison of spatial effects =============================== ```{r sensitivity-vis, fig.width=12} par(mfrow=c(1,2)) vis.gam(dsm_all_tw_rm, view=c("x","y"), plot.type="contour", main="s(x,y) + s(Depth)", asp=1, too.far=0.1, zlim=c(-6, 2)) vis.gam(dsm_no_depth, view=c("x","y"), plot.type="contour", main="s(x,y)+s(EKE)+s(NPP)", asp=1, too.far=0.1, zlim=c(-6, 2)) ``` Sensitivity example ====================== - Refit removing `x` and `y`... ```{r noxy} dsm_no_xy <- dsm(count~ #s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(Depth, bs="ts"),# + #s(EKE, bs="ts") + #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` ```{r sensitivity-anova-noxy, fig.width=12} cat("# without xy") summary(dsm_no_xy)$s.table cat("# with xy") summary(dsm_all_tw_rm)$s.table ``` Comparison of depth smooths ============================ ```{r sensitivity-depth, fig.width=12} par(mfrow=c(1,2)) plot(dsm_all_tw_rm, select=2, ylim=c(-5,5)) plot(dsm_no_xy, select=2, ylim=c(-5,5)) ``` Comparing those three models... ================================ ```{r sensitivity-table, results="asis"} sens <- data.frame(Name = c("`s(x,y) + s(Depth)`", "`s(x,y)+s(EKE)+s(NPP)`", "`s(SST)+s(Depth)`"), Rsq = c(summary(dsm_all_tw_rm)$r.sq, summary(dsm_no_depth)$r.sq, summary(dsm_no_xy)$r.sq), Deviance = c(summary(dsm_all_tw_rm)$dev.expl, summary(dsm_no_depth)$dev.expl, summary(dsm_no_xy)$dev.expl)) sens$Deviance <- sens$Deviance*100 sens[,2] <- round(sens[,2],4) sens[,3] <- round(sens[,3],2) kable(sens) ``` - "Full" model still explains most deviance - No depth model requires spatial smooth to "mop up" extra variation - We'll come back to this when we do prediction Recap ======== type: section Recap ======= - Adding smooths - Removing smooths - $p$-values - shrinkage - Comparing models - Comparing response distributions - Sensitivity