class: title-slide, inverse, center, middle #Extras<br/>and</br>advanced topics <div style="position: absolute; bottom: 15px; vertical-align: center; left: 10px"> <img src="images/02-foundation-vertical-white.png" height="200"> </div> --- class: inverse, center, middle # More complicated effects --- # `s(x,y)` doesn't always work - Only works for `bs="tp"` or `bs="ts"` - Covariates are isotropic - What if we wanted to use lat/long? - Or, more generally: interactions between covariates? --- # Enter `te()` .pull-left[ - We can built interactions using `te()` - Construct 2D basis from 2 1D bases - 💠"marginal 1Ds, join them up" ] .pull-right[ ![](dsmX-advanced-topics_files/figure-html/tensor-1.png)<!-- --> ] --- # Using `te()` Just like `s()`: ```r dsm_te <- dsm(count ~ te(Depth, SST), ddf.obj=df_hr, observation.data=obs, segment.data=segs, family=tw()) ``` --- # `summary` ``` ## ## Family: Tweedie(p=1.282) ## Link function: log ## ## Formula: ## count ~ te(Depth, SST) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.3862 0.2831 -72.02 <2e-16 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## te(Depth,SST) 11.79 14.03 7.104 <2e-16 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.117 Deviance explained = 36.6% ## -REML = 387.64 Scale est. = 4.5541 n = 949 ``` --- # Things to fiddle with - Setting `k=` 2 ways: - `k=5`: 5 for all covariates (total `\(5*5=25\)`) - `k=c(3,5)`: per basis, in order (total `\(3*5=15\)`) - Setting `bs=` 2 ways: - `bs="tp"`: tprs for all bases - `bs=c("tp", "tp")`: tprs per basis --- # Pulling `te()` apart: `ti()` - Can we look at the components of the `te()` - `te(x, y) = ti(x, y) + ti(x) + ti(y)` ```r dsm_ti <- dsm(count ~ ti(Depth, SST) + ti(Depth) + ti(SST), ddf.obj=df_hr, observation.data=obs, segment.data=segs, family=tw()) ``` --- # `summary` ``` ## ## Family: Tweedie(p=1.281) ## Link function: log ## ## Formula: ## count ~ ti(Depth, SST) + ti(Depth) + ti(SST) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.4337 0.2868 -71.25 <2e-16 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## ti(Depth,SST) 2.295 2.794 2.068 0.124 ## ti(Depth) 3.477 3.817 16.905 < 2e-16 *** ## ti(SST) 3.175 3.505 8.492 4.08e-06 *** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.114 Deviance explained = 36% ## -REML = 387.37 Scale est. = 4.5448 n = 949 ``` --- # Space x time - We had a 2d spatial model, add time? - `te(x, y, year)` ? - `d=` groups covariates - `te(x, y, year, d=c(2, 1))` gives `x, y` smooth and `year` smooth tensor - (Assuming default `k=` and `bs=` for bases above) --- # Fiddling - Often fewer temporal replicates - Fewer years than unique locations - `k=` smaller for temporal covariate? - Use cubic spline basis for time? - simpler basis, even knot placement - When using `ti()` arguments (`k`, `bs`) need to match up between terms - if `k=3` for `Depth` in one term it needs to be that in all terms --- class: inverse, center, middle # Other effects --- # Random effects - "Simple" random slope/random intercept models - `s(..., bs="re")` - **think** about what these models mean --- # Factor-smooth interactions - What if we only have a few "years"? - What if we don't think the "years" are smooth? - (Before/after?) - Terms like `s(Depth, by=year)` change the smooth by year - also `s(Depth, year, bs="fs")` (lots of ways to specify) - see [Pedersen et al. (2019)](https://peerj.com/articles/6876/) for more on these models --- class: inverse, center, middle # Availability --- # Availability - Is an animal *available* to be detected - e.g., diving marine mammals - Primitive way to do this in `dsm` - `availability=` for each segment (only for `count` models) - Active research area! --- class: inverse, center, middle # g(0), MRDS etc --- # Mark-recapture distance sampling - Will be able to include these models in next `dsm` release - Only independent observer ("`io`") and trial ("`trial`") modes supported - [Example here](https://github.com/densitymodelling/nefsc_fin_mrds_dsm) --- class: inverse, center, middle # Combining multiple surveys --- # Combining multiple surveys - What about combining aerial/shipboard data? - Different detection functions - Again, next `dsm` release allows this - [Fitting complicated models](https://github.com/densitymodelling/nefsc_fin_mrds_dsm) example --- class: inverse, center, middle # Finally... --- # Recent developments - New `dsm` out in the next few weeks! - [Fitting DSMs in JAGS/Nimble](https://github.com/densitymodelling/nimble_scrubjay) - [DenMod project](https://synergy.st-andrews.ac.uk/denmod/about/) has produced lots of methodology - Society for Marine Mammalogy meeting December --- class: inverse, center, middle # Extra bits --- class: inverse, center, middle # Deviance explained, explained --- # Deviance explained, explained - Avoid `\(R^2\)` (see [these notes](https://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf) for more info) - But what about deviance explained? - First, what is it? $$ D = -2(\mathcal{l}_s - \mathcal{l}) $$ where `\(\mathcal{L}_s\)` is the *saturated* log likelihood and `\(\mathcal{L}\)` is the likelihood of our model. - Saturated means the "best" model we can get, one parameter per data point. - So meaning is it's relative to the best we can do *for this model* --- # Deviance explained, explained - `mgcv` reports "Deviance explained" as a percentage $$ D_{\%} = 100 (\mathcal{l}_s - \mathcal{l})/ \mathcal{l}_s $$ - Problem: for different models (with different numbers of parameters) `\(\mathcal{l}_s\)` is different - So are we making fair comparisons? - AIC is simpler and easier to think about! [More info on deviance for GAMs](https://stats.stackexchange.com/a/191235) --- # More difficulties with explanatory power - Low (<60%) deviance is common. But why? - Sampling a temporally variable system - Revisiting the same place multiple times, we might get zero counts twice and then one large count. - What should the model make of this? - Without explicit temporal model, it tries to average - So prediction will be a "medium" count, bad prediction for the zeros and the large counts - No one is happy! - See observed vs. expected diagnostics etc --- class: inverse, center, middle # That's all folks!