```{r setup, include=FALSE}
# setup
library(knitr)
library(magrittr)
library(viridis)
library(reshape2)
library(animation)
opts_chunk$set(cache=TRUE, echo=FALSE, warning=FALSE, error=FALSE,
message=FALSE, fig.height=8, fig.width=10)
# some useful libraries
library(RColorBrewer)
library(ggplot2)
library(cowplot)
theme_set(theme_cowplot(20))
```
class: title-slide, inverse, center, middle
# Practical advice
---
class: inverse, center, middle
# Real survey data is messy
---
# Distance sampling in the Real World
- We've talked a lot about models
- We've also talked about assumptions
- Our example is relatively well-behaved
- What can we do about all the nasty real world stuff?
---
# Some days...
![2 fire emoji, a computer emoji, 2 fire emoji](images/firecomputer.png)
---
# Aims
- Here we want to cover common questions
- Not definitive answers
- Some guidance on where to look for answers
---
class: inverse, center, middle
# What should my sample size be?
---
# What do we mean by "sample size"?
- Number of animal (groups) recorded
- *detection function*
- Number of segments
- *spatial model*
- Number of segments with observations
- *spatial model*
---
class: inverse, center, middle
# Re-frame
---
# How would we know when we have enough samples?
- We don't
- Heavily context-dependent
- Go back to assumptions
---
# "How many data?"
```{r df-obs, echo=FALSE, fig.width=15, fig.height=9}
library(Distance)
#set.seed(21)
set.seed(321)
dist1 <- abs(rnorm(30,sd=0.2))
mod1 <- ds(data.frame(distance=dist1, object=1:length(dist1)), adjustment=NULL)
dist2 <- c(abs(rnorm(250,sd=0.02)),
abs(rnorm(750,sd=0.5)))
mod2_full <- ds(data.frame(distance=dist2, object=1:length(dist2)), adjustment=NULL)
mod2_05 <- ds(data.frame(distance=dist2, object=1:length(dist2)), adjustment=NULL, truncation=0.5)
mod2_004 <- ds(data.frame(distance=dist2, object=1:length(dist2)), adjustment=NULL, truncation=0.04)
par(mfrow=c(2,3))
plot(mod1, mainshowpoints=FALSE, pl.den=0)
title(main="n=30", cex.main=5)
plot(1:10, axes=FALSE, type="n", ylab="", xlab="", showpoints=FALSE, pl.den=0)
plot(1:10, axes=FALSE, type="n", ylab="", xlab="", showpoints=FALSE, pl.den=0)
plot(mod2_full, showpoints=FALSE, pl.den=0)
title(main="n=1000", cex.main=5)
plot(mod2_05, showpoints=FALSE, pl.den=0)
title(main="n=747", cex.main=5)
plot(mod2_004, showpoints=FALSE, pl.den=0)
title(main="n=273", cex.main=5)
```
---
# Pilot studies and "you get what you pay for"
- Designing surveys is hard
- Designing surveys is essential
- Better to fail one season than fail for 5, 10 years
- Get information early, get it cheap
- Inform design from a pilot study
---
# Avoiding rules of thumb
- Think about assumptions
- Detection function
- Spatial model
- Think about design
- Spatial coverage
- Covariate coverage
---
# Spatial coverage (IWC POWER)
---
# Covariate coverage
```{r coverage, fig.width=12, echo=FALSE}
library(dsm)
load("../practicals/spermwhale.RData")
df_hr <- ds(dist, truncation=6000, key="hr")
# fit a quick model from previous exericises
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
exclude_fn <- function(top, bottom){
segs2 <- segs[(segs$Depth >0 & segs$Depthtop),]
dsm_a <- dsm(count~s(Depth, bs="ts"),
ddf.obj=df_hr,
segment.data=segs2, observation.data=obs,
family=tw(), method="REML")
plot(dsm_a, scale=0, main=paste0(">",top," and <", bottom), xlim=c(0,5200))
}
par(mfrow=c(1,3), cex.main=2, cex.lab=2, cex.axis=2,
lwd=2, mar=c(5,6,4,2) + 0.1)
exclude_fn(0, Inf)
exclude_fn(3000, 1000)
exclude_fn(5000, 3000)
```
---
# Sometimes things are complicated
- Weather has a big effect on detectability
- Need to record during survey
- Disambiguate between distribution/detectability
- Potential confounding can be BAD
![weather or density?](images/weather_or_density.png)
---
# Visibility during POWER 2014
Thanks to Hiroto Murase and co. for this data!
---
# Covariates can make a big difference!
---
# Disappointment
- Sometimes you don't have enough data
- Or, enough coverage
- Or, the right covariates
Sometimes, you can't build a spatial model
---
class: inverse, center, middle
[@kitabet](http://twitter.com/kitabet)
---
class: inverse, center, middle
# "Which of options X, Y, Z is correct?"
---
# Alternatives problem
- When faced with options, try them.
- **Where** does the sensitivity lie?
- What's **really** going on?
- What is your **objective**?
![](images/nvsmap.png)
---
class: inverse, center, middle
# "How big should our segments be?"
---
# Segment size
- If you think it's an issue test it
- Resolution of covariates also important
- Maybe species-/domain-dependent?
- (Solutions on the horizon to avoid this)
---
class: inverse, center, middle
# "Is our model right?"
---
# Model validation
- Some variety of cross-validation
- Temporal replication
- Leave out 1 year, fit to others, predict, assess
- Spatial "pseudo-jackknife"
- Leave out every $n^{th}$ segment, refit, ...
- (Maybe leave out 2, 3 etc...)
---
class: inverse, center, middle
# Modelling philosophy
---
# Which covariates should we include?
- Dynamic vs static variables
- Spatial terms? Habitat models?
---
class: inverse, center, middle
# Getting help
---
# Resources
- Bibliography has pointers to these topics
- Distance sampling Google Group
- Friendly, helpful, low traffic
- see [distancesampling.org/distancelist.html](http://distancesampling.org/distancelist.html)
---
class: inverse, center, middle
# Advanced topics
---
class: inverse, center, middle
# This is a whirlwind tour...
---
class: inverse, center, middle
# ...and some of this is experimental
---
class: inverse, center, middle
# Smoother zoo
---
# Cyclic smooths
- What if things "wrap around"? (Time, angles, ...)
- Match value and derivative
- Use `bs="cc"`
- See `?smooth.construct.cs.smooth.spec`
---
# Smoothing in complex regions
.pull-left[
- Edges are important
- Whales don't live on land
- Bad things happen when we don't account for this
- Include boundary info in smoother
- `?soap`
]
.pull-right[
![Example of smoothers versus the Antarctic peninsula](images/soap.png)
]
---
# Multivariate smooths
- Thin plate splines are *isotropic*
- 1 unit in any direction is equal
- Fine for space, not for other things
---
# Tensor products
- $s_{x,z}(x,z) = \sum_{k_1}\sum_{k_2} \beta_k s_x(x)s_z(z)$
- As many covariates as you like! (But takes time)
- `te()` or `ti()` (instead of `s()`)
![Tensor product example](images/tensor.png)
---
# Black bears like to sunbathe
---
# Random effects
- normal random effects
- exploits equivalence of random effects and splines `?gam.vcomp`
- useful when you just have a “few” random effects
- `?random.effects`
---
class: inverse, center, middle
# Making things faster
---
# Parallel processing
- Some models are very big/slow
- Run on multiple cores
- Use `engine="bam"`!
- Some constraints in what you can do
- Wood, Goude and Shaw (2015)
---
# Summary
- Lots of complicated problems
- Lots of potential solutions
- (see also "other approaches" mini-lecture)
- Need to get simple things right first
- **Trade assumptions for data**