Estimating variance
========================================
author: David L Miller
css: custom.css
transition: none
Now we can make predictions
=============================
type:section
Now we are dangerous.
```{r setup, include=FALSE}
library(knitr)
library(viridis)
library(ggplot2)
library(dsm)
opts_chunk$set(cache=TRUE, echo=FALSE)
load("../spermwhale-analysis/count-models.RData")
load("../spermwhale-analysis/sperm-data.RData")
load("../spermwhale-analysis/df-models.RData")
load("../spermwhale-analysis/predgrid.RData")
# fit a quick model from previous exericises
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
Predictions are useless without uncertainty
============================================
- We are doing statistics
- We want to know about **uncertainty**
- This is the most useful part of the analysis
What do we want the uncertainty for?
======================================
- Variance of total abundance
- Map of uncertainty (coefficient of variation)
Where does uncertainty come from?
==================================
type:section
Sources of uncertainty
=======================
- Detection function
- GAM parameters
Let's think about smooths first
================================
type:section
Uncertainty in smooths
========================
- Dashed lines are +/- 2 standard errors
- How do we translate to $\hat{N}$?
```{r twmod-unc-smooth, fig.width=15}
plot(dsm.tw, pages=1, scale=0)
```
Back to bases
===============
- Before we expressed smooths as:
- $s(x) = \sum_{k=1}^K \beta_k b_k(x)$
- Theory tells us that:
- $\boldsymbol{\beta} \sim N(\boldsymbol{\hat{\beta}}, \mathbf{V}_\boldsymbol{\beta})$
- where $\mathbf{V}_\boldsymbol{\beta}$ is a bit complicated
- Apply parameter variance to $\hat{N}$
Predictions to prediction variance (roughly)
================================================
- "map" data onto fitted values $\mathbf{X}\boldsymbol{\beta}$
- "map" prediction matrix to predictions $\mathbf{X}_p \boldsymbol{\beta}$
- Here $\mathbf{X}_p$ need to take smooths into account
- pre-/post-multiply by $\mathbf{X}_p$ to "transform variance"
- $\Rightarrow \mathbf{X}_p^\text{T}\mathbf{V}_\boldsymbol{\beta} \mathbf{X}_p$
- link scale, need to do another transform for response
Adding in detection functions
===============================
type:section
GAM + detection function uncertainty
=====================================
(Getting a little fast-and-loose with the mathematics)
From previous lectures we know:
$$
\text{CV}^2\left( \hat{N} \right) \approx \text{CV}^2\left( \text{GAM} \right) +\\
\text{CV}^2\left( \text{detection function}\right)
$$
Not that simple...
====================
- Assumes detection function and GAM are **independent**
- Maybe this is okay?
A better way (for some models)
===============================
- Include the detectability as a "fixed" term in GAM
- Mean effect is zero
- Variance effect included
- Uncertainty "propagated" through the model
- Details in bibliography (too much to detail here)
That seemed complicated...
===========================
type:section
R to the rescue
=================
type:section
In R...
=========
- Functions in `dsm` to do this
- `dsm.var.gam`
- assumes spatial model and detection function are independent
- `dsm.var.prop`
- propagates uncertainty from detection function to spatial model
- only works for `count` models (more or less)
Variance of abundance
======================
Using `dsm.var.gam`
```{r, var-tot-abund-gam, echo=TRUE}
dsm_tw_var_ind <- dsm.var.gam(dsm_all_tw_rm, predgrid, off.set=predgrid$off.set)
summary(dsm_tw_var_ind)
```
Variance of abundance
======================
Using `dsm.var.prop`
```{r, var-tot-abund, echo=TRUE}
dsm_tw_var <- dsm.var.prop(dsm_all_tw_rm, predgrid, off.set=predgrid$off.set)
summary(dsm_tw_var)
```
Plotting - data processing
============================
- Calculate uncertainty per-cell
- `dsm.var.*` thinks `predgrid` is one "region"
- Need to split data into cells (using `split()`)
- (Could be arbitrary sets of cells, see exercises)
- Need `width` and `height` of cells for plotting
Plotting (code)
=================
```{r var-map-split, echo=TRUE}
predgrid$width <- predgrid$height <- 10*1000
predgrid_split <- split(predgrid, 1:nrow(predgrid))
head(predgrid_split,3)
dsm_tw_var_map <- dsm.var.prop(dsm_all_tw_rm, predgrid_split,
off.set=predgrid$off.set)
```
CV plot
========
```{r plotit}
p <- plot(dsm_tw_var_map, observations=FALSE, plot=FALSE) +
coord_equal() +
scale_fill_viridis()
print(p)
```
***
```
p <- plot(dsm_tw_var_map, observations=FALSE, plot=FALSE) +
coord_equal() +
scale_fill_viridis()
print(p)
```
Interpreting CV plots
======================
- Plotting coefficient of variation
- Standardise standard deviation by mean
- $\text{CV} = \text{se}(\hat{N})/\hat{N}$ (per cell)
- Can be useful to overplot survey effort
Effort overplotted
=====================
```{r loadtracks, results="hide"}
library(rgdal)
tracks <- readOGR("../spermwhale-analysis/Analysis.gdb", "Segments")
tracks <- fortify(tracks)
```
```{r plottracks, cache=FALSE}
library(ggplot2)
p <- plot(dsm_tw_var_map, observations=FALSE, plot=FALSE) +
coord_equal() +
scale_fill_viridis() +
geom_path(aes(x=long,y=lat, group=group), colour="white", data=tracks)
print(p)
```
Big CVs
========
- Here CVs are "well behaved"
- Not always the case (huge CVs possible)
- These can be a pain to plot
- Use `cut()` in R to make categorical variable
- e.g. `c(seq(0,1, len=100), 2:4, Inf)` or somesuch
Recap
======
- How does uncertainty arise in a DSM?
- Estimate variance of abundance estimate
- Map coefficient of variation
Let's try that!
=================
type:section