Lecture 2: Generalized Additive Models

class: title-slide, inverse, center, middle

# Lecture 2 : Generalized Additive Models

---

# Overview

- The count model, from scratch
- What is a GAM?
- What is smoothing?
- Fitting GAMs using `dsm`

---
# Building a model, from scratch

- Know count `$n_j$` in segment `$j$`
--

- Want :
$$
n_j = f( [\text{environmental covariates}]_j)
$$

--
- How to build `$f$`?

--
- Additive model of smooths `$s$`:
$$
n_j = \color{green}{\exp} \left[ \color{grey}{ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j)} \right]
$$
- `$\color{grey}{\text{model terms}}$`
- `$\color{green}{\exp}$` is the *link function*

---
# Building a model, from scratch

- What about area and detectability?
$$
n_j = \color{red}{A_j}\color{blue}{\hat{p}_j}\color{green}{\exp} \left[ \color{grey}{ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j)} \right]
$$

- `$\color{red}{A_j}$` area of segment - "offset"
- `$\color{blue}{\hat{p}_j}$` probability of detection in segment

---
# Building a model, from scratch

- It's a statistical model so:
$$
n_j = \color{red}{A_j}\color{blue}{\hat{p}_j}\color{green}{\exp} \left[ \color{grey}{ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j)}\right] + \epsilon_j
$$

- `$n_j$` has a distribution (count)
- `$\epsilon_j$` are *residuals* (differences between model and observations)

---
class: inverse, center, middle
# That's a Generalized Additive Model!

---
class: inverse, center, middle
# Now let's look at each bit...

---
# Response

$$
\color{red}{n_j} = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j) \right] + \epsilon_j
$$
<br/>
where `$\color{red}{n_j\sim \text{count distribution}}$`

---
# Count distributions

.pull-left[
![](dsm2-gams_files/figure-html/countshist-1.svg)
]

.pull-right[
- Response is a count
- Often, it's mostly zero
- mean `$\neq$` variance
  - (Poisson isn't good at this)
]

---
# Tweedie distribution

.pull-left[
![](dsm2-gams_files/figure-html/tweedie-1.svg)
(NB there is a point mass at zero not plotted)
]
.pull-right[
-  `$\text{Var}\left(\text{count}\right) = \phi\mathbb{E}(\text{count})^q$`
- Poisson is `$q=1$` 
- We estimate:
  - `$q$` (`p` in R), "power" parameter
  - `$\phi$` (`Scale est.` in R), scale parameter
]

---
# Negative binomial distribution

.pull-left[
![](dsm2-gams_files/figure-html/negbin-1.svg)
]
.pull-right[
- `$\text{Var}\left(\text{count}\right) =$` `$\mathbb{E}(\text{count}) + \kappa \mathbb{E}(\text{count})^2$`
- Estimate `$\kappa$`
- No scale parameter (`Scale est.`=1 always)
- (Poisson: `$\text{Var}\left(\text{count}\right) =\mathbb{E}(\text{count})$`) 
]

---
# Smooths

$$
n_j = A_j\hat{p}_j \exp\left[ \beta_0 + \color{red}{s(\text{y}_j) + s(\text{Depth}_j}) \right] + \epsilon_j
$$

---
# What about these "s" things?

.pull-left[
![](dsm2-gams_files/figure-html/wiggles-1.svg)
]
.pull-right[
- *Think* `$s$`=**smooth**
- Want a line that is "close" to all the data
- Balance between interpolation and "fit"
]

---
class: inverse, middle, center
# What is smoothing?

---
# Smoothing

- We think underlying phenomenon is *smooth*
  - "Abundance is a smooth function of depth"
- 1, 2 or more dimensions

---
# Estimating smooths

.pull-left[
- We set:
  - "type": *bases* (made up of *basis functions*)
  - "maximum wigglyness": *basis size* (sometimes: dimension/complexity)
- Automatically estimate:
  - "how wiggly it needs to be": *smoothing parameter(s)*
]
.pull-right[
![](dsm2-gams_files/figure-html/wiggles-plot2-1.svg)
]
---

# Splines

- Functions made of other, simpler functions
- **Basis functions** `$b_k$`, estimate `$\beta_k$` 
- `$s(x) = \sum_{k=1}^K \beta_k b_k(x)$`

![splines](images/basis-animate-1.gif)

---
# Thinking about wigglyness

- Visually:
  - Lots of wiggles `$\Rightarrow$` *not smooth*
  - Straight line `$\Rightarrow$` *very smooth*
- Smoothing parameter ( `$\lambda$` ) controls this

![](dsm2-gams_files/figure-html/wiggles-plot-1.svg)

---

# How wiggly are things?

- Measure the **effective degrees of freedom** (EDF)
- Set **basis complexity** or "size", `$k$`
- Set `$k$` "large enough"

![](dsm2-gams_files/figure-html/wiggles-plot-EDF-1.svg)

---
# Getting more out of GAMs

.pull-left[
![](images/igam.jpg)
]

.pull-right[
- I can't teach you all of GAMs in 1 week
- Good intro book
- (also a good textbook on GLMs and GLMMs)
- Quite technical in places
- More resources on course website
- `dsm` is based on `mgcv` by Simon Wood
]

---
class: inverse, middle, center
# Fitting GAMs using dsm

---
# Translating maths into R

$$
n_j = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) \right] + \epsilon_j
$$
<br/>
where `$\epsilon_j$` are some errors, `$\quad n_j\sim$` count distribution
<br/>
- inside the link: `formula=count ~ s(y)`
- response distribution: `family=nb()` or `family=tw()`
- detectability: `ddf.obj=df_hr`
- offset, data: `segment.data=segs, observation.data=obs`

---
# Your first DSM

```r
library(dsm)
dsm_x_tw <- dsm(count~s(x), ddf.obj=df,
                segment.data=segs, observation.data=obs,
                family=tw())
```

---
# `summary(dsm_x_tw)`

```
## 
## Family: Tweedie(p=1.326) 
## Link function: log 
## 
## Formula:
## count ~ s(x) + offset(off.set)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -19.8115     0.2277  -87.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##        edf Ref.df     F  p-value    
## s(x) 4.962  6.047 6.403 1.79e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0283   Deviance explained = 17.9%
## -REML = 409.94  Scale est. = 6.0413    n = 949
```

---
# Plotting

.pull-left[
![](dsm2-gams_files/figure-html/plotsmooth-1.svg)
]

.pull-right[
- `plot(dsm_x_tw)`
- Dashed lines indicate +/- 2 standard errors
- Rug plot
- On the link scale
- EDF on `$y$` axis
]

---
# Adding a term

- Just use `+`

```r
dsm_xy_tw <- dsm(count ~ s(x) + s(y),
                 ddf.obj=df,
                 segment.data=segs,
                 observation.data=obs,
                 family=tw())
```

---
# `summary(dsm_xy_tw)`

```
## 
## Family: Tweedie(p=1.306) 
## Link function: log 
## 
## Formula:
## count ~ s(x) + s(y) + offset(off.set)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.0908     0.2381  -84.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##        edf Ref.df     F p-value    
## s(x) 4.943  6.057 3.224 0.00425 ** 
## s(y) 5.293  6.419 4.034 0.00033 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0678   Deviance explained = 27.4%
## -REML = 399.84  Scale est. = 5.3157    n = 949
```

---
# Plotting

```r
plot(dsm_xy_tw, pages=1)
```
![](dsm2-gams_files/figure-html/plotsmooth-xy2-1.svg)
- `scale=0`: each plot on different scale
- `pages=1`: plot together

---
# Bivariate terms

- Assumed an additive structure
- No interaction
- We can specify `s(x,y)` (and `s(x,y,z,...)`)

---
# Bivariate spatial term

```r
dsm_xyb_tw <- dsm(count ~ s(x, y),
                 ddf.obj=df,
                 segment.data=segs,
                 observation.data=obs,
                 family=tw())
```

---
# `summary(dsm_xyb_tw)`

```
## 
## Family: Tweedie(p=1.29) 
## Link function: log 
## 
## Formula:
## count ~ s(x, y) + offset(off.set)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.2745     0.2477  -81.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value    
## s(x,y) 16.89  21.12 4.333  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.102   Deviance explained = 34.7%
## -REML = 394.86  Scale est. = 4.8248    n = 949
```

---
# Plotting

.pull-left[

```r
plot(dsm_xyb_tw, select=1,
     scheme=2, asp=1)
```
- On link scale
- `scheme=2` makes heatmap
- (set `too.far` to exclude points far from data)

]
.pull-right[
![](dsm2-gams_files/figure-html/twodee-1.svg)
]

---
# Comparing bivariate and additive models

![](dsm2-gams_files/figure-html/xy-x-y-1.svg)

---

class: inverse, middle, center

# Let's have a go...