Distance sampling: Advanced topics

David L Miller

Recap

Line transects - general idea

Calculate average detection probability
- using detection function (\( g(x) \))
\( \hat{p} = \int_0^w \frac{1}{w} g(x; \hat{\theta}) dx \)
\( \frac{1}{w} \) tells us about assumed density wrt line
- uniform from the line (out to \( w \))

plot of chunk pi-y

Line transects - distances

Model drop-off using a detection function
Use extra information estimate \( \hat{N} \)
How should we adjust \( n \)? (inflate by \( n/\hat{p}) \))

Fitting detection functions

Using the package Distance
Need to have data setup a certain way
- At least columns called object, distance

library(Distance)
df_hn <- ds(distdata, truncation=6000, adjustment = NULL)

Model summary

summary(df_hn)


Summary for distance analysis 
Number of observations :  132 
Distance range         :  0  -  6000 

Model : Half-normal key function 
AIC   : 2252.06 

Detection function parameters
Scale Coefficients:  
            estimate         se
(Intercept) 7.900732 0.07884776

                       Estimate          SE         CV
Average p             0.5490484  0.03662569 0.06670757
N in covered region 240.4159539 21.32287580 0.08869160

Plotting models

plot of chunk unnamed-chunk-2

plot(df_hn)

New stuff

Overview

Here we'll look at:

Model checking and selection
What else affects detection?
Estimating abundance and uncertainty
More R!

Why check models?

AIC best model can still be a terrible model
AIC only measures relative fit
Don't know if the model gives “sensible” answers

What to check?

Convergence
- Fitting ended, but our model is not good
Monotonicity
- Our model is “lumpy”
“Goodness of fit”
- Our model sucks statistically
(Other sampling assumptions are also important!)

Convergence

Distance will warn you about this:

** Warning: Problems with fitting model. Did not converge**
Error in detfct.fit.opt(ddfobj, optim.options, bounds, misc.options) :
  No convergence.

This can be complicated, see ?"mrds-opt" for info.

Monotonicity

Only a problem with adjustments
check.mono can help

check.mono(df_hr$ddf)

[1] TRUE

Monotonicity (when it goes wrong)

plot of chunk checkmonobad

Goodness of fit

plot of chunk unnamed-chunk-3

ddf.gof(df_hn$ddf)

Check fitted distribution of distances matches empirical
# distances below distance vs. # observations below given cumulative probability

Goodness of fit

As well as quantile-quantile plot, tests
Absolute measure of fit (vs. AIC)
Kolmogorov-Smirnov: largest distance on Q-Q plot
Cramer-von Mises: tests sum of distances

Goodness of fit

plot of chunk qq-expl

blue: Kolmogorov-Smirnov
red: Cramer-von Mises

Detection function model selection

Fit models
Look at summary and plot (fitting issues?)
Look at goodness of fit results, ddf.gof
AIC to select between models
- Parsimonous: “robust” and “efficient” models

Example: fitting detection functions

df_hn <- ds(distdata, truncation=6000, adjustment = NULL)

df_hn_cos <- ds(distdata, truncation=6000, adjustment = "cos")

df_hr <- ds(distdata, truncation=6000, key="hr", adjustment = NULL)

df_hr_cos <- ds(distdata, key="hr",  truncation=6000, adjustment = "cos")

Plotting those models

plot of chunk df-plots

Q-Q plots

plot of chunk df-qqplots

AIC

df_hn$ddf$criterion

[1] 2252.06

df_hn_cos$ddf$criterion

[1] 2247.69

## same model!
df_hr$ddf$criterion

[1] 2247.594

df_hr_cos$ddf$criterion

[1] 2247.594

Selection

Not much between these models!
You'll get to investigate these and more in the lab

What else affects detectability?

Covariates

Observer characteristics
- observer name
- platform
Animal characteristics
- sex
- size
- group size

Weather conditions
- sea state
- glare
- fog

How do we include covariates?

Affects scale, not shape

plot of chunk dfcovs

Covariates in the scale

\[ \exp \left( \frac{-x^2}{2\sigma^2}\right) \text{ or } 1-\exp \left[ \left( \frac{-x}{\sigma}\right)^{-b}\right] \]

Decompose \( \sigma=\exp \left( \beta_0 + \beta_1 z_1 + \ldots\right) \)

What does detectability mean?

\( \hat{p} \) is now \( \hat{p_i} \) (or \( \hat{p}(\mathbf{z}_i) \))
Average probability of detection (average over distances)
Also calculate an average \( \hat{p} \) as a summary

Covariates in R

Add formula=... to our ds() call:

df_hr_ss <- ds(distdata, truncation=6000,
               key="hr", formula=~SeaState)

df_hr_ss_size <- ds(distdata, truncation=6000,
                    key="hr", formula=~SeaState+size)

Summaries of covariate models

summary(df_hr_ss)


Summary for distance analysis 
Number of observations :  132 
Distance range         :  0  -  6000 

Model : Hazard-rate key function 
AIC   : 2247.347 

Detection function parameters
Scale Coefficients:  
              estimate        se
(Intercept)  8.1019226 0.7906353
SeaState    -0.4473291 0.2797965

Shape parameters:  
              estimate        se
(Intercept) 0.07319982 0.2417426

                       Estimate          SE        CV
Average p             0.3583687  0.07308615 0.2039412
N in covered region 368.3357858 79.54571167 0.2159598

"Average p"

\[ \hat{p}(\mathbf{z}_i) = \int_0^w g(x; \boldsymbol{\hat{\theta}}, \mathbf{z}_i) dx \quad \text{for } i=1, \ldots, n \]

unique(predict(df_hr_ss$ddf)$fitted)

 [1] 0.3360342 0.3876026 0.2895189 0.2480620 0.3985064 0.4439768 0.2723358
 [8] 0.2559550 0.2808264 0.3459473 0.3263237 0.3663789 0.5684780 0.2114896
[15] 0.3560627 0.4677557 0.1795108 0.7000862

Group size

What are groups?

Functional definition (NO ecology!)
- If animals are near each other, they are in a group
This probably affects detectability
- Bigger groups \( \Rightarrow \) easier to detect
Two inferential targets
- abundance of groups
- abundance of individuals

Detection and group size

plot of chunk groupcovs

Not a huge change here
Bigger effect for animals that occur in large groups
- Seabirds
- Dolphins

Estimating abundance

As before, assume density same in sampled/unsampled area
Horvitz-Thompson estimator

\[ \hat{N} = \frac{A}{a} \sum_{i=1}^n \frac{s_i}{\hat{p_i}} \]

where \( s_i \) is group size, \( n \) is number of observations (groups)

Estimating uncertainty

Sources of uncertainty

\[ \hat{N} = \frac{A}{a} \sum_{i=1}^\color{blue}{n} \frac{s_i}{\color{red}{\hat{p_i}}} \]

Uncertainty in \( n \) is from sampling
Uncertainty in \( \hat{p} \) is from the model

Uncertainty from sampling

Usually calculate encounter rate variance
Encounter rate is \( n/L \)
(Measure of spatial variability \( \Rightarrow \) uncertainty)
“Objects per unit length of transect surveyed”
Fewster et al. (2009) is the definitive reference

Uncertainty from the model

Model uncertainty from estimating parameters
Maximum likelihood theory gives uncertainty in model pars

Putting those parts together

Obtain overall CV by adding squared CVs:

\[ \text{CV}^2\left( \hat{D} \right) \approx \text{CV}^2\left( \frac{n}{L} \right) + \text{CV}^2\left( \hat{p}\right) \]

(Running through this quickly, see bibliography for more details)

(One other thing...)

Assume that group size is recorded correctly
This is almost never true
There are ways to deal with this
See bibliography for more details

Variance and abundance in R...

Data required

Need three tables
- region: whole area
- sample: the samples (transects)
- observation: relate samples to observations

Schematic

plot of chunk plottables

region
sample
observations

Region table

head(region.table)

  Region.Label      Area
1    StudyArea 5.285e+11

Sample table

head(sample.table)

     Sample.Label    Effort Region.Label
1 en0439520040624 144044.67    StudyArea
2 en0439520040625 167646.84    StudyArea
3 en0439520040626  59997.33    StudyArea
4 en0439520040627  33821.89    StudyArea
5 en0439520040628 147414.92    StudyArea
6 en0439520040629 101107.83    StudyArea

Observation table

head(obs.table)

  object    Sample.Label Region.Label
1      1 en0439520040628    StudyArea
2      2 en0439520040628    StudyArea
3      3 en0439520040628    StudyArea
4      4 en0439520040628    StudyArea
5      5 en0439520040629    StudyArea
6      6 en0439520040629    StudyArea

Abundance and variance

This generates a lot of output (here is a snippit):

dht(df_hr$ddf, region.table, sample.table, obs.table)

Summary for individuals

Summary statistics:
     Region      Area  CoveredArea  Effort     n           ER        se.ER     cv.ER mean.size
1 StudyArea 5.285e+11 113981689066 9498474 238.7 2.513035e-05 5.667492e-06 0.2255238  1.808333
    se.mean
1 0.1020928

Abundance:
  Label Estimate       se        cv      lcl      ucl       df
1 Total 3053.558 943.7425 0.3090632 1682.187 5542.912 170.9157

More investigation in the practical exercises…

From that summary...

Individuals observed: \( n = 238.7 \)
Covered area: \( a = 113,981,689,066m^2 \)
Study area: \( A = 5.285\times 10^{11}m^2 \)
Detectability: \( \hat{p}=0.3625 \)

\[ \hat{N}= \frac{n}{\hat{p}} \frac{A}{a} = 3053.558 \]

Recap

Summary

How to check detection function models
Covariates can affect detectability
Group size
Sources of uncertainty
Estimation of abundance and variance