Introduction to distance sampling

David L Miller

Overview

Line transects
Simple estimates of abundance
Why is detectability important?
What is a detection function?
First look at fitting models in R

How many animals are there? (500!)

plot of chunk plot

General strategy

Take a sample in some fixed areas
Find density/abundance in covered area
Multiply up to get abundance

General strategy (What did we assume?)

Take a sample in some fixed areas
- Sample is representative
Find density/abundance in covered area
- Estimator is “good”
Multiply up to get abundance
- Sample is representative

Plot sampling

plot of chunk plotsampling

Surveyed 10 quadrats (each \( 0.1^2 \) units)
- Total covered area \( a=10 * 0.1^2 = \) 0.1
Saw \( n= \) 59 animals
Estimated density \( \hat{D}=n/a= \) 590
Total area \( A=1 \)
Estimated abundance \( \hat{N}=\hat{D}A= \) 590

Strip transect

plot of chunk strip

Surveyed 4 lines (each \( 1*0.025 \) units)
- Total covered area \( a=4*1*0.025 = \) 0.1
Saw \( n= \) 57 animals
Estimated density \( \hat{D}=n/a= \) 570
Total area \( A=1 \)
Estimated abundance \( \hat{N}=\hat{D}A= \) 570

Detectability

Detectability matters!

We've assumed certain detection so far
This rarely happens in the field
Distance to the object is important
- (Other things too, more on that later)
- Detectability should decrease with increasing distance

Distance and detectability

Dolphins near and far from the bow of a ship. Credit Scott and Mary Flanders

Credit Scott and Mary Flanders

Recording distances is more efficient

Plots: what if an animal is just outside the box?
Strips: what if an animal is just outside the strip?
Line transects: record everything (within reason), then discard later
- Decide strip width (truncation distance) later

Detection as a function of distance

plot of chunk df

Model probability of detection, given distance
Fit models for the curve
Derive a probability of detection from this model

Line transect

plot of chunk lt

Line transects - distances

plot of chunk distance-hist

Distances from the line (sampler) to animal
Now we recorded distances, what do they look like?
“Fold” distribution over, left/right doesn't matter
Drop-off in # observations w. increasing distance

Distance sampling animation

Animation of line transect survey

"You should model that"

Detection function

plot of chunk df-fit

Using distance information

Detection function: \( \mathbb{P}(\text{ detection } \vert \text{ at distance } x) \)
Integrate out the conditioning \( \Rightarrow \mathbb{P}(\text{ detection }) = \hat{p} \)
“Inflate” \( n \) by \( \hat{p} \) to estimate abundance

Integrating out distance

Distance sampling estimate

Surveyed 5 lines (each \( 1*0.025 \) units)
- Total covered area \( a=5*1*0.02 = \) 0.2
Probability of detection \( \hat{p} = \int_0^w \frac{g(x)}{w}dx= \) 0.5981
Saw \( n= \) 60 animals
Inflate to \( n/\hat{p}= \) 100.31
Estimated density \( \hat{D}=\frac{n/\hat{p}}{a}= \) 502
Total area \( A=1 \)
Estimated abundance \( \hat{N}=\hat{D}A= \) 502

Summary: line transects

Efficient survey design
Relax the assumption of perfect detection
Exchange assumptions for data
More information = better inference

Assumptions

Animals are distributed independent of lines
On the line, detection is certain
Distances are recorded correctly
Animals don't move before detection

Animals are distributed independent of lines

plot of chunk lt-assumption-unif

When transects follow features
Difficult to work out detectability vs. distribution

On the line, detection is certain

plot of chunk df-g0-issue

Perception bias
Availability bias
Don't know \( y \) axis scale

Perception bias

Seal peeping out

Credit MAKY_OREL

Orca porpoising

Credit Minette Layne

Distances are recorded correctly

plot of chunk df-measurement-issue

Measurement error
Don't know \( x \) axis scale
This can be systematic

Animals don't move before detection

plot of chunk lt-assumption-movement

Animals can be attracted or repelled
Problems with distribution wrt line and/or measurement error

Attraction to the line

Dolphins bowriding. Credit Cork Whale Watch.

Credit Cork Whale Watch

Detection functions

What are detection functions?

Model \( \mathbb{P}\left( \text{detection } \vert \text{ animal at distance } x \right) \)
(Hence the integration)
Many different forms, depending on the data
All share some characteristics

Detection function assumptions

Have a “shoulder”
- we see things nearby easily
Monotonic decreasing
- never increasing with increasing distance
“Model robust”
- lots of forms/flexible models
“Pooling robust”
- individual heterogeneity averages out
“Efficient”
- models don't need lots of parameters

Possible detection functions

There are many options
A restricted set we'll cover in this course…
- Half-normal
- Hazard-rate
- adjustments to the above

Half-normal detection functions

plot of chunk df-hn

Hazard-rate detection functions

plot of chunk df-hr

Adjustment terms

These models are flexible
What about adding more flexibilty by “adjusting” them
Options:
- Cosine series
- Polynomials
- Hermite polynomials
Add extra flexibility

Half-normal (with cosine adjustments)

plot of chunk df-hn-cos

Okay, but how can we actually do this?

Modelling strategy

Pick some formulations, fit models
Check assumptions are violated
Goodness of fit
Select models
Estimate \( \hat{N} \) (and uncertainty!)

Distance sampling data

Need to have data setup a certain way
- a data.frame with one row per observation
- at least 2 columns, named “object” and “distance”

   distance object size SeaState
1  246.0173      1    2      3.0
2 1632.3934      2    2      2.5
3 2368.9941      3    1      3.0
4  244.6977      4    1      3.5
5 2081.3468      5    1      4.0
6 1149.2632      6    1      2.4

Fitting detection functions (in R!)

Using the package Distance
Function ds() does most of the work

library(Distance)
df_hn <- ds(distdata, truncation=6000, adjustment = NULL)
df_hr <- ds(distdata, truncation=6000, key="hr", adjustment = NULL)

Model summary

summary(df_hn)


Summary for distance analysis 
Number of observations :  132 
Distance range         :  0  -  6000 

Model : Half-normal key function 
AIC   : 2252.06 

Detection function parameters
Scale Coefficients:  
            estimate         se
(Intercept) 7.900732 0.07884776

                       Estimate          SE         CV
Average p             0.5490484  0.03662569 0.06670757
N in covered region 240.4159539 21.32287580 0.08869160

Plotting models

plot of chunk unnamed-chunk-4

Truncation

plot of chunk trunc

We set truncation=6000, why?
Remove observations in the tail of the distribution
Care about \( g \) near 0!
Trade-off! (Here we use ~96% of the data)
Len Thomas suggests \( g(w)\approx 0.15 \)

Introduction to distance sampling

Overview

How many animals are there? (500!)

General strategy

General strategy (What did we assume?)

Plot sampling

Strip transect

Detectability

Detectability matters!

Distance and detectability

Recording distances is more efficient

Detection as a function of distance

Line transect

Line transects - distances

Distance sampling animation

"You should model that"

Detection function

Using distance information

Integrating out distance

Distance sampling estimate

Summary: line transects

Assumptions

Assumptions

Animals are distributed independent of lines

On the line, detection is certain

Perception bias

Distances are recorded correctly

Animals don't move before detection

Attraction to the line

Detection functions

What are detection functions?

Detection function assumptions

Possible detection functions

Half-normal detection functions

Hazard-rate detection functions

Adjustment terms

Half-normal (with cosine adjustments)

Okay, but how can we actually do this?

Modelling strategy

Distance sampling data

Fitting detection functions (in R!)

Model summary

Plotting models

Truncation

Recap

Distance sampling

What's next?