Introductory distance sampling workshop

CREEM, Univ of St Andrews

Addressing size bias through covariate in detection function

2019-08-16

Size bias in distance sampling surveys

As shown in the lecture, if detectability is a function not only of distance, but also size (big groups are easier to see than small groups), then groups in the sample are likely to be larger than groups in the entire population. Consequently, when the density of groups is scaled up to the density of individuals \[\hat{D}_{indiv} = \hat{D}_{groups} \times \overline{size}_{group}\]

\(\hat{D}_{indiv}\) is overestimated.

A resolution to this problem is to explicitly model the probability of detection as a function of group size using size as a covariate in the detection function. I will demonstrate two applications: one where group size variability is small and one where group size variability is large. I will use simulation (where the answer is known) to demonstrate.

The necessary syntax to include covariates, group size in this instance, in the detection function is:

a.covariate <- ds(my.data, transect="line", key="hn", formula=~size)

Example 1: perhaps a terrestrial ungulate

Here animals occur in small herds. The distribution of herd size is Poisson with a mean herd size of 10.

You can see it is very rare for herds to exceed a size of twice the mean.

I’ll create a population with this distribution of herd size, with a true number of herds of 200; hence true number of individuals in the population is 2000. Those of you who are here next week will learn the details of the simulation process.

Do we have the tell-tale sign of size bias–missing small groups at large distances?

The distribution of computed average group size centred on the true size of 10 and there was no problem with fitting a detection function. The average over the simulations estimated number of individuals was 2015.28.

As a comparison, what happens if we don’t include size as a covariate in our detection function?

The distribution of computed average groups sizes is shown at right. We would expect an overestimate of mean group size because small groups at large distances are missing from our sample; but that effect is small in this instance. As a consequence, the average \(\hat{N}_{indiv}\) across all simulations is 2016.48.

Example 2: Possible dolphin pods or seabird rafts

I use a different distribution to mimic the group size distribution. A log normal distribution (you heard about it during the precision lecture) is like a normal distribution that has had its right tail pulled out.

The median of this distribution is 12 (not far from 10 in the previous example), but because of the right tail, the mean is 21.9. This changes the true number of individuals in the population to 200*21.9=4388

How about “missingness” of small groups at large distances?

Analysis with covariate

When including size as a covariate, estimates of average group size are not affected (figure in right margin). Likewise, mean \(\hat{N}_{indiv}\) is effectively unbiased: 4333.99.

Analysis without the covariate

Now mean \(\hat{N}_{indiv}\) is considerably biased: 5350.5, 21.9 percent larger than the true number of individuals in the population, 4388.

Take home message

When variability in group size is small for your study animal, size bias is unlikely to cause a problem, because even missing small groups at large distances does not cause the average size in the detected sample to be too different from the average size in the population. However, when group size variation is large, the average size in the sample can be considerably larger than the average group size in the population, inducing positive bias in the estimated number of individuals in the population. Under those situations, include group size as a covariate in the detection function modelling.