11 Mark-recapture distance sampling of golftees
This document is designed to give you some pointers so that you can perform the Mark-Recapture Distance Sampling practical directly using the mrds
package in R, rather than via the Distance graphical interface. I assume you have some knowledge of R, the mrds
package, and Distance.
11.1 Golf tee survey
The golf tee dataset is provided as part of the mrds
package, as well as in the Distance GolfteesExercise
project.
Distance 7 instructions
Data layers associated with the golftees dataset-
Note particularly the fields in the Observation layer: object, observer, detected.
Open R and load the mrds
package and golf tee dataset.
library(knitr)
library(mrds)
data(book.tee.data)
#investigate the structure of the dataset
str(book.tee.data)
List of 4
$ book.tee.dataframe:'data.frame': 324 obs. of 7 variables:
..$ object : num [1:324] 1 1 2 2 3 3 4 4 5 5 ...
..$ observer: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
..$ detected: num [1:324] 1 0 1 0 1 0 1 0 1 0 ...
..$ distance: num [1:324] 2.68 2.68 3.33 3.33 0.34 0.34 2.53 2.53 1.46 1.46 ...
..$ size : num [1:324] 2 2 2 2 1 1 2 2 2 2 ...
..$ sex : num [1:324] 1 1 1 1 0 0 1 1 1 1 ...
..$ exposure: num [1:324] 1 1 0 0 0 0 1 1 0 0 ...
$ book.tee.region :'data.frame': 2 obs. of 2 variables:
..$ Region.Label: Factor w/ 2 levels "1","2": 1 2
..$ Area : num [1:2] 1040 640
$ book.tee.samples :'data.frame': 11 obs. of 3 variables:
..$ Sample.Label: num [1:11] 1 2 3 4 5 6 7 8 9 10 ...
..$ Region.Label: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 2 2 2 2 ...
..$ Effort : num [1:11] 10 30 30 27 21 12 23 23 15 12 ...
$ book.tee.obs :'data.frame': 162 obs. of 3 variables:
..$ object : int [1:162] 1 2 3 21 22 23 24 59 60 61 ...
..$ Region.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
..$ Sample.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
#extract the list elements from the dataset into easy-to-use objects
detections <- book.tee.data$book.tee.dataframe
#make sure sex and exposure are factor variables
detections$sex <- as.factor(detections$sex)
detections$exposure <- as.factor(detections$exposure)
region <- book.tee.data$book.tee.region
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs
We’ll start by fitting the initial full independence model, with only distance as a covariate - just as was done in the “FI - MR dist” model in Distance. Indeed, if you did fit that model in Distance, you can look in the Log tab at the R code Distance generated, and compare it with the code we use here.
Feel free to use ?
to find out more about any of the functions used – e.g., ?ddf
will tell you more about the ddf function.
#Fit the model
fi.mr.dist <- ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance),
data=detections,meta.data=list(width=4))
#Create a set of tables summarizing the double observer data (this is what Distance does)
detection.tables <- det.tables(fi.mr.dist)
#Print these detection tables
detection.tables
Observer 1 detections
Detected
Missed Detected
[0,0.4] 1 25
(0.4,0.8] 2 16
(0.8,1.2] 2 16
(1.2,1.6] 6 22
(1.6,2] 5 9
(2,2.4] 2 10
(2.4,2.8] 6 12
(2.8,3.2] 6 9
(3.2,3.6] 2 3
(3.6,4] 6 2
Observer 2 detections
Detected
Missed Detected
[0,0.4] 4 22
(0.4,0.8] 1 17
(0.8,1.2] 0 18
(1.2,1.6] 2 26
(1.6,2] 1 13
(2,2.4] 2 10
(2.4,2.8] 3 15
(2.8,3.2] 4 11
(3.2,3.6] 2 3
(3.6,4] 1 7
Duplicate detections
[0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
21 15 16 20 8 8 9
(2.8,3.2] (3.2,3.6] (3.6,4]
5 1 1
Observer 1 detections of those seen by Observer 2
Missed Detected Prop. detected
[0,0.4] 1 21 0.9545455
(0.4,0.8] 2 15 0.8823529
(0.8,1.2] 2 16 0.8888889
(1.2,1.6] 6 20 0.7692308
(1.6,2] 5 8 0.6153846
(2,2.4] 2 8 0.8000000
(2.4,2.8] 6 9 0.6000000
(2.8,3.2] 6 5 0.4545455
(3.2,3.6] 2 1 0.3333333
(3.6,4] 6 1 0.1428571
# They could also be plotted, but I've not done so in the interest of space
# plot(detection.tables)
#Produce a summary of the fitted detection function object
summary(fi.mr.dist)
Summary for trial.fi object
Number of observations : 162
Number seen by primary : 124
Number seen by secondary (trials) : 142
Number seen by both (detected trials): 104
AIC : 452.8094
Conditional detection function parameters:
estimate se
(Intercept) 2.900233 0.4876238
distance -1.058677 0.2235722
Estimate SE CV
Average p 0.6423252 0.04069409 0.06335434
Average primary p(0) 0.9478579 0.06109655 0.06445750
N in covered region 193.0486185 15.84826458 0.08209468
#Produce goodness of fit statistics and a qq plot
gof.result <- ddf.gof(fi.mr.dist,
main="Full independence, trial mode goodness of fit\nGolftee data")
chi.distance <- gof.result$chisquare$chi1$chisq
chi.markrecap <- gof.result$chisquare$chi2$chisq
chi.total <- gof.result$chisquare$pooled.chi
Distance 7 instructions
Specification of the full independence model with distance as the only covariate in the mark-recapture model, fitted with a logit transform
Having run this model in Distance 7, convince yourself the parameter estimates for sigma and the distance covariate are the same when using the Distance 7 interface as when using mrds() directly.
Abbreviated \(\chi^2\) goodness of fit assessment shows the \(\chi^2\) contribution from the distance sampling model to be 11.5 and the \(\chi^2\) contribution from the mark-recapture model to be 3.4. The combination of these elements produces a total \(\chi^2\) of 14.9 with 17 degrees of freedom, resulting in a P-value of 0.604
#Calculate density estimates using the dht function
tee.abund <- dht(fi.mr.dist,region,samples,obs)
kable(tee.abund$individuals$summary, digits=2,
caption="Survey summary statistics for golftees")
Region | Area | CoveredArea | Effort | n | ER | se.ER | cv.ER | mean.size | se.mean |
---|---|---|---|---|---|---|---|---|---|
1 | 1040 | 1040 | 130 | 229 | 1.76 | 0.12 | 0.07 | 3.18 | 0.21 |
2 | 640 | 640 | 80 | 152 | 1.90 | 0.33 | 0.18 | 2.92 | 0.23 |
Total | 1680 | 1680 | 210 | 381 | 1.81 | 0.14 | 0.08 | 3.07 | 0.15 |
kable(tee.abund$individuals$N, digits=2,
caption="Abundance estimates for golftee population with two strata")
Label | Estimate | se | cv | lcl | ucl | df |
---|---|---|---|---|---|---|
1 | 356.52 | 32.35 | 0.09 | 294.54 | 431.53 | 17.13 |
2 | 236.64 | 44.14 | 0.19 | 147.33 | 380.09 | 5.06 |
Total | 593.16 | 60.38 | 0.10 | 478.32 | 735.57 | 16.06 |
Now, see if you can work out how to change the call to ddf to fit the other models mentioned in the exercise, and then write code to enable you to compare the models and select among them.
11.2 Crabeater seal survey
This analysis is described in Borchers et al. (2005) Biometrics paper of aerial survey data looking for seals in the Antarctic pack ice. There were four observers in the plane, two on each side (front and back).
The data from the survey has been saved in a .csv
file. This file is read into R using read.table()
. Note that these tables are only needed when estimating abundance by scaling up from the covered region to the study area.
library(Distance)
crabseal <- read.csv("crabbieMRDS.csv")
# Half normal detection function, 700m truncation distance,
# logit function for mark-recapture component
crab.ddf.io <- ddf(method="io", dsmodel=~cds(key="hn"),
mrmodel=~glm(link="logit", formula=~distance),
data=crabseal, meta.data=list(width=700))
summary(crab.ddf.io)
Summary for io.fi object
Number of observations : 1740
Number seen by primary : 1394
Number seen by secondary : 1471
Number seen by both : 1125
AIC : 3011.463
Conditional detection function parameters:
estimate se
(Intercept) 2.107762345 0.0994391200
distance -0.003087713 0.0003159216
Estimate SE CV
Average primary p(0) 0.8916554 0.009606428 0.010773701
Average secondary p(0) 0.8916554 0.009606428 0.010773701
Average combined p(0) 0.9882614 0.002081614 0.002106339
Summary for ds object
Number of observations : 1740
Distance range : 0 - 700
AIC : 22314.4
Detection function:
Half-normal key function
Detection function parameters
Scale coefficient(s):
estimate se
(Intercept) 5.828703 0.0268578
Estimate SE CV
Average p 0.5845871 0.01247837 0.02134562
Summary for io object
Total AIC value : 25325.86
Estimate SE CV
Average p 0.5777249 0.01239179 0.02144929
N in covered region 3011.8139211 79.84197966 0.02650960
Distance 7 instructions
Specification of the point independence model with distance as the only covariate in the mark-recapture model. However under point independence, not only is the a mark-recapture model but also a distance sampling model (shown here)
Having run this model in Distance 7, convince yourself the parameter estimates for sigma and the distance covariate are the same when using the Distance 7 interface as when using mrds() directly.
Goodness of fit could be examined in the same manner as the golf tees by the use of ddf.gof(crab.ddf.io)
but I have not shown this step.
Following model criticism and selection, estimation of abundance ensues. The estimates of abundance for the study area are arbitrary because inference of the study was restricted to the covered region. Hence the estimates of abundance here are artificial. For illustration, the checkdata()
function produces the region, sample, and observation tables. From these tables Horvitz-Thompson like estimators can be applied to produce estimates of \(\hat{N}\). The use of convert.units
adjusts the units of perpendicular distance measurement (m) to units of transect effort (km). Be sure to perform the conversion correctly or your abundance estimates will be off by orders of magnitude.
tables <- Distance:::checkdata(crabseal[crabseal$observer==1,])
crab.ddf.io.abund <- dht(region=tables$region.table,
sample=tables$sample.table, obs=tables$obs.table,
model=crab.ddf.io, se=TRUE, options=list(convert.units=0.001))
kable(crab.ddf.io.abund$individuals$summary, digits=3,
caption="Summary information from crabeater seal aerial survey.")
Region | Area | CoveredArea | Effort | n | ER | se.ER | cv.ER | mean.size | se.mean |
---|---|---|---|---|---|---|---|---|---|
1 | 1e+06 | 8594.082 | 6138.63 | 2053 | 0.334 | 0.033 | 0.097 | 1.18 | 0.013 |
kable(crab.ddf.io.abund$individual$N, digits=3,
caption="Crabeater seal abundance estimates for study area of arbitrary size.")
Label | Estimate | se | cv | lcl | ucl | df |
---|---|---|---|---|---|---|
Total | 413493.2 | 41201.49 | 0.09964248 | 339670.9 | 503359.6 | 128.6257 |
Distance 7 instructions
Specification of the point independence model with distance as the only covariate in the mark-recapture model. However under point independence, not only is the a mark-recapture model but also a distance sampling model (shown here).
The abundance estimates are not an exact match between the Distance 7 analysis and the mrds() analysis because of slight differences in the datasets. However, there is general agreement that the seal population size in this arbitrary study area is approximately 400,000 with a CV of ~10%.
11.2.1 Extra credit – Crabeater seals with MCDS
We can also analyse the crabeater seals data as if it were MCDS data (ignoring \(g(0)\)).
Data identical to that avaiable in the Distance project CrabbieMCDSExercise.zip
has been ported to crabbieMCDS.csv
as if you had entered these data yourself into a spreadsheet.
This short exercise guides you through the import of those data into R, and fits a simple half-normal detection function examining the possible improvement of the model by incorporating a side of plane covariate.
library(Distance)
crab.covariate <- read.csv("crabbieMCDS.csv")
head(crab.covariate, n=3)
Study.area Region.Label Area Sample.Label Effort distance size side
1 Nominal_area 1 1000000 99A21 59.72 144.49 1 R
2 Nominal_area 1 1000000 99A21 59.72 125.16 1 L
3 Nominal_area 1 1000000 99A21 59.72 421.40 1 L
exp fatigue gscat vis glare ssmi altitude obsname
1 0.0 61.90 1 G N 79 43.05763 YH
2 211.7 62.61 1 G N 79 43.05763 MF
3 0.0 62.86 1 G N 79 43.05763 MH
Noting that the data have been read into R appropriately, we are ready to fit a detection function.
Note that side of plane is assigned characters (L/R) in our data. Before using side of plane as a covariate, we need to tell R that side of plane takes on discrete values, hence characters are acceptable for use; this happens with the as.factor()
function.
ds.side <- ds(crab.covariate, key="hn", formula=~as.factor(side), truncation=700)
Model contains covariate term(s): no adjustment terms will be included.
Fitting half-normal key function
AIC= 22304.742
Having fitted a detection function using side of plane as a covariate to the half-normal detection function, we would like to assess the fit of this function to our data. Two visual assessments are provided by the panels below: histogram and function on the left and QQ plot on the right.
par(mfrow = c(1, 2))
plot.title <- "Two sets of points\none for each 'side' of plane"
plot(ds.side, pch=19, cex=0.5, main = plot.title)
gof.result <- ds.gof(ds.side, lwd = 2, lty = 1, pch = ".", cex = 0.5)
message <- paste("CVM GOF p-value=", round(gof.result$dsgof$CvM$p, 4))
text(0.6, 0.2, message, cex=0.5)
We could also fit a detection function without the covariate and contrast the two models.
ds.nocov <- ds(crab.covariate, key="hn", formula=~1, truncation=700)
Starting AIC adjustment term selection.
Fitting half-normal key function
Key only model: not constraining for monotonicity.
AIC= 22314.398
Fitting half-normal key function with cosine(2) adjustments
AIC= 22308.645
Fitting half-normal key function with cosine(2,3) adjustments
AIC= 22304.015
Fitting half-normal key function with cosine(2,3,4) adjustments
AIC= 22305.943
Half-normal key function with cosine(2,3) adjustments selected.
AIC score for model without covariate is 22304 and AIC score for model with side as a covariate is 22305 so the model that includes side as a covariate is preferred.
Abundance estimates are provided if you print the contents of summary.covar
or summary.nocovar
to your console, but we do not provide those results.
References
Borchers, D. L., J. L. Laake, C. Southwell, and C. G. M. Paxton. 2005. “Accommodating Unmodeled Heterogeneity in Double-Observer Distance Sampling Surveys.” Biometrics 62 (2). Wiley-Blackwell: 372–78. doi:10.1111/j.1541-0420.2005.00493.x.