Here is a “solution” for practical 1. As with any data analysis, there is no correct answer, but this shows how I would approach this analysis.

Much of the text below is as in the exercise itself, so it should be relatively easy to navigate.

Additional text and code is highlighted using boxes like this.

Preamble

First need to load the requisite R libraries

library(rgdal)

## Loading required package: sp
## rgdal: version: 1.0-4, (SVN revision 548)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/proj
##  Linking to sp version: 1.1-1

library(ggplot2)
library(Distance)

## Loading required package: mrds
## This is mrds 2.1.14
## Built: R 3.2.0; ; 2015-07-30 10:07:19 UTC; unix

library(knitr)

Load the data

The observations are located in a “geodatabase” we created in Arc. We want to pull out the “Sightings” table (called a “layer”) and make it into a data.frame (so that it’s easier for R to manipulate).

distdata <- readOGR("Analysis.gdb", layer="Sightings")

## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Sightings"
## with 137 features
## It has 7 fields

distdata <- as.data.frame(distdata)

We can check it has the correct format using head:

head(distdata)

##    Survey GroupSize SeaState  Distance        SightingTime SightingID
## 1 en04395         2      3.0  246.0173 2004/06/28 10:22:21          1
## 2 en04395         2      2.5 1632.3934 2004/06/28 13:18:14          2
## 3 en04395         1      3.0 2368.9941 2004/06/28 14:13:34          3
## 4 en04395         1      3.5  244.6977 2004/06/28 15:06:01          4
## 5 en04395         1      4.0 2081.3468 2004/06/29 10:48:31          5
## 6 en04395         1      2.4 1149.2632 2004/06/29 14:35:34          6
##   SegmentID coords.x1 coords.x2
## 1        48   -65.636    39.576
## 2        50   -65.648    39.746
## 3        51   -65.692    39.843
## 4        52   -65.717    39.967
## 5        56   -65.820    40.279
## 6        59   -65.938    40.612

The Distance package expects certain column names to be used. Renaming is much easier to do in R than ArcGIS, so we do it here.

distdata$distance <- distdata$Distance
distdata$object <- distdata$SightingID
distdata$size <- distdata$GroupSize

Let’s see what we did:

head(distdata)

##    Survey GroupSize SeaState  Distance        SightingTime SightingID
## 1 en04395         2      3.0  246.0173 2004/06/28 10:22:21          1
## 2 en04395         2      2.5 1632.3934 2004/06/28 13:18:14          2
## 3 en04395         1      3.0 2368.9941 2004/06/28 14:13:34          3
## 4 en04395         1      3.5  244.6977 2004/06/28 15:06:01          4
## 5 en04395         1      4.0 2081.3468 2004/06/29 10:48:31          5
## 6 en04395         1      2.4 1149.2632 2004/06/29 14:35:34          6
##   SegmentID coords.x1 coords.x2  distance object size
## 1        48   -65.636    39.576  246.0173      1    2
## 2        50   -65.648    39.746 1632.3934      2    2
## 3        51   -65.692    39.843 2368.9941      3    1
## 4        52   -65.717    39.967  244.6977      4    1
## 5        56   -65.820    40.279 2081.3468      5    1
## 6        59   -65.938    40.612 1149.2632      6    1

We now have four “extra” columns.

Exploratory analysis

Before setting off fitting detection functions, let’s look at the relationship of various variables in the data.

Don’t worry too much about understanding the code that generates these plots at the moment.

Distances

Obviously, the most important covariate in a distance sampling analysis is distance itself. We can plot a histogram of the distances to check that (1) we imported the data correctly and (2) it conforms to the usual shape for line transect data.

hist(distdata$distance, xlab="Distance (m)", main="Distance to sperm whale observations")

Size and distance

We might expect that there will be a relationship between the distance at whcih we see animals and the size of the groups observed (larger groups are easier to see at larger distances), so let’s plot that to help us visualise the relationship.

# plot of size versus distance and sea state vs distance, linear model and LOESS smoother overlay

# put the data into a simple format, only selecting what we need
distplot <- distdata[,c("distance","size","SeaState")]
names(distplot) <- c("Distance", "Size", "Beaufort")
library(reshape2)
# "melt" the data to have only three columns (try head(distplot))
distplot <- melt(distplot, id.vars="Distance", value.name="covariate")

# make the plot
p <- ggplot(distplot, aes(x=covariate, y=Distance)) +
      geom_point() +
      facet_wrap(~variable, scale="free") +
      geom_smooth(method="loess", se=FALSE) +
      geom_smooth(method="lm", se=FALSE) +
      labs(x="Covariate value", y="Distance (m)")
print(p)

Distance and sea state

We might also expect that increaing sea state would result in a drop in observations. We can plot histograms of distance for each sea state level (making the sea state take only values 0,1,2,4,5 for this).

distdata$SeaStateCut <- cut(distdata$SeaState,seq(0,5,by=1), include.lowest=TRUE)
p <- ggplot(distdata) +
      geom_histogram(aes(distance)) +
      facet_wrap(~SeaStateCut) +
      labs(x="Distance (m)", y="Count")
print(p)

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Survey effect

Given we are including data from two different surveys we can also investigate the relationship between survey and distances observed.

p <- ggplot(distdata) +
      geom_histogram(aes(distance)) +
      facet_wrap(~Survey) +
      labs(x="Distance (m)", y="Count")
print(p)

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Fitting detection functions

It’s now time to fit some detection function models. We’ll use the ds() function from the Distance package to fit the detection function. You can access the help file for the ds() function by typing ?ds – this will give you information about what the different arguments to the function are and what they do.

We can fit a very simple model with the following code:

df_hn <- ds(data=distdata, truncation=6000, key="hn", adjustment=NULL)

## Fitting half-normal key function
## Key only models do not require monotonicity contraints. Not constraining model for monotonicity.
## AIC= 2252.06
## No survey area information supplied, only estimating detection function.

Let’s dissect the call and see what each argument means:

data=distdata: the data to use to fit the model, as we prepared above.
truncation=6000: set the truncation distance. Here, observations at distances greater than 6000m will be discarded before fitting the detection function.
key="hn": the key function to use for the detection function, in this case half-normal (?ds lists the other options).
adjustment=NULL: adjustment term series to fit. NULL here means that no adjustments should be fitted (again ?ds lists all options).

Other useful arguments for this practical are:

formula=: gives the formula to use for the scale parameter. By default it takes the value ~1, meaning the scale parameter is constant and not a function of covariates.
order=: specifies the “order” of the adjustments to be used. This is a number (or vector of numbers) specifying the order of the terms. For example order=2 fits order 2 adjustments, order=c(2,3) will fit a model with order 2 and 3 adjustments (mathematically, it only makes sense to include order 3 with order 2). By default the value is NULL which has ds() select the number of adjustments by AIC.

Summaries

We can look at the summary of the fitted detection function using the summary() function:

summary(df_hn)

## 
## Summary for distance analysis 
## Number of observations :  132 
## Distance range         :  0  -  6000 
## 
## Model : Half-normal key function 
## AIC   : 2252.06 
## 
## Detection function parameters
## Scale Coefficients:  
##             estimate         se
## (Intercept) 7.900732 0.07884776
## 
##                        Estimate          SE         CV
## Average p             0.5490484  0.03662569 0.06670757
## N in covered region 240.4159539 21.32287580 0.08869160

Goodness of fit

Goodness of fit quantile-quantile plot and test results can be accessed using the ddf.gof() function:

ddf.gof(df_hn$ddf, asp=1)

## 
## Goodness of fit results for ddf object
## 
## Chi-square tests
##             [0,545] (545,1.09e+03] (1.09e+03,1.64e+03] (1.64e+03,2.18e+03]
## Observed  33.000000    20.00000000        19.000000000            8.000000
## Expected  21.708156    20.84245788        19.213254554           17.005089
## Chisquare  5.873634     0.03405238         0.002366986            4.768669
##           (2.18e+03,2.73e+03] (2.73e+03,3.27e+03] (3.27e+03,3.82e+03]
## Observed             9.000000          13.0000000           14.000000
## Expected            14.450499          11.7899695            9.235669
## Chisquare            2.055842           0.1241881            2.457737
##           (3.82e+03,4.36e+03] (4.36e+03,4.91e+03] (4.91e+03,5.45e+03]
## Observed             3.000000           4.0000000            7.000000
## Expected             6.946241           5.0159948            3.477683
## Chisquare            2.241906           0.2057908            3.567525
##           (5.45e+03,6e+03]     Total
## Observed        2.00000000 132.00000
## Expected        2.31498601 132.00000
## Chisquare       0.04285822  21.37457
## 
## P =0.011087 with 9 degrees of freedom
## 
## Distance sampling Kolmogorov-Smirnov test
## Test statistic =  0.11192  P =  0.073241 
## 
## Distance sampling Cramer-von Mises test(unweighted)
## Test statistic =  0.39618  P =  0.073947

Note three things here: 1. We use the $ddf element of the detection function object 2. We’re ignoring the $\Chi^2$ test results, as they rely on binning the distances to calculate test statistics where as Cramer-von Mises and Kolmogorov-Smirnov tests do not. 3. Using asp=1 to set the aspect ratio to 1, so the plot is not distorted.

Plotting

We can plot the models simply using the plot() function:

plot(df_hn)

The dots on the plot indicate the distances where observations are. We can remove them (particularly useful for a model without covariates) using the additional argument showpoints=FALSE (try this out!).

Now you try…

Now try fitting a few models and comparing them using AIC. Don’t try to fit all possible models, just try a selection (say, a hazard-rate, a model with adjustments and a couple with different covariates). You can also try out changing the truncation distance.

Here’s an example to work from. Some tips before you start:

You can include as many lines as you like in a given chunk (though you may find it more managable to separate them up, remembering each time to give the chunk a unique name).
You can run the current line of code in RStudio by hitting Control+Enter (on Windows/Linux; Command+Enter on Mac).
Giving your models informative names will help later on! Here I’m using df_ to indicate that this is a detection function, then shortened forms of the model form and covariates, separated by underscores, but use what makes sense to you (and future you!).

df_hr_ss_size <- ds(distdata, truncation=6000, adjustment=NULL, key="hr", formula=~SeaState+size)

## Fitting hazard-rate key function
## AIC= 2249.327
## No survey area information supplied, only estimating detection function.

Once you’ve got the hang of writing models and looking at the differences between them, you should move onto the next section.

Since I wasn’t constrained for time, here we can fit all possible detection function models with half-normal and hazard-rate functions using each combination of covariates. This is somewhat brute force.

We’ll store all the models in a list that we can then iterate over later.

models <- list()

# half-normal models
models$hn <- ds(distdata, truncation=6000, adjustment=NULL)

## Fitting half-normal key function
## Key only models do not require monotonicity contraints. Not constraining model for monotonicity.
## AIC= 2252.06
## No survey area information supplied, only estimating detection function.

models$hn.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~SeaState)

## Fitting half-normal key function
## AIC= 2248.832
## No survey area information supplied, only estimating detection function.

models$hn.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~size)

## Fitting half-normal key function
## AIC= 2253.892
## No survey area information supplied, only estimating detection function.

models$hn.ss.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~size+SeaState)

## Fitting half-normal key function
## AIC= 2250.75
## No survey area information supplied, only estimating detection function.

models$hn.survey <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey))

## Fitting half-normal key function
## AIC= 2252.35
## No survey area information supplied, only estimating detection function.

models$hn.survey.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+SeaState)

## Fitting half-normal key function
## AIC= 2250.441
## No survey area information supplied, only estimating detection function.

models$hn.survey.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+size)

## Fitting half-normal key function
## AIC= 2254.339
## No survey area information supplied, only estimating detection function.

models$hn.survey.size.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+size+SeaState)

## Fitting half-normal key function
## AIC= 2252.378
## No survey area information supplied, only estimating detection function.

# hazard-rate models
models$hr <- ds(distdata, truncation=6000, adjustment=NULL, key="hr")

## Fitting hazard-rate key function
## Key only models do not require monotonicity contraints. Not constraining model for monotonicity.
## AIC= 2247.594
## No survey area information supplied, only estimating detection function.

models$hr.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~SeaState, key="hr")

## Fitting hazard-rate key function
## AIC= 2247.347
## No survey area information supplied, only estimating detection function.

models$hr.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~size, key="hr")

## Fitting hazard-rate key function
## AIC= 2249.427
## No survey area information supplied, only estimating detection function.

models$hr.ss.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~size+SeaState, key="hr")

## Fitting hazard-rate key function
## AIC= 2249.327
## No survey area information supplied, only estimating detection function.

models$hr.survey <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey), key="hr")

## Fitting hazard-rate key function
## AIC= 2248.871
## No survey area information supplied, only estimating detection function.

models$hr.survey.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+SeaState, key="hr")

## Fitting hazard-rate key function
## AIC= 2248.744
## No survey area information supplied, only estimating detection function.

models$hr.survey.size <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+size, key="hr")

## Fitting hazard-rate key function
## AIC= 2250.146
## No survey area information supplied, only estimating detection function.

models$hr.survey.size.ss <- ds(distdata, truncation=6000, adjustment=NULL, formula=~as.factor(Survey)+size+SeaState, key="hr")

## Fitting hazard-rate key function
## AIC= 2250.454
## No survey area information supplied, only estimating detection function.

Phew!

Model selection

Looking at the models individually can be a bit unwieldy – it’s nicer to put that data into a table and sort the table by the relevant statistic.

The code below will make a results table with relevant statistics for model selection in it. Don’t worry about how this code exactly works at the moment.

make_table <- function(models){

  # this function extracts the model data for a single model (row)
  extract_model_data <- function(model){
    c(summary(model)$ds$key,
      model$ddf$ds$aux$ddfobj$scale$formula,
      model$ddf$criterion,
      ddf.gof(model$ddf, qq=FALSE)$dsgof$CvM$p,
      summary(model)$ds$average.p,
      summary(model)$ds$average.p.se
    )
  }

  # applying that to all the models then putting it into a data.frame
  res <- as.data.frame(t(as.data.frame(lapply(models, extract_model_data))),
                        stringsAsFactors=FALSE)

  # making sure the correct columns are numeric
  res[,3] <- as.numeric(res[,3])
  res[,4] <- as.numeric(res[,4])
  res[,5] <- as.numeric(res[,5])
  res[,6] <- as.numeric(res[,6])

  # giving the columns names
  colnames(res) <- c("Key function", "Formula", "AIC", "Cramer-von Mises $p$-value",
                     "$\\hat{P_a}$", "se($\\hat{P_a}$)")

  # creating a new column for the AIC difference to the best model
  res[["$\\Delta$AIC"]] <- res$AIC - min(res$AIC, na.rm=TRUE)
  # ordering the model by AIC score
  res <- res[order(res$AIC),]

  # returning the data.frame
  return(res)
}

The make_table() function expects a list of models as it’s input. We can do that with the two models that I fitted like so:

#models <- list()
#models$df_hn <- df_hn
#models$df_hr_ss_size <- df_hr_ss_size

(You can add the models you fitted above into this list.)

We already have everything in a list from the above, so we comment out the above.

Here is the resulting table from the code above, made using the kable function from knitr:

model_table <- make_table(models)
kable(model_table, digits=3)

	Key function	Formula	AIC	Cramer-von Mises $p$-value	$\hat{P_a}$	se($\hat{P_a}$)	$\Delta$AIC
hr.ss	hr	~SeaState	2247.347	0.878	0.358	0.073	0.000
hr	hr	~1	2247.594	0.904	0.362	0.077	0.247
hr.survey.ss	hr	~as.factor(Survey) + SeaState	2248.744	0.895	0.362	0.073	1.397
hn.ss	hn	~SeaState	2248.832	0.070	0.542	0.040	1.485
hr.survey	hr	~as.factor(Survey)	2248.871	0.875	0.377	0.076	1.524
hr.ss.size	hr	~size + SeaState	2249.327	0.880	0.355	0.074	1.980
hr.size	hr	~size	2249.427	0.899	0.355	0.077	2.079
hr.survey.size	hr	~as.factor(Survey) + size	2250.146	0.859	0.365	0.076	2.799
hn.survey.ss	hn	~as.factor(Survey) + SeaState	2250.441	0.101	0.538	0.038	3.094
hr.survey.size.ss	hr	~as.factor(Survey) + size + SeaState	2250.454	0.844	0.352	0.075	3.107
hn.ss.size	hn	~size + SeaState	2250.750	0.075	0.541	0.039	3.403
hn	hn	~1	2252.060	0.074	0.549	0.037	4.713
hn.survey	hn	~as.factor(Survey)	2252.350	0.082	0.545	0.036	5.003
hn.survey.size.ss	hn	~as.factor(Survey) + size + SeaState	2252.378	0.108	0.538	0.037	5.030
hn.size	hn	~size	2253.892	0.076	0.549	0.037	6.544
hn.survey.size	hn	~as.factor(Survey) + size	2254.339	0.082	0.545	0.037	6.992

The four best models are hr.ss, hr, hr.survey.ss, hn.ss

We can plot the best 4 models:

par(mfrow=c(2,2))
plot(models$hr.ss, main="hr.ss")
plot(models$hr, main="hr")
plot(models$hr.survey.ss, main="hr.survey.ss")
plot(models$hn.ss, main="hn.ss")

and produce the corresponding quantile-quantile plots and goodness of fit test results

par(mfrow=c(2,2))
ddf.gof(models$hr.ss$ddf, main="hr.ss")
ddf.gof(models$hr$ddf, main="hr")
ddf.gof(models$hr.survey.ss$ddf, main="hr.survey.ss")
ddf.gof(models$hn.ss$ddf, main="hn.ss")

(that was a lot of output, we can get rid of that by setting results="hide" in the chunk above).

From that it looks like the “best” mode by AIC and by goodness of fit is a hazard-rate model with Beaufort sea state as a covariate. But note that there is a considerable spike in the distance data (see next comment).

A further note about model selection for the sperm whale data

Note that there is a considerable spike in our distance data. This may be down to observers guarding the trackline (spending too much time searching near zero distance). It’s therefore possible that the hazard-rate model is overfitting to this peak. So we’d like to investigate results from the half-normal model too and see what the effects are on the final spatial models.

Estimating abundance

Just for fun, let’s estimate abundance from these models using a Horvtiz-Thompson-type estimator.

We know the Horvitz-Thompson estimator has the following form: \[ \hat{N} = \frac{A}{a} \sum_{i=1}^n \frac{s_i}{p_i} \] we can calculate each part of this equation in R:

A is the total area of the region we want to estimate abundance for. This was $A=5.285e+11 m^2$.
a is the total area we surveyed. We know that the total transect length was 9,498,474m and the truncation distance. Knowing that then $a=2wL$ we can calculate $a$.
$s_i$ are the group sizes, they are stored in df_hn$ddf$data$size.
$p_i$ are the probabilities of detection, we can obtain them using predict(df_hn$ddf)$fitted.

We know that in general operations are vectorised in R, so calculating c(1, 2, 3)/c(4, 5, 6) will give c(1/4, 2/5, 3/6) so we can just divide the results of getting the $s_i$ and $p_i$ values and then use the sum() function to sum them up.

Try out estimating abundance using the formula below using both df_hn and your favourite model from above:

Just trying that out for df_hn:

N_hn <- (5.285e+11/(2*df_hn$ddf$meta.data$width*9498474)) * 
         sum(df_hn$ddf$data$size/predict(df_hn$ddf)$fitted)
N_hn

## [1] 2015.82

Note that in the solutions to this exercise (posted on the course website) I show how to use the function dht() to estimate abundance (and uncertainty) for a detection function analysis. This involves some more complex data manipulation steps, so we’ve left it out here in favour of getting to grips with the mathematics.

As promised, here is an example using the data from GIS and the dht() function. Note that this code may not run until you’ve got the data as in practical 2.

First we need to load the effort data from the geodatabase, this is in the “Segments” table. We can then use this to build the “sample.table”

Don’t worry too much about understanding this code!

# load the segment data
segs <- readOGR("Analysis.gdb", "Segments")

## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Segments"
## with 949 features
## It has 8 fields

# convert to data.frame
segs <- as.data.frame(segs)

# remove unused columns
segs$Shape_Length <- NULL
segs$BeginTime <- NULL
segs$CenterTime <- NULL
segs$EndTime <- NULL
segs$FIRST_Survey <- NULL

library(plyr)

# make a copy of the segs data.frame, remove segment ID
segs2 <- segs
segs2$SegmentID <- NULL

# sum the length per transect
sample.table <- ddply(segs2, .(EffortDayID), summarize, Transect.Length=sum(Length))
# give the columns the right names
names(sample.table) <- c("Sample.Label", "Effort")
sample.table$Region.Label <- "StudyArea"

The total area of the study area is $A=5.285e+11 m^2$, we can use that to construct the region table, with one region in it:

region.table <- data.frame(Region.Label = "StudyArea",
                           Area         = 5.285e+11)

Finally we can construct the observation table to link the samples to the detection function. This involves taking each distance data entry and finding the corresponding observation segment entry.

# set the segment labels
segs$Sample.Label <- segs$SegmentID
distdata$Sample.Label <- distdata$SegmentID

# join the segment data onto the distance data
obs.table <- join(distdata, segs, by="Sample.Label")
distdata$Sample.Label <- NULL
obs.table$Sample.Label <- obs.table$EffortDayID
obs.table <- obs.table[,c("object","Sample.Label")]
obs.table$Region.Label <- "StudyArea"

Now we can call the dht function that will estimate the abundance for us, given the tables we’ve constructed:

Nhat_hr.ss <- dht(models$hr.ss$ddf, region.table, sample.table, obs.table)
Nhat_hr.ss

## 
## Summary for clusters
## 
## Summary statistics:
##      Region      Area  CoveredArea  Effort   n  k           ER
## 1 StudyArea 5.285e+11 113981689066 9498474 132 69 1.389697e-05
##          se.ER     cv.ER
## 1 2.701497e-06 0.1943947
## 
## Abundance:
##   Label Estimate       se        cv      lcl      ucl       df
## 1 Total 1707.866 473.5064 0.2772503 998.3714 2921.565 185.4831
## 
## Density:
##   Label     Estimate          se        cv          lcl          ucl
## 1 Total 3.231535e-09 8.95944e-10 0.2772503 1.889066e-09 5.528031e-09
##         df
## 1 185.4831
## 
## Summary for individuals
## 
## Summary statistics:
##      Region      Area  CoveredArea  Effort     n           ER        se.ER
## 1 StudyArea 5.285e+11 113981689066 9498474 238.7 2.513035e-05 5.667492e-06
##       cv.ER mean.size   se.mean
## 1 0.2255238  1.808333 0.1020928
## 
## Abundance:
##   Label Estimate     se        cv      lcl      ucl       df
## 1 Total 2993.335 891.64 0.2978751 1683.154 5323.372 163.6538
## 
## Density:
##   Label     Estimate           se        cv          lcl          ucl
## 1 Total 5.663832e-09 1.687115e-09 0.2978751 3.184776e-09 1.007261e-08
##         df
## 1 163.6538
## 
## Expected cluster size
##   Region Expected.S se.Expected.S cv.Expected.S
## 1  Total   1.752675     0.1381592    0.07882757
## 2  Total   1.752675     0.1381592    0.07882757

We can also use the hn.ss model to get an abundance estimate for that model too:

Nhat_hn.ss <- dht(models$hn.ss$ddf, region.table, sample.table, obs.table)
Nhat_hn.ss

## 
## Summary for clusters
## 
## Summary statistics:
##      Region      Area  CoveredArea  Effort   n  k           ER
## 1 StudyArea 5.285e+11 113981689066 9498474 132 69 1.389697e-05
##          se.ER     cv.ER
## 1 2.701497e-06 0.1943947
## 
## Abundance:
##   Label Estimate       se        cv     lcl      ucl       df
## 1 Total 1130.211 226.3852 0.2003034 762.032 1676.278 87.70608
## 
## Density:
##   Label     Estimate           se        cv          lcl          ucl
## 1 Total 2.138526e-09 4.283541e-10 0.2003034 1.441877e-09 3.171765e-09
##         df
## 1 87.70608
## 
## Summary for individuals
## 
## Summary statistics:
##      Region      Area  CoveredArea  Effort     n           ER        se.ER
## 1 StudyArea 5.285e+11 113981689066 9498474 238.7 2.513035e-05 5.667492e-06
##       cv.ER mean.size   se.mean
## 1 0.2255238  1.808333 0.1020928
## 
## Abundance:
##   Label Estimate       se        cv      lcl      ucl       df
## 1 Total 1982.134 461.3902 0.2327745 1254.929 3130.738 80.50493
## 
## Density:
##   Label     Estimate           se        cv          lcl          ucl
## 1 Total 3.750489e-09 8.730184e-10 0.2327745 2.374511e-09 5.923817e-09
##         df
## 1 80.50493
## 
## Expected cluster size
##   Region Expected.S se.Expected.S cv.Expected.S
## 1  Total   1.753773     0.1383593    0.07889239
## 2  Total   1.753773     0.1383593    0.07889239

Now that was a lot of output! But looking at the Summary for individuals section and the Abundance table we can see the differences between the two models’ abundance estimates.

As we can see, there is a definite difference between the abundance estimates for these two models.

Accounting for perception bias

It’s common, especially in marine surveys, for animals at zero distance to be missed by observers. There are several ways to deal with this issue. For now, we are just going to use a very simply constant correction factor to inflate the abundance.

From Palka (2006), we think that observations on the track line were such that $g(0)=0.46$, we can apply that correction to our abundance estimate (in a very primitive way):

N_hn_percep <- N_hn/0.46
N_hn_percep

## [1] 4382.217

This kind of correction works fine when we have a single number to adjust by, in general we’d like to model the perception bias using “mark-recapture distance sampling” techniques.

Save model objects

Save your top few models in an RData file, so we can load them up later on. We’ll also save the distance data we used to fit out models.

save(df_hn, df_hr_ss_size, # add you models here, followed by commas!
     distdata,
     file="df-models.RData")

You can check it worked by using the load() function to recover the models.

References

Palka, D. (2006). Summer Abundance Estimates of Cetaceans in US North Atlantic Navy Operating Areas. Northeast Fisheries Science Center Reference Document 06-03. Available online here

Practical 1: Detection function analysis

David L Miller