Distance sampling: Advanced topics ==================================== author: David L Miller css: custom.css transition: none Recap ======== type:section ```{r setup, include=FALSE} library(knitr) opts_chunk$set(cache=TRUE, echo=FALSE) # load the sperm whale data load("../spermwhale-analysis/df-models.RData") # remove the fitted models to avoid confusion #rm(df_hr, df_hr_ss) ``` Line transects - general idea ======================================================== - Calculate *average detection probability* - using detection function ($g(x)$) - $\hat{p} = \int_0^w \frac{1}{w} g(x; \hat{\theta}) dx$ - $\frac{1}{w}$ tells us about assumed density wrt line - *uniform* from the line (out to $w$) ```{r pi-y, fig.width=20, echo=FALSE} par(mfrow=c(1,3)) curve(exp(-x^2/(2*0.01^2)), from=0, to=0.025, xlab="Distance", ylab="Probability of detection") plot(seq(0,0.025, len=2), rep(1/0.25,2), type="l", xlab="Distance", ylab="Density wrt line", ylim=c(0, 1/0.25)) curve(exp(-x^2/(2*0.01^2))/0.025, from=0, to=0.025, xlab="Distance", ylab="Probability of detection") ``` Line transects - distances ======================================================== - Model drop-off using a *detection function* - Use extra information estimate $\hat{N}$ - How should we adjust $n$? (inflate by $n/\hat{p})$) Fitting detection functions ============================ - Using the package `Distance` - Need to have data setup a certain way - At least columns called `object`, `distance` ```{r simpledf, echo=TRUE} library(Distance) df_hn <- ds(distdata, truncation=6000, adjustment = NULL) ``` Model summary ================ ```{r echo=TRUE} summary(df_hn) ``` Plotting models ================ ```{r} plot(df_hn) ``` *** ``` plot(df_hn) ``` New stuff ========== type:section Overview ==================== Here we'll look at: - Model checking and selection - What else affects detection? - Estimating abundance and uncertainty - More R! Why check models? ================== - AIC best model can still be a terrible model - AIC only measures **relative** fit - Don't know if the model gives "sensible" answers What to check? =============== - Convergence - Fitting ended, but our model is not good - Monotonicity - Our model is "lumpy" - "Goodness of fit" - Our model sucks statistically - (Other sampling assumptions are also important!) Convergence ============ `Distance` will warn you about this: ``` ** Warning: Problems with fitting model. Did not converge** Error in detfct.fit.opt(ddfobj, optim.options, bounds, misc.options) : No convergence. ``` This can be complicated, see `?"mrds-opt"` for info. Monotonicity ============= - Only a problem with adjustments - `check.mono` can help ```{r checkmono, echo=TRUE} check.mono(df_hr$ddf) ``` Monotonicity (when it goes wrong) =================================== ```{r checkmonobad, results="hide"} df_hn_bad <- ds(distdata, truncation=6000, monotonicity = FALSE) check.mono(df_hn_bad$ddf, plot=TRUE) ``` Goodness of fit ================= ```{r results="hide"} ddf.gof(df_hn$ddf) ``` *** ``` ddf.gof(df_hn$ddf) ``` * Check fitted distribution of distances matches empirical * # distances below distance vs. # observations below given cumulative probability Goodness of fit ================= - As well as quantile-quantile plot, tests - Absolute measure of fit (vs. AIC) - Kolmogorov-Smirnov: largest distance on Q-Q plot - Cramer-von Mises: tests sum of distances Goodness of fit ================= left:60% ```{r qq-expl} par(mar=c(5, 4, 0, 0) + 0.1) # adapted from my RDistanceBook qq <- ddf.gof(df_hn$ddf) gof_tests_statplot <- function(model){ cdf_values <- ddf.gof(model, qq=FALSE)$dsgof$cdf # calculate EDF values edf_values <- ddf.gof(model, qq=FALSE)$dsgof$edf[,2] # plot Cramer-von Mises distances Map(function(edf, cdf){ lines(x=rep(edf,2), y=c(edf, cdf), lwd=1.5, col="red") }, edf_values, cdf_values) # find & plot which line-point distance is the test statistic for K-S test ks.ind <- which.max(abs(cdf_values-edf_values)) lines(x=rep(edf_values[ks.ind],2), y=c(edf_values[ks.ind], cdf_values[ks.ind]), lwd=2, col="blue") invisible() } gof_tests_statplot(df_hn$ddf) ``` *** - blue: Kolmogorov-Smirnov - red: Cramer-von Mises Detection function model selection =================================== - Fit models - Look at `summary` and `plot` (fitting issues?) - Look at goodness of fit results, `ddf.gof` - AIC to select between models - Parsimonous: "robust" and "efficient" models Example: fitting detection functions ======================================== ```{r some-dfs1, echo=TRUE} df_hn <- ds(distdata, truncation=6000, adjustment = NULL) ``` ```{r some-dfs2, echo=TRUE} df_hn_cos <- ds(distdata, truncation=6000, adjustment = "cos") ``` ```{r some-dfs3, echo=TRUE} df_hr <- ds(distdata, truncation=6000, key="hr", adjustment = NULL) ``` ```{r some-dfs4, echo=TRUE} df_hr_cos <- ds(distdata, key="hr", truncation=6000, adjustment = "cos") ``` Plotting those models ========================= ```{r df-plots} par(mfrow=c(2,2)) plot(df_hn, main="df_hn") plot(df_hr, main="df_hr") plot(df_hn_cos, main="df_hn_cos") plot(df_hr_cos, main="df_hr_cos") ``` Q-Q plots ========================= ```{r df-qqplots, messages=FALSE, results="hide"} par(mfrow=c(2,2)) ddf.gof(df_hn$ddf, main="df_hn") ddf.gof(df_hr$ddf, main="df_hr") ddf.gof(df_hn_cos$ddf, main="df_hn_cos") ddf.gof(df_hr_cos$ddf, main="df_hr_cos") ``` AIC ===== ```{r dfs-aic, echo=TRUE, messages=FALSE} df_hn$ddf$criterion df_hn_cos$ddf$criterion ## same model! df_hr$ddf$criterion df_hr_cos$ddf$criterion ``` Selection ========== - Not much between these models! - You'll get to investigate these and more in the lab What else affects detectability? ================================== type: section Covariates ================================== - Observer characteristics - observer name - platform - Animal characteristics - sex - size - group size *** - Weather conditions - sea state - glare - fog How do we include covariates? ================================== - Affects scale, not shape ```{r dfcovs, fig.width=15} par(mfrow=c(1,2)) plot(df_hr_ss, main="Sea state") df_hr_size <- ds(distdata, truncation=6000, key="hr", formula=~size, adjustment=NULL) plot(df_hr_size, main="Size") ``` Covariates in the scale ========================