By the end of this practical, you should feel comfortable:
summary()when called on a
kparameter of smooth terms to increase their flexibility
The example code below uses the
df_hn detection function in the density surface models. You can substitute this for your own best model as you go, or copy and paste the code at the end and see what results you get using your model for the detection function.
## Loading required package: mrds
## This is mrds 2.2.2 ## Built: R 4.0.0; ; 2020-05-12 17:00:13 UTC; unix
## ## Attaching package: 'Distance'
## The following object is masked from 'package:mrds': ## ## create.bins
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
## Loading required package: numDeriv
## This is dsm 2.3.0 ## Built: R 4.0.0; ; 2020-05-04 08:34:45 UTC; unix
Loading sperm whale data again and the
RData files where we saved our results:
Before we fit a model using
dsm() we must first remove the observations from the spatial data that we excluded when we fitted the detection function – those observations at distances greater than the truncation.
obs <- obs[obs$distance <= df_hn$ddf$meta.data$width, ]
Here we’ve used the value of the truncation stored in the detection function object, but we could also use the numeric value (which we can also find by checking the detection function’s
(If we don’t do this, the DSM will still fit fine, we just get a warning.)
Using the data that we’ve saved so far, we can build a call to the
dsm() function and fit out first density surface model. Here we’re only going to look at models that include spatial smooths.
Let’s start with a very simple model – a bivariate smooth of
dsm_nb_xy <- dsm(count~s(x, y, k=25), ddf.obj=df_hn, segment.data = segs, observation.data=obs, family=nb())
Note again that we try to have informative model object names so that we can work out what the main features of the model were from its name alone.
We can look at a
summary() of this model. Look through the summary output and try to pick out the important information based on what we’ve talked about in the lectures so far.
## ## Family: Negative Binomial(0.093) ## Link function: log ## ## Formula: ## count ~ s(x, y, k = 25) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -20.4863 0.2207 -92.84 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df Chi.sq p-value ## s(x,y) 13.91 17.55 66.36 1.47e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.08 Deviance explained = 34.9% ## -REML = 394.96 Scale est. = 1 n = 949
Things to look for:
s(x, y)) significant at the highest level?
n=) match the number of rows in
As discussed in the lectures, the
plot output is not terribly useful for bivariate smooths like these. We’ll use
vis.gam() to visualise the smooth instead:
vis.gam(dsm_nb_xy, view=c("x","y"), plot.type="contour", too.far=0.1, main="s(x,y) (link scale)", asp=1)
view=c("x","y")to display the smooths for
y(we can choose any two variables in our model to display like this)
plot.type="contour"gives this “flat” plot, set
plot.type="persp"for a “perspective” plot, in 3D.
too.far=0.1argument displays the values of the smooth not “too far” from the data (try changing this value to see what happens).
asp=1ensures that the aspect ratio of the plot is 1, making the pixels square.
?vis.gammanual page for more information on the plotting options.
We can set the basis complexity via the
k argument to the
s() term in the formula. For example the following re-fits the above model with a much smaller basis complexity than before:
dsm_nb_xy_smallk <- dsm(count~s(x, y, k=10), ddf.obj=df_hn, segment.data = segs, observation.data=obs, family=nb())
Compare the output of
vis.gam() for this model to the model with a larger basis complexity.
So far we’ve just used
count as the response. That is, we adjusted the offset of the model to make it take into account the “effective area” of the segments (see lecture notes for a refresher).
Instead of using
count we could use
abundance.est, which will leave the segment areas as they are and calculate the Horvitz-Thompson estimates of the abundance per segment and use that as the response in the model. This is most useful when we have covariates in the detection function (though we can use it any time).
Try copying the code that fits the model
dsm_nb_xy and make a model
dsm_nb_xy_ae that replaces
abundance.est in the model formula and uses the
df_hr_ss_size detection function. Compare the results of
plot output between this and the count model.
Instead of fitting a bivariate smooth of
s(x, y), we could instead use the additive nature and fit the following model:
dsm_nb_x_y <- dsm(count~s(x)+ s(y), ddf.obj=df_hn, segment.data = segs, observation.data=obs, family=nb())
Compare this model with
vis.gam() (Note you can display two plots side-by-side using
par(mfrow=c(1,2))). Investigate the output from
summary() too, comparing with the other models, adjust
So far, we’ve used
nb() as the response – the negative binomial distribution. We can also try out the Tweedie distribution as a response by replacing
Try this out and compare the results.
It’ll be interesting to see how these models compare to the more complex models we’ll see later on. Let’s save the fitted models at this stage (add your own models to this list so you can use them later).
# add your models here save(dsm_nb_x_y, dsm_nb_xy, file="dsms-xy.RData")
If you have time, try the following:
kvalue very big (~100 or so), what do you notice?