David L Miller
dsm
(via Natalie Kelly, CSIRO. Seen in Moby Dick.)
Taking the previous example…
\[ n_j = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j) \right] + \epsilon_j \]
where \( \epsilon_j \sim N(0, \sigma^2) \), \( \quad n_j\sim \) count distribution
Taking the previous example…
\[ n_j = \color{red}{A_j}\color{blue}{\hat{p}_j} \color{green}{\exp}\left[\color{grey}{ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j)} \right] + \epsilon_j \]
where \( \epsilon_j \sim N(0, \sigma^2) \), \( \quad n_j\sim \) count distribution
\[
\color{red}{n_j} = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j) \right] + \epsilon_j
\]
where \( \epsilon_j \sim N(0, \sigma^2) \), \( \quad \color{red}{n_j\sim \text{count distribution}} \)
\[
n_j = A_j\hat{p}_j \exp\left[ \beta_0 + \color{red}{s(\text{y}_j) + s(\text{Depth}_j}) \right] + \epsilon_j
\]
where \( \epsilon_j \sim N(0, \sigma^2) \), \( \quad n_j\sim \) count distribution
\[
n_j = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) \right] + \epsilon_j
\]
where \( \epsilon_j \sim N(0, \sigma^2) \), \( \quad n_j\sim \) count distribution
formula=count ~ s(y)
family=nb()
or family=tw()
ddf.obj=df_hr
segment.data=segs, observation.data=obs
library(dsm)
dsm_x_tw <- dsm(count~s(x), ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
(method="REML"
uses REML to select the smoothing parameter)
dsm
is based on mgcv
by Simon Wood
summary(dsm_x_tw)
Family: Tweedie(p=1.326)
Link function: log
Formula:
count ~ s(x) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.7008 0.2277 -86.52 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 4.962 6.047 6.403 1.07e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.0283 Deviance explained = 17.7%
-REML = 409.94 Scale est. = 6.0413 n = 949
plot(dsm_x_tw)
+
dsm_xy_tw <- dsm(count ~ s(x) + s(y),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
summary(dsm_xy_tw)
Family: Tweedie(p=1.306)
Link function: log
Formula:
count ~ s(x) + s(y) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.9801 0.2381 -83.93 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 4.943 6.057 3.224 0.004239 **
s(y) 5.293 6.419 4.034 0.000322 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.0678 Deviance explained = 27.3%
-REML = 399.84 Scale est. = 5.3157 n = 949
plot(dsm_xy_tw, scale=0, pages=1)
scale=0
: each plot on different scalepages=1
: plot togethers(x,y)
(and s(x,y,z,...)
)dsm_xyb_tw <- dsm(count ~ s(x, y),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
summary(dsm_xyb_tw)
Family: Tweedie(p=1.29)
Link function: log
Formula:
count ~ s(x, y) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.1638 0.2477 -81.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x,y) 16.89 21.12 4.333 3.73e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.102 Deviance explained = 34.5%
-REML = 394.86 Scale est. = 4.8248 n = 949
vis.gam(dsm_xyb_tw, view=c("x","y"), plot.type="contour", too.far=0.1, asp=1)
too.far
excludes points far from data
“perhaps the most important part of applied statistical modelling”
Simon Wood
k
per terms(x, k=10)
or s(x, y, k=100)
k
)gam.check(dsm_x_tw)
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-3.08755e-06,4.928062e-07]
(score 409.936 & scale 6.041307).
Hessian positive definite, eigenvalue range [0.7645492,302.127].
Model rank = 10 / 10
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x) 9.000 4.962 0.763 0.44
dsm_x_tw_k <- dsm(count~s(x, k=20), ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
gam.check(dsm_x_tw_k)
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-2.301246e-08,3.930757e-09]
(score 409.9245 & scale 6.033913).
Hessian positive definite, eigenvalue range [0.7678456,302.0336].
Model rank = 20 / 20
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x) 19.000 5.246 0.763 0.36
k
and see what happensp-value
” and “k-index
”k
can cause problems (nullspace)
gam.check
left side can be helpfulrqgam.check