Introductory distance sampling workshop

CREEM, Univ of St Andrews

Demonstration of the readdst package

2019-08-26

Preface

Now that you know how to analyse distance sampling data in R, what to do with the legacy data you have inside Distance for Windows projects?

Answer: convert the projects using readdst.

Package installation

The package does not reside on CRAN, but can be downloaded from Github as you did for several of the packages used in this workshop:

remotes::install_github("DistanceDevelopment/readdst")

One ideosyncracy for Windows users is that R-Studio must use the 32-bit version of R. Set that in the Tools | Global Options | General menu

Set R-Studio to use 32-bit version of R.

Set R-Studio to use 32-bit version of R.

On Windows platforms, you will also need the RODBC package that queries the Access-like database where Distance for Windows stores information.

Package use

We demonstrate the syntax for the main functions within readdst by working with some of the datasets used during the workshop:

The essential function convert_project()

As the package name implies, the .dst file of a Distance for Windows project is the sole source of information needed by readdst. The function convert_project() does what it says on the tin and the single argument necessary to use the function is the path to the location of the .dst file you wish to make available in R. I’ll demonstrate with the first of the projects mentioned above:

library(readdst)
wren.snap <- convert_project("P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2")
## Loading required package: RODBC
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows

There is a small complaint by convert_project() that we can ignore for the moment. The question is, what has been accomplished by this call to the function? Let’s investigate the created object:

class(wren.snap)
## [1] "converted_distance_analyses"

The class of the object tells us exactly what it contains. Not only does the object contain distance sampling data, but also the analyses that may have been conducted and stored within the Distance for Windows project. For our purposes, we’ll not concern ourselves with the analyses, but concentrate upon finding where the data are located so we can conduct analyses of them from within the R environment.

Data in a converted project object

Data actually may live in a number of locations within the created project. Data are stored in one “master” location, but are also stored along with each completed analysis in the Distance for Windows project. Explore the structure of the object wren.snap:

str(wren.snap, max.level = 1)
## List of 1
##  $ New Analysis:List of 14
##   ..- attr(*, "class")= chr "converted_distance_analysis"
##  - attr(*, "flatfile")='data.frame': 275 obs. of  10 variables:
##   ..- attr(*, "unit_conversion")='data.frame':   2 obs. of  3 variables:
##  - attr(*, "class")= chr "converted_distance_analyses"

You’ll see - attr(*, "flatfile")='data.frame': 275 obs. of 10 variables:, because it is a data frame with lots of rows, you might guess this would be the data, and you would be right. Challenge is how to access it.

head(attr(wren.snap, "flatfile"))
Area species visit distance object visits Study.Area Region.Label Sample.Label Effort
33.2 c 1 75 1 2 Montrave 2 Montrave 1-1 2
33.2 w 1 55 2 2 Montrave 2 Montrave 1-1 2
33.2 g 2 100 3 2 Montrave 2 Montrave 1-2 2
33.2 r 2 100 4 2 Montrave 2 Montrave 1-2 2
33.2 w 2 65 5 2 Montrave 2 Montrave 1-2 2
33.2 c 1 10 6 2 Montrave 2 Montrave 2-1 2

Data reside elsewhere in the converted project object

When looking at the structure of wren.snap, note the object is actually a list with a single element, named New Analysis. Examine the structure of this list element

str(wren.snap$`New Analysis`)
## List of 14
##  $ call         : chr "mrds::ddf(dsmodel=~cds(key=\"hn\", formula=~1, adj.series=\"cos\", adj.order=NULL), meta.data=list(width=125,le"| __truncated__
##  $ aic.select   : num 5
##  $ status       : int 0
##  $ env          :<environment: 0x0c45a940> 
##  $ filter       : chr "species=='w'"
##  $ group_size   :List of 2
##   ..$ Bias: chr "GXLOG"
##   ..$ by  : chr "All"
##  $ detection_by : chr "All"
##  $ gof_intervals: NULL
##  $ estimation   :List of 1
##   ..$ by: chr "All"
##  $ name         : chr "New Analysis"
##  $ ID           : int 20
##  $ engine       : chr "CDS"
##  $ project      : chr "P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2"
##  $ project_file : chr "P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2.dst"
##  - attr(*, "class")= chr "converted_distance_analysis"
# equivalently
# str(wren.snap[["New Analysis"]])
# equivalently
# str(wren.snap[[1]])

Another data frame can be found in the environment of the New Analysis list element

str(wren.snap[["New Analysis"]]$env$data)
## 'data.frame':    118 obs. of  8 variables:
##  $ species     : chr  "w" "w" "w" "w" ...
##  $ visit       : int  1 2 1 1 2 1 1 1 2 2 ...
##  $ distance    : num  55 65 20 40 55 55 60 85 35 50 ...
##  $ object      : int  2 5 7 8 12 15 16 17 18 19 ...
##  $ visits      : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Study.Area  : chr  "Montrave 2" "Montrave 2" "Montrave 2" "Montrave 2" ...
##  $ Region.Label: chr  "Montrave" "Montrave" "Montrave" "Montrave" ...
##  $ Sample.Label: chr  "1-1" "1-2" "2-1" "2-1" ...

Note however, this data frame is not identical to the data frame we discovered earlier:

dim(attr(wren.snap, "flatfile"))[1]
## [1] 275
dim(wren.snap[["New Analysis"]]$env$data)[1]
## [1] 118

The cause of the discrepency is because wren.snap[["New Analysis"]]$env$data is associated with a particular analysis in our project. That analysis was of one species (wrens) in a multi-specie survey. If you look harder at the wren.snap[["New Analysis"]] object, you will find the cause of the difference:

wren.snap[["New Analysis"]]$filter
## [1] "species=='w'"

Examples of other projects

Cue counts for wrens

wren.cue <- convert_project("P:\\distance.2019\\for-readdst\\Wren3\\D70Wren3")
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows
head(wren.cue[["2"]]$data)
Cue.rate Cue.rate.SE Search.time species visit distance object visits Study.Area Region.Label Sample.Label
1.4558 0.2428 10 w 1 50 38 2 montrave 3 Montrave 1-1
1.4558 0.2428 10 w 1 55 39 2 montrave 3 Montrave 1-1
1.4558 0.2428 10 w 1 55 40 2 montrave 3 Montrave 1-1
1.4558 0.2428 10 w 1 55 41 2 montrave 3 Montrave 1-1
1.4558 0.2428 10 w 2 50 46 2 montrave 3 Montrave 1-2
1.4558 0.2428 10 w 2 50 47 2 montrave 3 Montrave 1-2

Line transects for wrens

wren.lt <- convert_project("P:\\distance.2019\\for-readdst\\Wren4\\D70Wren4")
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows

hist(wren.lt[["New Analysis"]]$env$data$distance, main="Wren line transect")

Sika deer pellets

Quite a verbose analysis name in the Distance for Windows project.

pellets <- convert_project("P:\\distance.2019\\for-readdst\\Deer pellets solution\\D70new full sika")
hist(pellets$`HNcos mult 10% trunc stratified encounter rate (wght effort)`$env$data$distance,
     main="Sika pellets")

Carrying out an analysis in R

On the off chance you might wish to re-run an analysis you conducted in Distance for Windows, there is another function in readdst named run_analysis(). Demonstrate with the amakihi data

amakihi <- convert_project("P:\\distance.2019\\for-readdst\\fTAMAUK07\\D70fTAMAUK07")
hist(amakihi$`e5 - HR by strat w82.5`$env$data$distance)

tmp <- run_analysis(amakihi$`e5 - HR by strat w82.5`)
plot(tmp[[4]], pdf=TRUE, main="July 1993 survey")

About the analyses stored in our object

Don’t expect the results of detection function fitting in R to exactly match the detection functions fitted by Distance for Windows. The reasons for this are interesting only to the pathologically statistical; suffice it to say, there are different pieces of software doing the parameter estimation and there can be disagreements about the best fit, particularly for complex models.

Conclusion

More details

You can download a PDF poster describing the workings of readdst in more detail from our Github site