Now that you know how to analyse distance sampling data in R
, what to do with the legacy data you have inside Distance for Windows projects?
Answer: convert the projects using readdst
.
The package does not reside on CRAN, but can be downloaded from Github as you did for several of the packages used in this workshop:
remotes::install_github("DistanceDevelopment/readdst")
One ideosyncracy for Windows users is that R-Studio must use the 32-bit version of R
. Set that in the Tools | Global Options | General
menu
On Windows platforms, you will also need the RODBC
package that queries the Access-like database where Distance for Windows stores information.
We demonstrate the syntax for the main functions within readdst
by working with some of the datasets used during the workshop:
convert_project()
As the package name implies, the .dst
file of a Distance for Windows project is the sole source of information needed by readdst
. The function convert_project()
does what it says on the tin and the single argument necessary to use the function is the path to the location of the .dst
file you wish to make available in R. I’ll demonstrate with the first of the projects mentioned above:
library(readdst)
wren.snap <- convert_project("P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2")
## Loading required package: RODBC
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows
There is a small complaint by convert_project()
that we can ignore for the moment. The question is, what has been accomplished by this call to the function? Let’s investigate the created object:
class(wren.snap)
## [1] "converted_distance_analyses"
The class
of the object tells us exactly what it contains. Not only does the object contain distance sampling data, but also the analyses that may have been conducted and stored within the Distance for Windows project. For our purposes, we’ll not concern ourselves with the analyses, but concentrate upon finding where the data are located so we can conduct analyses of them from within the R
environment.
Data actually may live in a number of locations within the created project. Data are stored in one “master” location, but are also stored along with each completed analysis in the Distance for Windows project. Explore the structure of the object wren.snap
:
str(wren.snap, max.level = 1)
## List of 1
## $ New Analysis:List of 14
## ..- attr(*, "class")= chr "converted_distance_analysis"
## - attr(*, "flatfile")='data.frame': 275 obs. of 10 variables:
## ..- attr(*, "unit_conversion")='data.frame': 2 obs. of 3 variables:
## - attr(*, "class")= chr "converted_distance_analyses"
You’ll see - attr(*, "flatfile")='data.frame': 275 obs. of 10 variables:
, because it is a data frame with lots of rows, you might guess this would be the data, and you would be right. Challenge is how to access it.
head(attr(wren.snap, "flatfile"))
Area | species | visit | distance | object | visits | Study.Area | Region.Label | Sample.Label | Effort |
---|---|---|---|---|---|---|---|---|---|
33.2 | c | 1 | 75 | 1 | 2 | Montrave 2 | Montrave | 1-1 | 2 |
33.2 | w | 1 | 55 | 2 | 2 | Montrave 2 | Montrave | 1-1 | 2 |
33.2 | g | 2 | 100 | 3 | 2 | Montrave 2 | Montrave | 1-2 | 2 |
33.2 | r | 2 | 100 | 4 | 2 | Montrave 2 | Montrave | 1-2 | 2 |
33.2 | w | 2 | 65 | 5 | 2 | Montrave 2 | Montrave | 1-2 | 2 |
33.2 | c | 1 | 10 | 6 | 2 | Montrave 2 | Montrave | 2-1 | 2 |
When looking at the structure of wren.snap
, note the object is actually a list with a single element, named New Analysis
. Examine the structure of this list element
str(wren.snap$`New Analysis`)
## List of 14
## $ call : chr "mrds::ddf(dsmodel=~cds(key=\"hn\", formula=~1, adj.series=\"cos\", adj.order=NULL), meta.data=list(width=125,le"| __truncated__
## $ aic.select : num 5
## $ status : int 0
## $ env :<environment: 0x0c45a940>
## $ filter : chr "species=='w'"
## $ group_size :List of 2
## ..$ Bias: chr "GXLOG"
## ..$ by : chr "All"
## $ detection_by : chr "All"
## $ gof_intervals: NULL
## $ estimation :List of 1
## ..$ by: chr "All"
## $ name : chr "New Analysis"
## $ ID : int 20
## $ engine : chr "CDS"
## $ project : chr "P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2"
## $ project_file : chr "P:\\distance.2019\\for-readdst\\Wren2\\D70Wren2.dst"
## - attr(*, "class")= chr "converted_distance_analysis"
# equivalently
# str(wren.snap[["New Analysis"]])
# equivalently
# str(wren.snap[[1]])
Another data frame can be found in the environment
of the New Analysis
list element
str(wren.snap[["New Analysis"]]$env$data)
## 'data.frame': 118 obs. of 8 variables:
## $ species : chr "w" "w" "w" "w" ...
## $ visit : int 1 2 1 1 2 1 1 1 2 2 ...
## $ distance : num 55 65 20 40 55 55 60 85 35 50 ...
## $ object : int 2 5 7 8 12 15 16 17 18 19 ...
## $ visits : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Study.Area : chr "Montrave 2" "Montrave 2" "Montrave 2" "Montrave 2" ...
## $ Region.Label: chr "Montrave" "Montrave" "Montrave" "Montrave" ...
## $ Sample.Label: chr "1-1" "1-2" "2-1" "2-1" ...
Note however, this data frame is not identical to the data frame we discovered earlier:
dim(attr(wren.snap, "flatfile"))[1]
## [1] 275
dim(wren.snap[["New Analysis"]]$env$data)[1]
## [1] 118
The cause of the discrepency is because wren.snap[["New Analysis"]]$env$data
is associated with a particular analysis in our project. That analysis was of one species (wrens) in a multi-specie survey. If you look harder at the wren.snap[["New Analysis"]]
object, you will find the cause of the difference:
wren.snap[["New Analysis"]]$filter
## [1] "species=='w'"
wren.cue <- convert_project("P:\\distance.2019\\for-readdst\\Wren3\\D70Wren3")
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows
head(wren.cue[["2"]]$data)
Cue.rate | Cue.rate.SE | Search.time | species | visit | distance | object | visits | Study.Area | Region.Label | Sample.Label |
---|---|---|---|---|---|---|---|---|---|---|
1.4558 | 0.2428 | 10 | w | 1 | 50 | 38 | 2 | montrave 3 | Montrave | 1-1 |
1.4558 | 0.2428 | 10 | w | 1 | 55 | 39 | 2 | montrave 3 | Montrave | 1-1 |
1.4558 | 0.2428 | 10 | w | 1 | 55 | 40 | 2 | montrave 3 | Montrave | 1-1 |
1.4558 | 0.2428 | 10 | w | 1 | 55 | 41 | 2 | montrave 3 | Montrave | 1-1 |
1.4558 | 0.2428 | 10 | w | 2 | 50 | 46 | 2 | montrave 3 | Montrave | 1-2 |
1.4558 | 0.2428 | 10 | w | 2 | 50 | 47 | 2 | montrave 3 | Montrave | 1-2 |
wren.lt <- convert_project("P:\\distance.2019\\for-readdst\\Wren4\\D70Wren4")
## Warning in get_data(data_file): Data contains transects with repeated
## visits, 'Sample.Label's will not match Distance for Windows
hist(wren.lt[["New Analysis"]]$env$data$distance, main="Wren line transect")
Quite a verbose analysis name in the Distance for Windows project.
pellets <- convert_project("P:\\distance.2019\\for-readdst\\Deer pellets solution\\D70new full sika")
hist(pellets$`HNcos mult 10% trunc stratified encounter rate (wght effort)`$env$data$distance,
main="Sika pellets")
On the off chance you might wish to re-run an analysis you conducted in Distance for Windows, there is another function in readdst
named run_analysis()
. Demonstrate with the amakihi data
amakihi <- convert_project("P:\\distance.2019\\for-readdst\\fTAMAUK07\\D70fTAMAUK07")
hist(amakihi$`e5 - HR by strat w82.5`$env$data$distance)
tmp <- run_analysis(amakihi$`e5 - HR by strat w82.5`)
plot(tmp[[4]], pdf=TRUE, main="July 1993 survey")
Don’t expect the results of detection function fitting in R to exactly match the detection functions fitted by Distance for Windows. The reasons for this are interesting only to the pathologically statistical; suffice it to say, there are different pieces of software doing the parameter estimation and there can be disagreements about the best fit, particularly for complex models.
You can download a PDF poster describing the workings of readdst
in more detail from our Github site