Aims

By the end of this practical, you should feel comfortable:

Loading data from a geodatabase file into R
Removing and renaming columns in a data.frame
Saving data to an RData file

Note we can (and should) re-run this file when we update the Analysis.gdb file to ensure that the data R uses has all of the covariates we want to use in our analysis.

Preamble

Load some useful packages:

library(rgdal)
library(knitr)

Load and arrange data

To fit our spatial models we require three objects:

The detection function we fitted previously.
The segment data (sometimes called effort data). This tells us how much effort was expended per segment (in this case how far the boat went) and includes the covariates that we want to use to fit our model.
The observation table. This links the observations in the detection function object to the segments.

In R we can use the rgdal package to access the geodatabase files generated by ArcGIS (R can also access shapefiles and rasters).

It can be useful in general to see which “layers” are available in the geodatabase, for that we can use the ogrListLayers() function:

ogrListLayers("Analysis.gdb")

## [1] "Study_Area"        "US_Atlantic_EEZ"   "Sightings"        
## [4] "Tracklines"        "Segments"          "Segment_Centroids"
## attr(,"driver")
## [1] "OpenFileGDB"
## attr(,"nlayers")
## [1] 6

Segment data

For our analysis the segment data is located in in the “Segment_Centroids” table in the geodatabase. We can import that into R using the readOGR() function:

segs <- readOGR("Analysis.gdb", layer="Segment_Centroids")

## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Segment_Centroids"
## with 949 features
## It has 5 fields

To verify we have the right data we can plot it. This will give the locations of each segment:

plot(segs)

A further check would be to use head() to check that the structure of the data is correct. In particular it’s worth checking that the column names are correct and that the number of rows in the data set are correct (dim() will give the number of rows and columns).

It can also be useful to check that the columns are the correct data types. Calling str(segs@data) (or any object loaded using readOGR appended with @data) will reveal the data types of each column. In this case we can see that the CenterTime column has been interpreted as a factor variable rather than as a date/time. We’re not going to use it in our analysis, so we don’t need to worry for now but str() can reveal potential problems with loaded data.

For a deeper look at the values in the data, summary() will give summary statistics for each of the covariates as well as the projection and range of location values (lat/long or in our case x and y). We can compare these with values in ArcGIS.

We can turn the object into a data.frame (so R can better understand it) and then check that it looks like it’s in the right format using head():

segs <- as.data.frame(segs)
head(segs)

##            CenterTime SegmentID   Length  POINT_X  POINT_Y coords.x1
## 1 2004/06/24 07:27:04         1 10288.91 214544.0 689074.3  214544.0
## 2 2004/06/24 08:08:04         2 10288.91 222654.3 682781.0  222654.3
## 3 2004/06/24 09:03:18         3 10288.91 230279.9 675473.3  230279.9
## 4 2004/06/24 09:51:27         4 10288.91 239328.9 666646.3  239328.9
## 5 2004/06/24 10:25:39         5 10288.91 246686.5 659459.2  246686.5
## 6 2004/06/24 11:00:22         6 10288.91 254307.0 652547.2  254307.0
##   coords.x2
## 1  689074.3
## 2  682781.0
## 3  675473.3
## 4  666646.3
## 5  659459.2
## 6  652547.2

As with the distance data, we need to give the columns of the data particular names for them to work with dsm:

segs$x <- segs$POINT_X
segs$y <- segs$POINT_Y
segs$Effort <- segs$Length
segs$Sample.Label <- segs$SegmentID

Observation data

The observation data is exactly what we used to fit out detection function in the previous exercise (though this is not necessarily always true).

obs <- readOGR("Analysis.gdb", layer="Sightings")

## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Sightings"
## with 137 features
## It has 7 fields

Again we can use a plot to see whether the data looks okay. This time we only have the locations of the observations:

plot(obs)

Again, converting the object to be a data.frame and checking it’s format using head():

obs <- as.data.frame(obs)
head(obs)

##    Survey GroupSize SeaState  Distance        SightingTime SightingID
## 1 en04395         2      3.0  246.0173 2004/06/28 10:22:21          1
## 2 en04395         2      2.5 1632.3934 2004/06/28 13:18:14          2
## 3 en04395         1      3.0 2368.9941 2004/06/28 14:13:34          3
## 4 en04395         1      3.5  244.6977 2004/06/28 15:06:01          4
## 5 en04395         1      4.0 2081.3468 2004/06/29 10:48:31          5
## 6 en04395         1      2.4 1149.2632 2004/06/29 14:35:34          6
##   SegmentID coords.x1 coords.x2
## 1        48   -65.636    39.576
## 2        50   -65.648    39.746
## 3        51   -65.692    39.843
## 4        52   -65.717    39.967
## 5        56   -65.820    40.279
## 6        59   -65.938    40.612

Finally, we need to rename some of the columns:

obs$distance <- obs$Distance
obs$object <- obs$SightingID
obs$Sample.Label <- obs$SegmentID
obs$size <- obs$GroupSize

Save the data

We can now save the data.frames that we’ve created into an RData file so we can use them later.

save(segs, obs, file="sperm-data.RData")