# Preparing survey data for spatial analysis ### \textit{Practical 2, Intermediate Distance Sampling workshop, CREEM, 2018} ## Aims By the end of this practical, you should feel comfortable: - Loading data from a geodatabase file into R - Removing and renaming columns in a `data.frame` - Saving data to an `RData` file Note we can (and should) re-run this file when we update the `Analysis.gdb` file to ensure that the data R uses has all of the covariates we want to use in our analysis. ## Preamble Load some useful packages: ```{r load-packages, message=FALSE} library(rgdal) library(knitr) ``` ## Load and arrange data To fit our spatial models we require three objects: 1. A detection function. 2. The segment data (sometimes called effort data). This tells us how much effort was expended per segment (in this case how far the boat went) and includes the covariates that we want to use to fit our model. 3. The observation table. This links the observations in the detection function object to the segments. In R we can use the `rgdal` package to access the geodatabase files generated by ArcGIS (R can also access shapefiles and rasters) there are also interfaces to other GIS packages. It can be useful in general to see which "layers" are available in the geodatabase, for that we can use the `ogrListLayers()` function: ```{r list-layers} ogrListLayers("Analysis.gdb") ``` ### Segment data For our analysis the segment data is located in in the "Segment_Centroids" table in the geodatabase. We can import that into R using the `readOGR()` function: ```{r segs-data-load} segs <- readOGR("Analysis.gdb", layer="Segment_Centroids") ``` To verify we have the right data we can plot it. This will give the locations of each segment: ```{r segs-data-plot, fig.width=4, fig.height=4, fig.cap="Segment centroid locations for sperm whale dataset."} plot(segs) ``` A further check would be to use `head()` to check that the structure of the data is correct. In particular it's worth checking that the column names are correct and that the number of rows in the data set are correct (`dim()` will give the number of rows and columns). It can also be useful to check that the columns are the correct data types. Calling `str(segs@data)` (or any object loaded using `readOGR` appended with `@data`) will reveal the data types of each column. In this case we can see that the `CenterTime` column has been interpreted as a `factor` variable rather than as a date/time. We're not going to use it in our analysis, so we don't need to worry for now but `str()` can reveal potential problems with loaded data. For a deeper look at the values in the data, `summary()` will give summary statistics for each of the covariates as well as the projection and range of location values (lat/long or in our case `x` and `y`). We can compare these with values in ArcGIS. We can turn the object into a `data.frame` (so R can better understand it) and then check that it looks like it's in the right format using `head()`: ```{r segs-data-df} segs <- as.data.frame(segs) head(segs) ``` As with the distance data, we need to give the columns of the data particular names for them to work with `dsm`: ```{r rename-segs-cols} segs$x <- segs$POINT_X segs$y <- segs$POINT_Y segs$Effort <- segs$Length segs$Sample.Label <- segs$SegmentID ``` ### Observation data The observation data is exactly what we used to fit out detection function in the previous exercise (though this is not necessarily always true). ```{r obs-data-load} obs <- readOGR("Analysis.gdb", layer="Sightings") ``` Again we can use a plot to see whether the data looks okay. This time we only have the locations of the observations: ```{r obs-data-plot, fig.height=4, fig.width=4, fig.cap="Sighting locations for sperm whale dataset."} plot(obs) ``` Again, converting the object to be a `data.frame` and checking its format using `head()`: ```{r obs-data-df} obs <- as.data.frame(obs) head(obs) ``` Finally, we need to rename some of the columns: ```{r rename-obs-cols} obs$distance <- obs$Distance obs$object <- obs$SightingID obs$Sample.Label <- obs$SegmentID obs$size <- obs$GroupSize ``` ## Save the data We can now save the `data.frame`s that we've created into an `RData` file so we can use them later. ```{r save-models} save(segs, obs, file="sperm-data.RData") ```