Go through this document one chunk at a time to insert new soil moisture measurements data into the database.
First, load the necessary libraries
library(RPostgreSQL)
library(readr)
library(dplyr)
library(tidyr)
library(uuid)
library(magrittr)
library(kableExtra)
library(DT)
Then connect to the database
password <- scan(".pgpass", what="")
db <- dbConnect(PostgreSQL(),
host = "sesync-postgis01.research.sesync.org",
dbname = "choptank",
user = "palmergroup",
password = password)
Load up all of the functions defined in scripts in the R/
folder at once.
pathnames <- list.files(pattern="[.]R$", path="R", full.names=TRUE)
sapply(pathnames, FUN=source)
Upload the spreadsheet with your new soil moisture data. Either use a file browser:
new_measurements <- file.choose()
Or supply the file path:
new_measurements <- read_csv("../odm2-samples/data/Field_measurements - SOIL_field_measurements.csv")
The new_measurements
data frame should look something like this:
new_measurements %>% head() %>% kable(format = "markdown")
Date | Time | Site | Location | VWC_percent | soil_EC_mscm | soil_temp_C |
---|---|---|---|---|---|---|
November 28, 2017 | 08:52:00 | QB | SC B | 48.2 | 0.28 | 8.3 |
November 28, 2017 | 08:54:00 | QB | SC C | 42.0 | 0.27 | 8.1 |
November 28, 2017 | 08:57:00 | QB | SC D | 28.3 | 0.19 | 8.0 |
November 28, 2017 | 09:01:00 | QB | SC E | 12.8 | 0.16 | 7.8 |
November 28, 2017 | 10:14:00 | DK | SC A | 68.3 | 0.22 | 10.0 |
November 28, 2017 | 10:20:00 | DK | SC B | 57.9 | 0.23 | 10.1 |
Make sure the column names for data match the template. This chunk should return TRUE
.
all(c("VWC_percent", "soil_EC_mscm", "soil_temp_C") %in% names(new_measurements))
## [1] TRUE
Make a new column with site code names called samplingfeaturecode if it isn’t there already. If you have the wetland site code and the transect point in 2 separate columns, you can combine them using paste.
new_measurements$samplingfeaturecode <- paste(new_measurements$Site, new_measurements$Location)
Make sure all the sites in the new data are in the database by reading in the samplingfeatures table and checking all the site code names in the new data. The function check_samplingfeaturecodes (defined in helpers.R
) returns the names of any sites not in the database compared to a given vector.
check_samplingfeaturecodes(new_codes = new_measurements$samplingfeaturecode) %>% head()
## new_codes in_db
## 1 BB SC D FALSE
## 2 BB SC E FALSE
## 3 BB SC E FALSE
## 4 BB SC E FALSE
## 5 BB SC E FALSE
## 6 BB SC E FALSE
If you need help figuring out the exact site names, look for site names in the database with a given pattern using the function get_site_names_like()
get_site_names_like("QB SC")
## [1] "DB SC-E" "DB SC-D" "DB SC-C" "DB SC-B" "DB SC-A" "TB SC-A" "TB SC-B"
## [8] "TB SC-C" "TB SC-D" "TB SC-E" "BB SC-E" "BB SC-D" "BB SC-B" "BB SC-A"
## [15] "QB SC-E" "QB SC-D" "QB SC-C" "QB SC-B" "QB SC-A"
Fix site code names if needed. For example, soil chamber sites might need hyphens added.
new_measurements$samplingfeaturecode <- gsub(pattern = "SC ", "SC-", new_measurements$samplingfeaturecode)
Check site names again if needed. If there are no rows returned then all the site names match sampling feature codes in the database.
check_samplingfeaturecodes(new_measurements$samplingfeaturecode)
## [1] new_codes in_db
## <0 rows> (or 0-length row.names)
There needs to be a defined method in the methods table about how samples were collected. Make sure that sample collection method exists by checking the methods table online or reading it directly from the database using the dbReadTable()
function. Note the methodcode or make a new one with the db_add_method()
function.
dbReadTable(db, c("odm2", "methods")) %>% kable()
methodid | methodtypecv | methodcode | methodname | methoddescription | methodlink | organizationid |
---|---|---|---|---|---|---|
1 | Specimen collection | waterisotopefield | water isotope sample collection | Collect water sample without headspace for water isotopes | NA | NA |
3 | Specimen analysis | waterisotopelab | Picarro L1102-i | D-18O and D-D ratios measured using a cavity ringdown spectroscopic liquid water laser isotope analyzer (Picarro L1102-i) in the Virginia Tech Water Quality Lab housed within the Biological Systems Engineering Department. | NA | NA |
5 | Instrument deployment | soilmoistureTDR | Soil moisture measurement | Point measurement of soil moisture using TDR probe with 1.5 inch probes | NA | NA |
# db_add_method(methodname = "Soil moisture measurement",
# methodcode = "soilmoistureTDR",
# methodtypecv = "Instrument deployment",
# methoddescription = "Point measurement of soil moisture using TDR probe with 1.5 inch probes")
Save the methodcode as a variable
methodcode = "soilmoistureTDR"
Actions need a correctly formatted date and time. Format the date and time. If no time is given, assume midnight (to know that it isn’t a specific time). Also define the UTC offset variable (assuming it is the same as your computer’s timezone). Use ?strptime
to find the correct character string for the input data or refer to strtime.org
new_measurements$datetime <- strptime(
paste(new_measurements$Date, new_measurements$Time),
format = "%B %d, %Y %H:%M:%S", tz = "")
Define UTC offset as a variable. This will be either -4 or -5 depending on whether its daylight savings time or not.
utcoffset <- format(Sys.time(), "%z") %>% substr(1, 3) %>% as.integer()
Coming soon!
Add an annotation to an action or result
# dbReadTable(db, c("odm2", "cv_annotationtype"))
# db_add_annotation(annotationtypecv = "Action annotation",
# annotationtext = "measured less than 24 hours after rain")
Specify some necessary metadata parameters. You will likely not need to change these. Kelly should probably add these as default function arguments so its not cluttering up this document as much.
actiontypecv = "Instrument deployment"
resulttypecv = "Measurement"
censorcodecv = "Not censored"
qualitycodecv = "Unknown"
aggregationstatisticcv = "Sporadic"
timeaggregationinterval = 1 # assume 1 minute
timeaggregationintervalunitsid = dbGetQuery(db, "SELECT unitsid FROM odm2.units WHERE unitsname = 'Minute'") # minute
processlinglevelid = 1 # definition = "Raw data" # processinglevelcode = 0
valuecount = 1
sampledmediumcv = "Soil"
This is where the magic happens! The function db_insert_measurements()
writes and runs SQL code to upload data from one row of the new_measurements data frame at a time. So db_insert_measurements(1)
would insert data from the first row. Use an apply statement to upload all of the data at once. The eval=FALSE
argument means that this code will not run if you knit the whole document.
# db_insert_measurements_sm(1) # for just one measurement
sapply(1:nrow(new_measurements),
function(x) db_insert_measurements_sm(x))
Use this chunk to make a data frame called sm_db with all the soil data in the database
get_sm_data <- function(){
sql <- paste0("SELECT mrv.datavalue, mrv.valuedatetime, sf.samplingfeaturecode, r.featureactionid, v.variablecode, u.unitsname
FROM odm2.measurementresultvalues mrv, odm2.results r, odm2.variables v, odm2.units u, odm2.samplingfeatures sf, odm2.featureactions fa
WHERE r.variableid = v.variableid
AND r.featureactionid = fa.featureactionid
AND fa.samplingfeatureid = sf.samplingfeatureid
AND r.unitsid = u.unitsid
AND mrv.resultid = r.resultid
AND r.sampledmediumcv = 'Soil'")
sql <- gsub("\n", "", sql)
dbGetQuery(db, sql)
}
sm_db <- get_sm_data()
organize back to the original format using the spread function in the tidyr package
sm_db_spread <- sm_db %>% group_by(samplingfeaturecode) %>%
dplyr::select(-unitsname) %>%
spread(variablecode, datavalue)
sm_db_spread %>% datatable()