October 25th 2022
Usage
Manipulating Core Program data
First, load the example data. cp_data
is included with bleutils
and is a minimal example of what investigators need to give the IM team before we can proceed.
Adding standard columns to CP data
add_cp_cols
will add columns such as node, lagoon, lat/lon, etc… to a data.frame, provided a column of station codes, e.g. “EWLD1”. Use this to quickly add the required columns to a dataset.
df <- add_cp_cols(df, station_code_col = "station")
Note that this function relies on the stations
data.frame packaged with bleutils
. The canon copy of this is kept on Box. To check if bleutils
has the latest version, use update_cp_stations
and point it to where Box is on your local machine. If the Box version differs from what bleutils
has, this will update the package version. If this happens, make sure to push the change to GitHub and re-install bleutils
wherever it is used.
update_cp_stations(source_file = "path_goes_here")
Inferring sampling season based on dates
infer_season
outputs a vector of categorical season values (under ice/break up/open water) from a vector of date values (column in data.frame). If the date vector is of type chr instead of Date or POSIXct (i.e., datetimes) as in the case of the example data, you will first need to convert the vector.
# Example when dates are chr, without a time portion, e.g., "6/27/2019"
library(lubridate)
df$date_collected = as.Date(parse_date_time(df$date_collected, "mdy"))
# Now infer the season
df$season <- infer_season(df, date_col = "date_collected")
Order data columns
Order columns according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that’s done before running this function.
# In the example data, date_collected must be renamed to date_time
names(df)[names(df)=="date_collected"] <- "date_time"
# Now order the columns
df <- order_cp_cols(df, type = "water")
Sort data rows
Order data rows according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that’s done before running this function.
df <- sort_cp_rows(df, type = "water")
Other useful IM tasks
Initializing dataset
init_datapkg
creates a directory structure and a templated R script for a new dataset. It will warn you if the dataset ID already exists (based on directory names in base_path
) and messages the next available numeric ID.
init_datapkg(base_path = getwd(),
dataset_id = 9999L,
dataset_nickname = "test")
Initializing a data processing script
init_script
creates an R script and populates it from template. For a new dataset, init_datapkg
calls init_script(type = "init")
and there’s no further action needed. For updating existing datasets, use the argument type = "update"
. Note that the file directory must exist before running this function.
init_script(dataset_id = 9999,
file_dir = file.path(getwd(), "9999_test", "EML_RProject_9999", "2022"),
type = "update")
The main difference is that an update script also has code to pull the latest version of the dataset from EDI’s production server, so the new data can be appended to it.
The templates live in inst/
if updates are needed.
Appending units to metadata attribute names
We decided to append abbreviated units at the end of attribute names (see BLE handbook). To facilitate that, wrap calls to MetaEgress::get_meta()
in bleutils::append_units
. This saves us from having to actually have the units in attribute names in metabase, and makes updates easier.
From this point example commands will not work without being able to connect to an instance of metabase. Feel free to run them if your machine can connect to an instance of metabase though.
metadata <- MetaEgress::get_meta(dbname = "ble_metabase",
dataset_ids = 13,
user = "insert_or_enter_in_console",
password = "insert_or_enter_in_console")
metadata <- append_units(metadata)
Renaming attributes to match metadata
rename_attributes
renames all columns in a data.frame to what’s in the metadata. The usual assumptions apply: that there are as many columns in data as in metadata, and they are ordered the same way.
df <- rename_attributes(metadata,
dataset_id = 13,
entity = 1,
x = df,
append_units = TRUE)
Exporting personnel from metabase to CSV
BLE’s Core Program datasets include a CSV of personnel with the years of data they were involved in. This information is stored in metabase and is specific to BLE. MetaEgress::get_meta()
doesn’t query this information, so use:
export_personnel(dataset_ids = 13,
file_name = "BLE_LTER_chlorophyll_personnel.csv")