Utility functions for BLE LTER IM team • bleutils

October 25th 2022

Orientation

bleutils is a collection of R functions for use by the BLE LTER IM team.

Installation

remotes::install_github("BLE-LTER/bleutils")

Usage

Manipulating Core Program data

First, load the example data. cp_data is included with bleutils and is a minimal example of what investigators need to give the IM team before we can proceed.

library(bleutils)
df <- cp_data

Adding standard columns to CP data

add_cp_cols will add columns such as node, lagoon, lat/lon, etc… to a data.frame, provided a column of station codes, e.g. “EWLD1”. Use this to quickly add the required columns to a dataset.

df <- add_cp_cols(df, station_code_col = "station")

Note that this function relies on the stations data.frame packaged with bleutils. The canon copy of this is kept on Box. To check if bleutils has the latest version, use update_cp_stations and point it to where Box is on your local machine. If the Box version differs from what bleutils has, this will update the package version. If this happens, make sure to push the change to GitHub and re-install bleutils wherever it is used.

update_cp_stations(source_file = "path_goes_here")

Inferring sampling season based on dates

infer_season outputs a vector of categorical season values (under ice/break up/open water) from a vector of date values (column in data.frame). If the date vector is of type chr instead of Date or POSIXct (i.e., datetimes) as in the case of the example data, you will first need to convert the vector.

# Example when dates are chr, without a time portion, e.g., "6/27/2019"
library(lubridate)
df$date_collected = as.Date(parse_date_time(df$date_collected, "mdy"))

# Now infer the season
df$season <- infer_season(df, date_col = "date_collected")

Order data columns

Order columns according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that’s done before running this function.

# In the example data, date_collected must be renamed to date_time
names(df)[names(df)=="date_collected"] <- "date_time"

# Now order the columns
df <- order_cp_cols(df, type = "water")

Sort data rows

Order data rows according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that’s done before running this function.

df <- sort_cp_rows(df, type = "water")

Other useful IM tasks

Initializing dataset

init_datapkg creates a directory structure and a templated R script for a new dataset. It will warn you if the dataset ID already exists (based on directory names in base_path) and messages the next available numeric ID.

init_datapkg(base_path = getwd(),
             dataset_id = 9999L,
             dataset_nickname = "test")

Initializing a data processing script

init_script creates an R script and populates it from template. For a new dataset, init_datapkg calls init_script(type = "init") and there’s no further action needed. For updating existing datasets, use the argument type = "update". Note that the file directory must exist before running this function.

init_script(dataset_id = 9999,
            file_dir = file.path(getwd(), "9999_test", "EML_RProject_9999", "2022"), 
            type = "update")

The main difference is that an update script also has code to pull the latest version of the dataset from EDI’s production server, so the new data can be appended to it.

The templates live in inst/ if updates are needed.

Appending units to metadata attribute names

We decided to append abbreviated units at the end of attribute names (see BLE handbook). To facilitate that, wrap calls to MetaEgress::get_meta() in bleutils::append_units. This saves us from having to actually have the units in attribute names in metabase, and makes updates easier.

From this point example commands will not work without being able to connect to an instance of metabase. Feel free to run them if your machine can connect to an instance of metabase though.

metadata <- MetaEgress::get_meta(dbname = "ble_metabase",
                                 dataset_ids = 13, 
                                 user = "insert_or_enter_in_console", 
                                 password = "insert_or_enter_in_console")
metadata <- append_units(metadata)

Renaming attributes to match metadata

rename_attributes renames all columns in a data.frame to what’s in the metadata. The usual assumptions apply: that there are as many columns in data as in metadata, and they are ordered the same way.

df <- rename_attributes(metadata,
                        dataset_id = 13,
                        entity = 1,
                        x = df,
                        append_units = TRUE)

Exporting personnel from metabase to CSV

BLE’s Core Program datasets include a CSV of personnel with the years of data they were involved in. This information is stored in metabase and is specific to BLE. MetaEgress::get_meta() doesn’t query this information, so use:

export_personnel(dataset_ids = 13,
                 file_name = "BLE_LTER_chlorophyll_personnel.csv")