Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Reproducible Data Science in R: Flexible functions using tidy evaluation

Improve your functions with helpful dataframe evaluation patterns.

Date Posted December 17, 2024 Last Updated December 17, 2024
Author Elise Hinman
Reading Time 17 minutes Share
A smiling raindrop in a hardhat works on a pipe assembly line beneath the blog series title, Reproducible Data Science with R and to the left of the blog post title, Flexible functions using tidy evaluation.

Overview

This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.

Where we're going: This article addresses the common challenges associated with writing flexible data wrangling functions that leverage the tidyverse ecosystem for dataframe operations. It will first introduce the quirks and patterns of tidy evaluation before covering an example that utilizes techniques such as data masking and tidy selection to support function flexibility. By the end of the blog, the reader will be familiar with tidy evaluation patterns useful for writing functions. This post may also serve as a reference for R programmers looking for a reminder on these techniques.

Audience: This article assumes some familiarity with writing functions in R and using common dplyr functions.

Function flexibility challenges

In code development, choosing the ideal level of input flexibility for a custom-written function is a non-trivial decision. While flexible functions are often broadly applicable across coding projects (yay reusability!), they require more up front prep work to ensure specifications are clearly documented AND there’s consistency in the code’s functionality across the range of allowable inputs.

In particular, combining function input flexibility with the tidyverse ecosystem often requires adding some new pieces of syntax to the mix: for instance, have you noticed that dplyr functions don’t require you to add quotation marks around variable names? This is generally a very handy feature that saves time and keystrokes. However, when these functions are within custom functions that you write, they require some special handling. Why? Because in an attempt to make tidyverse functions efficient to use, the developers tweaked the way the functions evaluate inputs, straying away from the standard evaluation methods used by R. This form of non-standard evaluation is termed (surprise 🥳!) tidy evaluation.

Have you seen syntax like {{ }}, !!, .data[[ ]], or quo()/enquo() before? These are all examples of methods to alter evaluation contexts in R, which are particularly helpful while building your own flexible functions that leverage the tidyverse to seamlessly perform:

  • dynamic variable selection
  • grouped variable operations
  • plot parameterization

Evaluation in R

In the context of tidy evaluation, it bears reiterating (from previous blog posts ) that environments play a role in the process of evaluating code in R. When you open up an R session, any objects you define (including functions) will be added to the global environment. Functions you have authored also have their own unique environments created and destroyed every time you run them. If you introduce something into your function that isn’t explicitly defined in the function or its inputs, like x in the following function, add_stuff:

x <- 5

add_stuff <- function(y){
  z <- x + y
  return(z)
}

add_stuff(y = 10)
#> [1] 15

…R will look for x in the add_stuff function environment first, but it’s not there (the only input is y)! It will then look into the function’s enclosing environment, which in this case is the global environment, locate the expression x <- 5, and use that in its evaluation of z <- x + y.

Functions in the tidyverse excel at modifying evaluation in an attempt to improve the user experience. Let’s take a look at a base R and tidyverse approach to the same operation. In the example below, we are selecting two columns from the palmerpenguins dataset by name:

library(palmerpenguins)
library(dplyr)

# base R
base_selection <- penguins[,c("species", "island")]

# dplyr
dplyr_selection <- select(penguins, species, island)

Notice how in base R, we select columns using a vector of strings in quotation marks, but using dplyr::select(), we don’t need to use quotation marks to denote column names. This tidyverse implementation strays from the usual evaluation methods in R, because the normal base R method would look for objects called species and island throughout all enclosing environments, turn up empty handed, and throw an error. The tidyverse’s “slight of hand” under the hood sometimes delays evaluation of an input, or searches for an expression in a way that does not match R’s out-of-the-box methods for evaluation.

In the above example, you can think of dplyr invisibly adding quotation marks to the arguments in select() before evaluating them and recognizes the inputs to select() as columns in the penguins dataframe. This is called data masking and is the result of dplyr essentially attaching the dataframe as its most immediate environment to search for data variables (like column names). Note that data variables are specific to the dataframe and are distinct from environment variables, which are present in the global environment and other enclosing environments (e.g. the dataframe penguins is an environment variable). The following dplyr functions utilize data masking:

  • arrange()
  • count()
  • filter()
  • group_by()
  • mutate()
  • summarise()

Another prominent tidy evaluation method is called tidy selection, which leverages a useful set of modifiers like all_of(), any_of(), starts_with(), and where() to determine which columns a dplyr function will use. The following dplyr functions utilize tidy selection (sometimes in addition to data masking):

  • across()
  • relocate()
  • rename()
  • select()
  • pull()

Tidy evaluation patterns

The following expandable sections cover the most common tidy evaluation techniques and use cases. Click on each heading to learn more about when and how to use each technique. Next, we will use a real-world data analysis scenario to go into more detail on each technique.

.data[[ ]] pronoun
Category: Data masking

Input format within function(): column_name = "Column_A"

When to use: Your function uses tidyverse functions on column name inputs denoted by strings. Note that in most cases, you cannot use .data[[ ]] on the left-hand side of an operation (i.e. to the left of an = assignment). There's a bit of a funky reason for this that has to do with the fact that .data[[ ]] doesn't actually represent a dataframe; instead it is a construct used to grab data variables in the context of the dataframe being evaluated (here's a great discussion about it on the Posit forum). In other words, it's more or less a "stand-in" for referencing elements of a dataframe.

If you want to perform an operation (using something like mutate()) on a column you'd normally pass to the .data[[ ]] pronoun, you usually have to use the curly-curly braces on the left-hand side:{{ }}. Check out the next expandable section for more information on curly-curly.

Caveat: you can use .data[[ ]] on the left-hand side of a filtering function, like this: filter(.data[[column_name]] == .data[["Column_B"]]). Tricky.

Example structure: mutate(new_column = .data[[ column_name ]] * 2)

curly-curly (a.k.a. embrace) {{ }} and injection :=
Category: Data masking

Input format within function(): column_name = Column_A; column_name = "Column_A"

When to use: Your function uses tidyverse functions on variable column name inputs, and the .data[[ ]] pronoun won't work. If your function accepts column names without quotation marks, you can use {{ }} on both the left and right sides of an operation, like within mutate(). If your function accepts column names with quotation marks, you can only use {{ }} on the left-hand side of the operation. The use of curly-curly is a recent addition to the tidy evaluation suite, replacing most uses of !! (bang bang) and quo()/enquo(). Check out this section of Stanford's Data Challenge Lab course for an excellent explanation of curly-curly's predecessor, the !!enquo() pattern.

Example structure(s): mutate({{ column_name }} := {{ column_name }} * 2)
mutate({{ column_name }} := .data[[ column_name ]] * 2)

all_of() and any_of()
Category: Tidy selection

Input format within function(): column_names = c("Column_A", "Column_B")

When to use: Your function performs tidyverse operations on more than one input column at a time and those column names have quotation marks.

Example structure: select(all_of(column_names))

across()
Category: Both

Input format within function(): column_names = c("Column_A", "Column_B")

When to use: You want to pass multiple column names with quotation marks to a function that uses data masking.

Example structure: group_by(across(all_of(column_names)))

"{ }_name" :=
Category: Data masking

Input format within function():column_name = "Column_A"

When to use: You want to use a column name input to name a new column produced by your function (using tidyverse operations) and the column name is in quotation marks. Note that this trick leverages the glue package to paste together inputs and character strings, and elsewhere you may need to call glue() specifically to include this functionality.

Example structure: mutate("{ column_name }_timestwo" := .data[[ column_name ]] * 2)

"{{ }}_name" :=
Category: Data masking

Input format within function():column_name = Column_A

When to use: You want to use a column name input to name a new column produced by your function (using tidyverse operations) and the column name is not in quotation marks. Note that this trick leverages the glue package to paste together inputs and character strings, and elsewhere you may need to call glue() specifically to include this functionality (see data analysis workflow in next section for an example using the glue package).

Example structure: mutate("{{ column_name }}_timestwo" := {{ column_name }} * 2)

ellipsis ...
Category: Data masking

Input format within function():Column_A, Column_B

When to use: You want to pass a list of column name inputs without quotation marks directly into a tidyverse function or set of functions.

Example structure: group_by(...) or select(...)

Going deeper: an example workflow

Now that we have a better understanding of tidy evaluation and the patterns for working with it, let’s examine a scenario currently pertinent to users of Water Quality Portal (WQP) data. In 2024, the WQP team initiated a transition to the new WQX3.0 format from the WQX version 2 format (we will refer to it as “legacy”). While still under development, this switch includes changes to the field names (and in some cases, field values) returned from the database. NOTE that as of March 2024 newly uploaded or corrected USGS data are not available in the legacy data format. The dataRetrieval team has created an informative status page on this transition, complete with examples of WQX3.0 and legacy data pulls.

This scenario is a great opportunity to develop a function that leverages tidy evaluation in a way that it is flexible to both the legacy and WQX3.0 formats. Say we had already written a script that downloads data in the legacy format from the WQP from a single site and plots the most sampled water chemistry constituents (reported in micrograms per liter, ug/L) using an interactive ggplotly plot. We’ll use the Kawishiwi River outside of Ely, MN as our example site. Our legacy dataset might look something like this, using the most recent development version of dataRetrieval :

# Load required packages
# Install using install.packages() if you don't have them.
# Version numbers used provided in comments
library(dataRetrieval) # 2.7.17.9000, from GitHub
library(dplyr) # 1.1.4
library(lubridate) # 1.9.4
library(plotly) # 4.10.0
library(khroma) # 1.14.0, colorblind-accessible color palettes
library(glue) # 1.8.0
library(rlang) # 1.1.4

# Download legacy data
kawishiwi <- dataRetrieval::readWQPdata(
  siteid = 'USGS-05124480',
  service = 'Result'
)

# Take a look at what we're working with  
kawishiwi |>
  dplyr::arrange(ActivityStartDate, CharacteristicName) |>
  dplyr::select(ActivityStartDate, CharacteristicName,
                ResultMeasureValue, ResultMeasure.MeasureUnitCode) |>
  head()

#>   ActivityStartDate CharacteristicName ResultMeasureValue ResultMeasure.MeasureUnitCode
#> 1        1966-07-21      Acidity, (H+)            0.00013                          mg/l
#> 2        1966-07-21         Alkalinity                 11                    mg/l CaCO3
#> 3        1966-07-21           Altitude               1450                            ft
#> 4        1966-07-21           Aluminum                400                          ug/l
#> 5        1966-07-21        Bicarbonate                 13                          mg/l
#> 6        1966-07-21              Boron                 30                          ug/l
Our original script and plot output might look something like this (click to view code and plot):
# Make timeseries plot of WQ data

min_years <- 10

common_constituents <- kawishiwi |>
  # select columns used
  dplyr::select(ResultMeasure.MeasureUnitCode, ResultMeasureValue, 
                ActivityStartDate, CharacteristicName) |>
  # filter to micrograms per liter
  dplyr::filter(ResultMeasure.MeasureUnitCode %in% c("ug/l", "ug/L")) |>
  # ensure result value is numeric,
  # convert character string of dates to dates,
  # and grab year for filtering later
  dplyr::mutate(
    ResultMeasureValue_numeric = as.numeric(ResultMeasureValue),
    ActivityStartDate = as.Date(ActivityStartDate, "%y-%m-%d"),
    year = lubridate::year(ActivityStartDate)
  ) |>
  # get rid of NAs
  dplyr::filter(!is.na(ResultMeasureValue_numeric)) |>
  # arrange dataset by date
  dplyr::arrange(ActivityStartDate) |>
  # count the number of years of data collected
  # per characteristic, and then filter to
  # characteristics with at least n years of data
  dplyr::group_by(CharacteristicName) |>
  dplyr::mutate(n_years = length(unique(year))) |>
  dplyr::filter(n_years >= min_years) |>
  dplyr::ungroup()
  
# use ggplot to create a timeseries plot
p <- ggplot2::ggplot(data = common_constituents, 
                     ggplot2::aes(x = ActivityStartDate,
                         y = ResultMeasureValue_numeric,
                         color = CharacteristicName)) +
  ggplot2::geom_line() +
  ggplot2::geom_point() +
  ggplot2::labs(x = "Date",
                y = "Concentration (ug/l)",
                color = "Characteristic Name") +
  ggplot2::theme_minimal() +
  khroma::scale_color_vibrant()

# Make it an interactive plot!
plotly::ggplotly(p)
Interactive timeseries scatterplot of constituents with at least 10 years of data on the Kawishiwi River using data from legacy WQX. Hover over the points with your mouse to see result details and click on and off different constituents in the legend.

Next, let’s download the same dataset in the WQX3.0 format and take a look at its new structure:

# Download the WQX3.0 version data
kawishiwi_3.0 <- dataRetrieval::readWQPdata(
  siteid = 'USGS-05124480',
  service = 'ResultWQX3',
  dataProfile = 'fullPhysChem'
)

kawishiwi_3.0 |>
  dplyr::arrange(Activity_StartDate, Result_Characteristic) |>
  dplyr::select(Activity_StartDate, Result_Characteristic, Result_Measure, Result_MeasureUnit) |>
  head()

#>   Activity_StartDate Result_Characteristic Result_Measure Result_MeasureUnit
#> 1         1966-07-21         Acidity, (H+)        0.00013               mg/L
#> 2         1966-07-21            Alkalinity             11               mg/L
#> 3         1966-07-21              Altitude           1450                 ft
#> 4         1966-07-21              Aluminum            400               ug/L
#> 5         1966-07-21           Bicarbonate             13               mg/L
#> 6         1966-07-21                 Boron             30               ug/L

The challenge: As we suspected, the column names differ between the two WQX formats (e.g. ActivityStartDate versus Activity_StartDate), among other slight discrepancies (ug/"l" versus ug/"L" units). It would be a bit tedious and error-prone to copy, paste, and adapt the code above to our new dataset. And then what if another similar dataset comes along in the future?

The solution(s):With a little bit of forethought, we can tackle these dataframe variability challenges by writing a function that accepts the column names as inputs and leverages some tidy evaluation tricks. We will show you two solutions to generalizing the script into a function: one that uses quotation marks around the inputs, and one that does not.

Option 1: Use quotation marks around function inputs

The function below expects the column name inputs to have quotation marks around them, and leverages .data[[ ]] liberally. The function documentation is clear on this detail. Also notice that we build the new column name "{ result_column }_numeric" by pasting strings together using the glue() function from the glue package. glue() is helpful because it evaluates expressions in curly brackets and pastes the result into an input string:

#' Make timeseries plot of WQ data
#'
#' @param wqp_dataframe A dataframe containing water quality data.
#' @param unit_column A character string in quotation marks matching the name 
#'  of the unit column in`wqp_dataframe`.
#' @param date_column A character string in quotation marks matching the name 
#'  of the date column in `wqp_dataframe`. Expects the date column to be data 
#'  type character in the format "YYYY-MM-DD".
#' @param result_column A character string in quotation marks matching the name 
#'  of the result value column in `wqp_dataframe`. Expects the result value 
#'  column to be data type numeric.
#' @param grouping_column A character string in quotation marks matching the 
#'  name of the column being used to group the data in `wqp_dataframe`.
#' @param min_years Numeric. A single number indicating the minimum number of
#'  years of data collected.
#'
#' @return A plotly plot showing water quality constituent concentrations for
#'  which min_years or more years of data have been collected.
#'

plot_timeseries <- function(wqp_dataframe,
                            unit_column,
                            date_column,
                            result_column,
                            grouping_column,
                            min_years) {
  
  plot_dataframe <- wqp_dataframe |>
    dplyr::select(all_of(c(unit_column, date_column, result_column, grouping_column))) |>
    dplyr::mutate({{ unit_column }} := tolower(.data[[unit_column]])) |>
    # filter to micrograms per liter
    dplyr::filter(.data[[unit_column]] == "ug/l") |>
    # ensure result value is numeric,
    # convert character string of dates to dates,
    # and grab year for filtering later
    dplyr::mutate(
      "{ result_column }_numeric" := as.numeric(.data[[result_column]]),
      {{ date_column }} := as.Date(.data[[date_column]], "%y-%m-%d"),
      year = lubridate::year(.data[[date_column]])
    ) |>
    # get rid of NAs
    dplyr::filter(!is.na(.data[[glue::glue("{ result_column }_numeric")]])) |>
    # arrange dataset by date
    dplyr::arrange(.data[[date_column]]) |>
    # count the number of years of data collected
    # per characteristic, and then filter to
    # characteristics with at least n years of data
    dplyr::group_by(.data[[grouping_column]]) |>
    dplyr::mutate(n_years = length(unique(year))) |>
    dplyr::filter(n_years >= min_years) |>
    dplyr::ungroup()
  
  # use ggplot to create a timeseries plot
  p <- ggplot2::ggplot(data = plot_dataframe, ggplot2::aes(
    x = .data[[date_column]],
    y = .data[[glue::glue("{ result_column }_numeric")]],
    color = .data[[grouping_column]])) +
    ggplot2::geom_line() +
    ggplot2::geom_point() +
    ggplot2::labs(x = "Date", y = "Concentration (ug/l)") +
    ggplot2::theme_minimal() +
    khroma::scale_color_muted()
  
  # Make it an interactive plot!
  pi <- plotly::ggplotly(p)
  
  return(pi)
}

Let’s give this function a go with our new Kawishiwi River dataset from WQX3.0.

# Plot WQX3.0 dataset
plot_timeseries(
  wqp_dataframe = kawishiwi_3.0,
  unit_column = "Result_MeasureUnit",
  date_column = "Activity_StartDate",
  result_column = "Result_Measure",
  grouping_column = "Result_Characteristic",
  min_years = 10
)
Interactive timeseries scatterplot of constituents with at least 10 years of data on the Kawishiwi River using data from WQX3.0. Hover over the points with your mouse to see result details, drag a box around the data to zoom in, and click on and off different constituents in the legend.

Option 2: No quotation marks around function inputs

Note: new syntax ahead! In this example, we have one line that requires more esoteric syntax. This is because it's a bit thorny to create and then re-use a new column with a name dependent upon an input column that has been supplied to the function without quotation marks.

First, we define the new column name that will hold numeric result values. In order to make this column play nice with the rest, though, we first need to "defuse" the result_column input to an expression using rlang::enquo() and then convert that expression to a string using rlang::as_label(). Once the result column name is categorized as a string, it can then be pasted to the additional text used to describe the new column (_numeric in the example below). This results in a string in quotation marks that is then converted to a symbol that the tidyverse functions can understand in the context of the dataframe being evaluated.

The function below expects the column name inputs without quotation marks around them, and leverages {{ }} := {{ }} along with some rlang metaprogramming functions for column name genesis:

#' Make timeseries plot of WQ data
#'
#' @param wqp_dataframe A dataframe containing water quality data.
#' @param unit_column A character string matching the name of the unit column 
#'  in `wqp_dataframe`.
#' @param date_column A character string matching the name of the date column 
#'  in `wqp_dataframe`. Expects the date column to be data type character in 
#'  theformat "YYYY-MM-DD".
#' @param result_column A character string matching the name of the result 
#'  value column in `wqp_dataframe`. Expects the result value column to be 
#'  data type numeric.
#' @param grouping_column A character string matching the name of the column 
#'  being used to group the data in `wqp_dataframe`.
#' @param min_years Numeric. A single number indicating the minimum number of
#'  years of data collected.
#'
#' @return A plotly plot showing water quality constituent concentrations for
#'  which min_years or more years of data have been collected.
#'

plot_timeseries <- function(wqp_dataframe,
                            unit_column,
                            date_column,
                            result_column,
                            grouping_column,
                            min_years) {
  
  # create new column name for column that will hold numeric version
  # of result_column. First, we'll "defuse" the input column name by
  # wrapping it in enquo(), then convert it to a string-like object
  # so it can be pasted together with "_numeric". Lastly, we use sym
  # to convert it to a symbol, much like all of the other input column
  # names.
  numeric_result_column <- rlang::sym(
    paste0(rlang::as_label(rlang::enquo(result_column)), "_numeric")
    )
  
  plot_dataframe <- wqp_dataframe |>
    dplyr::select({{ unit_column }}, {{ date_column }},
                  {{ result_column }}, {{ grouping_column }}) |>
    dplyr::mutate({{ unit_column }} := tolower({{ unit_column }})) |>
    # filter to micrograms per liter
    dplyr::filter({{ unit_column }} == "ug/l") |>
    # ensure result value is numeric,
    # convert character string of dates to dates,
    # and grab year for filtering later
    dplyr::mutate(
      {{ numeric_result_column }} := as.numeric({{ result_column }}),
      {{ date_column }} := lubridate::ymd({{ date_column }}),
      year = lubridate::year({{ date_column }})
    ) |>
    # get rid of NAs
    dplyr::filter(!is.na({{ numeric_result_column }})) |>
    # arrange dataset by date
    dplyr::arrange({{ date_column }}) |>
    # count the number of years of data collected
    # per characteristic, and then filter to
    # characteristics with at least n years of data
    dplyr::group_by({{ grouping_column }}) |>
    dplyr::mutate(n_years = length(unique(year))) |>
    dplyr::filter(n_years >= min_years) |>
    dplyr::ungroup()
  
  # use plotly to plot all characteristics on same plot
  p <- ggplot2::ggplot(data = plot_dataframe, ggplot2::aes(
    x = {{ date_column }},
    y = {{ numeric_result_column }},
    color = {{ grouping_column }})) +
    ggplot2::geom_line() +
    ggplot2::geom_point() +
    ggplot2::labs(x = "Date", y = "Concentration (ug/l)") +
    ggplot2::theme_minimal() +
    khroma::scale_color_muted()
  
  # Make it an interactive plot!
  pi <- plotly::ggplotly(p)
  
  return(pi)
}

For fun, let’s try making a slightly different timeseries plot for kawishiwi_3.0, using characteristic group rather than characteristic name to visualize water quality through time:

plot_timeseries(
  wqp_dataframe = kawishiwi_3.0,
  unit_column = Result_MeasureUnit,
  date_column = Activity_StartDate,
  result_column = Result_Measure,
  grouping_column = Result_CharacteristicGroup,
  min_years = 10
)
Interactive timeseries scatterplot of constituent groups on the Kawishiwi River using data from WQX3.0. Hover over the points with your mouse to see result details, drag a box around the data to zoom in, and click on and off different constituents in the legend.

Well, would you look at that flexibility? 😎

Hopefully by now you’ve started gaining familiarity with tidy evaluation and feel inspired to give it a try in some of your upcoming analyses. This blog post only scratches the surface, and there is so much more to learn on this topic. If you want to dig deeper, check out the resources below, several of which go into the nitty gritty details of code evaluation.

Helpful resources

Quick tip: writing a package you want to submit to CRAN? Here's something to consider (click to expand)
Tidy evaluation uses syntax like := and .data, which fundamentally belong to the rlang package and are part of the suite of metaprogramming symbols/notations. If these symbols are not explicity imported from rlang in your package in the roxygen skeleton (ex. @importFrom rlang :=) and rlang is not added as an Import in the DESCRIPTION file, it can lead to issues with R CMD check/devtools::check throwing the note: no visible binding for global variable [insert symbol here]. Additionally, tidyverse's data variables construct can lead R CMD check to throw the "no visible binding" note because it confuses data variables for undefined global objects. One way to avoid this is to define your data variables as global variables using utils::globalVariables("[var1]", "[var2]"). Check out this Posit forum for information on importing rlang syntax into your package and this Posit forum for information regarding data variables in an R package.


Links to some additional reading on tidy evaluation:

References

DeCicco, L.A., Hirsch, R.M., Lorenz, D., Watkins, W.D., Johnson, M., 2024, dataRetrieval: R packages for discovering and retrieving water data available from Federal hydrologic web services, v.2.7.15, doi:10.5066/P9X4L3GE

Share:

Related Posts

  • Reproducible Data Science in R: Writing functions that work for you

    May 14, 2024

    Overview This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.

  • Reproducible Data Science in R: Iterate, don't duplicate

    July 18, 2024

    Overview This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.

  • Mapping water insecurity in R with tidycensus

    December 9, 2024

    Water insecurity can be influenced by number of social vulnerability indicators—from demographic characteristics to living conditions and socioeconomic status —that vary spatially across the U.

  • The Hydro Network-Linked Data Index

    November 2, 2020

    Introduction updated 11-2-2020 after updates described here . updated 9-20-2024 when the NLDI moved from labs.waterdata.usgs.gov to api.water.usgs.gov/nldi/ The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information.

  • Reproducible Data Science in R: Writing better functions

    June 17, 2024

    Overview This blog post is part of a series that works up from functional programming foundations through the use of the targets R package to create efficient, reproducible data workflows.