Introducing the neonSoilFlux package

Introduction

Welcome to the neonSoilFlux package! This vignette will guide you through the process of using this package to acquire and compute soil CO\(_{2}\) fluxes at different sites in the National Ecological Observatory Network.

You can think about this package working in two primary phases:

  1. acquiring the environment data for a given month at a NEON site (acquire_neon_data). This includes:
    1. Soil temperature at different depths.
    2. Soil water content at different depths.
    3. Soil CO\(_{2}\) concentration.
    4. Atmospheric pressure
    5. Soil properties (bulk density, others)
  2. Given those properties, computing the soil surface fluxes and the associated uncertainty using a variety of methods to compute fluxes (compute_neon_flux).

We split these two functions in order to optimize time and that both were fundamentally different processes. Acquiring the NEON data makes use of the neonUtilities package.

This package takes the guess work out of which data products to collect, hoping to reduce the workflow needed. We rely very much on the tidyverse philosophy for computation and coding here. Figure @ref(fig:diagram) dispalys a conceptual diagram of the processes in neonSoilFlux.

Model diagram for data workflow for the `neonSoilFlux` R package. a) Acquire: Data are obtained from given NEON location and horizontal sensor location, which includes soil water content, soil temperature, CO$_{2}$ concentration, and atmospheric pressure. All data are screened for quality assurance; if gap-filling of missing data occurs, it is flagged for the user. b) Any belowground data are then harmonized to the same depth as CO$_{2}$ concentrations using linear regression. c) The flux across a given depth is computed via Fick's law, denoted with $F_{ijk}$, where $i$, $j$, or $k$ are either 0 or 1 denoting the layers the flux is computed across ($i$ = closest to surface, $k$ = deepest). $F_{000}$ represents a flux estimate where the gradient $dC/dz$ is the slope of a linear regression of CO$_{2}$ with depth.

Model diagram for data workflow for the neonSoilFlux R package. a) Acquire: Data are obtained from given NEON location and horizontal sensor location, which includes soil water content, soil temperature, CO\(_{2}\) concentration, and atmospheric pressure. All data are screened for quality assurance; if gap-filling of missing data occurs, it is flagged for the user. b) Any belowground data are then harmonized to the same depth as CO\(_{2}\) concentrations using linear regression. c) The flux across a given depth is computed via Fick’s law, denoted with \(F_{ijk}\), where \(i\), \(j\), or \(k\) are either 0 or 1 denoting the layers the flux is computed across (\(i\) = closest to surface, \(k\) = deepest). \(F_{000}\) represents a flux estimate where the gradient \(dC/dz\) is the slope of a linear regression of CO\(_{2}\) with depth.

Each NEON site has three measurement layers, so we denote the flux between which two layers as a three-digit subscript \(F_{ijk}\) with indicator variables \(i\), \(j\), and \(k\) indicate if a given layer was used (written in order of increasing depth), according to the following:

Uncertainty in all \(F_{ijk}\) is computed through quadrature (Taylor 2022).

Acquiring environmental data

Load up the relevant libraries:

library(tidyverse)
library(neonSoilFlux)

Let’s say we want to acquire the NEON soil data at the SJER site during the month June in 2021:

out_env_data <- acquire_neon_data(site_name = 'SJER',
                  download_date = '2022-06',
                  )

The output out_env_data for acquire_neon_data is a list of lists:

Two required inputs are needed to run the function acquire_neon_data:

The provisional option allows acquisition of NEON data not in an annual release cycle.

As the data are acquired various messages from the loadByProduct function from the neonUtilities package are shown - this is normal. Products are acquired from each spatial location (horizontalPosition) or vertical depth (verticalPosition) at a NEON site.

Outputs for acquire_neon_data are two nested data frames:

Data preparation

For each data product, the acquire_neon_data function also performs two additional checks:

  • The soil water content data product requires some additional calibration to correct both the soil sensor depth and calibration in the function swc_correct. Information about regarding this correction is found here: LINK. Once updated sensors are installed in the future we will depreciate this function.
  • The actual measurement depth (in meters) is extracted for each position.
  • The monthly mean for each measurement at each depth is computed, described below.

Computing the monthly mean

The monthly mean is utilized when a given measurement fails final QF checks. This function is provided by code from Zoey Werbin.

For a given half-hour, if any input variable \(\mathbf{m}\) (soil CO\(_2\) concentration, soil temperature, or soil moisture) at depth \(z\) is QA flagged. In this situation flagged measurements and their uncertainties were replaced with a bootstrapped monthly mean (\(\overline{m}\)) and monthly standard deviation (\(\overline{s}\)).

For each month, depth \(z\), and variable, we computed bootstrapped estimates of \(\overline{m}\) and \(\overline{s}\) from the vectors of unflagged measurements (\(\mathbf{m}\)), reported standard errors (\(\boldsymbol\sigma\)), and the 95% confidence interval (\(\boldsymbol\epsilon\), or expanded uncertainty). We also defined a bias vector \(\mathbf{b}=\sqrt{\boldsymbol\epsilon^{2}-\boldsymbol\sigma^{2}}\), which quantifies the spread of uncertainty in a given period and is incorporated into \(\overline{m}\). Each of these vectors (\(\mathbf{m}, \boldsymbol\sigma, \boldsymbol\epsilon, \mathbf{b}\)).

From these, 5000 bootstrap samples were generated for \(\mathbf{m}, \boldsymbol\sigma\), and \(\mathbf{b}\). For each sample (\(m_k, b_k, \sigma_k\)), we generated a vector \(\mathbf{n}\) (length \(N=5000\)) by drawing from a normal distribution with mean \(m_k+b_k\) and standard deviation \(\sigma_k\). The sample mean and standard deviation were then computed from \(\mathbf{n}\). The resulting distributions of sample means and sample standard deviations provided the bootstrapped monthly mean (\(\overline{m}\)) and standard error (\(\overline{s}\)) respectively.

If more than 15 days of a measurement are flagged, no gap-filling is conducted.

Visualizing outputs

With the resulting output from acquire_neon_data, you can then unnest the different data frames to make plots, for example:

# Extract data
VSWC_data <- out_env_data$site_data |>
  filter(measurement == 'VSWC') |>
  unnest(cols=c("data"))

# Plot data
VSWC_data |>
  ggplot(aes(x=startDateTime,y=VSWCMean)) +
  geom_point(aes(color=as.factor(VSWCFinalQF))) +
  facet_grid(verticalPosition~horizontalPosition) +
  labs(color = "QF Flags")

Diffusivity calculations

A key factor to consider is the soil diffusivity, \(D_{a}\). Soil diffusivity \(D_{a}\) at a given measurement depth is the product of the diffusivity in free air \(D_{a,0}\) (m\(^{2}\) s\(^{-1}\)) and the tortuosity \(\xi\) (no units) (millingtonDiffusionAggregatedPorous1971?).

We compute \(D_{a,0}\) with Equation @ref(eq:da0):

\[\begin{equation} D_{a,0} = 0.0000147 \cdot \left( \frac{T_{i} + 273.15}{293.15} \right)^{1.75} \cdot \left( \frac{P}{101.3} \right) \label{eq:da0} \end{equation}\]

where \(T_{i}\) is soil temperature (\(^\circ\)C) at depth \(i\) and \(P\) surface barometric pressure (kPa).

At low soil water content, the choice of tortuosity model can lead to order-of-magnitude differences in \(D_{a}\), which in turn affect modeled soil fluxes. The neonSoilFlux package currently includes two approaches to calculate \(\xi\), representing the range of tortuosity behavior reported in (sallamMeasurementGasDiffusion1984?).

The first approach is the Millington-Quirk model (millingtonDiffusionAggregatedPorous1971?), in which tortuosity depends on both porosity and soil water content:

\[\begin{equation} \xi = \frac{(\phi - SWC_{i})^{10/3}}{\phi^{2}} \label{eq:tortuosity-mq} \end{equation}\]

In Equation @ref(eq:tortuosity-mq), \(SWC\) is the soil water content at depth \(i\) and \(\phi\) is the porosity (determined in the data tables soil_megapit):

\[\begin{equation} \phi = \left(1- \frac{\rho_{s}}{\rho_{m}} \right) \left(1-f_{V}\right) \label{eq:porosity} \end{equation}\]

In Equation @ref(eq:porosity), \(\rho_{m}\) is the particle density of mineral soil (2.65 g cm\(^{-3}\)), \(\rho_{s}\) the soil bulk density (g cm\(^{-3}\)) excluding coarse fragments greater than 2 mm and \(f_{V}\) is a site-specific value that accounts for the proportion of soil fragments between 2-20 mm. We assume that rock fragments contain no internal pores.

The Millington-Quirk model assumes \(\xi\) is modulated by the amount of fluid saturation in soil pores (millingtonDiffusionAggregatedPorous1971?). In contrast, the Marshall model (marshallDiffusionGasesPorous1959?) expresses tortuosity as only a function of porosity (\(\xi = \phi^{1.5}\)), with \(\phi\) defined from Equation \(\ref{eq:porosity}\). The Marshall model is independent of soil water content and assumes tortuosity is only governed by soil structure. The neonSoilFlux package allows users to choose the tortuosity model most appropriate for site-specific conditions and research goals.

Computing fluxes

Once we have out_env_data from acquire_neon_flux, we then compute the fluxes at this site:

out_fluxes <- compute_neon_flux(input_site_env = out_env_data$site_data,
                  input_site_megapit = out_env_data$site_megapit
                  )

The resulting data frame out_fluxes is a list of lists, depending on the diffusivity model used to compute fluxes:

Each of the lists have the following variables:

A QF measurement fails when there is a monthly mean could not be computed for a measurement. Note that this would cause all flux calculations to fail at that given horizontal position.

Assessing Environmental QF flags

You can see the distribution the QF flags for each environmental measurement with env_fingerprint_plot:

env_fingerprint_plot(out_fluxes$millington_quirk)

(You can also use env_fingerprint_plot(out_fluxes$marshall), but the environmental data is the same, there should be no difference between the two.)

Explanation of QF check values:

  • “Pass” means that for the given timepoint, the monthly mean was not used or the sensor was not offline. This is the highest quality measurement.
  • “Monthly Mean” means that for the given timepoint the measurement value was replaced by the monthly mean.
  • “Fail” means that no measurement was available. This occurs if there is not sufficient data to compute the monthly mean. When a measurement fails it usually will be for the entire month.

Assessing flux QF flags

Similarly, you can see the distribution of QF flags for each diffusivity and flux computation with flux_fingerprint_plot:

flux_fingerprint_plot(out_fluxes$millington_quirk)

Explanation of QF check values:

  • “Pass” means that for the given timepoint, the computed flux measurement was not NA or positive (the sign of the derived flux conformed to expectations). Monthly means could be used in the computation.
  • “Fail” means that the flux was not computed. This occurs if there is not sufficient data to compute the monthly mean (one environmental measurement was “Fail”), or the computed flux was negative.

Visualizing outputs

To plot the flux results:

out_fluxes$marshall |>   # Can also use millington-quirk
  select(-diffusivity) |>
  unnest(cols=c(flux_compute)) |>
  ggplot(aes(x=startDateTime,y=flux,color=method)) +
    geom_line() +
    facet_wrap(~horizontalPosition,scales = "free_y")

The diffusivity can be plotted similarly:

out_fluxes$marshall |> # Can also use millington-quirk
  select(-flux_compute) |>
  unnest(cols=c(diffusivity)) |>
  ggplot(aes(x=startDateTime,y=diffusivity,color=as.factor(zOffset))) +
  geom_line() +
  facet_wrap(~horizontalPosition,scales = "free_y")  
Hirano, Takashi, Honghyun Kim, and Yumiko Tanaka. 2003. “Long-Term Half-Hourly Measurement of Soil CO2 Concentration and Soil Respiration in a Temperate Deciduous Forest.” Journal of Geophysical Research: Atmospheres 108 (D20). https://doi.org/10.1029/2003JD003766.
Tang, Jianwu, Dennis D Baldocchi, Ye Qi, and Liukang Xu. 2003. “Assessing Soil CO2 Efflux Using Continuous Measurements of CO2 Profiles in Soils with Small Solid-State Sensors.” Agricultural and Forest Meteorology 118 (3): 207–20. https://doi.org/10.1016/S0168-1923(03)00112-6.
Tang, Jianwu, Laurent Misson, Alexander Gershenson, Weixin Cheng, and Allen H. Goldstein. 2005. “Continuous Measurements of Soil Respiration with and Without Roots in a Ponderosa Pine Plantation in the Sierra Nevada Mountains.” Agricultural and Forest Meteorology 132 (3): 212–27. https://doi.org/10.1016/j.agrformet.2005.07.011.
Taylor, John R. 2022. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, Third Edition. 3rd ed. Melville, NY: University Science Press.

mirror server hosted at Truenetwork, Russian Federation.