avstrat_workflow-examples

Workflow examples for avstrat

This vignette walks through a typical workflow for using the avstrat package. As a starting point for a project workflow, I suggest you copy the contents of this markdown file as a new document in your own project folder. I like to setup projects with a similar structure to packages, with this markdown file as my primary code workflow, and a subfolder called ‘data’ where I store all data that I’ll be analyzing.

Setup

To start, you’ll need to install any packages that you want to use. You only need to run this once, or again to get the latest package updates. For current usage, see the package ReadMe for installation. There are two options below which will only work once the package is public.

#install the stable CRAN release:
#install.packages("avstrat") # this will only work once it is on CRAN

#install the development version from gitlab using the devtools package:
devtools::install_gitlab("vsc/tephra/tools/avstrat", 
                        host = "code.usgs.gov", 
                        build_vignettes = TRUE)

# Other packages you'll want intstalled if you don't have them already
install.packages("tidyverse")
install.packages("readxl")

You might need other packages. If using RStudio, it will usually alert you to any packages you don’t have installed at the top.

Every time you load R you’ll want to load the avstrat package, for this vignette, I also suggest loading a few others.

library(readxl)
#> Warning: package 'readxl' was built under R version 4.4.3
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.4.3
library(patchwork)
#> Warning: package 'patchwork' was built under R version 4.4.3
library(avstrat)

I also suggest overriding the default ‘ggplot2’ theme (because it’s ugly), and provide a theme that works well with this package’s figures, but you can use any theme you like.

theme_set(theme_avstrat())

Data import

The first step is to bring stratigraphy data into the coding environment. There is more than one way to prepare your data, and avstrat has functions for loading data from .xlsx files that follow one of two templates, described in the next sections. There is also an example loaded dataset, example_data_strat included with avstrat that you can view and use to create your own upload routine.

GeoDIVA forms

The Alaska Volcano Observatory has a mature database that includes stratigraphic layer data, “GeoDIVA”. This database has established data upload forms, which avstrat has functions for directly loading. These templates provide strict data validation fields that help ensure your data are comparable with plotting and database submission.

To copy the example templates into your working directory:

file.copy(path_samples, "example_samples_stations_upload_2024.xlsx")
file.copy(path_layers, "example_layers_upload_2024.xlsx")

I suggest putting these in a “data” folder in your project, similar to an R package structure. Now, you can read these files into the environment with readxl::read_xlsx().

station_sample_upload <- readxl::read_xlsx(path_samples, sheet = "Data")
#> New names:
#> • `SampleID` -> `SampleID...3`
#> • `sample_parent_id` -> `sample_parent_id...4`
#> • `SampType1` -> `SampType1...21`
#> • `SampType2` -> `SampType2...22`
#> • `` -> `...35`
#> • `` -> `...36`
#> • `SampleID` -> `SampleID...37`
#> • `sample_parent_id` -> `sample_parent_id...38`
#> • `SampType1` -> `SampType1...39`
#> • `SampType2` -> `SampType2...40`
layer_upload <- readxl::read_xlsx(path_layers, sheet = "Data")

Now these files need to be combined into a uglier but ready for avstrat format.

mydata <- load_geodiva_forms(station_sample_upload = station_sample_upload,
                             layer_upload = layer_upload)
#> Imported sections:
#> 17KWLCI025
#> 21DVHD01
#> 21DVML04
#> 21DVML05
#> 21DVML08
#> 21LSHD02
#> fake1
#> fake2
#> fake3

You can see you successfully loaded 9 example sections that come with the package.

Individual upload

If you are not already uploading to GeoDIVA, you may prefer to us a more elegant upload. This is my preferred approach to preparing data for formal data releases. Each tab of a .xlsx spreadsheet contains a single data type, such as “stations” or “samples”. The tables are linked in a quasi database sytle, with a linking column such as “station_id” denoting the station (location information) a sample was collected at. This way, if you collect multiple samples at a single station, you only have to list the station metadata once. This is especially handy for stratigraphic “layers”, where you always have many layers for a single section and station.

To copy an example of these options into your working directory:

file.copy(path_indiv, "example_inputs.xlsx")

There are several options in this example template:

  1. stations: a table of stations and location metadata.

  2. sections: a table of all sections with section metadata, must include a link to station_id.

  3. stations_sections: since typically you only have a single section at each station, this tab gives you the option of merging these together into one table.

  4. layers: a table with stratigraphic layer metadata. Each layer must be linked to a section_id.

  5. layers_sample: this is a case where samples from a layer are noted in a nested list in the layer, with multiple samples separated with a pipe “|”. This follows the GeoDIVA upload format. I personally DO NOT like this approach.

  6. samples: samples and sample metadata. Should be linked to a station_id.

  7. samples_layer: this is my preferred approach to uploading samples, simply link the sample metadata to station_id and layer_id. ]

To read this example data into your environment and the avstrat format:

stations_data <- readxl::read_xlsx(path_indiv, sheet = "stations")
sections_data <- readxl::read_xlsx(path_indiv, sheet = "sections")
layers_data <- readxl::read_xlsx(path_indiv, sheet = "layers")
samples_data <- readxl::read_xlsx(path_indiv, sheet = "samples_layer")


mydata <- load_stratdata_indiv(
  stations_upload = stations_data,
  sections_upload = sections_data,
  layers_upload = layers_data,
  samples_upload = samples_data
)
#> Imported sections:
#> 17KWLCI025
#> 21DVHD01
#> 21DVML04
#> 21DVML05
#> 21DVML08
#> 21LSHD02
#> fake1
#> fake2
#> fake3

This is one example of how to use this upload, but you can mix it up. For example, if you include your sections with station metadata, simply upload “stations_sections” to both “stations_upload” and “sections_upload”.

Basic plotting

You can make a simple grainsize-depth section with either increasing or decreasing grainsize on the x-axis. You can also make a simple column with no grainsize data.

grainsize_increasing <- ggstrat(df = mydata, 
                                section_name = "21LSHD02")
grainsize_decreasing <- ggstrat(df = mydata, 
                                section_name = "21LSHD02", 
                                grainsize_direction = "decreasing")
grainsize_column <- ggstrat_column(df = mydata, 
                                section_name = "21LSHD02")
# Combine all 3 plots using Patchwork
grainsize_increasing + grainsize_decreasing + grainsize_column

The ggstrat() function also let’s you choose different labels for the x-axis with grainsize_labs, zoom in on part of the section manually with ylim, use a different color palette for the layers with layer_fill_color, map the layer color to a different categorical value (although you’ll need to supply a new fill palette) with layer_fill, as well as other options. Below you can see the utility of the ylim option for seeing an area of small layers.

sample_label <- ggstrat_label(df = mydata, 
                                section_name = "21LSHD02")
grainsize_increasing + sample_label
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

Interactive app

For most projects, you might want to explore your sections spatially a map. avstrat has a interactive Shiny app that let’s you do this. The map is generated with the leaflet package, and when you click on a station, it will display the strat section on the right. Be sure to close the app popup window before coming back to this notebook!

run_ggstrat_app(df = mydata)

The app defaults to a combined ggstrat() + ggstrat_label() plot showing SampleID, but you can supply any plot function, including a custom function of your own design! Here is an example.

mystratplot <- function(df, 
                        section_name) {
  ggstrat(df = df, section_name = {{ section_name }}) +
    theme(plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "cm")) +
  ggstrat_label(df = df, section_name = {{ section_name }}, 
                               label = "SampleID") +
    theme(plot.margin = unit(c(0.1, 1.5, 0.1, 0.1), "cm")) +
  ggstrat_label(df = df, section_name = {{ section_name }}, 
                                label = "volcano_name") +
    theme(plot.margin = unit(c(0.1, 0.5, 0.1, 0.1), "cm")) +
    plot_layout(guides='collect') & theme(legend.position='bottom')
}

run_ggstrat_app(df = mydata, plot = mystratplot)

Saving plots

Individual plots can be saved with ggsave(), which allows you to define the plot size, the filetype (e.g., png of pdf), and resolution.

ggsave(plot = grainsize_increasing, 
       filename = "mytestsection.png",
       width = 3,
       height = 6,
       units = "in",
       dpi = 300)

More likely, once you have a plot you’re happy with, you’ll want to save all your plots at once, and there is a handy helper function for that. It will save these by default in a folder StratSectionsPlotted. Note, you will need to go to the Console and select “Y” after you run this function, to make sure you don’t accidentally run it for some dataset of thousands of sections.

ggstrat_bulk_save(df = mydata,
                  plotfunction = ggstrat_samples,
                  outdir = "StratSectionsPLotted",
                  file_type = "png",
                  width = 6,
                  height = 6,
                  units = "in")

Conclusions

Hopefully this provides a basic workflow that will help you with your own project! The plotting functions internally run some data processing functions, specifically add_depths() which standardizes the depth data for plotting, and add_layer_width() which takes grainsize observations and prepares them for making the polygons of ggstrat(). You can use these to make your own custom plots. If your plots seem widely useful, please pass them on so we can incorporate them into this package!

mirror server hosted at Truenetwork, Russian Federation.