BayLum: Specification of the YAML config file

Sebastian Kreutzer

Updated for BayLum package version >= 0.3.2 (2024-04-14)

1 Background and scope

In earlier versions of 'BayLum' measurement data and settings had to be prepared using a multiple folder structure comprising various CVS and BIN/BINX files in a very particular way. This concept proved error-prone and left a lot of frustrated 'BayLum' users behind, who sometimes spent hours trying to understand unclear error messages and then realising that there was a typo in one of the CSV files, or the folder structure was not precisely how 'BayLum' expected it to be found. By switching to a single-configuration file users have more options while the settings are cleaner and less scatters over different files in numerous subfolders. The parameter naming follows the naming convention used in the “old” 'BayLum' CSV-files.

The purpose of this document is the specification and description of the YAML (file ending *.yml) configuration file used by the function create_DataFile() to provide data input and settings to the 'BayLum' modelling. The YAML file is an alternative and a future replacement of the previous folder structure with various CSV files required by the functions Generate_DataFile() and Generate_DateFile_MG().

2 Key concepts

The configuration file uses the YAML format, which uses indention to nest different parameters. Please see the cited documentation for details, for 'BayLum' the following features stick out:

3 Examples and detailed specifications

3.1 A single sample entry

A single sample entry appears as follows:

- sample: "samp1"
  files: 
    - "/yourhardrive/yourfolder/sample_one.binx"
  settings:
    dose_points: null
    dose_source: { value: 0.1535, error: 0.00005891 }
    dose_env: { value: 2.512, error: 0.05626 }
    rules:
        beginSignal: 6
        endSignal: 8
        beginBackground: 50
        endBackground: 55
        beginTest: 6
        endTest: 8
        beginTestBackground: 50
        endTestBackground: 55
        inflatePercent: 0.027
        nbOfLastCycleToRemove: 1

Each new record (aka sample) starts with a - and the indention as shown above. Furthermore:

Other than that, if you keep these simple rules in mind, you will have an easy time preparing your 'BayLum' analysis.

3.2 Multiple records

While the single record makes an easy case, you probably have more than one sample to be thrown into the modelling. Although the number of records is not limited, we keep it simple here; a two records entry (dots replace the entries as shown above):

# this is a coment for sample number one
- sample: "samp1"
  files: 
    - "/yourhardrive/yourfolder/sample_one.binx"
  settings:
    dose_points: null
    dose_source: { value: 0.1535, error: 0.00005891 }
    dose_env: { value: 2.512, error: 0.05626 }
    rules:
        beginSignal: 6
        .
        .
        .
        nbOfLastCycleToRemove: 1
# this is a coment for sample number two
- sample: "samp2"
  files: null
  settings:
    dose_points: null
    dose_source: { value: 0.1535, error: 0.00005891 }
    dose_env: { value: 2.512, error: 0.05626 }
    rules:
        beginSignal: 6
        .
        .
        .
        nbOfLastCycleToRemove: 1

As you can see from the example, you can also add comments to the records, which start with #. The two records also show different entries for argument files. In the first case, a file path is given, while for record number two, files is set to null. Both options are possible. In the first case, the record specifies where the measurement data can be found. In the second case it is assumed that an R object with the name samp2 can be found in the global environment of your R session.

3.3 Paramter specifcation

3.3.1 Top level (sample, files)

The top level has two parameters:

3.3.1.1 sample

This parameter specifies the name of the sample. This name must be unique and is ideally free of non-ASCII characters and white space.

3.3.1.2 files

This parameter can be null (files is the only parameter that can be set to null) or is followed by a set of - with the path to the measurement file given in quotes. The number of entries under files is not limited. Example:

  files: 
    - "/yourhardrive/yourfolder/sample_one_a.binx"
    - "/yourhardrive/yourfolder/sample_one_b.binx"

If the entry is null, the function BayLum::create_DataFile() that uses the settings from the YAML file will assume that R objects with the name specified in sample are available in the global session environment. For instance, files are imported and
treated with Luminescence::read_BIN2R(...) |> subset(...) or similar. Setting files to null gives you all options to pre-process your measurement data and is the recommended mode of operation.

If files comes with file path entries, then BayLum::create_DataFile() will try to import those files using the appropriate import functions. This is very convenient, however, except for minimal filtering (e.g., removing non-OSL and non-IRSL curves), the measurement data remain untreated, and BayLum::create_DataFile() expects that all data are complete (e.g., identical number of curves), without error and strictly follow the SAR structure.

3.3.2 settings level

The settings level allows you to specify the dose rate of your source used for the irradiation in Gy/s (dose_source) and the environmental dose rate in Gy/ka (dose_env). Each value needs to be provided with its uncertainty, as shown in the example:

  settings:
    dose_points: null
    dose_source: { value: 0.1535, error: 0.00005891 }
    dose_env: { value: 2.512, error: 0.05626 }

Additionally, you can set specify the regeneration dose points (in s). The default is null, because irradiation times are automatically extracted from the data by create_DataFile(). However, this information might be missing or, more likely, wrong and it is very cumbersome to fix those numbers manually in the measurement data. Therefore the dose points can be provided with the config file:

  settings:
    dose_points: [10, 20, 50, 0, 10]
    dose_source: { value: 0.1535, error: 0.00005891 }
    dose_env: { value: 2.512, error: 0.05626 }

The example corresponds to 5 (five) regeneration dose points of 10 s, 20 s, …, 10 s.

Note

  • You should add the values as you have specified them in the measurement sequence, except for the natural dose point (0 s) and the test dose points, which must not be added.

  • The provided vector will be shortened automatically to fit the actual number of dose points.

  • An error will be thrown if you provide not enough dose points

3.3.3 rules level

The rules level enables you to provide a couple of parameters, which are used in Bayesian modelling.

PARAMETER TYPE COMMENT
beginSignal integer Channel number start OSL signal integral (\(L_x\))
endSignal integer Channel number end OSL signal integral (\(L_x\))
beginBackground integer Channel number start OSL background integral (\(L_x\))
endBackground integer Channel number end OSL background integral (\(L_x\))
beginTest integer Channel number start OSL signal integral (\(T_x\))
endTest integer Channel number end OSL signal integral (\(T_x\))
beginTestBackground integer Channel number start OSL background integral (\(T_x\))
endTestBackground integer Channel number end OSL background integral (\(T_x\))
inflatePercent double Additional overdispersion value to inflate the uncertainty in percentage
nbOfLastCycleToRemove integer Number of SAR cycles to be removed from the measurement file

Example:

   rules:
        beginSignal: 6
        endSignal: 8
        beginBackground: 50
        endBackground: 55
        beginTest: 6
        endTest: 8
        beginTestBackground: 50
        endTestBackground: 55
        inflatePercent: 0.027
        nbOfLastCycleToRemove: 1

Please ensure that the set values correspond to your measurement data. For instance, if your OSL curve has only 100 channels (data points), it does not make sense to set larger integral settings (e.g., 1000), and such a setting will lead to an error. Integral values for (\(L_x\)) and (\(T_x\)) are usually set to identical values unless you have good reasons to use different integral settings.

3.4 Final remarks

3.5 Auto-generate the config file using write_YAMLConfigFile()

To ease the generation of configuration files for many samples, you can use the function write_YAMLConfigFile().

The function has two different operation modes, which are shown below. Important is to note that the function does not seem to have function parameters, because all parameters are extracted from a reference file within the package. All parameters in the reference file are allowed. However, you can only preset each parameter for all records, except for the parameter sample. The length of this parameter (e.g., write_YAMLConfigFile(sample = c("a1", "a2))) determines the number of records in the configuration file output.

3.5.1 Show available parameters

In this mode, the function displays available parameters in the terminal and returns a list that can be modified in R and then passed to create_DataFile():

l <- write_YAMLConfigFile()
── Allowed function parameters (start) ─────────────────────────────────────────────────────────────────────────────────
sample
files
settings.dose_points
settings.dose_source.value
settings.dose_source.error
settings.dose_env.value
settings.dose_env.error
settings.rules.beginSignal
settings.rules.endSignal
settings.rules.beginBackground
settings.rules.endBackground
settings.rules.beginTest
settings.rules.endTest
settings.rules.beginTestBackground
settings.rules.endTestBackground
settings.rules.inflatePercent
settings.rules.nbOfLastCycleToRemove
── Allowed function parameters (end) ───────────────────────────────────────────────────────────────────────────────────
str(l)
List of 1
 $ :List of 3
  ..$ sample  : chr "reference"
  ..$ files   : NULL
  ..$ settings:List of 4
  .. ..$ dose_points: NULL
  .. ..$ dose_source:List of 2
  .. .. ..$ value: int 0
  .. .. ..$ error: int 0
  .. ..$ dose_env   :List of 2
  .. .. ..$ value: int 0
  .. .. ..$ error: int 0
  .. ..$ rules      :List of 10
  .. .. ..$ beginSignal          : int 0
  .. .. ..$ endSignal            : int 0
  .. .. ..$ beginBackground      : int 0
  .. .. ..$ endBackground        : int 0
  .. .. ..$ beginTest            : int 0
  .. .. ..$ endTest              : int 0
  .. .. ..$ beginTestBackground  : int 0
  .. .. ..$ endTestBackground    : int 0
  .. .. ..$ inflatePercent       : int 0
  .. .. ..$ nbOfLastCycleToRemove: int 0

3.5.2 Write YAML file

Alternatively, the function can be used to generate a config file with preset values. You can then modify the generated YAML file with any text-editor.

l <- write_YAMLConfigFile(output_file = "<your filepath>")

3.6 Internals

The YAML settings file is loaded and processed by the function BayLum::create_DataFile() using yaml::read_yaml() from the R package 'yaml'. yaml::read_yaml() returns a list on R, which is then processed by BayLum::create_DataFile(). Sometimes it makes sense to modify the settings on the fly in R. To avoid import and export of YAML files, BayLum::create_DataFile() always tries to process the input of the parameter BayLum::create_DataFile(config_file, ...) as a list before trying to load a YAML file from the hard drive. While this option is usually unnecessary, this information may help in more complex R scripts.

mirror server hosted at Truenetwork, Russian Federation.