Database / indexing layer

# Minimal executable example — selectRecords() works entirely in memory
library(gmsp)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following object is masked from 'package:base':
#> 
#>     %notin%
master <- data.table(
  RecordID  = c("aabbccdd00112233", "aabbccdd00112233", "eeff00112233aabb"),
  OwnerID   = c("NGAW", "NGAW", "CESMD"),
  EventID   = c("20100227T063452Z", "20100227T063452Z", "20110311T054624Z"),
  StationID = c("ANTU", "ANTU", "MYG004"),
  DIR       = c("H1", "H2", "H1"),
  EventMagnitude = c(8.8, 8.8, 9.1),
  Repi      = c(90, 90, 140)
)
sel <- selectRecords(master[EventMagnitude > 8 & DIR == "H1"])
print(sel)
#>            RecordID OwnerID          EventID StationID
#>              <char>  <char>           <char>    <char>
#> 1: eeff00112233aabb   CESMD 20110311T054624Z    MYG004
#> 2: aabbccdd00112233    NGAW 20100227T063452Z      ANTU

gmsp ships an optional layer for managing a local strong-motion record archive. It is separate from the signal-processing core (AT2TS, TS2IMF, TSL2PS, getIntensity) — you can use the core without ever touching the indexing layer.

The indexing layer assumes records on disk in a fixed directory structure. The base paths are yours to choose; functions that touch disk take explicit path, path.records, or path.index arguments.

Expected file layout

<recordsDir>/                                      ← you choose this
  <OwnerID>/                                       e.g. "NGAW", "CESMD", "ESM"
    <EventID>/                                     e.g. "20060803T030800Z"
      <StationID>/                                 e.g. "NTYB"
        raw.owner/                                 provider files as downloaded
          record.json                              owner-supplied metadata
          <component-files>                        .AT2 / .v2 / .ac / .tr / ...
        raw/                                       gmsp output of extractRecord()
          AT.<RecordID>.csv                        WIDE: provider OCID columns (scaled to mm)
          AT.<RecordID>.json                       DIR / OCID / NP / PGA / dt / Fs / Units

<indexDir>/                                        ← you choose this
  RawFileTable.<OwnerID>.csv                       provider file inventory
  RawRecordTable.<OwnerID>.csv                     one row per RecordID
  RawIntensityTable.<OwnerID>.csv                  per (RecordID, DIR), 20 IM scalars
  EventTable.<OwnerID>.csv                         event metadata
  StationTable.<OwnerID>.csv                       station metadata

<selectionDir>/                                    ← you choose this
  <name>.csv                                       writeSelection() output
  <name>.json                                      sidecar with audit metadata

Provider formats supported

OwnerID Format Parser Quantity Notes
NGAW AT2 readAT2() AT PEER NGA-West2 (4-line header, NPTS/DT)
CESMD V2 / V2c readV2() AT multi-channel V2 or single-channel V2c
NWZ V2A readV2A() AT NWZ-flavoured V2
GSC TR (A/B/C/Z) readTR() AT Geological Survey of Canada
IGP ACA / LIS readAC() AT Instituto Geofísico del Perú
UCR ACB readAC() AT Universidad de Costa Rica
Generic two-col readTwoCol() AT (t, s) ASCII columns; used by CAL, CENA, etc.
ISEE ISEE readISEE() VT Micromate / ISEE blasting seismograph (mm/s velocity, MicL dropped)

Each parser returns a LONG data.table(t, OCID, s) for one component file. parseRecord() is the dispatcher that consults .OWNER_FORMAT and calls the right parser for the owner.

Extraction pipeline

parseRecord()       ── reads raw.owner/* via the owner's parser
   │                   returns LONG (t, OCID, s) for all components
   ▼
mapComponents()     ── derives DIR labels H1 / H2 / UP from provider OCIDs
   │                   H1/H2 are derived processing directions
   │                   `extractRecord()` uses rotate = FALSE
   │                   Returns NULL for arrays or 2-comp records
   ▼
alignComponents()   ── pads (or truncates) to equal NP across components
   │
   ▼
extractRecord()     ── scales to canonical mm via .parseUnits + .getSF
                       writes raw/<KIND>.<RecordID>.csv + <KIND>.<RecordID>.json
                       CSV columns remain provider OCID values; the JSON
                       sidecar stores the DIR -> OCID mapping.
                       KIND ∈ {AT, VT, DT} -- derived from the Units
                       suffix by .parseKind(), or forced by the
                       `kind = "VT"` argument (e.g. for blasting
                       records whose Units may be missing).
                       Sidecar peak field is named accordingly:
                       PGA (KIND=AT) / PGV (KIND=VT) / PGD (KIND=DT).
                       RecordID = first 16 hex chars of md5(CSV).

extractRecord() is the orchestrator; parsers and mapComponents() are public so they can be reused or audited. Public calls use parseRecord(.x, path) and extractRecord(.x, path), where .x is the one-record master subset and path is the records root.

Indexing tables

After extractRecord() has produced raw/ outputs for some records, the indexing functions scan the records tree and emit per-owner CSVs to <indexDir>/:

The provider-flatfile + USGS catalog join (buildEventTable()) is under development and ships in inst/dev/; it is not yet part of the exported API.

Master record catalog

buildMaster() joins, per owner:

and emits a data.table keyed at (RecordID, DIR). It adds:

After buildMaster() you can filter the master and pass the subset to selectRecords() to produce a (RecordID, OwnerID, EventID, StationID) selection, which is the input contract for the readTS() family — readAT() / readVT() / readDT() are KIND-specific wrappers around readTS(.x, path, kind = ...) — and for writeSelection() (persists the selection to disk for orchestration).

Composing with the processing core

The natural composition for acceleration records is:

M   <- buildMaster(path = "<your index path>")
Selection <- selectRecords(M[EventMagnitude > 7 & Repi < 100 & DIR == "H1"])
TS  <- readAT(.x = Selection, path = "<your records path>")
ATS <- TS[, AT2TS(.SD, units.source = "mm", Fmax = 25),
          by = .(RecordID, OwnerID, EventID, StationID)]

The output of readAT() is a wide table keyed by (RecordID, OwnerID, EventID, StationID, t) with one column per provider OCID. AT2TS() consumes it per record. The shape is identical for readVT() and readDT(); pair them with VT2TS() / DT2TS(). Blasting records (e.g. ISEE) typically flow through readVT() + VT2TS().

Audit helpers

Maintenance

archiveRawOwner(path) compresses raw.owner/ to raw.owner.tar.gz after extraction has succeeded, verifies the archive is readable, and only then unlinks the original.

Notes