Maintenance
archiveRawOwner(path) compresses raw.owner/
to raw.owner.tar.gz after extraction has succeeded,
verifies the archive is readable, and only then unlinks the
original.
# Minimal executable example — selectRecords() works entirely in memory
library(gmsp)
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following object is masked from 'package:base':
#>
#> %notin%
master <- data.table(
RecordID = c("aabbccdd00112233", "aabbccdd00112233", "eeff00112233aabb"),
OwnerID = c("NGAW", "NGAW", "CESMD"),
EventID = c("20100227T063452Z", "20100227T063452Z", "20110311T054624Z"),
StationID = c("ANTU", "ANTU", "MYG004"),
DIR = c("H1", "H2", "H1"),
EventMagnitude = c(8.8, 8.8, 9.1),
Repi = c(90, 90, 140)
)
sel <- selectRecords(master[EventMagnitude > 8 & DIR == "H1"])
print(sel)
#> RecordID OwnerID EventID StationID
#> <char> <char> <char> <char>
#> 1: eeff00112233aabb CESMD 20110311T054624Z MYG004
#> 2: aabbccdd00112233 NGAW 20100227T063452Z ANTUgmsp ships an optional layer for managing a local
strong-motion record archive. It is separate from the
signal-processing core (AT2TS, TS2IMF,
TSL2PS, getIntensity) — you can use the core
without ever touching the indexing layer.
The indexing layer assumes records on disk in a fixed directory
structure. The base paths are yours to choose; functions that touch disk
take explicit path, path.records, or
path.index arguments.
<recordsDir>/ ← you choose this
<OwnerID>/ e.g. "NGAW", "CESMD", "ESM"
<EventID>/ e.g. "20060803T030800Z"
<StationID>/ e.g. "NTYB"
raw.owner/ provider files as downloaded
record.json owner-supplied metadata
<component-files> .AT2 / .v2 / .ac / .tr / ...
raw/ gmsp output of extractRecord()
AT.<RecordID>.csv WIDE: provider OCID columns (scaled to mm)
AT.<RecordID>.json DIR / OCID / NP / PGA / dt / Fs / Units
<indexDir>/ ← you choose this
RawFileTable.<OwnerID>.csv provider file inventory
RawRecordTable.<OwnerID>.csv one row per RecordID
RawIntensityTable.<OwnerID>.csv per (RecordID, DIR), 20 IM scalars
EventTable.<OwnerID>.csv event metadata
StationTable.<OwnerID>.csv station metadata
<selectionDir>/ ← you choose this
<name>.csv writeSelection() output
<name>.json sidecar with audit metadata
OwnerID |
Format | Parser | Quantity | Notes |
|---|---|---|---|---|
NGAW |
AT2 | readAT2() |
AT | PEER NGA-West2 (4-line header, NPTS/DT) |
CESMD |
V2 / V2c | readV2() |
AT | multi-channel V2 or single-channel V2c |
NWZ |
V2A | readV2A() |
AT | NWZ-flavoured V2 |
GSC |
TR (A/B/C/Z) | readTR() |
AT | Geological Survey of Canada |
IGP |
ACA / LIS | readAC() |
AT | Instituto Geofísico del Perú |
UCR |
ACB | readAC() |
AT | Universidad de Costa Rica |
| Generic | two-col | readTwoCol() |
AT | (t, s) ASCII columns; used by CAL, CENA, etc. |
ISEE |
ISEE | readISEE() |
VT | Micromate / ISEE blasting seismograph (mm/s velocity, MicL dropped) |
Each parser returns a LONG data.table(t, OCID, s) for
one component file. parseRecord() is the dispatcher that
consults .OWNER_FORMAT and calls the right parser for the
owner.
parseRecord() ── reads raw.owner/* via the owner's parser
│ returns LONG (t, OCID, s) for all components
▼
mapComponents() ── derives DIR labels H1 / H2 / UP from provider OCIDs
│ H1/H2 are derived processing directions
│ `extractRecord()` uses rotate = FALSE
│ Returns NULL for arrays or 2-comp records
▼
alignComponents() ── pads (or truncates) to equal NP across components
│
▼
extractRecord() ── scales to canonical mm via .parseUnits + .getSF
writes raw/<KIND>.<RecordID>.csv + <KIND>.<RecordID>.json
CSV columns remain provider OCID values; the JSON
sidecar stores the DIR -> OCID mapping.
KIND ∈ {AT, VT, DT} -- derived from the Units
suffix by .parseKind(), or forced by the
`kind = "VT"` argument (e.g. for blasting
records whose Units may be missing).
Sidecar peak field is named accordingly:
PGA (KIND=AT) / PGV (KIND=VT) / PGD (KIND=DT).
RecordID = first 16 hex chars of md5(CSV).
extractRecord() is the orchestrator; parsers and
mapComponents() are public so they can be reused or
audited. Public calls use parseRecord(.x, path) and
extractRecord(.x, path), where .x is the
one-record master subset and path is the records root.
After extractRecord() has produced raw/
outputs for some records, the indexing functions scan the records tree
and emit per-owner CSVs to <indexDir>/:
buildRawFileTable() — provider-file inventory (one row
per ComponentID × FileID); reads
raw.owner/record.json or raw.owner.tar.gz
(post-archive safe).buildRawRecordTable() — one row per
RecordID (NP = max(post-align),
pad = max NP − min NP, Fs).buildRawIntensityTable() — calls
getRawIntensities() per station; emits three rows per
record (one per DIR), each carrying the 20 AT-derivable
scalars from getIntensity().The provider-flatfile + USGS catalog join
(buildEventTable()) is under development and ships in
inst/dev/; it is not yet part of the exported API.
buildMaster() joins, per owner:
RawRecordTable.<O>.csv (record list),EventTable.<O>.csv (event scalars, merged via
fcoalesce with source precedence *.owner >
*.USGS > *.ISC),StationTable.<O>.csv (station scalars including
Vs30),and emits a data.table keyed at
(RecordID, DIR). It adds:
Repi — epicentral distance (haversine, km),Rhyp — hypocentral distance, \(\sqrt{\mathrm{Repi}^2
+ \mathrm{EventDepth}^2}\) (km).After buildMaster() you can filter the master and pass
the subset to selectRecords() to produce a
(RecordID, OwnerID, EventID, StationID) selection, which is
the input contract for the readTS() family —
readAT() / readVT() / readDT()
are KIND-specific wrappers around
readTS(.x, path, kind = ...) — and for
writeSelection() (persists the selection to disk for
orchestration).
The natural composition for acceleration records is:
M <- buildMaster(path = "<your index path>")
Selection <- selectRecords(M[EventMagnitude > 7 & Repi < 100 & DIR == "H1"])
TS <- readAT(.x = Selection, path = "<your records path>")
ATS <- TS[, AT2TS(.SD, units.source = "mm", Fmax = 25),
by = .(RecordID, OwnerID, EventID, StationID)]The output of readAT() is a wide table keyed by
(RecordID, OwnerID, EventID, StationID, t) with one column
per provider OCID. AT2TS() consumes it per
record. The shape is identical for readVT() and
readDT(); pair them with VT2TS() /
DT2TS(). Blasting records (e.g. ISEE) typically flow
through readVT() + VT2TS().
auditSite(M) — flags rows with missing or out-of-range
StationVs30.auditDistances(M) — flags lat/lon NA or
out-of-range, negative depths, large Repi, geometric
impossibility (Rhyp < Repi).auditParsers(.x = M, owner = "NGAW", path = ...) —
dry-run parseRecord() per (EventID, StationID)
of one owner and report OK / FAIL / WARN with reason.archiveRawOwner(path) compresses raw.owner/
to raw.owner.tar.gz after extraction has succeeded,
verifies the archive is readable, and only then unlinks the
original.
raw.owner/ is the user’s responsibility.
Examples under examples/maintenance/ in the source
repository show a pattern for ingestion (USGS catalog matching, staging
/ promote / rollback, etc.).RecordID is a 16-character hex hash
(openssl::md5 of the WIDE CSV body, truncated). It is
stable across re-extraction of the same record.