Introduction

This vignette of package groupedHyperframe (CRAN, Github, RPubs) documents the creation of groupedHyperframe object, the batch processes for a groupedHyperframe, and aggregations of various statistics over multi-level grouping structure.

Prerequisite

Package groupedHyperframe may require the development versions of the spatstat family.

devtools::install_github('spatstat/spatstat')
devtools::install_github('spatstat/spatstat.data')
devtools::install_github('spatstat/spatstat.explore')
devtools::install_github('spatstat/spatstat.geom')
devtools::install_github('spatstat/spatstat.linnet')
devtools::install_github('spatstat/spatstat.model')
devtools::install_github('spatstat/spatstat.random')
devtools::install_github('spatstat/spatstat.sparse')
devtools::install_github('spatstat/spatstat.univar')
devtools::install_github('spatstat/spatstat.utils')

Note to Users

Examples in this vignette require that the search path has

library(groupedHyperframe)
library(spatstat.data)
library(survival) # to help hyperframe understand Surv object

Users should remove the parameter mc.cores = 1L from all examples to engage all CPU cores on the current host under macOS. The authors of package groupedHyperframe are forced to have mc.cores = 1L in this vignette to pass CRAN’s submission check.

Terms and Abbreviations

Term / Abbreviation	Description	Reference
	Forward pipe operator	`?base::pipeOp` introduced in `R` 4.1.0
`attr`	Attributes	`base::attr`; `base::attributes`
`CRAN`, `R`	The Comprehensive R Archive Network	https://cran.r-project.org
`data.frame`	Data frame	`base::data.frame`
`formula`	Formula	`stats::formula`
`fv`, `fv.object`, `fv.plot`	(Plot of) function value table	`spatstat.explore::fv.object`, `spatstat.explore::plot.fv`
`groupedData`, `~ g1/.../gm`	Grouped data frame; nested grouping structure	`nlme::groupedData`; `nlme::lme`
`hypercolumns`, `hyperframe`	(Hyper columns of) hyper data frame	`spatstat.geom::hyperframe`
`inherits`	Class inheritance	`base::inherits`
`kerndens`	Kernel density	`stats::density.default()$y`
`mc.cores`	Number of CPU cores to use	`parallel::mclapply`; `parallel::detectCores`
`multitype`	Multitype object	`spatstat.geom::is.multitype`
`object.size`	Memory allocation	`utils::object.size`
`pmean`, `pmedian`	Parallel mean and median	`groupedHyperframe::pmean`; `groupedHyperframe::pmedian`
`pmax`, `pmin`	Parallel maxima and minima	`base::pmax`; `base::pmin`
`ppp`, `ppp.object`	(Marked) point pattern	`spatstat.geom::ppp.object`
`quantile`	Quantile	`stats::quantile`
`save`, `xz`	Save with `xz` compression	`base::save(., compress = 'xz')`; `base::saveRDS(., compress = 'xz')`; https://en.wikipedia.org/wiki/XZ_Utils
`S3`, `generic`, `methods`	`S3` object oriented system	`base::UseMethod`; `utils::methods`; `utils::getS3method`; https://adv-r.hadley.nz/s3.html
`search`	Search path	`base::search`
`Surv`	Survival object	`survival::Surv`
`trapz`, `cumtrapz`	(Cumulative) trapezoidal integration	`pracma::trapz`; `pracma::cumtrapz`; https://en.wikipedia.org/wiki/Trapezoidal_rule

Acknowledgement

This work supported by NCI R01CA222847 (I. Chervoneva, T. Zhan, and H. Rui) and R01CA253977 (H. Rui and I. Chervoneva).

`groupedHyperframe` Class

The S3 class groupedHyperframe inherits from the hyperframe class, in a similar fashion as the groupedData class inherits from the data.frame class.

A groupedHyperframe object, in addition to a hyperframe object, has attribute(s)

attr(., 'group'), a formula to specify the (nested) grouping structure

Create a `groupedHyperframe`

From a `hyperframe`

The S3 method dispatch as.groupedHyperframe.hyperframe() converts a hyperframe to groupedHyperframe. Data set spatstat.data::osteo has the serial number of sampling volume brick nested in the bone sample id,

osteo |> as.groupedHyperframe(group = ~ id/brick)
#> Grouped Hyperframe: ~id/brick
#> 
#> 40 brick nested in
#> 4 id
#> 
#>        id shortid brick   pts depth
#> 1  c77za4       4     1 (pp3)    45
#> 2  c77za4       4     2 (pp3)    60
#> 3  c77za4       4     3 (pp3)    55
#> 4  c77za4       4     4 (pp3)    60
#> 5  c77za4       4     5 (pp3)    85
#> 6  c77za4       4     6 (pp3)    90
#> 7  c77za4       4     7 (pp3)    95
#> 8  c77za4       4     8 (pp3)    65
#> 9  c77za4       4     9 (pp3)   100
#> 10 c77za4       4    10 (pp3)   100

From a `data.frame`

The S3 method dispatch as.groupedHyperframe.data.frame() converts a data.frame to a groupedHyperframe. This function inspects the input by the (nested) grouping structure, identifies the column(s) with elements not identical within the lowest group, and converts them into hypercolumns. Data set Ki67. in this package has non-identical column logKi67 in the nested grouping structure ~ patientID/tissueID.

(Ki67g = Ki67. |> as.groupedHyperframe(group = ~ patientID/tissueID, mc.cores = 1L))
#> Grouped Hyperframe: ~patientID/tissueID
#> 
#> 6 tissueID nested in
#> 6 patientID
#> 
#>     logKi67 tissueID Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo
#> 1 (numeric) TJUe_I17      2 100+             100          0   FALSE     FALSE
#> 2 (numeric) TJUe_G17      1   22              22          1   FALSE     FALSE
#> 3 (numeric) TJUe_F17      1  99+              99          0   FALSE        NA
#> 4 (numeric) TJUe_D17      1  99+              99          0   FALSE      TRUE
#> 5 (numeric) TJUe_J18      1  112             112          1    TRUE      TRUE
#> 6 (numeric) TJUe_N17      4   12              12          1    TRUE     FALSE
#>   histology  Her2   HR  node  race age patientID
#> 1         3  TRUE TRUE  TRUE White  66   PT00037
#> 2         3 FALSE TRUE FALSE Black  42   PT00039
#> 3         3 FALSE TRUE FALSE White  60   PT00040
#> 4         3 FALSE TRUE  TRUE White  53   PT00042
#> 5         3 FALSE TRUE  TRUE White  52   PT00054
#> 6         2  TRUE TRUE  TRUE Black  51   PT00059

Converting a data.frame with cell intensities, etc., into a groupedHyperframe reduces memory allocation, but does not reduce much the saved files size if xz compression is used.

unclass(object.size(Ki67g)) / unclass(object.size(Ki67.))
#> [1] 0.1148083

f_g = tempfile(fileext = '.rds')
Ki67g |> saveRDS(file = f_g, compress = 'xz')
f = tempfile(fileext = '.rds')
Ki67. |> saveRDS(file = f, compress = 'xz')
file.size(f_g) / file.size(f) # not much reduction
#> [1] 0.9629481

Create a `groupedHyperframe` with `ppp`-`hypercolumn`

Function grouped_ppp() creates a groupedHyperframe with one-and-only-one ppp-hypercolumn. In the following example, the argument formula specifies

the marks, e.g., numeric mark hladr and multitype mark phenotype, on the left-hand-side
the additional predictors and/or endpoints for downstream analysis, e.g., OS, gender and age, before the | separator on the right-hand-side
the grouping structure, e.g., image_id nested in patient_id, after the | separator on the right-hand-side.

(s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id, 
                 data = wrobel_lung, mc.cores = 1L))
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp.
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)

Batch Process on `ppp`-`hypercolumn`

In this section, we outline the batch processes of spatial point pattern analyses applicable to the one-and-only-one ppp-hypercolumn of a hyperframe. These batch processes are not intended for a hyperframe with multiple ppp-hypercolumns in the foreseeable future, as that would require checking for name clashes in the $marks from multiple ppp-hypercolumns.

… which adds a `fv`-`hypercolumn`

Batch Process	Workhorse in `spatstat.explore`	Applicable To	`fv`-`hypercolumn` Suffix
`Emark_()`	`Emark()`	`numeric` marks	`.E`
`Vmark_()`	`Vmark()`	`numeric` marks	`.V`
`markcorr_()`	`markcorr()`	`numeric` marks	`.k`
`markvario_()`	`markvario()`	`numeric` marks	`.gamma`
`Gcross_()`	`Gcross()`	`multitype` marks	`.G`
`Kcross_()`	`Kcross()`	`multitype` marks	`.K`
`Jcross_()`	`Jcross()`	`multitype` marks	`.J`

… which adds a `numeric`-`hypercolumn`

Batch Process	Workhorse in `spatstat.geom`	Applicable To	`numeric`-`hypercolumn` Suffix
`nncross_()`	`nncross.ppp(., what = 'dist')`	`multitype` marks	`.nncross`

Pipe operator compatible

Multiple batch processes may be applied to a hyperframe (or groupedHyperframe) in a pipeline.

r = seq.int(from = 0, to = 250, by = 10)
out = s |>
  Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast
#>

The returned hyperframe (or groupedHyperframe) has

fv-hypercolumn hladr.E, created by function Emark_() on numeric mark hladr
fv-hypercolumn phenotype.G, created by function Gcross_() on multitype mark phenotype
numeric-hypercolumn phenotype.nncross, created by function nncross_() on multitype mark phenotype

out
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)    (fv)        (fv)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)    (fv)        (fv)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)    (fv)        (fv)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)    (fv)        (fv)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)    (fv)        (fv)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)    (fv)        (fv)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)    (fv)        (fv)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)    (fv)        (fv)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)    (fv)        (fv)
#>    phenotype.nncross
#> 1          (numeric)
#> 2          (numeric)
#> 3          (numeric)
#> 4          (numeric)
#> 5          (numeric)
#> 6          (numeric)
#> 7          (numeric)
#> 8          (numeric)
#> 9          (numeric)
#> 10         (numeric)

Aggregation Over Nested Grouping Structure

When nested grouping structure ~g1/g2/.../gm is present, we may aggregate over the

fv-hypercolumns
numeric-hypercolumns
numeric marks in the ppp-hypercolumn

by either one of the grouping levels ~g1, ~g2, …, or ~gm. If the lowest grouping ~gm is specified, then no aggregation is performed.

Aggregation of `fv`-`hypercolumns`

Function aggregate_fv() aggregates

the function values, i.e., the black-solid-curve of fv.plot. In the following example, we have
- numeric-hypercolumns hladr.E.value and phenotype.G.value, aggregated function values from fv-hypercolumns hladr.E and phenotype.G
the cumulative trapezoidal integration under the black-solid-curve. In the following example, we have
- numeric-hypercolumns hladr.E.cumtrapz and phenotype.G.cumtrapz, aggregated cumulative trapezoidal integration from fv-hypercolumns hladr.E and phenotype.G

(afv = out |>
  aggregate_fv(by = ~ patient_id, f_aggr_ = pmean, mc.cores = 1L))
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id hladr.E.value hladr.E.cumtrapz
#> 1 3488+      F  85 #01 0-889-121     (numeric)        (numeric)
#> 2  1605      M  66 #02 1-037-393     (numeric)        (numeric)
#> 3   176      M  84 #03 2-080-378     (numeric)        (numeric)
#> 4 2042+      M  79 #04 2-223-153     (numeric)        (numeric)
#> 5 3747+      M  68 #05 2-286-740     (numeric)        (numeric)
#>   phenotype.G.value phenotype.G.cumtrapz
#> 1         (numeric)            (numeric)
#> 2         (numeric)            (numeric)
#> 3         (numeric)            (numeric)
#> 4         (numeric)            (numeric)
#> 5         (numeric)            (numeric)

Each of the numeric-hypercolumns contains tabulated values on the common grid of r. One “slice” of this grid may be extracted by

afv$hladr.E.cumtrapz |> .slice(j = '50')
#>         1         2         3         4         5 
#> 10.489960 10.463419 31.248955  3.162186 23.635120

Aggregation of `numeric`-`hypercolumns` and `numeric` mark(s) in `ppp`-`hypercolumn`

Function aggregate_quantile() aggregates the quantile of

the numeric-hypercolumns. In the following example, we have
- numeric-hypercolumn phenotype.nncross.quantile, aggregated quantile of numeric-hypercolumn phenotype.nncross
the numeric mark(s) in the ppp-hypercolumn. In the following example, we have
- numeric-hypercolumn hladr.quantile, aggregated quantile of numeric mark hladr in ppp-hypercolumn

out |>
  aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L)
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id phenotype.nncross.quantile hladr.quantile
#> 1 3488+      F  85 #01 0-889-121                  (numeric)      (numeric)
#> 2  1605      M  66 #02 1-037-393                  (numeric)      (numeric)
#> 3   176      M  84 #03 2-080-378                  (numeric)      (numeric)
#> 4 2042+      M  79 #04 2-223-153                  (numeric)      (numeric)
#> 5 3747+      M  68 #05 2-286-740                  (numeric)      (numeric)

Function aggregate_kerndens() aggregates the kernel density of

the numeric-hypercolumns. In the following example, we have
- numeric-hypercolumn phenotype.nncross.kerndens, aggregated kernel density of numeric-hypercolumn phenotype.nncross
the numeric mark(s) in the ppp-hypercolumn. In the following example, we have
- numeric-hypercolumn hladr.kerndens, aggregated kernel density of numeric mark hladr in ppp-hypercolumn

(mdist = out$phenotype.nncross |> unlist() |> max())
#> [1] 354.2968
out |> 
  aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L)
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id phenotype.nncross.kerndens hladr.kerndens
#> 1 3488+      F  85 #01 0-889-121                  (numeric)      (numeric)
#> 2  1605      M  66 #02 1-037-393                  (numeric)      (numeric)
#> 3   176      M  84 #03 2-080-378                  (numeric)      (numeric)
#> 4 2042+      M  79 #04 2-223-153                  (numeric)      (numeric)
#> 5 3747+      M  68 #05 2-286-740                  (numeric)      (numeric)

Grouped Hyper Data Frame

Tingting Zhan

Introduction

Prerequisite

Note to Users

Terms and Abbreviations

Acknowledgement

`groupedHyperframe` Class

Create a `groupedHyperframe`

From a `hyperframe`

From a `data.frame`

Create a `groupedHyperframe` with `ppp`-`hypercolumn`

Batch Process on `ppp`-`hypercolumn`

… which adds a `fv`-`hypercolumn`

… which adds a `numeric`-`hypercolumn`

Pipe operator compatible

Aggregation Over Nested Grouping Structure

Aggregation of `fv`-`hypercolumns`

Aggregation of `numeric`-`hypercolumns` and `numeric` mark(s) in `ppp`-`hypercolumn`

Grouped Hyper Data Frame

Tingting Zhan

Introduction

Prerequisite

Note to Users

Terms and Abbreviations

Acknowledgement

groupedHyperframe Class

Create a groupedHyperframe

From a hyperframe

From a data.frame

Create a groupedHyperframe with ppp-hypercolumn

Batch Process on ppp-hypercolumn

… which adds a fv-hypercolumn

… which adds a numeric-hypercolumn

Pipe operator compatible

Aggregation Over Nested Grouping Structure

Aggregation of fv-hypercolumns

Aggregation of numeric-hypercolumns and numeric mark(s) in ppp-hypercolumn

`groupedHyperframe` Class

Create a `groupedHyperframe`

From a `hyperframe`

From a `data.frame`

Create a `groupedHyperframe` with `ppp`-`hypercolumn`

Batch Process on `ppp`-`hypercolumn`

… which adds a `fv`-`hypercolumn`

… which adds a `numeric`-`hypercolumn`

Aggregation of `fv`-`hypercolumns`

Aggregation of `numeric`-`hypercolumns` and `numeric` mark(s) in `ppp`-`hypercolumn`