Grouped Hyperframe

Introduction

This vignette for package groupedHyperframe documents the creation of groupedHyperframe object, the batch processes defined for a groupedHyperframe, and aggregations over multi-level grouping structure.

Prerequisite

Package groupedHyperframe requires the development versions of spatstat family of packages.

devtools::install_github('spatstat/spatstat'); packageDate('spatstat')
devtools::install_github('spatstat/spatstat.data'); packageDate('spatstat.data')
devtools::install_github('spatstat/spatstat.explore'); packageDate('spatstat.explore')
devtools::install_github('spatstat/spatstat.geom'); packageDate('spatstat.geom')
devtools::install_github('spatstat/spatstat.linnet'); packageDate('spatstat.linnet')
devtools::install_github('spatstat/spatstat.model'); packageDate('spatstat.model')
devtools::install_github('spatstat/spatstat.random'); packageDate('spatstat.random')
devtools::install_github('spatstat/spatstat.sparse'); packageDate('spatstat.sparse')
devtools::install_github('spatstat/spatstat.univar'); packageDate('spatstat.univar')
devtools::install_github('spatstat/spatstat.utils'); packageDate('spatstat.utils')

Note to Users

Examples in this vignette require that the search path has

library(groupedHyperframe)
library(survival) # to help hyperframe understand Surv object

Users should remove parameter mc.cores = 1L from all examples and use the default option, which engages all CPU cores on the current host for macOS. The authors are forced to have mc.cores = 1L in this vignette in order to pass CRAN’s submission check.

Additional Resources

A development version of package groupedHyperframe is hosted on Github.

devtools::install_github('tingtingzhan/groupedHyperframe', build_vignettes = TRUE)
vignette('intro', package = 'groupedHyperframe')

List of Terms and Abbreviations

Term / Abbreviation Description Reference
attr Attributes base::attr; base::attributes
CRAN, R The Comprehensive R Archive Network https://cran.r-project.org
data.frame Data frame base::data.frame
formula Formula stats::formula
fv, fv.object Function value table spatstat.explore::fv.object
groupedData Grouped data frame nlme::groupedData
hypercolumn Column of hyper data frame spatstat.geom::hyperframe
hyperframe Hyper data frame spatstat.geom::hyperframe
inherits Class inheritance base::inherits
kerndens Kernel density stats::density.default()$y
matrix Matrix base::matrix
mc.cores Number of CPU cores to use parallel::mclapply, parallel::detectCores
multitype Multitype object spatstat.geom::is.multitype
ppp, ppp.object (Marked) point pattern spatstat.geom::ppp.object
~ g1/.../gm Nested grouping structure nlme::groupedData; nlme::lme
quantile Quantile stats::quantile
S3 R’s simplest object oriented system https://adv-r.hadley.nz/s3.html
search Search path base::search
Surv Survival object survival::Surv
trapz, cumtrapz (Cumulative) trapezoidal integration pracma::trapz; pracma::cumtrapz; https://en.wikipedia.org/wiki/Trapezoidal_rule

groupedHyperframe Class

The S3 class groupedHyperframe inherits from hyperframe class, in a similar fashion as groupedData class inherits from data.frame class.

A groupedHyperframe object, in addition to hyperframe object, has attribute(s)

groupedHyperframe with ppp-hypercolumn

Function grouped_ppp() creates a groupedHyperframe with one-and-only-one ppp-hypercolumn. Multiple ppp-hypercolumns will not be supported in foreseeable future, as we would need to check for name clash in $marks from the multiple ppp-hypercolumns, which is too much trouble.

In the following example, the argument formula specifies

(s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id, 
                 data = wrobel_lung, mc.cores = 1L))
#> 
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp.
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)

Function grouped_ppp() has parameter coords which specifies the column name of \(x\)- and \(y\)-coordinates in the input data. Default coords = ~ x + y indicates the use of data$x and data$y for \(x\)- and \(y\)-coordinates, respectively. Users may use coords = FALSE for data without \(x\)- and \(y\)-coordinates. In this case, the coordinates are filled with randomly generated numbers, and the returned groupedHyperframe has a pseudo.ppp-hypercolumn.

(s_a = grouped_ppp(Ki67 ~ Surv(recfreesurv_mon, recurrence) + race + age | patientID/tissueID, 
                  data = Ki67, coords = FALSE, mc.cores = 1L))
#> 
#> Grouped Hyperframe: ~patientID/tissueID
#> 
#> 207 tissueID nested in
#> 200 patientID
#> 
#>    recfreesurv_mon recurrence  race age patientID tissueID         ppp.
#> 1              100          0 White  66   PT00037 TJUe_I17 (pseudo.ppp)
#> 2               22          1 Black  42   PT00039 TJUe_G17 (pseudo.ppp)
#> 3               99          0 White  60   PT00040 TJUe_F17 (pseudo.ppp)
#> 4               99          0 White  53   PT00042 TJUe_D17 (pseudo.ppp)
#> 5              112          1 White  52   PT00054 TJUe_J18 (pseudo.ppp)
#> 6               12          1 Black  51   PT00059 TJUe_N17 (pseudo.ppp)
#> 7               64          0 Asian  50   PT00062 TJUe_J17 (pseudo.ppp)
#> 8               56          0 White  37   PT00068 TJUe_F19 (pseudo.ppp)
#> 9               79          0 White  68   PT00082 TJUe_P19 (pseudo.ppp)
#> 10              26          1 Black  55   PT00084 TJUe_O19 (pseudo.ppp)

Batch Process on ppp-Hypercolumn

In this section, we outline the batch process of spatial point pattern analyses applicable to the ppp-hypercolumn of a hyperframe.

Note that these spatial point pattern analyses should not be applied to a pseudo.ppp-hypercolumn, as the \(x\)- and \(y\)-coordinates are randomly generated psuedo numbers.

Batch processes that add a fv-hypercolumn to the input hyperframe include

Batch process that adds an fv-hypercolumn
Function Workhorse Applicable To
Emark_() spatstat.explore::Emark numeric marks (e.g., hladr) in ppp-hypercolumn
Vmark_() spatstat.explore::Vmark numeric marks
markcorr_() spatstat.explore::markcorr numeric marks
markvario_() spatstat.explore::markvario numeric marks
Gcross_() spatstat.explore::Gcross multitype marks (e.g., phenotype)
Kcross_() spatstat.explore::Kcross multitype marks
Jcross_() spatstat.explore::Jcross multitype marks

Batch processes that add a numeric-hypercolumn to the input hyperframe include

Batch process that adds a numeric-hypercolumn
Function Workhorse Applicable To
nncross_() spatstat.geom::nncross.ppp(., what = 'dist') multitype marks (e.g., phenotype)

Following example shows that multiple batch processes may be applied to a hyperframe (or groupedHyperframe) in a pipeline (|>).

r = seq.int(from = 0, to = 250, by = 10)
out = s |>
  Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast
#> 

The returned hyperframe (or groupedHyperframe) has

out
#> 
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)    (fv)        (fv)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)    (fv)        (fv)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)    (fv)        (fv)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)    (fv)        (fv)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)    (fv)        (fv)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)    (fv)        (fv)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)    (fv)        (fv)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)    (fv)        (fv)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)    (fv)        (fv)
#>    phenotype.nncross
#> 1          (numeric)
#> 2          (numeric)
#> 3          (numeric)
#> 4          (numeric)
#> 5          (numeric)
#> 6          (numeric)
#> 7          (numeric)
#> 8          (numeric)
#> 9          (numeric)
#> 10         (numeric)

Aggregation Over Nested Grouping Structure

When nested grouping structure ~g1/g2/.../gm is present, we may aggregate over the

by either one of the grouping levels ~g1, ~g2, …, or ~gm. If the lowest grouping ~gm is specified, then no aggregation is performed.

The returned object of various aggregation functions, aggregate_fv(), aggregate_quantile() and aggregate_kerndens(), is data.frame instead of hyperframe. This is because the aggregated results are stored in matrix-columns, while the hyperframe class does not support matrix-column.

Aggregation of fv-hypercolumn(s)

Function aggregate_fv() aggregates

afv = out |>
  aggregate_fv(by = ~ patient_id, f_aggr_ = 'mean', mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(afv) # number of patients
#> [1] 5
names(afv)
#> [1] "OS"                   "gender"               "age"                 
#> [4] "patient_id"           "hladr.E.value"        "hladr.E.cumtrapz"    
#> [7] "phenotype.G.value"    "phenotype.G.cumtrapz"
dim(afv$hladr.E.cumtrapz) # N(patient) by length(r)
#> [1]  5 25

Aggregation of numeric-hypercolumn(s) and numeric mark(s) in ppp-hypercolumn

Function aggregate_quantile() aggregates

q = out |>
  aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(q)
#> [1] 5
names(q)
#> [1] "OS"                         "gender"                    
#> [3] "age"                        "patient_id"                
#> [5] "phenotype.nncross.quantile" "hladr.quantile"
dim(q$phenotype.nncross.quantile)
#> [1]  5 11
dim(q$hladr.quantile)
#> [1]  5 11

Function aggregate_kerndens() aggregates

(mdist = out$phenotype.nncross |> unlist() |> max())
#> [1] 354.2968
d = out |> 
  aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L)
#> Column(s) 'image_id' removed; as they are not identical per aggregation-group
nrow(d)
#> [1] 5
names(d)
#> [1] "OS"                         "gender"                    
#> [3] "age"                        "patient_id"                
#> [5] "phenotype.nncross.kerndens" "hladr.kerndens"
dim(d$phenotype.nncross.kerndens)
#> [1]   5 512

mirror server hosted at Truenetwork, Russian Federation.