Welcome to rupturesRcpp

Description

The R package provides an efficient, object-oriented R6 interface for offline change point detection, implemented in C++ for high performance. This was created as part of the Google Summer of Code 2025 program (see edelweiss611428/rupturesRcpp at gsoc-2025 for the project archive).

+------------------------------------------------------------+
|                                                            |
|          Google Summer of Code 2025 Program                |
|                                                            | 
|  Project: rupturesRcpp                                     |
|  Contributor: @edelweiss611428                             |
|  Mentors: @tdhock & @deepcharles                           |
|  Organisation: The R Project for Statistical Computing     |
|                                                            |
+------------------------------------------------------------+

Installation

To install the newest version of the package, use the following R code:

library(devtools)
install_github("edelweiss611428/rupturesRcpp") 

Getting started

To detect change-points using rupturesRcpp you need three main components:

Each component is implemented using an R6-based object-oriented design for modularity and maintainability.

Cost functions

Create a costFunc object by specifying the desired (supported) cost function and optional parameters:

library("rupturesRcpp")
costFuncObj = costFunc$new("L2")
costFunc$pass() #output attributes corresponding to the specified cost function.
$costFunc
[1] "L2"

The following table shows the list of supported cost functions. Here, n is segment length.

Cost function Description Parameters/active bindings Dimension Time complexity
"L1" Sum of L1 distances to the segment-wise median; robust to outliers. costFunc multi O(nlog(n))
"L2" Sum of squared L2 distances to the segment-wise mean; faster but less robust than L1. costFunc multi O(1)
"SIGMA" Log-determinant of empirical covariance; models varying mean&variance. costFunc, addSmallDiag, epsilon multi O(1)
"VAR" Sum of squared residuals from a vector autoregressive model with constant noise variance. costFunc, pVAR multi O(1)
"LinearL2" Sum of squared residuals from a linear regression model with constant noise variance. costFunc, intercept multi O(1)

If active binding costFunc is modified by assigning to costFuncObj$costFunc and the required parameters are missing, the default parameters will be used.

costFuncObj$costFunc = "VAR"
costFuncObj$pass()
$costFunc
[1] "VAR"

$pVAR
[1] 1

Segmentation methods

After initialising a costFunc object, create a segmentation object such as binSeg, Window, or PELT.

R6 Class Method Description Parameters/active bindings
binSeg Binary Segmentation Recursively splits the signal at points that minimise the cost. minSize, jump, costFunc, tsMat, covariates
Window Slicing Window Detects change-points using local gains over sliding windows. minSize, jump, radius, costFunc,tsMat, covariates
PELT Pruned Exact Linear Time Optimal segmentation with pruning for linear-time performance. minSize, jump, costFunc, tsMat, covariates

The covariates argument is optional and only required for models involving both dependent and independent variables (e.g., "LinearL2"). If not provided, the model is force-fitted using only an intercept term (i.e., a column of ones).

A PELT object, for example, can be initialised as follows:

detectionObj = PELT$new(minSize = 1L, jump = 1L, costFunc = costFuncObj)

All segmentation objects implement the following methods:

Active bindings (such as minSize or tsMat) can be modified at any time—either before or after the object is created via the $ operator. For consistency, if the object has already been fitted, modifying any active bindings will automatically trigger the re-fitting process.

detectionObj$minSize = 2L #Before fitting
detectionObj$fit(a_time_series_matrix) #Fitted
detectionObj$minSize = 1L #After fitting - automatically trigger `$fit()`

Simulated data examples

2-regime SIGMA example via binary segmentation

To demonstrate the package usage, we first consider a simple 2d time series with two piecewise Gaussian regimes and varying variance.

set.seed(1)
tsMat = cbind(c(rnorm(100,0), rnorm(100,5,5)),
              c(rnorm(100,0), rnorm(100,5,5)))

image

As our example involves regimes with varying variance, a suitable costFunc option is "SIGMA". Since the segmentation objects’ interfaces are similar, it is sufficient to demonstrate the usage of binSeg only.

SIGMAObj = costFunc$new("SIGMA", addSmallDiag = TRUE, epsilon = 1e-6)
binSegObj = binSeg$new(minSize = 1L, jump = 1L, costFunc = SIGMAObj) 
binSegObj$fit(tsMat) 

Once fitted, $predict() and $eval() can be used. To view the configurations of the binSeg object, we can use $describe().

binSegObj$describe(printConfig = TRUE) 
Binary Segmentation (binSeg)
minSize      : 1L
jump         : 1L
costFunc     : "SIGMA"
addSmallDiag : TRUE
epsilon      : 1e-06
fitted       : TRUE
n            : 200L
p            : 2L

To obtain an estimated segmentation, we can use the $predict() method and specify a non-negative penalty value pen, which should be properly tuned. This returns a sorted integer vector of end-points, including the number of observations by design.

Here, we set pen = 100.

binSegObj$predict(pen = 100)
[1] 100 200

After running $predict(), the segmentation output is temporarily saved to the binSeg object, allowing users to use the $plot() method without specifying endPts.

binSegObj$plot(d = 1:2, 
               main = "method: binSeg; costFunc: SIGMA; pen: 100")

image

2-regime VAR example: Modifying a binSeg object through its active bindings

You can also modify a binSeg object’s fields through its active bindings. To demonstrate this, we consider a piecewise vector autoregressive example with constant noise variance.

set.seed(1)
tsMat = matrix(c(filter(rnorm(100), filter = 0.9, method = "recursive"), 
                 filter(rnorm(100), filter = -0.9, method = "recursive")))

image

Here, the most suitable cost function is "VAR". Instead of creating a new binSeg object, we will modify the current binSegObj as follows:

VARObj = costFunc$new("VAR")
binSegObj$tsMat = tsMat
binSegObj$costFunc = VARObj

Modifying tsMat (or any other bindings) will automatically trigger self$fit() if a tsMat has already existed.

binSegObj$describe(printConfig = TRUE)
Binary Segmentation (binSeg)
minSize      : 1L
jump         : 1L
costFunc     : "VAR"
pVAR         : 1L
fitted       : TRUE
n            : 200L
p            : 1L

We can then perform binary segmentation with pen = 25 and plot the segmentation results.

binSegObj$predict(pen = 25)
binSegObj$plot(d = 1L, 
               main = "method: binSeg; costFunc: VAR; pen: 25")

image

Future development

Contributing

We welcome all contributions, whether big or small. If you encounter a bug or have a feature request, please open an issue to let us know.

Feel free to fork the repository and make your changes. For significant updates, it’s best to discuss them with us first. When your changes are ready, submit a pull request.

Thanks for helping us improve this project!

License

This project is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

References

mirror server hosted at Truenetwork, Russian Federation.