Getting Started with cyclicwave

Overview

A modular toolkit for clustering time series data and detecting anomalies using classical, wavelet-based, Hilbert-based, and circular feature extraction methods. It supports DBSCAN, OPTICS clustering with consistent output formats and provides a comparison function that allows users to compare multiple feature/algorithm combinations with a single call.

We use the bundled power_consumption dataset, recorded at 10-minute intervals across three urban zones in Tetouan, Morocco.

library(cyclicwave)
data(power_consumption)

The data

Each row is a single time point. The last three columns are the zone-wise power consumption signals; the rest are weather variables we will ignore in this example.

dim(power_consumption)
#> [1] 13906     9
head(power_consumption, 3)
#>        Datetime Temperature Humidity WindSpeed GeneralDiffuseFlows DiffuseFlows
#> 1 1/1/2017 0:00       6.559     73.8     0.083               0.051        0.119
#> 2 1/1/2017 0:10       6.414     74.5     0.083               0.070        0.085
#> 3 1/1/2017 0:20       6.313     74.5     0.080               0.062        0.100
#>   PowerConsumption_Zone1 PowerConsumption_Zone2 PowerConsumption_Zone3
#> 1               34055.70               16128.88               20240.96
#> 2               29814.68               19375.08               20131.08
#> 3               29128.10               19006.69               19668.43

For this walkthrough we will work with a 1000-row slice to keep everything fast. The exact same code runs on the full dataset; it just takes longer.

pwr <- power_consumption[1:1000, ]
zones_matrix <- as.matrix(pwr[, 7:9])

Step 1: reshape into long format

DBSCAN expects a 2D matrix where each row is one observation. We flatten it and attach a zone identifier per row.

flat <- flatten_with_zones(zones_matrix)
length(flat$values)   
#> [1] 3000
table(flat$zones)    
#> 
#>    1    2    3 
#> 1000 1000 1000

After this step we have a single long vector with 3000 values and a matching zones vector of identifiers.

Step 2: extract rolling features

Each observation needs more than a single value to be informative. We compute rolling mean and standard deviation over a 10-point window

rolling <- rolling_stats(zones_matrix,
                         window_size = 10,
                         stats = c("mean", "sd"))

rolling_stats returns a list of matrices. We flatten each to align with our long-format values.

raw_features <- cbind(
  zone  = flat$zones,
  value = flat$values,
  mavg  = as.vector(rolling$mean),
  sd    = as.vector(rolling$sd)
)
head(raw_features, 3)
#>      zone    value     mavg       sd
#> [1,]    1 34055.70 29712.61 2601.235
#> [2,]    1 29814.68 29197.97 2646.171
#> [3,]    1 29128.10 28740.98 2701.318

The first column is the zone identifier; it is metadata, not a feature. We will exclude it from clustering and normalization.

Step 3: normalize

DBSCAN is distance-based, so feature scales matter.

raw_features[, 2:4] <- normalize_features(raw_features[, 2:4],
                                          method = "zscore")

Step 4: choose epsilon (visual heuristic)

DBSCAN needs an eps parameter: the neighborhood radius. The k-distance plot is the standard visual heuristic. We look for an elbow in the sorted distances curve.

plot_k_distance(raw_features[, 2:4], k = 7)

Step 5: run DBSCAN

result <- run_dbscan(raw_features[, 2:4],
                     eps = 0.3,
                     min_pts = 7)

result$n_clusters
#> [1] 3
result$n_noise
#> [1] 63

The result is a list with a standardized structure.

Step 6: evaluate

The Davies-Bouldin Index summarizes how compact and separated the clusters are. Lower values are better.

davies_bouldin(raw_features[, 2:4], result$cluster)
#> [1] 0.4434717

We can visualize the partition by projecting onto the first two principal components and coloring by cluster.

plot_clusters_pca(raw_features[, 2:4], result$cluster)

For function-level reference, see the help pages, e.g. ?run_dbscan.

mirror server hosted at Truenetwork, Russian Federation.