---
title: "Getting Started with cyclicwave"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with cyclicwave}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Overview

A modular toolkit for clustering time series data and detecting
    anomalies using classical, wavelet-based, Hilbert-based, and circular feature
    extraction methods. It supports DBSCAN, OPTICS clustering with
    consistent output formats and provides a comparison function that allows users
    to compare multiple feature/algorithm combinations with a single call.

We use the bundled `power_consumption` dataset, recorded at 10-minute
intervals across three urban zones in Tetouan, Morocco.

```{r setup}
library(cyclicwave)
data(power_consumption)
```

## The data

Each row is a single time point. The last three columns are the
zone-wise power consumption signals; the rest are weather variables we
will ignore in this example.

```{r}
dim(power_consumption)
head(power_consumption, 3)
```

For this walkthrough we will work with a 1000-row slice to keep
everything fast. The exact same code runs on the full dataset; it just
takes longer.

```{r}
pwr <- power_consumption[1:1000, ]
zones_matrix <- as.matrix(pwr[, 7:9])
```

## Step 1: reshape into long format

DBSCAN expects a 2D matrix where each row is one observation. We flatten it and attach a zone identifier per row.

```{r}
flat <- flatten_with_zones(zones_matrix)
length(flat$values)   
table(flat$zones)    
```

After this step we have a single long vector with 3000 values and a
matching `zones` vector of identifiers.

## Step 2: extract rolling features

Each observation needs more than a single value to be informative.
We compute rolling mean and standard deviation over a 10-point window
```{r}
rolling <- rolling_stats(zones_matrix,
                         window_size = 10,
                         stats = c("mean", "sd"))
```

`rolling_stats` returns a list of matrices. We flatten each to align with our long-format values.

```{r}
raw_features <- cbind(
  zone  = flat$zones,
  value = flat$values,
  mavg  = as.vector(rolling$mean),
  sd    = as.vector(rolling$sd)
)
head(raw_features, 3)
```

The first column is the zone identifier; it is metadata, not a feature.
We will exclude it from clustering and normalization.

## Step 3: normalize

DBSCAN is distance-based, so feature scales matter.

```{r}
raw_features[, 2:4] <- normalize_features(raw_features[, 2:4],
                                          method = "zscore")
```

## Step 4: choose epsilon (visual heuristic)

DBSCAN needs an `eps` parameter: the neighborhood radius. The k-distance
plot is the standard visual heuristic. We look for an elbow in the
sorted distances curve.

```{r kdist-plot}
plot_k_distance(raw_features[, 2:4], k = 7)
```

## Step 5: run DBSCAN

```{r}
result <- run_dbscan(raw_features[, 2:4],
                     eps = 0.3,
                     min_pts = 7)

result$n_clusters
result$n_noise
```

The result is a list with a standardized structure.

## Step 6: evaluate

The Davies-Bouldin Index summarizes how compact and separated the
clusters are. Lower values are better.

```{r}
davies_bouldin(raw_features[, 2:4], result$cluster)
```

We can visualize the partition by projecting onto the first two
principal components and coloring by cluster.

```{r cluster-plot}
plot_clusters_pca(raw_features[, 2:4], result$cluster)
```

For function-level reference, see the help pages, e.g. `?run_dbscan`.
