vald.extractor

CRAN status R-CMD-check

Robust Pipeline for VALD ForceDecks Data Extraction and Analysis

vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.

The Problem

Organizations using VALD ForceDecks face three critical challenges:

  1. API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
  2. Data Cleaning: Team/sport names are inconsistent (“Football” vs “Soccer” vs “FSI”), requiring hours of manual categorization
  3. Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix

The Solution

vald.extractor solves these problems through:

Installation

# Install from CRAN (when available)
install.packages("vald.extractor")

# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")

Quick Start

library(vald.extractor)

# 1. Set VALD credentials
valdr::set_credentials(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id",
  region        = "aue"
)

# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100
)

# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id"
)

athlete_metadata <- standardize_vald_metadata(
  profiles = metadata$profiles,
  groups   = metadata$groups
)

# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)

# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)

# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)

cmj_data <- test_datasets$CMJ  # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data  <- test_datasets$DJ   # Same column names enable generic analysis

# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))

# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")

Key Features

1. Fault-Tolerant Batch Extraction

# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100,  # Adjust based on API performance
  verbose = TRUE
)

# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debugging

Why it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.

2. Automated Sports Taxonomy

metadata <- classify_sports(metadata, group_col = "all_group_names")

# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"

# After:
# "Football", "Football", "Football", "Basketball", "Basketball"

table(metadata$sports_clean)
#> Football    Basketball    Cricket    Swimming    Track & Field
#>      523           198        145          87              234

The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.

3. Generic Test-Type Analysis

# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
  test_data %>%
    mutate(
      asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
                  ((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
    )
}

# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry  <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)

DRY Principle: Without suffix removal, you’d need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.

4. Metadata Patching

# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
  data = cmj_data,
  patch_file = "corrections.xlsx",
  fields_to_patch = c("sex", "dateOfBirth")
)

# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After:  Male: 470, Female: 405, Unknown: 0

5. Publication-Ready Visualizations

# Longitudinal trends
plot_vald_trends(
  data = cmj_data,
  metric_col = "JUMP_HEIGHT_Both",
  group_col = "profileId",
  facet_col = "sports"
)

# Cross-sectional comparisons
plot_vald_compare(
  data = cmj_data,
  metric_col = "PEAK_FORCE_Both",
  group_col = "sports",
  fill_col = "sex"
)

Documentation

Production Use Cases

vald.extractor is designed for:

Comparison to Manual Workflow

Task Manual Workflow vald.extractor
Extract 5000 tests ❌ API timeout errors ✅ Chunked processing (15 min)
Classify 500 athletes into sports ❌ 2-3 hours manual work ✅ Automated (30 sec)
Analyze CMJ, DJ, ISO separately ❌ Duplicate code for each ✅ Generic functions
Handle missing demographics ❌ Manual data entry ✅ Excel patch import
Generate summary tables ❌ Custom scripts summary_vald_metrics()
Create visualizations ❌ ggplot2 from scratch ✅ Pre-built themes

Roadmap for R Journal Submission

The R Journal article will focus on:

  1. Technical Innovation: Chunked extraction architecture with fault tolerance
  2. Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
  3. Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
  4. Reproducible Research: Complete workflow from raw API to publication figures

Key Message: “Automating domain-specific data taxonomy for multi-organizational sports science”

Citation

If you use vald.extractor in published research, please cite:

Chougale PD, Anathakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-sport-taxonomy)
  3. Add tests for new functionality
  4. Submit a pull request

Common contributions:

License

MIT License - see LICENSE file for details.

Acknowledgments

Support


Status: Ready for CRAN submission pending: - [ ] Final testing on multiple VALD tenants - [ ] CRAN comment responses - [ ] Logo design (hex sticker) - [ ] pkgdown website deployment

Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)

mirror server hosted at Truenetwork, Russian Federation.