vald.extractor

Robust Pipeline for VALD ForceDecks Data Extraction and Analysis

vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.

The Problem

Organizations using VALD ForceDecks face three critical challenges:

API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
Data Cleaning: Team/sport names are inconsistent (“Football” vs “Soccer” vs “FSI”), requiring hours of manual categorization
Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix

The Solution

vald.extractor solves these problems through:

Chunked Batch Processing: Extracts data in manageable chunks (100 tests at a time) with fault-tolerant error handling
Automated Sports Taxonomy: Regex-based pattern matching standardizes inconsistent naming conventions
Generic Programming: Strips test-type suffixes to enable writing analysis code once that works for all tests

Installation

# Install from CRAN (when available)
install.packages("vald.extractor")

# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")

Quick Start

library(vald.extractor)

# 1. Set VALD credentials
valdr::set_credentials(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id",
  region        = "aue"
)

# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100
)

# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id"
)

athlete_metadata <- standardize_vald_metadata(
  profiles = metadata$profiles,
  groups   = metadata$groups
)

# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)

# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)

# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)

cmj_data <- test_datasets$CMJ  # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data  <- test_datasets$DJ   # Same column names enable generic analysis

# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))

# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")

Key Features

1. Fault-Tolerant Batch Extraction

# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100,  # Adjust based on API performance
  verbose = TRUE
)

# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debugging

Why it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.

2. Automated Sports Taxonomy

metadata <- classify_sports(metadata, group_col = "all_group_names")

# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"

# After:
# "Football", "Football", "Football", "Basketball", "Basketball"

table(metadata$sports_clean)
#> Football    Basketball    Cricket    Swimming    Track & Field
#>      523           198        145          87              234

The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.

3. Generic Test-Type Analysis

# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
  test_data %>%
    mutate(
      asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
                  ((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
    )
}

# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry  <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)

DRY Principle: Without suffix removal, you’d need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.

4. Metadata Patching

# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
  data = cmj_data,
  patch_file = "corrections.xlsx",
  fields_to_patch = c("sex", "dateOfBirth")
)

# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After:  Male: 470, Female: 405, Unknown: 0

5. Publication-Ready Visualizations

# Longitudinal trends
plot_vald_trends(
  data = cmj_data,
  metric_col = "JUMP_HEIGHT_Both",
  group_col = "profileId",
  facet_col = "sports"
)

# Cross-sectional comparisons
plot_vald_compare(
  data = cmj_data,
  metric_col = "PEAK_FORCE_Both",
  group_col = "sports",
  fill_col = "sex"
)

Documentation

Vignette: End-to-End Pipeline: From API to Multi-Sport Analysis
Function Reference: ?fetch_vald_batch, ?standardize_vald_metadata, ?split_by_test, etc.
GitHub Repository: https://github.com/praveenmaths89/vald.extractor

Production Use Cases

vald.extractor is designed for:

Multi-Sport Organizations: National sport institutes, university athletic departments, professional academies
Longitudinal Research: Track athlete development over months/years with automated weekly updates
Cross-Sectional Studies: Compare performance across sports, sexes, age groups
Clinical Settings: Monitor return-to-sport progression, ACL rehab, injury risk

Comparison to Manual Workflow

Task	Manual Workflow	vald.extractor
Extract 5000 tests	❌ API timeout errors	✅ Chunked processing (15 min)
Classify 500 athletes into sports	❌ 2-3 hours manual work	✅ Automated (30 sec)
Analyze CMJ, DJ, ISO separately	❌ Duplicate code for each	✅ Generic functions
Handle missing demographics	❌ Manual data entry	✅ Excel patch import
Generate summary tables	❌ Custom scripts	✅ `summary_vald_metrics()`
Create visualizations	❌ ggplot2 from scratch	✅ Pre-built themes

Roadmap for R Journal Submission

The R Journal article will focus on:

Technical Innovation: Chunked extraction architecture with fault tolerance
Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
Reproducible Research: Complete workflow from raw API to publication figures

Key Message: “Automating domain-specific data taxonomy for multi-organizational sports science”

Citation

If you use vald.extractor in published research, please cite:

Chougale PD, Ananthakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/new-sport-taxonomy)
Add tests for new functionality
Submit a pull request

Common contributions:

Add patterns for new sports in classify_sports()
Improve error messages
Add new visualization themes
Extend to other VALD devices (NordBord, DynaMo, etc.)

License

MIT License - see LICENSE file for details.

Acknowledgments

VALD Performance: For providing the ForceDecks API and valdr package
tidyverse team: For creating the tools that make this package possible
Sports scientists: Who provided real-world use cases and taxonomy requirements

Support

Issues: GitHub Issues
Email: praveenmaths89@gmail.com
Documentation: Run vignette("end-to-end-pipeline", package = "vald.extractor")

Status: Ready for CRAN submission pending: - [ ] Final testing on multiple VALD tenants - [ ] CRAN comment responses - [ ] Logo design (hex sticker) - [ ] pkgdown website deployment

Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)

mirror server hosted at Truenetwork, Russian Federation.