Type: Package
Title: Seamless AWS Cloud Bursting for Parallel R Workloads
Version: 0.3.8
Description: A 'future' backend that enables seamless execution of parallel R workloads on 'Amazon Web Services' ('AWS', https://aws.amazon.com), including 'EC2' and 'Fargate'. 'staRburst' handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change.
License: Apache License 2.0
Encoding: UTF-8
RoxygenNote: 7.3.3
Depends: R (≥ 4.0.0)
Imports: future (≥ 1.33.0), globals, paws.compute, paws.storage, paws.management, paws.security.identity, qs2, uuid, renv, jsonlite, crayon, digest, base64enc, processx
Suggests: future.apply, testthat (≥ 3.0.0), knitr, rmarkdown, withr, mockery
VignetteBuilder: knitr
URL: https://starburst.ing, https://github.com/scttfrdmn/starburst
BugReports: https://github.com/scttfrdmn/starburst/issues
NeedsCompilation: no
Packaged: 2026-03-16 00:17:30 UTC; scttfrdmn
Author: Scott Friedman [aut, cre]
Maintainer: Scott Friedman <help@starburst.ing>
Repository: CRAN
Date/Publication: 2026-03-19 14:50:02 UTC

starburst: Seamless AWS Cloud Bursting for Parallel R Workloads

Description

logo

A 'future' backend that enables seamless execution of parallel R workloads on 'Amazon Web Services' ('AWS', https://aws.amazon.com), including 'EC2' and 'Fargate'. 'staRburst' handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change.

Author(s)

Maintainer: Scott Friedman help@starburst.ing

See Also

Useful links:


Starburst Future Backend

Description

A future backend for running parallel R workloads on AWS ECS

Usage

StarburstBackend(
  workers = 10,
  cpu = 4,
  memory = "8GB",
  region = NULL,
  timeout = 3600,
  launch_type = "EC2",
  instance_type = "c6a.large",
  use_spot = FALSE,
  warm_pool_timeout = 3600,
  ...
)

Arguments

workers

Number of parallel workers

cpu

vCPUs per worker (1, 2, 4, 8, or 16)

memory

Memory per worker (supports GB notation, e.g., "8GB")

region

AWS region (default: from config or "us-east-1")

timeout

Maximum runtime in seconds (default: 3600)

launch_type

"EC2" or "FARGATE" (default: "EC2")

instance_type

EC2 instance type (e.g., "c6a.large")

use_spot

Use spot instances (default: FALSE)

warm_pool_timeout

Pool timeout in seconds (default: 3600)

...

Additional arguments

Value

A StarburstBackend object


StarburstFuture Constructor

Description

Creates a Future object for evaluation on AWS Fargate

Usage

StarburstFuture(
  expr,
  envir = parent.frame(),
  substitute = TRUE,
  globals = TRUE,
  packages = NULL,
  lazy = FALSE,
  seed = FALSE,
  stdout = TRUE,
  conditions = "condition",
  label = NULL,
  ...
)

Arguments

expr

Expression to evaluate

envir

Environment for evaluation

substitute

Whether to substitute the expression

globals

Globals to export (TRUE for auto-detection, list for manual)

packages

Packages to load

lazy

Whether to lazily evaluate (always FALSE for remote)

seed

Random seed

...

Additional arguments

Value

A StarburstFuture object


Atomically claim a pending task

Description

This is a helper that combines get + conditional update in one operation

Usage

atomic_claim_task(session_id, task_id, worker_id, region, bucket)

Arguments

session_id

Session identifier

task_id

Task identifier

worker_id

Worker identifier claiming the task

region

AWS region

bucket

S3 bucket

Value

TRUE if claimed successfully, FALSE if already claimed


AWS Retry Logic

Description

Centralized retry logic for AWS operations with exponential backoff


Build base Docker image with common dependencies

Description

Build base Docker image with common dependencies

Usage

build_base_image(region)

Build environment image

Description

Build environment image

Usage

build_environment_image(tag, region, use_public = NULL)

Arguments

tag

Image tag

region

AWS region

use_public

Logical, use public ECR base image (default NULL = from config)


Build initial environment

Description

Build initial environment

Usage

build_initial_environment(region)

Calculate Cloud Performance Predictions

Description

Calculate Cloud Performance Predictions

Usage

calculate_predictions(
  n_total,
  avg_time_per_task,
  local_specs,
  workers,
  cpu,
  memory,
  platform
)

Calculate task cost

Description

Calculate task cost

Usage

calculate_task_cost(future)

Calculate total cost

Description

Calculate total cost

Usage

calculate_total_cost(plan)

Check and Submit Wave if Ready

Description

Checks wave queue and submits next wave if current wave is complete

Usage

check_and_submit_wave(backend)

Arguments

backend

Backend environment


Check AWS credentials

Description

Check AWS credentials

Usage

check_aws_credentials()

Value

Logical indicating if credentials are valid


Check ECR image age and suggest/force rebuild

Description

Check ECR image age and suggest/force rebuild

Usage

check_ecr_image_age(region, image_tag, ttl_days = NULL, force_rebuild = FALSE)

Arguments

region

AWS region

image_tag

Image tag to check

ttl_days

TTL setting (NULL = no check)

force_rebuild

Force rebuild if past TTL

Value

TRUE if image is fresh or doesn't exist, FALSE if stale


Check if ECR image exists

Description

Check if ECR image exists

Usage

check_ecr_image_exists(tag, region)

Check Fargate vCPU quota

Description

Check Fargate vCPU quota

Usage

check_fargate_quota(region)

Arguments

region

AWS region

Value

List with quota information


Check if quota is sufficient for plan

Description

Check if quota is sufficient for plan

Usage

check_quota_sufficient(workers, cpu, region)

Clean up cluster resources

Description

Clean up cluster resources

Usage

cleanup_cluster(backend)

Cleanup S3 files

Description

Cleanup S3 files

Usage

cleanup_s3_files(plan)

Cleanup session

Description

Cleanup session

Usage

cleanup_session(session, stop_workers = TRUE, force = FALSE)

Arguments

session

Session object

stop_workers

Stop running ECS tasks (default TRUE)

force

Delete S3 files (default FALSE)


Collect results from session

Description

Collect results from session

Usage

collect_session_results(session, wait, timeout)

Compute environment image hash

Description

Computes the hash used to tag environment Docker images, combining the renv.lock file contents with the starburst package version. This ensures new images are built when either the R package environment or the starburst worker script changes.

Usage

compute_env_hash(lock_file)

Arguments

lock_file

Path to renv.lock file

Value

MD5 hash string


Get configuration directory

Description

Get configuration directory

Usage

config_dir()

Get configuration file path

Description

Get configuration file path

Usage

config_path()

Create ECR lifecycle policy to auto-delete old images

Description

Create ECR lifecycle policy to auto-delete old images

Usage

create_ecr_lifecycle_policy(region, repository_name, ttl_days = NULL)

Arguments

region

AWS region

repository_name

ECR repository name

ttl_days

Number of days to keep images (NULL = no auto-delete)


Create ECR repository

Description

Create ECR repository

Usage

create_ecr_repository(repo_name, region)

Create ECS cluster

Description

Create ECS cluster

Usage

create_ecs_cluster(cluster_name, region)

Create session manifest in S3

Description

Create session manifest in S3

Usage

create_session_manifest(session_id, backend)

Arguments

session_id

Unique session identifier

backend

Backend configuration

Value

Invisibly returns NULL


Create session object with methods

Description

Create session object with methods

Usage

create_session_object(backend)

Arguments

backend

Backend environment

Value

Session object (environment)


Create S3 bucket for staRburst

Description

Create S3 bucket for staRburst

Usage

create_starburst_bucket(bucket_name, region)

Create task object

Description

Create task object

Usage

create_task(expr, globals, packages, plan)

Create task status in S3

Description

Create task status in S3

Usage

create_task_status(session_id, task_id, state = "pending", region, bucket)

Arguments

session_id

Session identifier

task_id

Task identifier

state

Initial state (default: "pending")

region

AWS region

bucket

S3 bucket

Value

Invisibly returns NULL


EC2 Pool Management for staRburst

Description

Functions for managing Auto-Scaling Groups and ECS Capacity Providers to maintain warm pools of EC2 instances for fast task execution.


Ensure base image exists

Description

Ensure base image exists

Usage

ensure_base_image(region, use_public = NULL)

Arguments

region

AWS region

use_public

Logical, use public ECR base image (default TRUE)


Ensure ECS instance IAM profile exists

Description

Ensure ECS instance IAM profile exists

Usage

ensure_ecs_instance_profile(region)

Arguments

region

AWS region

Value

Instance profile ARN


Ensure ECS security group exists

Description

Ensure ECS security group exists

Usage

ensure_ecs_security_group(region)

Arguments

region

AWS region

Value

Security group ID


Ensure environment is ready

Description

Ensure environment is ready

Usage

ensure_environment(region)

Ensure CloudWatch log group exists

Description

Ensure CloudWatch log group exists

Usage

ensure_log_group(log_group_name, region)

Error Handling for staRburst

Description

Improved error messages with context and solutions


Estimate Cost

Description

Estimate Cost

Estimate cost

Usage

estimate_cost(
  workers,
  cpu,
  memory,
  estimated_runtime_hours = 1,
  launch_type = "FARGATE",
  instance_type = NULL,
  use_spot = FALSE
)

estimate_cost(
  workers,
  cpu,
  memory,
  estimated_runtime_hours = 1,
  launch_type = "FARGATE",
  instance_type = NULL,
  use_spot = FALSE
)

Extend session timeout

Description

Extend session timeout

Usage

extend_session_timeout(session, seconds)

Extract region from S3 key

Description

Extract region from S3 key

Usage

extract_region_from_key(key)

Create a Future using Starburst Backend

Description

This is the entry point called by the Future package when a plan(starburst) is active

Usage

## S3 method for class 'starburst'
future(
  expr,
  envir = parent.frame(),
  substitute = TRUE,
  lazy = FALSE,
  seed = FALSE,
  globals = TRUE,
  packages = NULL,
  stdout = TRUE,
  conditions = "condition",
  label = NULL,
  ...
)

Arguments

expr

Expression to evaluate

envir

Environment for evaluation

substitute

Whether to substitute the expression

lazy

Whether to lazily evaluate (always FALSE for remote)

seed

Random seed

globals

Globals to export (TRUE for auto-detection, list for manual)

packages

Packages to load

stdout

Whether to capture stdout (TRUE, FALSE, or NA)

conditions

Character vector of condition classes to capture

label

Optional label for the future

...

Additional arguments

Value

A StarburstFuture object


Get CPU architecture from instance type

Description

Get CPU architecture from instance type

Usage

get_architecture_from_instance_type(instance_type)

Arguments

instance_type

EC2 instance type (e.g., "c7g.xlarge", "c7i.xlarge", "c7a.xlarge")

Value

CPU architecture ("ARM64" or "X86_64")


Get Auto Scaling client

Description

Get Auto Scaling client

Usage

get_autoscaling_client(region)

Arguments

region

AWS region

Value

Auto Scaling client


Get AWS account ID

Description

Get AWS account ID

Usage

get_aws_account_id()

Value

AWS account ID


Get base image source URI

Description

Get base image source URI

Usage

get_base_image_source(use_public = TRUE)

Arguments

use_public

Logical, use public ECR base image (default TRUE)


Get base image URI

Description

Get base image URI

Usage

get_base_image_uri(region)

Get Cloud Performance Ratio

Description

Returns estimated per-core performance of cloud vs local hardware. Based on empirical benchmarking data.

Usage

get_cloud_performance_ratio(local_specs, platform)

Get current vCPU usage

Description

Get current vCPU usage

Usage

get_current_vcpu_usage(region)

Get EC2 client

Description

Get EC2 client

Get EC2 client

Usage

get_ec2_client(region)

get_ec2_client(region)

Arguments

region

AWS region

Value

EC2 client

EC2 client


Get EC2 instance pricing

Description

Get EC2 instance pricing

Usage

get_ec2_instance_price(instance_type, use_spot = FALSE)

Arguments

instance_type

EC2 instance type (e.g., "c7g.xlarge")

use_spot

Whether to use spot pricing

Value

Price per hour in USD


Get ECR client

Description

Get ECR client

Usage

get_ecr_client(region)

Arguments

region

AWS region

Value

ECR client


Get ECS client

Description

Get ECS client

Usage

get_ecs_client(region)

Arguments

region

AWS region

Value

ECS client


Get ECS-optimized AMI ID for region and architecture

Description

Get ECS-optimized AMI ID for region and architecture

Usage

get_ecs_optimized_ami(region, architecture = "X86_64")

Arguments

region

AWS region

architecture

CPU architecture ("X86_64" or "ARM64")

Value

AMI ID


Get IAM execution role ARN

Description

Returns the ARN for the ECS execution role (should be created during setup)

Usage

get_execution_role_arn(region)

Get instance specifications (vCPUs, memory)

Description

Get instance specifications (vCPUs, memory)

Usage

get_instance_specs(instance_type)

Get vCPU count for instance type

Description

Get vCPU count for instance type

Usage

get_instance_vcpus(instance_type)

Arguments

instance_type

EC2 instance type

Value

Number of vCPUs


Get Local Hardware Specifications

Description

Detects local CPU model and core count for performance estimation.

Usage

get_local_hardware_specs()

Value

List with cpu_name, cores, and architecture


Get or create security group

Description

Get or create security group

Usage

get_or_create_security_group(vpc_id, region)

Get or create subnets

Description

Get or create subnets

Usage

get_or_create_subnets(vpc_id, region)

Get or create task definition

Description

Get or create task definition

Usage

get_or_create_task_definition(plan)

Get pool status

Description

Query current state of the EC2 pool

Usage

get_pool_status(backend)

Arguments

backend

Backend configuration object

Value

List with pool status information


Get S3 client

Description

Get S3 client

Usage

get_s3_client(region)

Arguments

region

AWS region

Value

S3 client


Get Service Quotas client

Description

Get Service Quotas client

Usage

get_service_quotas_client(region)

Arguments

region

AWS region

Value

Service Quotas client


Get session manifest from S3

Description

Get session manifest from S3

Usage

get_session_manifest(session_id, region, bucket)

Arguments

session_id

Session identifier

region

AWS region

bucket

S3 bucket

Value

Session manifest list


Get session status

Description

Get session status

Usage

get_session_status(session)

Get starburst bucket name

Description

Get starburst bucket name

Usage

get_starburst_bucket()

Value

S3 bucket name


Get staRburst configuration

Description

Get staRburst configuration

Usage

get_starburst_config()

Value

List of configuration values


Get starburst security groups

Description

Get starburst security groups

Usage

get_starburst_security_groups(region)

Arguments

region

AWS region

Value

Vector of security group IDs


Get starburst subnets

Description

Get starburst subnets

Usage

get_starburst_subnets(region)

Arguments

region

AWS region

Value

Vector of subnet IDs


Get task ARN

Description

Get task ARN

Usage

get_task_arn(task_id)

Get task registry environment

Description

Get task registry environment

Usage

get_task_registry()

Get IAM task role ARN

Description

Returns the ARN for the ECS task role (should be created during setup)

Usage

get_task_role_arn(region)

Get task status from S3

Description

Get task status from S3

Usage

get_task_status(session_id, task_id, region, bucket, include_etag = FALSE)

Arguments

session_id

Session identifier

task_id

Task identifier

region

AWS region

bucket

S3 bucket

include_etag

Include ETag in result (for atomic operations)

Value

Task status list (with optional $etag field)


Get VPC configuration for ECS tasks

Description

Get VPC configuration for ECS tasks

Usage

get_vpc_config(region)

Get wave queue status

Description

Get wave queue status

Usage

get_wave_status(backend)

Arguments

backend

Backend environment


Initialize backend for detached mode

Description

Creates a backend for detached sessions without modifying the future plan

Usage

initialize_detached_backend(
  session_id,
  workers = 10,
  cpu = 4,
  memory = "8GB",
  region = NULL,
  timeout = 3600,
  absolute_timeout = 86400,
  launch_type = "EC2",
  instance_type = "c7g.xlarge",
  use_spot = TRUE,
  warm_pool_timeout = 3600
)

Arguments

session_id

Unique session identifier

workers

Number of workers

cpu

vCPUs per worker

memory

Memory per worker (GB notation like "8GB")

region

AWS region

timeout

Task timeout in seconds

absolute_timeout

Maximum session lifetime in seconds

launch_type

"FARGATE" or "EC2"

instance_type

EC2 instance type (for EC2 launch type)

use_spot

Use spot instances

warm_pool_timeout

Warm pool timeout for EC2

Value

Backend environment


Check if setup is complete

Description

Check if setup is complete

Usage

is_setup_complete()

Launch a future on the Starburst backend

Description

Launch a future on the Starburst backend

Usage

## S3 method for class 'StarburstBackend'
launchFuture(backend, future, ...)

Arguments

backend

A StarburstBackend object

future

The future object to launch

...

Additional arguments

Value

The future object (invisibly)


Launch workers for detached session

Description

Launches workers with bootstrap tasks that tell them the session ID

Usage

launch_detached_workers(backend)

Arguments

backend

Backend environment

Value

Invisibly returns NULL


List futures for StarburstBackend

Description

List futures for StarburstBackend

Usage

## S3 method for class 'StarburstBackend'
listFutures(backend, ...)

Arguments

backend

A StarburstBackend object

...

Additional arguments

Value

List of futures (empty for this backend)


List active clusters

Description

List active clusters

Usage

list_active_clusters(region)

List pending tasks in session

Description

List pending tasks in session

Usage

list_pending_tasks(session_id, region, bucket)

Arguments

session_id

Session identifier

region

AWS region

bucket

S3 bucket

Value

Character vector of pending task IDs


List all stored task ARNs

Description

List all stored task ARNs

Usage

list_task_arns()

List all task statuses in session

Description

List all task statuses in session

Usage

list_task_statuses(session_id, region, bucket)

Arguments

session_id

Session identifier

region

AWS region

bucket

S3 bucket

Value

Named list of task statuses (task_id -> status)


Load quota request history

Description

Load quota request history

Usage

load_quota_history()

Number of workers for StarburstBackend

Description

Number of workers for StarburstBackend

Usage

## S3 method for class 'StarburstBackend'
nbrOfWorkers(evaluator)

Arguments

evaluator

A StarburstBackend object

Value

Number of workers


Package installation error

Description

Package installation error

Usage

package_installation_error(package, error_message = NULL)

Parse memory specification

Description

Parse memory specification

Usage

parse_memory(memory)

Arguments

memory

Memory specification (numeric GB or string like "8GB")


Permission denied error

Description

Permission denied error

Usage

permission_error(service, operation, resource = NULL, iam_role = NULL)

staRburst Future Backend

Description

A future backend for running parallel R workloads on AWS (EC2 or Fargate)

Usage

## S3 method for class 'starburst'
plan(
  strategy,
  workers = 10,
  cpu = 4,
  memory = "8GB",
  region = NULL,
  timeout = 3600,
  auto_quota_request = interactive(),
  launch_type = "EC2",
  instance_type = "c7g.xlarge",
  use_spot = TRUE,
  warm_pool_timeout = 3600,
  detached = FALSE,
  ...
)

Arguments

strategy

The starburst strategy marker (ignored, for S3 dispatch)

workers

Number of parallel workers

cpu

vCPUs per worker (1, 2, 4, 8, or 16)

memory

Memory per worker (supports GB notation, e.g., "8GB")

region

AWS region (default: from config or "us-east-1")

timeout

Maximum runtime in seconds (default: 3600)

auto_quota_request

Automatically request quota increases (default: interactive())

launch_type

Launch type: EC2 or FARGATE (default: EC2)

instance_type

EC2 instance type when using EC2 launch type (default: c7g.xlarge)

use_spot

Use EC2 Spot instances for cost savings (default: TRUE)

warm_pool_timeout

Timeout for warm pool in seconds (default: 3600)

detached

Use detached session mode (deprecated, use starburst_session instead)

...

Additional arguments passed to future backend

Value

A future plan object

Examples


if (starburst_is_configured()) {
  future::plan(starburst, workers = 50)
  results <- future.apply::future_lapply(1:100, function(i) i^2)
}


Poll for result

Description

Poll for result

Usage

poll_for_result(future, timeout = 3600)

Print method for session status

Description

Print method for session status

Usage

## S3 method for class 'StarburstSessionStatus'
print(x, ...)

Arguments

x

A StarburstSessionStatus object

...

Additional arguments (ignored)

Value

Invisibly returns x.


Description

Print Estimate Summary

Usage

print_estimate_summary(pred, local_specs)

Quota exceeded error

Description

Quota exceeded error

Usage

quota_error(resource, requested, available, region, workers = NULL, cpu = NULL)

Reconstruct backend from session manifest

Description

Used when reattaching to an existing session

Usage

reconstruct_backend_from_manifest(manifest)

Arguments

manifest

Session manifest from S3

Value

Backend environment


Register cleanup handler

Description

Register cleanup handler

Usage

register_cleanup(evaluator)

Request quota increase

Description

Request quota increase

Usage

request_quota_increase(service, quota_code, desired_value, region, reason = "")

Arguments

service

AWS service (e.g., "fargate")

quota_code

Service quota code

desired_value

Desired quota value

region

AWS region

reason

Justification for increase

Value

Case ID if successful, NULL if failed


Check if StarburstFuture is Resolved

Description

Checks whether the future task has completed execution

Usage

## S3 method for class 'StarburstFuture'
resolved(x, ...)

Arguments

x

A StarburstFuture object

...

Additional arguments

Value

Logical indicating if the future is resolved


Get Result from StarburstFuture

Description

Retrieves the result from a resolved future

Usage

## S3 method for class 'StarburstFuture'
result(future, ...)

Arguments

future

A StarburstFuture object

...

Additional arguments

Value

A FutureResult object


Check if result exists

Description

Check if result exists

Usage

result_exists(task_id, region)

Run a StarburstFuture

Description

Submits the future task to AWS Fargate for execution

Usage

## S3 method for class 'StarburstFuture'
run(future, ...)

Arguments

future

A StarburstFuture object

...

Additional arguments

Value

The future object (invisibly)


Execute system command safely (no shell injection)

Description

Execute system command safely (no shell injection)

Usage

safe_system(
  command,
  args = character(),
  allowed_commands = c("docker", "aws", "uname", "sysctl", "cat", "nproc"),
  stdin = NULL,
  stdout = "|",
  stderr = "|",
  ...
)

Arguments

command

Command to execute (must be in whitelist)

args

Character vector of arguments

allowed_commands

Commands allowed to be executed

stdin

Optional input to pass to stdin

...

Additional arguments passed to processx::run()

Value

Result from processx::run()


Save staRburst configuration

Description

Save staRburst configuration

Usage

save_config(config)

Save quota request to local history

Description

Save quota request to local history

Usage

save_quota_request(case_id, service, quota_code, desired_value, region)

Serialize and upload to S3

Description

Serialize and upload to S3

Usage

serialize_and_upload(obj, bucket, key)

Detached Session API

Description

User-facing API for creating and managing detached sessions


Session Backend Initialization

Description

Backend initialization for detached session mode


S3 Session State Management

Description

Core S3 operations for managing detached session state


Setup EC2 capacity provider for ECS cluster

Description

Creates Launch Template, Auto-Scaling Group, and ECS Capacity Provider

Usage

setup_ec2_capacity_provider(backend)

Arguments

backend

Backend configuration object

Value

List with capacity provider details


Setup VPC resources

Description

Setup VPC resources

Usage

setup_vpc_resources(region)

Starburst strategy marker

Description

This function should never be called directly. Use plan(starburst, ...) instead.

Usage

starburst(...)

Arguments

...

Arguments passed to StarburstBackend()

Value

Does not return a value; always signals an error if called directly. This object exists as a strategy marker for plan.


Monitor quota increase request

Description

Monitor quota increase request

Usage

starburst_check_quota_request(case_id, region = NULL)

Arguments

case_id

Case ID from quota increase request

region

AWS region

Value

Invisibly returns the quota request details, or NULL on error.

Examples


if (starburst_is_configured()) {
  starburst_check_quota_request("case-12345")
}


Clean up staRburst ECR images

Description

Manually delete Docker images from ECR to save storage costs. Images will be rebuilt on next use (adds 3-5 min delay).

Usage

starburst_cleanup_ecr(force = FALSE, region = NULL)

Arguments

force

Delete all images immediately, ignoring TTL

region

AWS region (default: from config)

Value

Invisibly returns TRUE on success or FALSE if not configured.

Examples


if (starburst_is_configured()) {
  # Delete images past TTL
  starburst_cleanup_ecr()

  # Delete all images immediately (save $0.50/month)
  starburst_cleanup_ecr(force = TRUE)
}


Create a Starburst Cluster

Description

Creates a cluster object for managing AWS Fargate workers using Future backend

Usage

starburst_cluster(
  workers = 10,
  cpu = 4,
  memory = "8GB",
  platform = "X86_64",
  region = NULL,
  timeout = 3600
)

Arguments

workers

Number of parallel workers

cpu

CPU units per worker

memory

Memory per worker

platform

CPU architecture (X86_64 or ARM64)

region

AWS region

timeout

Maximum runtime in seconds

Value

A starburst_cluster object

Examples


if (starburst_is_configured()) {
  cluster <- starburst_cluster(workers = 20)
  results <- cluster$map(data, function(x) x * 2)
}


Execute Map on Starburst Cluster

Description

Internal function to execute parallel map by creating StarburstFuture objects directly

Usage

starburst_cluster_map(cluster, .x, .f, .progress = TRUE)

Configure staRburst options

Description

Configure staRburst options

Usage

starburst_config(
  max_cost_per_job = NULL,
  cost_alert_threshold = NULL,
  auto_cleanup_s3 = NULL,
  ...
)

Arguments

max_cost_per_job

Maximum cost per job in dollars

cost_alert_threshold

Cost threshold for alerts

auto_cleanup_s3

Automatically clean up S3 files after completion

...

Additional configuration options

Value

Invisibly returns the updated configuration list.

Examples


if (starburst_is_configured()) {
  starburst_config(
    max_cost_per_job = 10,
    cost_alert_threshold = 5
  )
}


Create informative staRburst error

Description

Creates error messages with context, solutions, and links to documentation

Usage

starburst_error(
  message,
  context = list(),
  solution = NULL,
  call = sys.call(-1)
)

Arguments

message

Main error message

context

Named list of contextual information

solution

Suggested solution (optional)

call

Calling function (default: sys.call(-1))

Value

Error condition with class "starburst_error"


Estimate Cloud Performance and Cost

Description

Runs a small sample of tasks locally to estimate cloud execution time and cost. Provides informed prediction before spending money on cloud execution.

Usage

starburst_estimate(
  .x,
  .f,
  workers = 10,
  cpu = 2,
  memory = "8GB",
  platform = "X86_64",
  sample_size = 10,
  region = NULL,
  ...
)

Arguments

.x

A vector or list to iterate over

.f

A function to apply to each element

workers

Number of parallel workers to estimate for

cpu

CPU units per worker (1, 2, 4, 8, or 16)

memory

Memory per worker (e.g., "8GB")

platform

CPU architecture: "X86_64" (default) or "ARM64" (Graviton3)

sample_size

Number of items to run locally for estimation (default: 10)

region

AWS region

...

Additional arguments passed to .f

Value

Invisible list with estimates, prints summary to console

Examples


if (starburst_is_configured()) {
  # Estimate before running
  starburst_estimate(1:1000, expensive_function, workers = 50)

  # Then decide whether to proceed
  results <- starburst_map(1:1000, expensive_function, workers = 50)
}


Check if staRburst is configured

Description

Returns TRUE if starburst_setup() has been run, the configuration file exists, and AWS credentials are available. Useful for guarding example code that requires AWS credentials.

Usage

starburst_is_configured()

Value

TRUE if configured and credentials are available, FALSE otherwise.

Examples

starburst_is_configured()

List All Sessions

Description

List all detached sessions in S3

Usage

starburst_list_sessions(region = NULL)

Arguments

region

AWS region (default: from config)

Value

Data frame with session information

Examples


if (starburst_is_configured()) {
  sessions <- starburst_list_sessions()
  print(sessions)
}


View worker logs

Description

View worker logs

Usage

starburst_logs(task_id = NULL, cluster_id = NULL, last_n = 50, region = NULL)

Arguments

task_id

Optional task ID to view logs for specific task

cluster_id

Optional cluster ID to view logs for specific cluster

last_n

Number of last log lines to show (default: 50)

region

AWS region (default: from config)

Value

Invisibly returns the list of log events, or NULL if no events were found.

Examples


if (starburst_is_configured()) {
  # View recent logs
  starburst_logs()

  # View logs for specific task
  starburst_logs(task_id = "abc-123")

  # View last 100 lines
  starburst_logs(last_n = 100)
}


Map Function Over Data Using AWS Fargate

Description

Parallel map function that executes on AWS Fargate using the Future backend

Usage

starburst_map(
  .x,
  .f,
  workers = 10,
  cpu = 4,
  memory = "8GB",
  platform = "X86_64",
  region = NULL,
  timeout = 3600,
  .progress = TRUE,
  ...
)

Arguments

.x

A vector or list to iterate over

.f

A function to apply to each element

workers

Number of parallel workers (default: 10)

cpu

CPU units per worker (1, 2, 4, 8, or 16)

memory

Memory per worker (e.g., 8GB)

platform

CPU architecture (X86_64 or ARM64)

region

AWS region

timeout

Maximum runtime in seconds per task

.progress

Show progress bar (default: TRUE)

...

Additional arguments passed to .f

Value

A list of results, one per element of .x

Examples


if (starburst_is_configured()) {
  # Simple parallel computation
  results <- starburst_map(1:100, function(x) x^2, workers = 10)

  # With custom configuration
  results <- starburst_map(
    data_list,
    expensive_function,
    workers = 50,
    cpu = 4,
    memory = "8GB"
  )
}


Show quota status

Description

Show quota status

Usage

starburst_quota_status(region = NULL)

Arguments

region

AWS region (default: from config)

Value

Invisibly returns a list with quota information including current limit, usage, and any pending requests.

Examples


if (starburst_is_configured()) {
  starburst_quota_status()
}


Rebuild environment image

Description

Rebuild environment image

Usage

starburst_rebuild_environment(region = NULL, force = FALSE)

Arguments

region

AWS region (default: from config)

force

Force rebuild even if current environment hasn't changed

Value

Invisibly returns NULL. Called for its side effect of rebuilding and pushing the Docker environment image.

Examples


if (starburst_is_configured()) {
  starburst_rebuild_environment()
}


Request quota increase (user-facing)

Description

Request quota increase (user-facing)

Usage

starburst_request_quota_increase(vcpus = 500, region = NULL)

Arguments

vcpus

Desired vCPU quota

region

AWS region (default: from config)

Value

Invisibly returns TRUE if the increase was requested, FALSE if already sufficient or cancelled.

Examples


if (starburst_is_configured()) {
  starburst_request_quota_increase(vcpus = 500)
}


Create a Detached Starburst Session

Description

Creates a new detached session that can run computations independently of your R session. You can close R and reattach later to collect results.

Usage

starburst_session(
  workers = 10,
  cpu = 4,
  memory = "8GB",
  region = NULL,
  timeout = 3600,
  session_timeout = 3600,
  absolute_timeout = 86400,
  launch_type = "EC2",
  instance_type = "c7g.xlarge",
  use_spot = TRUE,
  warm_pool_timeout = 3600
)

Arguments

workers

Number of parallel workers (default: 10)

cpu

vCPUs per worker (default: 4)

memory

Memory per worker, e.g., "8GB" (default: "8GB")

region

AWS region (default: from config or "us-east-1")

timeout

Task timeout in seconds (default: 3600)

session_timeout

Active timeout in seconds (default: 3600)

absolute_timeout

Maximum session lifetime in seconds (default: 86400)

launch_type

"FARGATE" or "EC2" (default: "FARGATE")

instance_type

EC2 instance type for EC2 launch (default: "c6a.large")

use_spot

Use spot instances for EC2 (default: FALSE)

warm_pool_timeout

EC2 warm pool timeout in seconds (default: 3600)

Value

A StarburstSession object with methods:

Examples


if (starburst_is_configured()) {
  # Create detached session
  session <- starburst_session(workers = 10)

  # Submit tasks
  task_ids <- lapply(1:100, function(i) {
    session$submit(quote(expensive_computation(i)))
  })

  # Close R and come back later...
  session_id <- session$session_id

  # Reattach
  session <- starburst_session_attach(session_id)

  # Collect results
  results <- session$collect(wait = TRUE)
}


Reattach to Existing Session

Description

Reattach to a previously created detached session

Usage

starburst_session_attach(session_id, region = NULL)

Arguments

session_id

Session identifier

region

AWS region (default: from config)

Value

A StarburstSession object

Examples


if (starburst_is_configured()) {
  session <- starburst_session_attach("session-abc123")
  status <- session$status()
  results <- session$collect()
}


Setup staRburst

Description

One-time configuration to set up AWS resources for staRburst

Usage

starburst_setup(
  region = "us-east-1",
  force = FALSE,
  use_public_base = TRUE,
  ecr_image_ttl_days = NULL
)

Arguments

region

AWS region (default: "us-east-1")

force

Force re-setup even if already configured

use_public_base

Use public base Docker images (default: TRUE). Set to FALSE to build private base images in your ECR.

ecr_image_ttl_days

Number of days to keep Docker images in ECR (default: NULL = never delete). AWS will automatically delete images older than this many days. This prevents surprise costs if you stop using staRburst. Recommended: 30 days for regular users, 7 days for occasional users. When images are deleted, they will be rebuilt on next use (adds 3-5 min).

Value

Invisibly returns the configuration list.

Examples


if (starburst_is_configured()) {
  # Default: keep images forever (~$0.50/month idle cost)
  starburst_setup()

  # Auto-delete images after 30 days (saves money if you stop using it)
  starburst_setup(ecr_image_ttl_days = 30)

  # Use private base images with 7-day cleanup
  starburst_setup(use_public_base = FALSE, ecr_image_ttl_days = 7)
}


Setup EC2 capacity providers for staRburst

Description

One-time setup for EC2 launch type. Creates IAM roles, instance profiles, and capacity providers for specified instance types.

Usage

starburst_setup_ec2(
  region = "us-east-1",
  instance_types = c("c7g.xlarge", "c7i.xlarge"),
  force = FALSE
)

Arguments

region

AWS region (default: "us-east-1")

instance_types

Character vector of instance types to setup (default: c("c7g.xlarge", "c7i.xlarge"))

force

Force re-setup even if already configured

Value

Invisibly returns TRUE on success or FALSE on failure or cancellation.

Examples


if (starburst_is_configured()) {
  # Setup with default instance types (Graviton and Intel)
  starburst_setup_ec2()

  # Setup with custom instance types
  starburst_setup_ec2(instance_types = c("c7g.2xlarge", "r7g.xlarge"))
}


Show staRburst status

Description

Show staRburst status

Usage

starburst_status()

Value

Invisibly returns a list with current configuration and quota information.


Start warm EC2 pool

Description

Scales Auto-Scaling Group to desired capacity and waits for instances

Usage

start_warm_pool(backend, capacity, timeout_seconds = 180)

Arguments

backend

Backend configuration object

capacity

Desired number of instances

timeout_seconds

Maximum time to wait for instances (default: 180)

Value

Invisible NULL


Stop running tasks

Description

Stop running tasks

Usage

stop_running_tasks(plan)

Stop warm pool

Description

Scales Auto-Scaling Group to zero

Usage

stop_warm_pool(backend)

Arguments

backend

Backend configuration object

Value

Invisible NULL


Store task ARN

Description

Store task ARN

Usage

store_task_arn(task_id, task_arn)

Submit detached worker to ECS

Description

Submit detached worker to ECS

Usage

submit_detached_worker(backend, task_id)

Arguments

backend

Backend environment

task_id

Task ID for the worker to execute

Value

Task ARN


Submit Task to ECS

Description

Internal function to submit a task to ECS (Fargate or EC2)

Usage

submit_task(future, backend)

Arguments

future

StarburstFuture object

backend

Backend/plan object


Submit task to session

Description

Submit task to session

Usage

submit_to_session(session, expr, envir, substitute, globals, packages)

Suggest appropriate quota based on needs

Description

Suggest appropriate quota based on needs

Usage

suggest_quota(vcpus_needed)

Arguments

vcpus_needed

vCPUs needed

Value

Suggested quota value


Task failure error

Description

Task failure error

Usage

task_failure_error(task_id, reason = NULL, exit_code = NULL, log_stream = NULL)

Update session manifest atomically

Description

Uses S3 ETags for optimistic locking to prevent race conditions.

Usage

update_session_manifest(session_id, updates, region, bucket, max_retries = 3)

Arguments

session_id

Session identifier

updates

Named list of fields to update

region

AWS region

bucket

S3 bucket

max_retries

Maximum number of retry attempts (default: 3)

Value

Invisibly returns updated manifest


Update task status with atomic write

Description

Update task status with atomic write

Usage

update_task_status(
  session_id,
  task_id,
  state,
  etag = NULL,
  region,
  bucket,
  updates = list()
)

Arguments

session_id

Session identifier

task_id

Task identifier

state

New state

etag

Optional ETag for conditional write (atomic claiming)

region

AWS region

bucket

S3 bucket

updates

Optional additional fields to update

Value

TRUE if successful, FALSE if conditional write failed


Upload task for detached session

Description

Upload task for detached session

Usage

upload_detached_task(task_id, task_data, backend)

Arguments

task_id

Task identifier

task_data

Task data (expr, globals, packages, session_id)

backend

Backend environment

Value

Invisibly returns NULL


Utility functions for staRburst

Description

Utility functions for staRburst


Retry AWS operations with exponential backoff

Description

Wraps AWS API calls with automatic retry logic for transient failures. Uses exponential backoff with jitter to avoid thundering herd.

Usage

with_aws_retry(
  expr,
  max_attempts = 3,
  base_delay = 1,
  max_delay = 60,
  retryable_errors = c("Throttling", "ThrottlingException", "RequestTimeout",
    "ServiceUnavailable", "InternalError", "InternalServerError", "TooManyRequests",
    "RequestLimitExceeded", "5\\d{2}"),
  operation_name = "AWS operation"
)

Arguments

expr

Expression to evaluate (AWS API call)

max_attempts

Maximum retry attempts (default: 3)

base_delay

Initial delay in seconds (default: 1)

max_delay

Maximum delay in seconds (default: 60)

retryable_errors

Regex patterns for retryable error messages

operation_name

Optional name for logging (default: "AWS operation")

Value

Result of expression


Retry ECR operations

Description

Specialized wrapper for ECR operations with ECR-specific retry patterns

Usage

with_ecr_retry(expr, max_attempts = 3, operation_name = "ECR operation")

Arguments

expr

Expression to evaluate (AWS API call)

max_attempts

Maximum retry attempts (default: 3)

operation_name

Optional name for logging (default: "AWS operation")


Retry ECS operations

Description

Specialized wrapper for ECS operations with ECS-specific retry patterns

Usage

with_ecs_retry(expr, max_attempts = 3, operation_name = "ECS operation")

Arguments

expr

Expression to evaluate (AWS API call)

max_attempts

Maximum retry attempts (default: 3)

operation_name

Optional name for logging (default: "AWS operation")


Retry S3 operations

Description

Specialized wrapper for S3 operations with S3-specific retry patterns

Usage

with_s3_retry(expr, max_attempts = 3, operation_name = "S3 operation")

Arguments

expr

Expression to evaluate (AWS API call)

max_attempts

Maximum retry attempts (default: 3)

operation_name

Optional name for logging (default: "AWS operation")


Worker validation error

Description

Worker validation error

Usage

worker_validation_error(workers, max_allowed = 500)

mirror server hosted at Truenetwork, Russian Federation.