| Type: | Package |
| Title: | Base R Code Formatter |
| Version: | 0.1.0 |
| Description: | A minimal R code formatter following base R style conventions. Formats R code with consistent spacing, indentation, and structure. |
| License: | GPL-3 |
| URL: | https://github.com/cornball-ai/rformat |
| BugReports: | https://github.com/cornball-ai/rformat/issues |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | Rcpp |
| LinkingTo: | Rcpp |
| Suggests: | tinytest, simplermarkdown |
| VignetteBuilder: | simplermarkdown |
| NeedsCompilation: | yes |
| Packaged: | 2026-03-05 02:46:29 UTC; troy |
| Author: | Troy Hernandez |
| Maintainer: | Troy Hernandez <troy@cornball.ai> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-09 16:30:09 UTC |
Add Control Braces (AST Version)
Description
Finds bare control flow bodies (if/for/while/repeat without braces) and transforms them according to the specified mode. Modes: - 'TRUE' / '"single"': Add braces, keep on one line if short enough. - '"multi"': Add braces, force multi-line. - '"next_line"': Move same-line body to next line (no braces). - '"same_line"': Move next-line body to same line; strip single-stmt braces.
Usage
add_control_braces(terms, mode = "single", indent_str = " ", line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
mode |
Control brace mode. |
indent_str |
Indent string (for line width calculations). |
line_limit |
Maximum line width. |
Value
Updated DataFrame.
Compute Display Width of an Output Line
Description
Sums token text widths plus inter-token spaces for a given output line.
Usage
ast_line_width(terms, line_num, indent_str)
Arguments
terms |
Enriched terminal DataFrame (sorted by out_line, out_order). |
line_num |
The output line number. |
indent_str |
Indent string (e.g., '" "' for 4 spaces). |
Value
Display width of the line.
Build Line Index
Description
Creates a named list mapping output line numbers to row indices in the terms DataFrame. Avoids repeated 'which(terms$out_line == ln)' scans.
Usage
build_line_index(terms)
Arguments
terms |
Enriched terminal DataFrame. |
Value
Named list where names are line numbers (as strings) and values are integer vectors of row indices.
Get Tab-Expanded Line Length
Description
Returns the display width of a line, with tabs expanded to 8-column stops.
Usage
code_width(line)
Arguments
line |
A single line of text. |
Value
Display width of the line.
Convert Tab-Expanded Column to Character Position
Description
R's getParseData() reports columns with tabs expanded to 8-column tab stops. This function converts such a column back to a character position for use with substring().
Usage
col_to_charpos(line, col)
Arguments
line |
A single line of text. |
col |
Tab-expanded column position (1-based). |
Value
Character position (1-based) in the string.
Collapse Multi-Line Calls (AST Version)
Description
Finds multi-line parenthesized groups (function calls, control flow conditions) that would fit on one line and collapses them by setting all tokens' 'out_line' to the opening line.
Usage
collapse_calls(terms, indent_str, line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string. |
line_limit |
Maximum line length. |
Value
Updated DataFrame.
Compute Indent at a Column Position
Description
Walks tokens on a line up to a given column, tracking braces and parens exactly as compute_nesting does. Returns the indent level that a hypothetical continuation line would receive.
Usage
compute_indent_at_col(nesting, line_toks, line_num, break_col)
Arguments
nesting |
Result from compute_nesting(). |
line_toks |
Tokens on the line. |
line_num |
Line number. |
break_col |
Column position to stop at (inclusive). |
Value
Integer indent level.
Compute Nesting Depth Per Line
Description
Shared function used by 'format_tokens' and wrap passes to compute identical depth-based indent levels from the parse tree.
Usage
compute_nesting(terminals, n_lines)
Arguments
terminals |
Terminal token data frame from 'getParseData()', ordered by 'line1, col1'. |
n_lines |
Number of source lines. |
Value
Named list with 'line_indent', 'line_end_brace', 'line_end_paren', 'line_end_pab' (all integer vectors of length 'n_lines').
Enrich Terminal Tokens for AST-Based Formatting
Description
Parses code and returns an enriched terminal-token DataFrame with per-token nesting state and output metadata. This is the foundation of the parse-once architecture: parse once, enrich once, transform the DataFrame through all passes, serialize to text once at the end.
Usage
enrich_terminals(pd, orig_lines)
Arguments
pd |
Parse data from 'getParseData()'. |
orig_lines |
Original source lines (split by newline). |
Value
Enriched terminal-token DataFrame with added columns: 'out_line', 'out_order', 'out_text', 'brace_depth', 'paren_depth', 'pab', 'nesting_level'.
Expand Bare If-Else in Function Call Arguments (AST Version)
Description
Finds bare 'if (cond) expr else expr' arguments inside function calls on overlong lines and expands them to braced multi-line form.
Usage
expand_call_if_args(terms, indent_str = " ", line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string. |
line_limit |
Maximum line width. |
Value
Updated DataFrame.
Extract Expression Text from Source Lines
Description
Extract original text for a multi-line expression and re-indent it.
Usage
extract_expr_text(lines, tokens, target_indent)
Arguments
lines |
Source code lines. |
tokens |
Token data frame for the expression. |
target_indent |
Target indentation string for continuation lines. |
Value
Expression text with first line unindented, continuation lines re-indented.
Check if a Body Token Range is a Complete Statement
Description
Returns FALSE if the body has unclosed parens/brackets or ends with an operator that expects a continuation (assignment, binary ops).
Usage
find_bare_body_end(terms, body_start)
Arguments
terms |
Enriched terminal DataFrame. |
body_start |
Integer row index where the bare body begins. |
Value
Integer row index of the last token in the bare body.
Find Token Position in Formatted Line Output
Description
Computes the 1-based character position where the token at index 'idx' starts in the output of 'format_line_tokens(tokens)'. This replays the spacing logic to determine the exact output column.
Usage
find_token_pos_in_formatted(tokens, idx)
Arguments
tokens |
Data frame of tokens for one line (ordered by col1). |
idx |
Index into 'tokens' of the target token. |
Value
1-based character position of that token in the formatted output.
Fix Else Placement
Description
Ensures 'else' appears on the same line as the closing brace.
Usage
fix_else_placement(code)
Arguments
code |
Code string. |
Value
Code with corrected else placement.
Format Blank Lines
Description
Normalize blank lines between code blocks.
Usage
format_blank_lines(code)
Arguments
code |
Code string. |
Value
Code with normalized blank lines.
Format Tokens on a Single Line
Description
Format Tokens on a Single Line
Usage
format_line_tokens(tokens, prev_token = NULL, prev_prev_token = NULL)
Arguments
tokens |
Data frame of tokens for one line. |
prev_token |
Optional token to treat as the previous token when formatting a token subset (e.g., suffix after a collapsed call). |
prev_prev_token |
Optional token before prev_token for unary detection. |
Value
Formatted line content (no leading whitespace).
AST-Based Format Pipeline
Description
Single-pass pipeline: parse once, enrich the terminal DataFrame, run all transforms as DataFrame operations, serialize to text once.
Usage
format_pipeline(code, indent, wrap, expand_if, brace_style, line_limit,
function_space = FALSE, control_braces = FALSE, join_else = TRUE)
Arguments
code |
Code string for one top-level expression. |
indent |
Indent string or integer. |
wrap |
Continuation style: '"paren"' or '"fixed"'. |
expand_if |
Whether to expand all inline if-else. |
brace_style |
'"kr"' or '"allman"'. |
line_limit |
Maximum line length. |
function_space |
Add space after 'function'. |
control_braces |
Control brace mode. |
join_else |
If TRUE, move else to same line as preceding '}'. |
Value
Formatted code string.
Format R Code Using Token-Based Parsing
Description
Internal function to format R code using getParseData tokens. Calculates proper indentation based on nesting depth.
Usage
format_tokens(code, indent = 4L, wrap = "paren", expand_if = FALSE,
brace_style = "kr", line_limit = 80L, function_space = FALSE,
control_braces = FALSE, join_else = TRUE)
Arguments
code |
Character string of R code. |
indent |
Integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style). |
wrap |
Continuation style: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent. |
expand_if |
Expand inline if-else to multi-line (default FALSE). |
brace_style |
Brace placement: '"kr"' (same line) or '"allman"' (new line). |
line_limit |
Maximum line length before wrapping (default 80). |
function_space |
If TRUE, add space before '(' in function definitions. |
control_braces |
If TRUE, add braces to bare one-line control flow bodies. |
join_else |
If TRUE, move else to same line as preceding '}'. |
Value
Formatted code as character string.
Insert Synthetic Tokens into the DataFrame
Description
Adds new token rows (e.g., for brace insertion). New tokens get unique IDs starting from 'max(existing_id) + 1'.
Usage
insert_tokens(terms, new_rows)
Arguments
terms |
Enriched terminal DataFrame. |
new_rows |
Data frame of new tokens to insert. Must have at minimum: 'token', 'out_text', 'out_line', 'out_order'. Other columns will be filled with defaults. |
Value
Updated DataFrame with new rows appended.
Join Else to Preceding Close Brace
Description
AST transform that moves ELSE tokens (and any following tokens on the same line, like 'if' in 'else if') to the same output line as the preceding '}'. Skips if a COMMENT exists between '}' and 'else', or if joining would exceed the line limit.
Usage
join_else_transform(terms, indent_str, line_limit)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string for line width calculation. |
line_limit |
Maximum line width. |
Value
Updated DataFrame.
Look Up Row Indices for a Line
Description
Look Up Row Indices for a Line
Usage
line_index_get(lidx, line_num)
Arguments
lidx |
Line index from 'build_line_index()'. |
line_num |
Output line number. |
Value
Integer vector of row indices, or 'integer(0)' if none.
Compute Display Width Using Line Index
Description
Like 'ast_line_width()' but uses a pre-built line index for O(1) lookup.
Usage
line_index_width(terms, lidx, line_num, indent_str)
Arguments
terms |
Enriched terminal DataFrame. |
lidx |
Line index from 'build_line_index()'. |
line_num |
The output line number. |
indent_str |
Indent string. |
Value
Display width of the line.
Create a Synthetic Token Row
Description
Helper to build a single token row for insertion.
Usage
make_token(token, text, out_line, out_order, parent = 0L)
Arguments
token |
Token type string (e.g., ‘"’{'"‘, '"’}'"‘, '"’,'"') |
text |
Token text (e.g., '"{"', '"}"', '","') |
out_line |
Target output line. |
out_order |
Sort order within the line. |
parent |
Parent node ID (default 0). |
Value
Single-row data frame.
Determine If Space Needed Between Tokens
Description
Determine If Space Needed Between Tokens
Usage
needs_space(prev, tok, prev_prev = NULL)
Arguments
prev |
Previous token (data frame row). |
tok |
Current token (data frame row). |
prev_prev |
Token before prev (data frame row or NULL), for unary detection. |
Value
Logical.
Recompute Nesting State After Structural Changes
Description
Re-walks terminals and refreshes 'brace_depth', 'paren_depth', 'pab', and 'nesting_level' columns. Call after brace insertion, token removal, or any structural transform.
Usage
recompute_nesting(terms)
Arguments
terms |
Enriched terminal DataFrame. |
Value
Updated DataFrame with refreshed nesting columns.
Reformat Function Definitions (AST Version)
Description
Rewrites named function signatures to fit within the line limit. Short signatures go on one line; long ones wrap at commas with paren-aligned or fixed continuation indent. Operates on the DataFrame directly, avoiding the serialize/re-parse cycle that caused idempotency oscillation.
Usage
reformat_function_defs(terms, indent_str = " ", wrap = "paren",
brace_style = "kr", line_limit = 80L,
function_space = FALSE)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string (e.g., '" "'). |
wrap |
Continuation style: '"paren"' or '"fixed"'. |
brace_style |
'"kr"' or '"allman"'. |
line_limit |
Maximum line length. |
function_space |
Whether to add space after 'function'. |
Value
Updated DataFrame.
Reformat Inline If-Else Assignments (AST Version)
Description
Finds 'var <- if (cond) true_expr else false_expr' patterns and expands them to braced multi-line form with duplicated assignment: if (cond) { var <- true_expr } else { var <- false_expr }
Usage
reformat_inline_if(terms, indent_str = " ", line_limit = 0L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string. |
line_limit |
Maximum line width. Use 0 to expand all. |
Value
Updated DataFrame.
Renumber Output Lines Sequentially
Description
After transforms that insert or remove lines, renumber 'out_line' so values are sequential starting from 1, preserving relative order and gaps for blank lines.
Usage
renumber_lines(terms)
Arguments
terms |
Enriched terminal DataFrame. |
Value
Updated DataFrame with renumbered 'out_line'.
Restore Truncated String Constant Token Text
Description
'utils::getParseData()' truncates long 'STR_CONST' token text. Reconstruct the original literal from source lines so token-based rewrite passes can round-trip long strings without introducing parse-invalid placeholders.
Usage
restore_truncated_str_const_tokens(terminals, orig_lines)
Arguments
terminals |
Terminal token data frame from 'getParseData()'. |
orig_lines |
Original source lines. |
Value
'terminals' with long 'STR_CONST' text restored.
Format R Code
Description
Format R code string according to base R style conventions.
Usage
rformat(code, indent = 4L, line_limit = 80L, wrap = "paren",
brace_style = "kr", control_braces = FALSE, expand_if = FALSE,
else_same_line = TRUE, function_space = FALSE, join_else = TRUE)
Arguments
code |
Character string of R code to format. |
indent |
Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style). |
line_limit |
Maximum line length before wrapping (default 80). |
wrap |
Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent. |
brace_style |
Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line. |
control_braces |
If TRUE, add braces to bare one-line control flow bodies (e.g., 'if (x) y' becomes 'if (x) { y }'). Default FALSE matches R Core source code where 59% of control flow bodies are bare. |
expand_if |
Expand inline if-else to multi-line (default FALSE). |
else_same_line |
If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting. When FALSE, unparseable input is returned unchanged with a warning. |
function_space |
If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code. |
join_else |
If TRUE (default), move 'else' to the same line as the preceding '}': '} else {'. Matches R Core source code where 70% use same-line else. When FALSE, '}\nelse' on separate lines is preserved. |
Value
Formatted code as a character string.
Examples
# Basic formatting: spacing around operators
rformat("x<-1+2")
# Add braces to bare control-flow bodies
rformat("if(x>0) y<-1", control_braces = TRUE)
# Expand inline if-else to multi-line
rformat("x <- if (a) b else c", expand_if = TRUE)
# Wrap long function signatures (default: paren-aligned)
long_sig <- paste0(
"f <- function(alpha, beta, gamma, delta, ",
"epsilon, zeta, eta) {\n 1\n}")
cat(rformat(long_sig), sep = "\n")
# Wrap with fixed 8-space continuation indent
cat(rformat(long_sig, wrap = "fixed"), sep = "\n")
# Allman brace style
rformat("f <- function(x) { x }", brace_style = "allman")
Format R Files in Directory
Description
Format all R files in a directory.
Usage
rformat_dir(path = ".", recursive = TRUE, dry_run = FALSE, indent = 4L,
line_limit = 80L, wrap = "paren", brace_style = "kr",
control_braces = FALSE, expand_if = FALSE, else_same_line = TRUE,
function_space = FALSE, join_else = TRUE)
Arguments
path |
Path to directory. |
recursive |
If TRUE, process subdirectories. |
dry_run |
If TRUE, report changes without writing. |
indent |
Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style). |
line_limit |
Maximum line length before wrapping (default 80). |
wrap |
Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent. |
brace_style |
Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line. |
control_braces |
If TRUE, add braces to bare one-line control flow bodies. Default FALSE matches R Core majority style. |
expand_if |
Expand inline if-else to multi-line (default FALSE). |
else_same_line |
If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting. |
function_space |
If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code. |
join_else |
If TRUE (default), move 'else' to the same line as the preceding '}'. |
Value
Invisibly returns vector of modified file paths.
Examples
# Format all R files in a directory (dry run)
d <- tempfile()
dir.create(d)
writeLines("x<-1", file.path(d, "test.R"))
rformat_dir(d, dry_run = TRUE)
# Format and overwrite
rformat_dir(d)
unlink(d, recursive = TRUE)
Format R File
Description
Format an R file in place or write to a new file.
Usage
rformat_file(path, output = NULL, dry_run = FALSE, indent = 4L,
line_limit = 80L, wrap = "paren", brace_style = "kr",
control_braces = FALSE, expand_if = FALSE, else_same_line = TRUE,
function_space = FALSE, join_else = TRUE)
Arguments
path |
Path to R file. |
output |
Optional output path. If NULL, overwrites input file. |
dry_run |
If TRUE, return formatted code without writing. |
indent |
Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style). |
line_limit |
Maximum line length before wrapping (default 80). |
wrap |
Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent. |
brace_style |
Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line. |
control_braces |
If TRUE, add braces to bare one-line control flow bodies. Default FALSE matches R Core majority style. |
expand_if |
Expand inline if-else to multi-line (default FALSE). |
else_same_line |
If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting. |
function_space |
If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code. |
join_else |
If TRUE (default), move 'else' to the same line as the preceding '}'. |
Value
Invisibly returns formatted code.
Examples
# Format a file (dry run to see result without writing)
f <- tempfile(fileext = ".R")
writeLines("x<-1+2", f)
rformat_file(f, dry_run = TRUE)
# Format and overwrite
rformat_file(f)
readLines(f)
unlink(f)
Serialize Enriched Tokens to Formatted Code
Description
Converts the enriched terminal DataFrame to a formatted code string. This is the final step: tokens are emitted in '(out_line, out_order)' order with proper indentation and spacing.
Usage
serialize_tokens(terms, indent_str, wrap = "paren", line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string (e.g., '" "' for 4 spaces). |
wrap |
Continuation style: '"paren"' or '"fixed"'. |
line_limit |
Maximum line length. |
Value
Formatted code string.
Split Code into Top-Level Expressions
Description
Parses code to find top-level expressions, returning a list of chunks. Each chunk is either an expression (code string) or an inter-expression gap (comments, blank lines). Chunks concatenate back to the original.
Usage
split_toplevel(code)
Arguments
code |
Character string of R code. |
Value
List of 'list(text = "...", is_expr = TRUE/FALSE)' pairs.
Compute Indent Level for a Token
Description
Returns the depth-based indent level that should apply to a token's line. For closing tokens (‘}', ')', ']'), the indent is one less than the token’s own nesting level (they outdent to match their opening counterpart).
Usage
token_indent_level(terms, idx)
Arguments
terms |
Enriched terminal DataFrame. |
idx |
Index of the token (must be first on its line for indent). |
Value
Integer indent level.
Wrap Long Function Calls at Commas (AST Version)
Description
Finds single-line function calls on overlong lines and wraps them at commas. Continuation lines get depth-based indentation (or paren-aligned if 'wrap = "paren"').
Usage
wrap_long_calls(terms, indent_str, wrap = "paren", line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string. |
wrap |
Continuation style: '"paren"' or '"fixed"'. |
line_limit |
Maximum line length. |
Value
Updated DataFrame.
Wrap Long Lines at Operators (AST Version)
Description
Finds overlong lines and breaks them after logical operators ('||', '&&', '|', '&'). Continuation lines get depth-based indentation.
Usage
wrap_long_operators(terms, indent_str, line_limit = 80L)
Arguments
terms |
Enriched terminal DataFrame. |
indent_str |
Indent string. |
line_limit |
Maximum line length. |
Value
Updated DataFrame.