Writing Your Own Checks

library(checktor)

checktor ships about thirty diagnostics, but every team has house rules too local to upstream: a function you have banned, a header you insist on, a habit you keep relapsing into. This vignette is for those. It walks through the handful of helpers in R/ast.R and shows how to author a new check against the parsed syntax tree in a few lines of XPath, with the orchestrator handling the bookkeeping.

The shape of a check

Every diagnostic function follows the same contract:

diagnose_<name> <- function(path, verbose = TRUE, parsed = NULL) {
  if (is.null(parsed)) parsed <- read_r_xml(path)
  if (length(parsed) == 0L) {
    return(checktor_check_result(TRUE, character(0), "<message>"))
  }
  # ... XPath logic ...
  checktor_check_result(passed, issues, "<message>")
}

The parsed argument is an optional parse-cache: when checktor() runs all code-side checks together it parses each file once and passes the cache to every check via this internal argument, so 13 checks against a 200-file package mean 200 parses, not 2600.

Helpers in R/ast.R

read_r_xml(path)

Start here: this is what makes your sources queryable. It parses every R/*.R file in the package and returns a named list of list(file, xml, error). A parse failure becomes an error slot instead of crashing the run.

parsed <- read_r_xml(".")
str(parsed[[1]])
#> List of 3
#>  $ file : chr "R/foo.R"
#>  $ xml  : xml_document
#>  $ error: NULL

The xml slot is an xml2 document produced by xmlparsedata::xml_parse_data(). Every parse-tree token is an XML element with line1, col1, line2, col2 attributes.

xpath_lints(parsed, xpath, label = NULL)

The workhorse. Give it an XPath query, get back "basename:line" strings for every match across every file, ready to hand to a check result’s $issues. The optional label appears in parens after each hit.

hits <- xpath_lints(parsed,
                    "//SYMBOL_FUNCTION_CALL[text() = 'set.seed']")
#> "foo.R:42" "bar.R:17"

undesirable_function_check(parsed, funs, label = TRUE)

The most common pattern, “flag any call to function X”, has a canned helper:

issues <- undesirable_function_check(parsed,
                                     c("install.packages", "browser"))

This is checktor’s equivalent of lintr::undesirable_function_linter().

not_under_fn_with_call_xpath(funs)

Returns an XPath predicate that restricts hits to nodes whose innermost enclosing function-body doesn’t also contain a call to any of funs. This is how option_changes enforces that options() is guarded by a sibling on.exit() in the same function, and the “innermost” part is what makes it correct on nested functions where on.exit in the outer function wouldn’t cover an inner one.

predicate <- not_under_fn_with_call_xpath(c("on.exit", "local_options"))
xpath <- paste0(
  "//SYMBOL_FUNCTION_CALL[text() = 'options']",
  "[", predicate, "]"
)

extract_rd_section(rd, tag) and collect_rd_text(node, skip)

Walking .Rd files structurally via tools::parse_Rd():

rd <- tools::parse_Rd("man/my_fn.Rd")
ex <- extract_rd_section(rd, "\\examples")
collect_rd_text(ex, skip = "\\dontrun")

Walked example: Sys.setenv() without cleanup

Suppose we want a check that flags any Sys.setenv() call whose enclosing function doesn’t also call on.exit(Sys.unsetenv(...)) or withr::local_envvar(). This is the same shape as diagnose_option_changes and ships in checktor as diagnose_sys_setenv_no_reset. Here is the essential shape:

diagnose_sys_setenv_no_reset <- function(path, verbose = TRUE,
                                         parsed = NULL) {
  if (is.null(parsed)) parsed <- read_r_xml(path)
  if (length(parsed) == 0L) {
    return(checktor_check_result(TRUE, character(0),
                                 "Sys.setenv reset check"))
  }
  xpath <- paste0(
    "//SYMBOL_FUNCTION_CALL[text() = 'Sys.setenv'][",
    "  ", not_under_fn_with_call_xpath(c(
        "on.exit",
        "Sys.unsetenv",
        "local_envvar", "with_envvar"
      )),
    "]"
  )
  issues <- xpath_lints(parsed, xpath)
  passed <- length(issues) == 0L
  # a shipped check also calls emit_issue_summary(issues, verbose, ...) here
  # to print the cli summary when verbose = TRUE
  checktor_check_result(passed, issues, "Sys.setenv reset check")
}

Twenty lines, and the interesting one is the XPath predicate. Everything else is bookkeeping shared with every other check.

The xmlparsedata XML structure

A call fn(a, b = 1) parses to:

<expr>                              <!-- call expr -->
  <expr>                            <!-- function-name expr -->
    <SYMBOL_FUNCTION_CALL>fn</SYMBOL_FUNCTION_CALL>
  </expr>
  <OP-LEFT-PAREN>(
  <expr><SYMBOL>a</SYMBOL></expr>   <!-- first positional arg -->
  <OP-COMMA>,
  <SYMBOL_SUB>b</SYMBOL_SUB>        <!-- named-arg name -->
  <EQ_SUB>=</EQ_SUB>
  <expr><NUM_CONST>1</NUM_CONST></expr>  <!-- named-arg value -->
  <OP-RIGHT-PAREN>)
</expr>

When you anchor on a SYMBOL_FUNCTION_CALL:

A common bug is treating parent::expr as the call expr; it is actually the function-name wrapper, which has only one child (the SYMBOL_FUNCTION_CALL itself).

Trying it out

# Parse a file
parsed <- read_r_xml("path/to/package")

# Find every call to install.packages()
xpath_lints(parsed,
            "//SYMBOL_FUNCTION_CALL[text() = 'install.packages']")

To plug a new check into checktor(), add a diagnose_<name> function to the appropriate R/diagnostics-*.R file and register it in that file’s run_checks(list(...), path, verbose) call as a closure that forwards the cache: my_check = function(p, v) diagnose_my_check(p, v, parsed = parsed). That closure is what lets your check share the parse-once cache; the orchestrator handles error catching and $passed bookkeeping for you.

Conclusion

Building on the parsed syntax tree buys the property that makes checktor trustworthy: a pattern sitting in a string literal or a comment is a different kind of node than a real call, so it never false-positives. Write the XPath, let run_checks() carry the rest, and your house rule is enforced as rigorously as the checks that ship in the box.

See also