| Title: | Query the FDA Global Substance Registration System (GSRS) API |
| Version: | 0.1.0 |
| Description: | Provides functions to query the FDA Global Substance Registration System (GSRS) REST API (https://gsrs.ncats.nih.gov/api/v1/). Enables programmatic access to substance records, UNII identifiers, synonyms, external codes, and chemical structures for over 170,000 registered substances. |
| License: | MIT + file LICENSE |
| URL: | https://c1au6i0.github.io/rgsrs/, https://github.com/c1au6i0/rgsrs |
| BugReports: | https://github.com/c1au6i0/rgsrs/issues |
| Depends: | R (≥ 4.1) |
| Imports: | cli, httr2, janitor, pingr |
| Suggests: | fs, openxlsx, spelling, testthat (≥ 3.0.0), withr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-01 11:28:15 UTC; heverz |
| Author: | Claudio Zanettini |
| Maintainer: | Claudio Zanettini <claudio.zanettini@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-05 15:08:24 UTC |
Retrieve comprehensive GSRS data for a set of UNIIs
Description
Convenience wrapper that calls gsrs_substance(), gsrs_names(),
gsrs_codes(), gsrs_structure(), and gsrs_hierarchy() in sequence and
returns a named list containing all five data frames. Each sub-function uses
with_graceful_exit internally, so partial failures return NULL for that
element without aborting the whole call.
Usage
gsrs_all(unii, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups.
Default |
Value
A named list with five elements:
- substance
Data frame from
gsrs_substance().- names
Data frame from
gsrs_names().- codes
Data frame from
gsrs_codes().- structure
Data frame from
gsrs_structure().- hierarchy
Data frame from
gsrs_hierarchy().
Returns NULL on error (with a warning).
See Also
gsrs_substance(), gsrs_names(), gsrs_codes(),
gsrs_structure(), gsrs_hierarchy()
Examples
Sys.sleep(2)
out <- gsrs_all("R16CO5Y76E") # aspirin
if (!is.null(out)) {
print(out$substance)
print(head(out$names))
print(head(out$codes))
print(out$structure[, c("smiles", "formula", "mwt", "inchi_key")])
print(out$hierarchy[, c("depth", "type", "approval_id", "name")])
}
Browse all substance records in GSRS
Description
Retrieves a paginated list of all substance records from
GET /api/v1/substances. Useful for bulk workflows or building a local
catalogue. Use top and skip to page through the ~170,000 available
records, or set top = Inf to fetch all (slow — use with care).
Usage
gsrs_browse(top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)
Arguments
top |
Integer. Maximum number of records to return per request.
Default |
skip |
Integer. Number of records to skip (offset). Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests when
|
Value
A data frame with the same columns as gsrs_search().
Returns NULL on error (with a warning).
See Also
gsrs_search(), gsrs_substance()
Examples
Sys.sleep(2)
# Fetch the first 5 substance records
out <- gsrs_browse(top = 5, verbose = FALSE)
if (!is.null(out)) print(out[, c("approval_id", "preferred_name",
"substance_class")])
Retrieve chemical structure information by substance name or CAS number
Description
A convenience wrapper that resolves one or more substance identifiers to GSRS UNIIs and then fetches the embedded chemical structure data for each substance. The result is one wide row per input identifier containing both the resolved metadata and the full structure record.
Usage
gsrs_chem_info(
identifiers,
type = c("name", "cas", "unii", "inchikey", "smiles"),
verbose = TRUE,
delay = 0.5
)
Arguments
identifiers |
Character vector of substance identifiers. |
type |
Character scalar. The identifier type. One of:
|
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual API calls.
Default |
Value
A data frame with one row per input identifier and columns:
- query
The identifier supplied by the caller.
- type
The identifier type (
"name"or"cas").- unii
Resolved UNII / approval ID.
- preferred_name
Preferred display name in GSRS.
- substance_class
Substance class (e.g.,
"chemical").- smiles
Canonical SMILES string.
- formula
Molecular formula (e.g.,
"C9H8O4").- mwt
Molecular weight (numeric).
- inchi_key
Standard InChIKey.
- inchi
Full InChI string.
- stereochemistry
Stereochemistry descriptor.
- optical_activity
Optical activity descriptor.
- charge
Formal charge (integer).
- stereo_centers
Number of stereocenters.
- defined_stereo
Number of defined stereocenters.
- ez_centers
Number of E/Z double-bond stereocenters.
- molfile
MDL molfile as a string.
- date_retrieved
Date the structure response was received.
Unresolved identifiers or non-chemical substances produce a row of NAs
with query and type set. Returns NULL on error (with a warning).
See Also
gsrs_structure(), gsrs_unii_from_name(), gsrs_codes(),
gsrs_structure_search()
Examples
Sys.sleep(2)
out <- gsrs_chem_info(c("aspirin", "ibuprofen"), type = "name")
if (!is.null(out)) print(out[, c("query", "unii", "formula", "mwt")])
Sys.sleep(2)
out_cas <- gsrs_chem_info(c("50-78-2", "15687-27-1"), type = "cas")
if (!is.null(out_cas)) print(out_cas[, c("query", "unii", "formula", "mwt")])
Sys.sleep(2)
out_unii <- gsrs_chem_info("R16CO5Y76E", type = "unii")
if (!is.null(out_unii)) print(out_unii[, c("query", "formula", "mwt")])
Sys.sleep(2)
out_ik <- gsrs_chem_info("BSYNRYMUTXBXSQ-UHFFFAOYSA-N", type = "inchikey")
if (!is.null(out_ik)) print(out_ik[, c("query", "unii", "formula")])
Sys.sleep(2)
out_smi <- gsrs_chem_info("CC(=O)Oc1ccccc1C(=O)O", type = "smiles")
if (!is.null(out_smi)) print(out_smi[, c("query", "unii", "formula")])
Retrieve external codes and identifiers for GSRS substances
Description
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/codes and
returns all registered cross-references as a tidy data frame. These include
CAS numbers, PubChem CIDs, ChEMBL IDs, WHO-ATC codes, NDF-RT codes,
DrugBank IDs, and many more.
Usage
gsrs_codes(unii, code_system = NULL, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes. |
code_system |
Character vector of code systems to filter on
(e.g., |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
Value
A data frame with columns:
- code_system
External database / code system name (e.g.,
"CAS","PUBCHEM","ChEMBL","WHO-ATC").- code
The identifier in that system.
- type
"PRIMARY"or"ALTERNATIVE".- url
URL to the external record (when available).
- comments
Additional context for the code (e.g., ATC path).
- is_classification
Logical;
TRUEfor classification codes.- uuid
Internal GSRS UUID for the code record.
- date_retrieved
Date the response was received.
- query
The UNII supplied by the caller.
Returns NULL on error (with a warning).
See Also
gsrs_substance(), gsrs_names(), gsrs_search()
Examples
Sys.sleep(2)
# All codes for aspirin
out <- gsrs_codes("R16CO5Y76E")
if (!is.null(out)) print(head(out))
Sys.sleep(2)
# Only CAS and PubChem codes
out_cas <- gsrs_codes("R16CO5Y76E", code_system = c("CAS", "PUBCHEM"))
if (!is.null(out_cas)) print(out_cas)
Retrieve the relationship hierarchy for GSRS substances
Description
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/@hierarchy
and returns the flat parent/child relationship tree as a tidy data frame.
This is useful for navigating relationships such as salt forms to free base,
active metabolites, or component substances.
Usage
gsrs_hierarchy(unii, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
Value
A data frame with columns:
- node_id
Node identifier within the hierarchy tree (string index).
- parent_id
Parent node identifier (
"#"for root nodes).- depth
Depth in the tree (0 = root).
- type
Node type (e.g.,
"ROOT","ACTIVE MOIETY","SALT/SOLVATE").- text
Human-readable label including UNII and name.
- expandable
Logical;
TRUEif node has children.- approval_id
UNII of the substance at this node.
- name
Preferred name at this node.
- ref_uuid
Internal GSRS UUID of the related substance.
- substance_class
Substance class at this node.
- deprecated
Logical;
TRUEif the node substance is deprecated.- date_retrieved
Date the response was received.
- query
The UNII supplied by the caller.
Returns NULL on error (with a warning).
See Also
Examples
Sys.sleep(2)
out <- gsrs_hierarchy("R16CO5Y76E") # aspirin
if (!is.null(out)) print(out[, c("depth", "type", "approval_id", "name")])
Retrieve all names (synonyms) for GSRS substances
Description
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/names and
returns every registered name record as a tidy data frame row.
Usage
gsrs_names(unii, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
Value
A data frame with columns:
- name
The name string.
- std_name
Standardised (uppercased) name.
- type
Name type code (e.g.,
"bn"brand name,"cn"common name,"sys"systematic name,"of"official name).- preferred
Logical;
TRUEwhen this is the preferred name.- display_name
Logical;
TRUEwhen this name is shown by default.- languages
Semicolon-separated language codes.
- domains
Semicolon-separated domain tags.
- uuid
Internal GSRS UUID for the name record.
- date_retrieved
Date the response was received.
- query
The UNII supplied by the caller.
Returns NULL on error (with a warning).
See Also
gsrs_substance(), gsrs_codes(), gsrs_search()
Examples
Sys.sleep(2)
out <- gsrs_names("R16CO5Y76E") # aspirin
if (!is.null(out)) print(head(out))
Search the GSRS substance database
Description
Searches the FDA Global Substance Registration System (GSRS) using a free-text or Lucene-style field query. Returns a tidy data frame of matching substance records with key metadata fields.
Usage
gsrs_search(query, top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)
Arguments
query |
Character string. The search query. Supports:
|
top |
Integer. Maximum number of records to return per request.
Default |
skip |
Integer. Number of records to skip (offset). Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests.
Default |
Value
A data frame with columns:
- uuid
Internal GSRS UUID of the substance.
- approval_id
FDA UNII / approval ID.
- preferred_name
Preferred display name.
- substance_class
Substance class (e.g.,
"chemical","structurallyDiverse").- status
Record status (e.g.,
"approved").- definition_type
"PRIMARY"or"ALTERNATIVE".- definition_level
"COMPLETE"or"INCOMPLETE".- version
Record version string.
- names_url
URL to retrieve all names for this substance.
- codes_url
URL to retrieve all codes for this substance.
- self_url
Full URL for this substance record.
- date_retrieved
Date the response was received from the server.
Returns NULL on error (with a warning).
See Also
gsrs_substance(), gsrs_names(), gsrs_codes()
Examples
Sys.sleep(2)
out <- gsrs_search("aspirin", top = 5)
if (!is.null(out)) print(head(out))
Retrieve chemical structure data for GSRS substances
Description
For each supplied UNII, fetches the full substance record from
GET /api/v1/substances(<UNII>) and extracts the embedded structure
object, returning chemical identifiers and properties as a tidy data frame.
Usage
gsrs_structure(unii, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
Value
A data frame with columns:
- smiles
Canonical SMILES string.
- formula
Molecular formula (e.g.,
"C9H8O4").- mwt
Molecular weight (numeric).
- inchi_key
Standard InChIKey.
- inchi
Full InChI string.
- stereochemistry
Stereochemistry descriptor (e.g.,
"ACHIRAL","RACEMIC","ABSOLUTE").- optical_activity
Optical activity (e.g.,
"UNSPECIFIED","(+)","(-)").- charge
Formal charge (integer).
- stereo_centers
Number of stereocenters.
- defined_stereo
Number of defined stereocenters.
- ez_centers
Number of E/Z double-bond stereocenters.
- molfile
MDL molfile as a string.
- date_retrieved
Date the response was received.
- query
The UNII supplied by the caller.
Non-chemical substances (proteins, polymers, etc.) return a row of NAs
with query set. Returns NULL on error (with a warning).
See Also
gsrs_substance(), gsrs_structure_search(), gsrs_names(),
gsrs_codes()
Examples
Sys.sleep(2)
out <- gsrs_structure("R16CO5Y76E") # aspirin
if (!is.null(out)) print(out[, c("smiles", "formula", "mwt", "inchi_key")])
Search GSRS by chemical structure
Description
Searches the FDA Global Substance Registration System for substances matching a chemical structure query supplied as a SMILES string. Supports substructure, similarity, exact-match, and flexible (disconnected moiety) search types.
Usage
gsrs_structure_search(
smiles,
type = c("sub", "sim", "exact", "flex"),
cutoff = 0.8,
top = 10L,
verbose = TRUE
)
Arguments
smiles |
Character string. A valid SMILES or SMARTS string describing
the query structure (e.g., |
type |
Character string. Search type. One of:
|
cutoff |
Numeric in |
top |
Integer. Maximum number of records to return. Default |
verbose |
Logical. If |
Value
A data frame with the same columns as gsrs_search(), plus a
query_smiles column recording the input SMILES. Returns NULL on error
(with a warning).
See Also
gsrs_structure(), gsrs_search()
Examples
Sys.sleep(2)
# Exact match for aspirin
out <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O", type = "exact")
if (!is.null(out)) print(out[, c("approval_id", "preferred_name")])
Sys.sleep(2)
# Similarity search
out_sim <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O",
type = "sim", cutoff = 0.7, top = 5)
if (!is.null(out_sim)) print(out_sim[, c("approval_id", "preferred_name")])
Fetch a GSRS substance record by UNII
Description
Retrieves the top-level metadata for a single substance identified by its
UNII (Unique Ingredient Identifier / approval ID). Internally this performs
a filtered search using root_approvalID:<unii>.
Usage
gsrs_substance(unii, verbose = TRUE, delay = 0.5)
Arguments
unii |
Character vector of one or more UNII codes
(e.g., |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
Value
A data frame with the same columns as gsrs_search(), with one row
per input UNII. Rows for unrecognised UNIIs will contain NA except for
the query column (which is always set to the input UNII). Returns
NULL on error (with a warning).
See Also
gsrs_search(), gsrs_names(), gsrs_codes()
Examples
Sys.sleep(2)
out <- gsrs_substance("R16CO5Y76E") # aspirin
if (!is.null(out)) print(out)
Look up UNII codes for substance names
Description
For each supplied name, queries GSRS using root_names:<name> and returns
the best-matching UNII together with the preferred substance name and
substance class. This is useful for converting common or systematic names
to the canonical FDA UNII identifier.
Usage
gsrs_unii_from_name(names, top = 1L, verbose = TRUE, delay = 0.5)
Arguments
names |
Character vector of substance names to resolve. |
top |
Integer. Maximum number of candidate records to consider per
name query. Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups.
Default |
Value
A data frame with columns:
- unii
The UNII / approval ID of the matched substance.
- preferred_name
Preferred display name in GSRS.
- substance_class
Substance class (e.g.,
"chemical").- status
Record status.
- uuid
Internal GSRS UUID.
- date_retrieved
Date the response was received.
- query
The name supplied by the caller.
Unresolved names produce a row of NAs with query set. Returns NULL
on error (with a warning).
See Also
gsrs_substance(), gsrs_search(), gsrs_names()
Examples
Sys.sleep(2)
out <- gsrs_unii_from_name(c("aspirin", "ibuprofen"))
if (!is.null(out)) print(out)
Retrieve controlled vocabulary terms from GSRS
Description
Fetches all (or a page of) controlled vocabulary entries from
GET /api/v1/vocabularies. The result is one row per vocabulary term,
with the parent domain and type attached to every row. This is useful for
understanding allowed values for fields such as name type, substance class,
relationship type, code system, and more.
Usage
gsrs_vocabularies(top = NULL, verbose = TRUE, delay = 0.5)
Arguments
top |
Integer. Maximum number of vocabulary domains to return per
request. Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests.
Default |
Value
A data frame with columns:
- domain
Vocabulary domain name (e.g.,
"NAME_TYPE","SUBSTANCE_CLASS","RELATIONSHIP_TYPE").- term_type
Vocabulary term type identifier.
- editable
Logical;
TRUEif the vocabulary can be extended.- filterable
Logical;
TRUEif the vocabulary supports filtering.- value
The controlled term value (used in the API/data).
- display
Human-readable display label for the term.
- hidden
Logical;
TRUEif the term is hidden from the UI.- selected
Logical;
TRUEif the term is selected by default.- date_retrieved
Date the response was received.
Returns NULL on error (with a warning).
See Also
Examples
Sys.sleep(2)
vocab <- gsrs_vocabularies(verbose = FALSE)
if (!is.null(vocab)) {
# See all name type values
print(vocab[vocab$domain == "NAME_TYPE", c("value", "display")])
}
Write a named list of data frames to an Excel workbook
Description
Each element of df_list is written to its own sheet. Requires the
openxlsx package (listed in Suggests).
Usage
write_dataframes_to_excel(df_list, filename)
Arguments
df_list |
A named list of data frames. |
filename |
Character string. Path to the output |
Value
Invisible filename.
Examples
tmp <- tempfile(fileext = ".xlsx")
write_dataframes_to_excel(list(sheet1 = mtcars, sheet2 = iris), tmp)