Package {PEIMAN2}


Title: Post-Translational Modification Enrichment, Integration, and Matching Analysis
Version: 1.1.0
Description: Functions and mined database from 'UniProt' focusing on post-translational modifications to do single enrichment analysis (SEA) and protein set enrichment analysis (PSEA). Payman Nickchi, Uladzislau Vadadokhau, Mehdi Mirzaie, Marc Baumann, Amir Ata Saei, Mohieddin Jafari (2025) <doi:10.1002/pmic.202400238>.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.3
VignetteBuilder: knitr
Depends: R (≥ 3.5)
Imports: ggplot2, dplyr, glue, lifecycle, purrr, rlang, stringr, graphics, forcats, stats, magrittr, jsonlite
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
LazyData: true
NeedsCompilation: no
Packaged: 2026-06-17 01:50:18 UTC; payma
Author: Mohieddin Jafari [aut], Payman Nickchi [aut, cre]
Maintainer: Payman Nickchi <payman.nickchi@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-17 06:30:10 UTC

PEIMAN2: Post-Translational Modification Enrichment, Integration, and Matching Analysis

Description

Functions and mined database from 'UniProt' focusing on post-translational modifications to do single enrichment analysis (SEA) and protein set enrichment analysis (PSEA). Payman Nickchi, Uladzislau Vadadokhau, Mehdi Mirzaie, Marc Baumann, Amir Ata Saei, Mohieddin Jafari (2025) doi:10.1002/pmic.202400238.

Author(s)

Maintainer: Payman Nickchi payman.nickchi@gmail.com

Authors:


Example dataset1

Description

A dataset with randomly selected proteins from UniProt.

Usage

exmplData1

Format

A list with 2 elements:

pl1

97 randomly selected Homo sapiens (Human) proteins randomly selected from UniProt.

pl2

45 randomly selected Homo sapiens (Human) proteins randomly selected from UniProt.

...

Source

https://www.uniprot.org/


Example dataset 2

Description

A test dataset of proteins identified from rat hippocampus proteome using label-free thermal proteome profiling. The score for each protein corresponds to the SEQUEST HT engine score of one arbitrary peptide-spectrum match (PSM) associated with that protein. This dataset is provided to demonstrate how a ranked list of proteins can be used within the PEIMAN2 package.

Usage

exmplData2

Format

A data frame with 209 rows and 2 columns:

UniProtAC

UniProt accession code of proteins

Score

SEQUEST HT score of one associated PSM (used for demonstration purposes)

...

Details

Proteins of rat hippocampus proteome.

Source

https://pubmed.ncbi.nlm.nih.gov/33632781/


Return the exact taxonomy name for list of protein

Description

getTaxonomyName get a character vector of proteins with their UniProt accession code and returns the exact taxonomy code.

Usage

getTaxonomyName(x, database_version = "bundled")

Arguments

x

A character vector with each entry presenting a protein UniProt accession code.

database_version

Character string specifying which PEIMAN database version to use. The default is 'bundled', which uses the database included with the package. Use 'latest' for the newest cached database, or a specific version such as '2026-05-01'.

Value

The exact taxonomy name

Examples

getTaxonomyName(x = exmplData1$pl1)

Load a PEIMAN database

Description

Loads the PEIMAN database used internally by PEIMAN2. By default, this function loads the database bundled with the package. It can also load the latest cached database or a specific cached database version.

Usage

load_peiman_database(version = "bundled")

Arguments

version

Character string specifying which database version to load. Use 'bundled' to load the database included with the package, 'latest' to load the newest cached database, or a specific version such as '2026_05_01'.

Details

Cached databases are expected to be stored in the PEIMAN2 user cache directory, given by:

tools::R_user_dir('PEIMAN2', which = 'cache')

Cached database files should follow the naming format:

peiman_database_YYYY_MM_DD.rds

This function is intended mainly for internal package use.

Value

A data frame containing the PEIMAN database.


Load a UniProt PTM list

Description

Loads the UniProt PTM list used internally by PEIMAN2. By default, this function loads the PTM list bundled with the package. It can also load the latest cached PTM list or a specific cached PTM list version.

Usage

load_ptmlist(version = "bundled")

Arguments

version

Character string specifying which PTM list version to load. The default is 'bundled', which uses the PTM list included with the package. Use 'latest' for the newest cached PTM list, or a specific version such as '2026-06-15'.

Details

Cached PTM list files are expected to be stored in the PEIMAN2 user cache directory, given by:

tools::R_user_dir('PEIMAN2', which = 'cache')

Cached PTM list files should follow the naming format:

uniprot_ptm_list_YYYY-MM-DD.rds

This function is intended mainly for internal package use.

Value

A data frame containing the UniProt PTM list.


Database of protein modifications

Description

Ontology database for post-translational modification terms. For more details, see the reference.

Usage

data(mod_ont)

Format

A data frame with 2102 rows and 3 variables

Details

Source

https://raw.githubusercontent.com/HUPO-PSI/psi-mod-CV/master/PSI-MOD.obo


Run internal PEIMAN singular enrichment analysis

Description

Internal helper function used by runEnrichment to run singular enrichment analysis for a given protein list, organism, background list, and PEIMAN database version.

Usage

peiman(pro, os, background = NULL, am, db_version = "bundled")

Arguments

pro

A character vector of UniProt accession codes.

os

A character string giving the exact taxonomy name of the organism.

background

Optional character vector of UniProt accession codes to use as the background protein list. If NULL, all reviewed proteins for the selected organism in the PEIMAN database are used as the background.

am

Character string specifying the p-value adjustment method. This is passed to p.adjust.

db_version

Character string specifying which PEIMAN database version to use. Use 'bundled' for the database included with the package, 'latest' for the newest cached database, or a specific version such as '2026-05-01'.

Details

This function is intended for internal package use. User-facing enrichment analysis should be performed with runEnrichment.

Value

A list with two elements:

enrich

A data frame containing the enrichment results.

ms

A character vector of proteins missing from the selected PEIMAN database.


Plot and match singular enrichment results

Description

This function can be used to plot results of singular enrichment analysis for one set of protein. It can also be used to integrate and match the results of two separate singular enrichment analysis and plot the common PTMs. For more details please see examples.

Usage

plotEnrichment(x, y = NULL, sig.level = 0.05, number.rep = NULL, plotit = TRUE)

Arguments

x

A data frame that contains singular enrichment results generated by runEnrichment

y

Default value is NULL. If provided by a singular enrichment results, the matching results of x and y are plotted.

sig.level

The significance level to select post-translational modification (based on their corrected p-value). Note that sig.level applies to both x and y simultaneously.

number.rep

Only plot PTM terms that occurred more than a specific number of times in UniProt database. This number is set by number.rep parameter. The default value is NULL.

plotit

a logical indicating whether you want to draw the plot (TRUE, default value) or you want to return the plot (FALSE).

Value

Plot.

Examples

## Enrichment analysis for the first protein list
enrich1 <- runEnrichment(protein = exmplData1$pl1, os.name = 'Homo sapiens (Human)')
## Plot results for first protein list
plotEnrichment(x = enrich1)

## Enrichment analysis for the second protein list
enrich2 <- runEnrichment(protein = exmplData1$pl2, os.name = 'Homo sapiens (Human)')
## Plot results for second protein list
plotEnrichment(x = enrich2)

## Integrate and match the results of two separate singular enrichment analysis
plotEnrichment(x = enrich1, y = enrich2)
plotEnrichment(x = enrich1, y = enrich2, number.rep = 5)

Plot the results of protein set enrichment analysis (PSEA)

Description

plotPSEA can be used to plot the results of protein set enrichment analysis (psea) for a set of proteins obtained from an experiment.

Usage

plotPSEA(x, y = NULL, sig.level = 0.05, number.rep = NULL)

Arguments

x

A data frame returned by runPSEA function.

y

Default value is NULL. If provided by a protein set enrichment results, the matching results of x and y are plotted.

sig.level

The significance level applied on adjusted p-value by permutation to filter pathways for plotting. The default value is 0.05

number.rep

Only plot PTM terms that occurred more than a specific number of times in UniProt. This number is set by number.rep parameter. The default value is NULL.

Value

Plot

Examples

# We recommend at least nperm = 1000.
# The number of permutations was reduced to 10
# to accommodate CRAN policy on examples (run time <= 5 seconds).
psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 10)
plotPSEA(psea_res, sig.level = 0.05)


Plot running score plot for the results of psea

Description

This function takes results generated by runPSEA. It plots running enrichment score of ranked protein for each PTM.

Usage

plotRunningScore(
  x,
  nplot = length(x$psea.result),
  type = "l",
  lty = 1,
  lwd = 3,
  cex = 1.2,
  cex.axis = 1.2,
  cex.lab = 1.1,
  col = "blue"
)

Arguments

x

A list of 6 generated by runPSEA function.

nplot

An integer that defines the number of running score plots to show. Default value is the number of enriched PTMs in x.

type

Type of line used in the plot.

lty

A list of 6 generated by runPSEA function.

lwd

line width

cex

Specify the size of the title text

cex.axis

Specify the size of the tick label

cex.lab

Specify the size of the axis label text

col

Color of running enrichment score line

Value

Plot

Examples

# We recommend at least nperm = 1000.
# The number of permutations was reduced to 10
# to accommodate CRAN policy on examples (run time <= 5 seconds).
psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 10)
plotRunningScore(x = psea_res)

Translate PSEA results for Mass Spectrometry searching tools

Description

This function translates protein set enrihment analysis results and extracts the required information for mass spectometry searching tools. The subset of protein modifications is from https://raw.githubusercontent.com/HUPO-PSI/psi-mod-CV/master/PSI-MOD.obo.

Usage

psea2mass(x, sig.level = 0.05, number.rep = NULL, ptmlist_version = "bundled")

Arguments

x

A list of psea results generated by runPSEA function.

sig.level

The significance level to filter PTMs (applies on adjusted p-value). Default value is 0.05

number.rep

Only consider PTM terms that occurred more than a specific number of times in UniProt. This number is set by number.rep parameter. The default value is NULL.

ptmlist_version

Character string specifying which UniProt PTM list version to use. The default is 'bundled', which uses the PTM list included with the package. Use 'latest' for the newest cached PTM list, or a specific version such as '2026-06-15'.

Value

A database of subset of protein modifications:

Examples

# We recommend at least nperm = 1000.
# The number of permutations was reduced to 10
# to accommodate CRAN policy on examples (run time <= 5 seconds).
psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 10)
MS <- psea2mass(x = psea_res, sig.level = 0.05)

Controlled vocabulary for post-translational modifications (PTM) terms

Description

This dataframe lists the posttranslational modifications used in the UniProt knowledgebase (Swiss-Prot and TrEMBL). The columns in this dataframe are as follows:

Usage

data(ptmlist)

Format

A data frame with 686 rows and 5 variables

Details

Source

https://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/complete/docs/ptmlist.txt


Run singular enrichment analysis (SEA) for a given list of protein

Description

This function takes proteins with their UniProt accession code, runs singular enrichment (SEA) analysis, and returns enrichment results.

Usage

runEnrichment(
  protein,
  os.name,
  blist = NULL,
  p.adj.method = "BH",
  database_version = "bundled"
)

Arguments

protein

A character vector with protein UniProt accession codes.

os.name

A character vector of length one with exact taxonomy name of species. If you do not know the the exact taxonomy name of species you are working with, please read getTaxonomyName.

blist

The background list will be substituted with the complete set of UniProt reviewed proteins to facilitate the analysis with a background list. The default value is NULL. Alternatively, if a vector of UniProt Accession Codes is provided, it will serve as the background list for the enrichment analysis.

p.adj.method

The adjustment method to correct for multiple testing. The default value is 'BH'. Run/see p.adjust.methods to get a list of possible methods.

database_version

Character string specifying which PEIMAN database version to use. The default is 'bundled', which uses the database included with the package. Use 'latest' for the newest cached database, or a specific version such as '2026-05-01'.

Value

The result is a dataframe with the following columns:

Examples

enrich1 <- runEnrichment(protein = exmplData1$pl1, os.name = 'Homo sapiens (Human)')


Run Protein Set Enrichment Analysis (PSEA)

Description

This is the main function to run protein set enrichment analysis for a list of proteins and their score.

Usage

runPSEA(
  protein,
  os.name,
  blist = NULL,
  pexponent = 1,
  nperm = 1000,
  p.adj.method = "fdr",
  sig.level = 0.05,
  minSize = 1,
  database_version = "bundled"
)

Arguments

protein

A dataframe with two columns. Frist column should be protein accession code, second column is the score.

os.name

A character vector of length one with exact taxonomy name of species. If you do not know the the exact taxonomy name of species you are working with, please read getTaxonomyName.

blist

The background list will be substituted with the complete set of UniProt reviewed proteins to facilitate the analysis with a background list. The default value is NULL. Alternatively, if a vector of UniProt Accession Codes is provided, it will serve as the background list for the enrichment analysis.

pexponent

Enrichment weighting exponent, p. For values of p < 1, one can detect incoherent patterns in a set of protein. If one expects a small number of proteins to be coherent in a large set, then p > 1 is a good choice.

nperm

Number of permutation to estimate false discovery rate (FDR). Default value is 1000.

p.adj.method

The adjustment method to correct pvalues for multiple testing in enrichment. Run p.adjust.methods() to get a list of possible methods.

sig.level

The significance level to filter PTM (applies on adjusted p-value)

minSize

PTMs with the number of proteins below this threshold are excluded.

database_version

Character string specifying which PEIMAN database version to use. The default is 'bundled', which uses the database included with the package. Use 'latest' for the newest cached database, or a specific version such as '2026-05-01'.

Value

Returns a list of 6: 1: A dataframe with protein set enrichment analysis (PSEA) results. Every row corresponds to a post-translational modification (PTM) keyword.

Examples

# We recommend at least nperm = 1000.
# The number of permutations was reduced to 10
# to accommodate CRAN policy on examples (run time <= 5 seconds).
psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 10)

Translate SEA results for Mass Spectrometry searching tools

Description

This function translates singular enrichment analysis results and extracts the required information for mass spectometry searching tools. The subset of protein modifications is from https://raw.githubusercontent.com/HUPO-PSI/psi-mod-CV/master/PSI-MOD.obo.

Usage

sea2mass(x, sig.level = 0.05, number.rep = NULL, ptmlist_version = "bundled")

Arguments

x

A dataframe of single enrichment analysis results generated by runEnrichment function.

sig.level

The significance level to filter pathways (applies on adjusted p-value). Default value is 0.05.

number.rep

Only consider PTM terms that occurred more than a specific number of times in UniProt. This number is set by number.rep parameter. The default value is NULL.

ptmlist_version

Character string specifying which UniProt PTM list version to use. The default is 'bundled', which uses the PTM list included with the package. Use 'latest' for the newest cached PTM list, or a specific version such as '2026-06-15'.

Value

A database of subset of protein modifications:

Examples

enrich1 <- runEnrichment(protein = exmplData1$pl1, os.name = 'Homo sapiens (Human)')
MS      <- sea2mass(x = enrich1, sig.level = 0.05)

Download and cache PEIMAN2 external data files

Description

Downloads external PEIMAN2 data files from the online PEIMAN2 database repository and stores them in the user's local PEIMAN2 cache directory. This can include the main PEIMAN database, the UniProt PTM list, or both.

Usage

update_peiman_database(version = "latest", refresh = FALSE, type = "all")

Arguments

version

Character string specifying which version to download. The default is 'latest', which downloads the newest available version for the selected file type listed in the online configuration file. A specific version such as '2026-05-01' can also be supplied.

refresh

Logical. If FALSE, a file is not downloaded again if it already exists in the local cache. If TRUE, the file is downloaded again and replaces the cached file.

type

Character string specifying which file type to download. Use 'database' for the main PEIMAN database, 'ptmlist' for the UniProt PTM list, or 'all' to download both. The default is 'all'.

Details

Cached files are stored in the PEIMAN2 user cache directory:

tools::R_user_dir('PEIMAN2', which = 'cache')

This function requires internet access. It is not called automatically when the package is loaded.

The online configuration file contains at least the following columns: type, version, file, and url.

Value

Invisibly returns the path or paths to the cached file(s).

mirror server hosted at Truenetwork, Russian Federation.