RaMP-DB 3.0 Vignette

Introduction

This vignette will provide basic steps for interacting with RaMP-DB (Relational database of Metabolomic Pathways).

Details on RaMP-DB installation are also avaialble through GitHub (https://github.com/ncats/RaMP-DB/). Questions can be asked through the Issues tab or by sending an email to NCATSRaMP@nih.gov. RaMP-DB supports queries and enrichment analyses for biochemical pathways, reactions, ontologies and chemical properties (chemical classes and chemical reactions).

Installation instructions can be found here. Once the R package is installed and loaded, the user has access to all versions of RaMP-DB within your local file cache and in our remote repository. To see a list of available versions, users can call the function listAvailableRaMPDbVersions().

The first step in using RaMP-DB is to initialize the RaMP-DB database object. The user can specify a specific version of the database to use and if not specified, the the most recent version of the database will be used. Initialization of the RaMP-DB object only needs to be performed once and the object should be passed on to RaMP-DB functions. If this object is not passed, then the functions called will first initialize an RaMP-DB instance every time they are called and this will take longer. We thus recommend creating the object once in the beginning and using it throughout.

Here’s how to do it:

# Load the library
library(RaMP)

# Load some other libraries that are useful for this vignette in displaying the data
if(!require("DT")) install.packages("DT")
library(DT) # for prettier tables in vignette
if(!require("dplyr")) install.packages("dplyr")
library(dplyr)
if(!require("magrittr")) install.packages("magrittr")
library(magrittr)

# List available RaMP-DB versions
listAvailableRaMPDbVersions()

## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
## [1] "3.0.3" "2.6.5" "2.6.3" "2.5.4" "2.3.1"
## [1] "Available remote RaMP SQLite DB versions for download:"
## [1] "2.5.4" "2.5.0" "2.4.3" "2.4.2" "2.4.0" "2.3.2" "2.3.1"
## [1] "The following RaMP Database versions are available for download:"
## [1] "2.5.0" "2.4.3" "2.4.2" "2.4.0" "2.3.2"
## [1] "Use the command db <- RaMP(<new_version_number>) to download the specified version."

# lLoad a local RaMP database or download the latest RaMP database version from the repository.
# If the version is not specified, the latest local version will be used.
# If there are not local database cached, then the latest remote version will be downloaded.
rampDB <- RaMP(branch="ramp3.0")

Preparing your input for RaMP-DB

Note that it is always preferable to utilize IDs rather then common names. When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP-DB currently supports the following ID types (that should be prepended):

  metabprefixes <- getPrefixesFromAnalytes("metabolite", db=rampDB)
  geneprefixes <- getPrefixesFromAnalytes("gene", db=rampDB)

  datatable(rbind(metabprefixes, geneprefixes))

Input External Data Set

Users can input external data sources of analytes using the function createRaMPInput(). This function will take as input a data.frame, .csv, or .xlsx with metabolite metadata formatted as follows: - Column names must correspond to an existing ID source (e.g., hmdb, kegg, entrez, etc.). See function getPrefixesFromAnalytes() for supported ID sources. ID sources not supported will be ignored. - ID of supported types are then filled in for each row

HEre’s an example built into the package:

# Retrieve the file pathe and name of the example data
dir   <- system.file("extdata", package="RaMP", mustWork=TRUE)
exInput <- file.path(dir, "ExampleRaMPInput.csv")

# Load in the data into RaMP-DB
exData <- createRaMPInput(filePath = exInput, db=rampDB)

# Now use this input to query pathways corresponding to those analytes (passing on the rampDB object
# we created previously)
testids <- getPathwayFromAnalyte(analytes = exData, db=rampDB)

datatable(testids)
new.data <- distinct(testids, commonName, inputId)
print(new.data)

Exploring biological pathways and performing multi-omic pathway enrichment

Users can retrieve analytes from input pathways, retrieve pathways from input analytes, as well as perform multi-omic pathway enrichment. RaMP-DB pulls from pathway information from multiple resources, including HMDB, Reactome, WikiPathways, and PFOCR.

Retrieve Analytes From Input Pathway(s)

Analytes (genes.proteins, metabolites) can be retrieved by pathway. RaMP-DB does not explicitely distinguish bewteen genes and proteins when it comes to pathways.

Users can either input the exact pathway name or part of a pathway name for a fuzzy search. By default, the function getAnalyteFromPathway() returns both proteins/genes and metabolites involved. This can be modified with the parameter “analyteType”.

Here is an example:

myanalytes <- getAnalyteFromPathway(pathway="Sphingolipid metabolism", db=rampDB)

## [1] "fired!"
## [1] "Timing .."
##    user  system elapsed 
##   0.634   0.385   1.162

datatable(myanalytes)

myanalytes <- getAnalyteFromPathway(pathway="Sphingolipid", db=rampDB, match = "fuzzy")

## [1] "fired!"
## [1] "Timing .."
##    user  system elapsed 
##   0.760   0.383   1.171

datatable(myanalytes)

To retrieve information from multiple pathways, input a vector of pathway names:

myanalytes <- getAnalyteFromPathway(pathway=c("Wnt Signaling Pathway", 
                                              "sphingolipid metabolism"), db=rampDB)

## [1] "fired!"
## [1] "Timing .."
##    user  system elapsed 
##   0.638   0.362   1.003

Retrieve Pathways From Input Analyte(s)

It is oftentimes useful to get a sense of what pathways are represented in a dataset (this is particularly true for metabolomics, where coverage of metabolites varies depending on what platform is used). In other cases, one may be interested in exploring one or several metabolites to see what pathways they are are presented in. By default, pathways with < 5 or > 150 analytes will not be returned. See the getPathwayFromAnalyte() documentation to change those defaults.

In this example, we will search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatine. Note we are using their IDs for queries which is recommended (rather than using names).

pathwaydfids <- getPathwayFromAnalyte(c("ensembl:ENSG00000135679", "hmdb:HMDB0000064","hmdb:HMDB0000148", "ensembl:ENSG00000141510"), db=rampDB)

## [1] "Starting getPathwayFromAnalyte()"
## [1] "finished getPathwayFromAnalyte()"
## [1] "Found 2542 associated pathways."

datatable(pathwaydfids)

Each row returns a pathway attributed to one of the input analytes. To retrieve the number of unique pathways returned for all analytes or each analyte, try the following:

print(paste("Number of Unique Pathways Returned for All Analytes:", 
            length(unique(pathwaydfids$pathwayId))))

## [1] "Number of Unique Pathways Returned for All Analytes: 2353"

lapply(unique(pathwaydfids$commonName), function(x) {
        (paste("Number of Unique Pathways Returned for",x,":",
                length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})

## [[1]]
## [1] "Number of Unique Pathways Returned for TP53 : 906"
## 
## [[2]]
## [1] "Number of Unique Pathways Returned for L-Glutamate : 232"
## 
## [[3]]
## [1] "Number of Unique Pathways Returned for MDM2 : 1282"
## 
## [[4]]
## [1] "Number of Unique Pathways Returned for Creatine : 122"

Pathway Enrichment Analyses

RaMP-DB performs pathway enrichment analysis using Fisher’s tests with the function runEnrichPathways().

Using the pathwaydfids data frame from our previous step, we can now run Fisher’s Exact test to identify pathways that are enriched for our analytes of interest:

pathways.enriched <- runEnrichPathways(analytes = c("hmdb:HMDB0000033","hmdb:HMDB0000052","hmdb:HMDB0000094",
  "hmdb:HMDB0000161","hmdb:HMDB0000168","hmdb:HMDB0000191","hmdb:HMDB0000201","chemspider:10026",
  "hmdb:HMDB0006059", "Chemspider:6405", "CAS:5657-19-2","hmdb:HMDB0002511", "chemspider:20171375",
  "CAS:133-32-4", "CAS:5746-90-7", "CAS:477251-67-5", "hmdb:HMDB0000695", "chebi:15934", "CAS:838-07-3",
  "hmdb:HMDBP00789", "hmdb:HMDBP00283", "hmdb:HMDBP00284", "hmdb:HMDBP00850"), 
  db=rampDB)

Once we have our fisher results we can format them into a new dataframe and filter the pathways for significance. For this example we will be using an FDR p-value cutoff of 0.05.

#Returning Fisher Pathways and P-Values
filtered.pathways.enriched <- filterEnrichResults(enrichResults=pathways.enriched, 
    pValType = 'holm', pValCutoff=0.05)

## [1] "Filtering Fisher Results..."

Because RaMP-DB combines pathways from multiple sources, pathways may be represented more than once (e.g., the TCA cycle is represented in many databases). Further, due to the hierarchical nature of pathways and because Fisher’s testing assumes pathways are independent, subpathways and their parent pathways may appear in a list.

To help group together pathways that represent similar biological processes, we have implemented a clustering algorithm that groups pathways together if they share analytes in common.

clusters <- findCluster(filtered.pathways.enriched,
  percAnalyteOverlap = 0.2, percPathwayOverlap = 0.2, db=rampDB)

## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."

datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
  rownames = FALSE
)

See the findCluster() function documentation for a description of input parameters. We suggest trying different values of perc_analyte_overlap and perc_pathway_overlap to obtain the meaningful clusters.

To plot pathway results with clusters, use the function plotPathwayResults() as follows:

plotPathwayResults(pathwaysSig=clusters,  interactive = TRUE, db=rampDB)

## The input pathway result has already been clustered. Defaulting to existing clustering.

Ontologies

RaMP-DB contains ontology annotations, obtained from HMDB, which include the following categories: Biofluid and excreta, Organ and components, Subcellular location, Industrial applications, Source (e.g. plant/animal/microbial), Health Condition, Tissue and substructures.

Retrieve Metabolites from Ontologies

The function getMetaFromOnto() retrieves metabolites that are associated with user-input ontology(ies).

ontologies.of.interest <- "Urine"
new.metabolites <- getMetaFromOnto(ontology = ontologies.of.interest, db=rampDB)

## [1] "Retreiving Metabolites for input ontology terms."
## [1] "Found 1 ontology term matches."
## [1] "Found 7363 metabolites associated with the input ontology terms."
## [1] "Finished getting metabolies from ontology terms."

# View the first 10 metabolites:
datatable(head(new.metabolites, n=10))

Retrieve Ontologies from Input Metabolites

To retrieve ontologies that are associated with metabolites of interest, we can use getOntoFromMeta(). This function takes in a vector of metabolites as an input and returns associated ontologies.

analytes.of.interest <- c("chebi:15422", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "wikidata:Q426660")
new.ontologies <- getOntoFromMeta(mets = analytes.of.interest, db=rampDB)
datatable(new.ontologies)

Ontology Enrichment Analyses

RaMP-DB performs ontology enrichment analysis using Fisher’s tests with the function runEnrichOntologies().

ontologies.enriched <- runEnrichOntologies(mets = c("hmdb:HMDB0000033","hmdb:HMDB0000052","hmdb:HMDB0000094",
  "hmdb:HMDB0000161","hmdb:HMDB0000168","hmdb:HMDB0000191","hmdb:HMDB0000201","chemspider:10026",
  "hmdb:HMDB0006059", "Chemspider:6405", "CAS:5657-19-2","hmdb:HMDB0002511", "chemspider:20171375",
  "CAS:133-32-4", "CAS:5746-90-7", "CAS:477251-67-5", "hmdb:HMDB0000695", "chebi:15934", "CAS:838-07-3",
  "hmdb:HMDBP00789", "hmdb:HMDBP00283", "hmdb:HMDBP00284", "hmdb:HMDBP00850"), 
  db=rampDB)

# Filter results based on p-values:
filtered.ontologies.enriched <- filterEnrichResults(enrichResults=ontologies.enriched, 
    pValType = 'holm', pValCutoff=0.05)

datatable(filtered.ontologies.enriched$fishertresults)

Reactions

RaMP-DB pulls reaction information from the curated resource Rhea and HMDB. Only human reactions are retrieve. The information retrieved includes reaction classes, information on substrates and products, their enzymes (if any) as well as directionality. Both enzymatic and spontaenous biotransformations are represented.

The following analyes involving reactions can be performed: retrieving analytes involved in the same reaction and visualizing associated networks, retrieve and interactively explore reaction classes and perform enrichment.

Retrieve Analytes Involved in the Same Reaction

The user may want to know what enzymes which can catalyze reactions involving metabolites in their experiment and vice versa.

Users can input metabolites to retrieve associated enzymes or can input enzymes to return metabolites involved in the same chemical reactions. Again, using IDs is preferred over use of names. For Rhea, only UniProt (for proteins) and ChEBI (for metabolites) IDs are supported. Other ID types are supported for HMDB.

# Input Metabolites and Proteins
inputs.of.interest <- c("kegg:C00186" , "hmdb:HMDB0000148", "kegg:C00780", "hmdb:HMDB0000064", "ensembl:ENSG00000115850", "uniprot:Q99259")


catalyzedby.output <- rampFastCata(analytes = inputs.of.interest, db=rampDB)

## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 132"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 144"
## [1] "There are no ChEBI metabolite IDs in the input. Skipping metabolite to protein query step."

#just show HMDB analyte associations
datatable(catalyzedby.output)

NEED SOME EXPLANATION HERE OF THE OUPUT

The enzyme-metabolite relationships retrieved with the rampFastCata() function can be visualized and explored as a network.

plotCataNetwork(catalyzedby.output)

Retrieve reactions from input analytes

Curated reactions from Rhea can be returned given a list of input analytes with the function getReactionsForAnalytes(). Users input a vector metabolite ChEBI ids and/or a vector of gene/protein UniProt IDs. The function returns 3 reaction lists:

reactions involving input metabolites,
reactions involving input proteins, and
reactions that have at least one metabolite and one protein from the input analyte list.

For each list, the substrates, product, reaction direction and other information are returned.

analytes.of.interest = c('chebi:58115', 'chebi:456215', 'chebi:58245', 'chebi:58450',
             'chebi:17596', 'chebi:16335', 'chebi:16750', 'chebi:172878',
             'chebi:62286', 'chebi:77897', 'uniprot:P30566','uniprot:P30520',
             'uniprot:P00568', 'uniprot:P23109', 'uniprot:P22102', 'uniprot:P15531')
reactionsLists <- getReactionsForAnalytes(analytes = analytes.of.interest, db=rampDB)

## Running getReactionsForAnalytes()

## Reporting Function: getReactionsForAnalytes

## The input list has 16 IDs.

## The input list has 10 chebi IDs.

## The input list has 6 uniprot IDs.

## Finished getReactionsForAnalytes()

# Just show the reactions with at least one metabolite and one protein from the input list belonging 
# to the same reaction.
datatable(subset(reactionsLists$metProteinCommonReactions))

Notice that the output returns the reaction classes for each reaction. The function plotAnalyteOverlapPerRxnLevel() generates an interactive UpSet plot showing the number of overlapping input analytes at each reaction class level 1.

plotAnalyteOverlapPerRxnLevel(reactionsLists)

Retrieve reaction classes from input analytes

RaMP-DB includes reaction class and Enzyme Commission numbers (EC numbers) for enzymes from Rhea. These can be retrieved using the getReactionClassesForAnalytes() given a user input vector metabolite ChEBI ids and/or a vector of gene/protein UniProt IDs.

analytes.of.interest = c('chebi:58115', 'chebi:456215', 'chebi:58245', 'chebi:58450',
             'chebi:17596', 'chebi:16335', 'chebi:16750', 'chebi:172878',
             'chebi:62286', 'chebi:77897', 'uniprot:P30566','uniprot:P30520',
             'uniprot:P00568', 'uniprot:P23109', 'uniprot:P22102', 'uniprot:P15531')
reaction.classes <- getReactionClassesForAnalytes(analytes = analytes.of.interest, db=rampDB)

## [1] "Starting reaction class query..."

## Reporting Function: getReactionClassesForAnalytes

## The input list has 16 IDs.

## The input list has 10 chebi IDs.

## The input list has 6 uniprot IDs.

## [1] "Passed the getReactionClassStats"
## [1] "Completed reaction class query..."

The retrieved reaction classes can be visualized from an interactive sunburst plot, allowing users to explore the reaction classes represented by their data as well as their associated analytes.

plotReactionClasses(reaction.classes)

Perform reaction class enrichment

RaMP-DB performs reaction class enrichment analysis using Fisher’s tests with the function runEnrichReactionClass(). Similar to the other functions related to reaction classes, input IDs supported are ChEBI for metabolites and UniProt for proteins.

rxn.enriched <- runEnrichReactionClass(analytes = analytes.of.interest, db=rampDB)

## Reporting Function: getReactionClassesForAnalytes

## The input list has 16 IDs.

## The input list has 10 chebi IDs.

## The input list has 6 uniprot IDs.

# Filter results based on p-values:
filtered.rxn.enriched <- filterEnrichResults(enrichResults = rxn.enriched, pValType = 'holm', pValCutoff=0.05)

Chemical Descriptors

Users can retrieve chemical classes and chemical property information from input metabolites, as well as perform chemical class enrichment.

Retrieve Chemical Classes from Input Metabolites

RaMP incorporates Classyfire and lipidMAPS classes. The function getChemClass() function takes as input a vector of metabolites and outputs the classes associated with each metabolite input.

metabolites.of.interest = c("pubchem:64969", "chebi:16958", "chemspider:20549", "kegg:C05598", "chemspider:388809", "pubchem:53861142", "hmdb:HMDB0001138", "hmdb:HMDB0029412")
chemical.classes <- getChemClass(mets = metabolites.of.interest, db=rampDB)

## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"

metabolite.classes <- as.data.frame(chemical.classes$met_classes)
datatable(metabolite.classes)

Retrieve Chemical Property Information from Input Metabolites

Chemical properties captured by RaMP include SMILES, InChI, InChI-keys, monoisotopic masses, molecular formula, and common name. The getChemicalProperties() function takes as input a vector of metabolites and outputs a list of chemical property information.

chemical.properties <- getChemicalProperties(metabolites.of.interest, db=rampDB)

## Starting Chemical Property Query

## Finished Chemical Property Query

chemical.data <- chemical.properties$chem_props
datatable(chemical.data)

Chemical Class Enrichment Analyses

RaMP-DB performs chemical class enrichment analysis using Fisher’s tests with the function runEnrichChemClass(). The function performs enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LipidMaps categories, main classess, and sub classes.

metabolites.of.interest = c("pubchem:64969", "chebi:16958", "chemspider:20549", "kegg:C05598", 
    "chemspider:388809", "pubchem:53861142", "hmdb:HMDB0001138", "hmdb:HMDB0029412")
chemClass.enrichment <- runEnrichChemClass(mets = metabolites.of.interest, db=rampDB)

## [1] "Starting Chemical Class Enrichment"
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
## [1] "check total summary"
## [1] "getting population totals"
## [1] "Finished Chemical Class Enrichment"

# Enrichment was performed on the following chemical classes:
names(chemClass.enrichment)

## [1] "ClassyFire_class"       "ClassyFire_sub_class"   "ClassyFire_super_class"
## [4] "result_type"

# To retrieve results for the ClassyFire Class:
classy_fire_classes <- chemClass.enrichment$ClassyFire_class
datatable(classy_fire_classes)

Connect to Different Versions of RaMP

Users are able to download previous versions of RaMP, and can input queries in these earlier versions. Some annotations have been added or changed since updated versions have been posted.

#Example query for earlier version
Alternate.db <- RaMP('2.3.1')
Alternate.Ramp <- getAnalyteFromPathway(db = Alternate.db, pathway = c('Pentose Phosphate Pathway'))
datatable(Alternate.Ramp)

#Example query for current version
Current.db <- RaMP('2.5.4')
path.search <- getAnalyteFromPathway(db = Current.db, pathway = c('Pentose Phosphate Pathway'))
datatable(path.search)

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.0
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.3 dplyr_1.1.4    DT_0.33        RaMP_3.0.2    
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5         xfun_0.48            bslib_0.8.0         
##  [4] ggplot2_3.5.1        htmlwidgets_1.6.4    visNetwork_2.1.2    
##  [7] lattice_0.22-6       vctrs_0.6.5          tools_4.4.1         
## [10] crosstalk_1.2.1      generics_0.1.3       curl_5.2.3          
## [13] Polychrome_1.5.1     tibble_3.2.1         fansi_1.0.6         
## [16] RSQLite_2.3.7        blob_1.2.4           janeaustenr_1.0.0   
## [19] pkgconfig_2.0.3      tokenizers_0.3.0     Matrix_1.7-0        
## [22] data.table_1.16.2    dbplyr_2.5.0         scatterplot3d_0.3-44
## [25] lifecycle_1.0.4      compiler_4.4.1       farver_2.1.2        
## [28] munsell_0.5.1        htmltools_0.5.8.1    SnowballC_0.7.1     
## [31] sass_0.4.9           yaml_2.3.10          lazyeval_0.2.2      
## [34] tidytext_0.4.2       plotly_4.10.4        pillar_1.9.0        
## [37] jquerylib_0.1.4      tidyr_1.3.1          upsetjs_1.11.1      
## [40] cachem_1.1.0         tidyselect_1.2.1     digest_0.6.37       
## [43] stringi_1.8.4        purrr_1.0.2          labeling_0.4.3      
## [46] fastmap_1.2.0        grid_4.4.1           colorspace_2.1-1    
## [49] cli_3.6.3            utf8_1.2.4           withr_3.0.1         
## [52] filelock_1.0.3       scales_1.3.0         bit64_4.5.2         
## [55] rmarkdown_2.28       httr_1.4.7           bit_4.5.0           
## [58] memoise_2.0.1        evaluate_1.0.1       knitr_1.48          
## [61] BiocFileCache_2.12.0 viridisLite_0.4.2    rlang_1.1.4         
## [64] Rcpp_1.0.13          glue_1.8.0           DBI_1.2.3           
## [67] rstudioapi_0.16.0    jsonlite_1.8.9       R6_2.5.1