Quick Start with RaMP-DB - SQLite Version

Introduction
Supported RaMP Queries

Introduction

RaMP-DB (Relational database of Metabolomic Pathways) is a publicly available relational database that integrates multiple sources of biological, structural/chemical, disease, and ontology annotations for analytes (metabolites, genes, proteins) from multiple resources including HMDB, KEGG, Reactome, WikiPathways, LIPIDMAPS, ChEBI and Rhea. The resource was initially created to

enhance the ability to interpret metabolomic and multi-omic data using standardized annotations on analytes;
fill the gap in the availability of benchmark annotation database that was consistently used to compare pathway enrichment methods.

Today, RaMP-DB includes 256,537 chemical structures, of which 43,338 are lipids, 16,443 genes, 53,952 pathways, 63,396 metabolic enzyme/metabolite reactions, and 699 ontologies. Notably, curation of the source data within RaMP-DB is semi-automatic to help ensure sustainability of the resource. The curation process involves a first automatic pass for errors in ID mappings for metabolites across resources (e.g. comparison of MW and chemical properties), followed by a second manual curation.

Expanded content for RaMP-DB 3.0

Source database updates - Increased focus on lipid annotations - Incorporation of curated reactions from Rhea

Improved semi-automated integration scripts

SQLite version (replacing MySQL) Greatly facilitates R package installation and integration with other tools (COMETS Analytics, MetaboAnalyst, Bioconductor Computational Metabolomics Ecosystem)

New queries of biochemical reactions

Main Functionalities

RaMP-DB Content and Functions

RaMP-DB enables: - Single and batch queries of pathways, common reactions, ontologies, chemical structures/classes:

Retrieve Analytes From Input Pathway(s) and vice versa
Retrieve Ontologies From Input Metabolites and vice versa
Retrieve Analytes Involved in the Same Reaction as Input Analytes
Retrieve Chemical Classes from Input Metabolites and vice versa
Pathway and Chemical Class enrichment given input analytes

Access to SQLite Branch (to be released soon at v3.0)

R package - The latest package is on the SQLite branch will be released this month!
User-friendly web interface (https://rampdb.nih.gov/).

Direct any questions through the Issues tab in GibHub or via email at NCATSRaMP@nih.gov.

Installation

For this workshop, use the sqlite branch install of RaMP. Be sure to have the latest R version (4.3.2).

Special Note: There is incompatibility (reported here: https://stat.ethz.ch/pipermail/bioc-devel/2023-October/020003.html) between the version of BiocFileCache installed using BiocManager (2.8.0) and the actual latest version (2.10.1). The latter is needed to be compatible with other dependencies in RaMP-DB. To install the latest version, you will need to download the source file from Bioconductor (https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html), then install using the install.packages() function. For a Mac, this looks like this:

# install.packages("YourDirectory/BiocFileCache_2.10.1.tgz")

You can install the RaMP-DB package directly from GitHub using the install_github() function available through the devtools package. In the R Console, type the following:

# Locally install RaMP
install.packages("devtools")
library(devtools)
install_github("ncats/RAMP-DB", ref="sqlite")

packageVersion('BiocFileCache')
install_github("ncats/RAMP-DB", ref="sqlite")

Now you can load your package and list available database versions within your local file cache and in our remote repository.

library(RaMP)
# Some other packages 
library(DT) # for prettier tables in vignette
library(dplyr)
library(magrittr)

RaMP::listAvailableRaMPDbVersions()

## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
## [1] "2.4.2" "2.4.1" "2.3.1"
## [1] "Available remote RaMP SQLite DB versions for download:"
## [1] "2.4.2" "2.4.0" "2.3.2" "2.3.1"
## [1] "The following RaMP Database versions are available for download:"
## [1] "2.4.0" "2.3.2"
## [1] "Use the command db <- RaMP(<new_version_number>) to download the specified version."

Next, initialize the RaMP database object and automatically retrieve the latest version of the database (unless you specify a specific version, e.g., version = ‘2.3.2’, which could be useful for back compatibility).

rampDB <- RaMP()

Special Note: When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP currently supports the following ID types (that should be prepended):

  metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite")
  geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene")

  datatable(rbind(metabprefixes, geneprefixes))

The supported ID types for searching chemical classes are: ‘iso_smiles’, ‘inchi_key’, ‘inchi_key_prefix’, ‘inchi’, ‘mw’, ‘monoisotop_mass’, ‘formula’, ‘common_name’.

Supported RaMP Queries

Pathway Queries

Retrieve analytes from pathway(s).

Retrieve analytes using a pathway name or name:

myanalytes <- getAnalyteFromPathway(db = rampDB, pathway="Sphingolipid metabolism")

## [1] "fired!"
## [1] "Timing .."
##    user  system elapsed 
##   0.721   0.067   0.790

# See results in a nice table
datatable(myanalytes)

# Can also input multiple pathways:
myanalytes <- getAnalyteFromPathway(db = rampDB, pathway=c("De Novo Triacylglycerol Biosynthesis", 
                                              "sphingolipid metabolism"))

## [1] "fired!"
## [1] "Timing .."
##    user  system elapsed 
##   0.833   0.072   0.907

  metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite")
  geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene")

  datatable(rbind(metabprefixes, geneprefixes))

Now let’s search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatinine.

pathwaydfids <- getPathwayFromAnalyte(db = rampDB, c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "ensembl:ENSG00000141510"))

## [1] "Starting getPathwayFromAnalyte()"
## [1] "Working on ID List..."
## [1] "finished getPathwayFromAnalyte()"
## [1] "Found 542 associated pathways."

datatable(pathwaydfids)

Note that each row returns a pathway attributed to one of the input analytes. To retrieve the number of unique pathways returned for all analytes or each analyte, try the following:

print(paste("Number of Unique Pathways Returned for All Analytes:", 
            length(unique(pathwaydfids$pathwayId))))

## [1] "Number of Unique Pathways Returned for All Analytes: 450"

lapply(unique(pathwaydfids$commonName), function(x) {
        (paste("Number of Unique Pathways Returned for",x,":",
                length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})

## [[1]]
## [1] "Number of Unique Pathways Returned for L-Glutamic acid,Glutamate : 217"
## 
## [[2]]
## [1] "Number of Unique Pathways Returned for TP53 : 223"
## 
## [[3]]
## [1] "Number of Unique Pathways Returned for MDM2 : 89"
## 
## [[4]]
## [1] "Number of Unique Pathways Returned for Creatine : 13"

Ontology queries

Conversely, the user can retrieve the metabolites that are associated with a specific ontology or vector of ontologies. We can accomplish this using the function getMetaFromOnto(). It should be noted that it does not matter which ontology the metabolites are from. The function will return all metabolites associated with all the ontologies specified by the user.

ontologies.of.interest <- c("Colon", "Liver", "Lung")

new.metabolites <- RaMP::getMetaFromOnto(db = rampDB, ontology = ontologies.of.interest)

## [1] "Retreiving Metabolites for input ontology terms."
## [1] "Found 3 ontology term matches."
## [1] "Found 1544 metabolites associated with the input ontology terms."
## [1] "Finished getting metabolies from ontology terms."

datatable(head(new.metabolites, n=10))

Retrieve Ontologies from Input Metabolites

Ontologies are retrieved from HMDB and include annotations such as health condition, organ/components, tissue, biofluid, industrial applications, etc.

To retrieve ontologies from a user-input list of analytes, type the following:

analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "ensembl:ENSG00000141510")
new.ontologies <- RaMP::getOntoFromMeta(db = rampDB, analytes = analytes.of.interest)
datatable(new.ontologies)

Reaction Queries

Users can retrieve analytes belonging to the same reaction, which is useful for multi-omic applications (e.g., what genes catalyze reactions involving a list of metabolites?).

Here is an example where a vector of proteins and metabolites are input, and associated metabolites and proteins, respectively, are returned.

# Let's search for enzymes catalyzing reactions involving creatine:
mytranscripts <- rampFastCata(db = rampDB, analytes = "hmdb:HMDB0000064")

## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 10"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 10"

datatable(mytranscripts)

# Let's search for metabolites involved in reactions catalyzed by GAD1
mymetabolites <- rampFastCata(db = rampDB, analytes = "uniprot:Q99259")

## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 0"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 12"

datatable(mymetabolites)

Visualize your results:

plotCataNetwork(mytranscripts)

New ways to interact with Rhea reactions

Users can query Rhea reactions from a list of analytes and retrieve all rich information on the reactions. The function returns 3 dataframes: 1) metabolites-to-reactions 2) proteins-to-reactions 3) metProteinCommonReactions: reactions that have at least one user-input metabolites and one user-input protein.

analytes.of.interest <- c('chebi:57368', 'uniprot:Q96N66', 'CHEBI:73003')
reactionsLists <- RaMP::getReactionsForAnalytes(db = rampDB, analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T)

## [1] "Retrieving reactions for compounds"
## [1] "Retrieving reactions for genes/proteins"

names(reactionsLists)

## [1] "met2rxn"                   "prot2rxn"                 
## [3] "metProteinCommonReactions"

# Just show reactions with at least one metabolite and one protein from the user list.
datatable(subset(reactionsLists$metProteinCommonReactions, select = -c(rxn_html_label)))

Chemical Classes and Properties Queries

RaMP incorporates Classfire and LIPIDMAPS classes, which can readily be retrieved by inputting metabolite names. Remember that metabolite names must be prepended by one of the following ID types:

getPrefixesFromAnalytes(db = rampDB, "metabolite")

##   analyteType
## 1 Metabolites
##                                                                                                                              idTypes
## 1 hmdb, chebi, chemspider, kegg, pubchem, CAS, wikidata, LIPIDMAPS, lipidbank, swisslipids, plantfa, kegg_glycan, rhea-comp, polymer

Query Chemical Classes

Here’s how to query chemical classes:

metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
                            'hmdb:HMDB0011211')
chemical.classes <- chemicalClassSurvey(db = rampDB, mets = metabolites.of.interest)

## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"

metabolite.classes <- as.data.frame(chemical.classes$met_classes)
datatable(metabolite.classes)

# Print unique class names:
print(unique(metabolite.classes$class_name))

##  [1] "Carboxylic acids and derivatives"                    
##  [2] "Amino acids, peptides, and analogues"                
##  [3] "Organic acids and derivatives"                       
##  [4] "Glycerophospholipids"                                
##  [5] "Glycerophosphocholines"                              
##  [6] "Lipids and lipid-like molecules"                     
##  [7] "Glycerophospholipids [GP]"                           
##  [8] "Glycerophosphocholines [GP01]"                       
##  [9] "Diacylglycerophosphocholines [GP0101]"               
## [10] "1-(1Z-alkenyl),2-acylglycerophosphocholines [GP0103]"
## [11] "Dicarboxylic acids and derivatives"

###@ Query Chemical Properties

Here’s how to query chemical properties:

chemical.properties <- getChemicalProperties(db = rampDB, metabolites.of.interest)

## Starting Chemical Property Query

## Finished Chemical Property Query

chemical.data <- chemical.properties$chem_props
datatable(chemical.data)

Enrichment Analyses

RaMP performs pathway and chemical class overrespresentation analysis using Fisher’s tests.

Pathway Enrichment (multi-omic)

Pathway enrichment can be run on gene/proteins and metabolites. Note that RaMP-DB focuses on human metabolic pathways.

fisher_results <- runCombinedFisherTest(db = rampDB, analytes = c(
                                                  "hmdb:HMDB0000033",
                                                  "hmdb:HMDB0000052",
                                                  "hmdb:HMDB0000094",
                                                  "hmdb:HMDB0000161",
                                                  "hmdb:HMDB0000168",
                                                  "hmdb:HMDB0000191",
                                                  "hmdb:HMDB0000201",
                                                  "chemspider:10026",
                                                  "hmdb:HMDB0006059",
                                                  "Chemspider:6405",
                                                  "CAS:5657-19-2",
                                                  "hmdb:HMDB0002511",
                                                  "chemspider:20171375",
                                                  "CAS:133-32-4",
                                                  "CAS:5746-90-7",
                                                  "CAS:477251-67-5",
                                                  "hmdb:HMDB0000695",
                                                  "chebi:15934",
                                                  "CAS:838-07-3",
                                                  "hmdb:HMDBP00789",
                                                  "hmdb:HMDBP00283",
                                                  "hmdb:HMDBP00284",
                                                  "hmdb:HMDBP00850"))
dim(fisher_results)

To filter the pathways for significance (e.g., FDR p-value < 0.05):

filtered.fisher.results <- FilterFishersResults(fisher_results, pval_type = 'holm', pval_cutoff=0.05)

## [1] "Filtering Fisher Results..."
## [1] "Fisher Result Type: Pathway Enrichment"

# Quick view results:
datatable(filtered.fisher.results$fishresults)

#Note* RaMP-DB combines pathways from multiple sources, thus similar/overlapping pathways may be represented more than once. Further, due to the hierarchical nature of pathways and because Fisher’s testing assumes pathways are independent, subpathways and their parent pathways may appear in a list.

The following clustering function can be run to help group together similar/overlaping.

clusters <- RaMP::findCluster(db = rampDB, filtered.fisher.results,
  perc_analyte_overlap = 0.2,
  min_pathway_tocluster = 2, perc_pathway_overlap = 0.2
)

## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."

## print("Pathways with Holm-adjusted Pval < 0.05")

datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
  rownames = FALSE)

To visualize:

pathwayResultsPlot(db = rampDB, filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2, 
    min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE)

## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."

Note: To explicitly view the results of mapping input IDs to RaMP, users can run the getPathwayFromAnalyte() function as noted in above in the section “Retrieve Pathways From Input Analyte(s)”.

Chemical Enrichment

Users can perform chemical enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LIPIDMAPS categories, main classess, and sub classes. Here’s how:

metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
                            'hmdb:HMDB0011211')
chemical.enrichment <- chemicalClassEnrichment(db = rampDB, mets = metabolites.of.interest)

## [1] "Starting Chemical Class Enrichment"
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
## [1] "check total summary"
## [1] "getting population totals"
## [1] "Finished Chemical Class Enrichment"

# Enrichment was performed on the following chemical classes:
names(chemical.enrichment)

## [1] "ClassyFire_class"       "ClassyFire_sub_class"   "ClassyFire_super_class"
## [4] "result_type"

# To retrieve results for the ClassyFire Class:
classy_fire_classes <- chemical.enrichment$ClassyFire_class
datatable(classy_fire_classes)

Note: To explicitly view the results of mapping input IDs to RaMP, users can run the chemicalClassSurvey() function as noted in above in the section “Retrieve Chemical Class from Input Metabolites”.

And for good measure:

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.0
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.3 dplyr_1.1.3    DT_0.30        RaMP_3.0.0    
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4         xfun_0.41            bslib_0.5.1         
##  [4] ggplot2_3.4.4        htmlwidgets_1.6.2    visNetwork_2.1.2    
##  [7] lattice_0.21-9       vctrs_0.6.4          tools_4.3.2         
## [10] crosstalk_1.2.0      generics_0.1.3       curl_5.1.0          
## [13] tibble_3.2.1         fansi_1.0.5          RSQLite_2.3.3       
## [16] highr_0.10           blob_1.2.4           janeaustenr_1.0.0   
## [19] pkgconfig_2.0.3      tokenizers_0.3.0     Matrix_1.6-1.1      
## [22] data.table_1.14.8    dbplyr_2.4.0         lifecycle_1.0.4     
## [25] compiler_4.3.2       farver_2.1.1         munsell_0.5.0       
## [28] htmltools_0.5.7      SnowballC_0.7.1      sass_0.4.7          
## [31] yaml_2.3.7           tidytext_0.4.1       pillar_1.9.0        
## [34] jquerylib_0.1.4      tidyr_1.3.0          ellipsis_0.3.2      
## [37] cachem_1.0.8         tidyselect_1.2.0     digest_0.6.33       
## [40] stringi_1.7.12       purrr_1.0.2          labeling_0.4.3      
## [43] fastmap_1.1.1        grid_4.3.2           colorspace_2.1-0    
## [46] cli_3.6.1            utf8_1.2.4           withr_2.5.2         
## [49] filelock_1.0.2       scales_1.2.1         bit64_4.0.5         
## [52] rmarkdown_2.25       httr_1.4.7           bit_4.0.5           
## [55] memoise_2.0.1        evaluate_0.23        knitr_1.45          
## [58] BiocFileCache_2.10.1 rlang_1.1.2          Rcpp_1.0.11         
## [61] glue_1.6.2           DBI_1.1.3            rstudioapi_0.15.0   
## [64] jsonlite_1.8.7       R6_2.5.1