Introduction

This vignette will provide basic steps for interacting with RaMP-DB (Relational database of Metabolomic Pathways).
The codebase for RaMP-DB is available on our GitHub site. Details on RaMP-DB installation are also avaialble through GitHub, and questions can be asked through the Issues tab or by sending an email to NCATSRaMP@nih.gov.

RaMP-DB supports queries and enrichment analyses. Supported queries are:

Supported enrichment analyses are:

Once installed, first load the package:

library(RaMP)
library(DT) # for prettier tables in vignette
library(dplyr)
library(magrittr)
pkg.globals <- setConnectionToRaMP(
  dbname = "ramp", username = "root", conpass = "",
  host = "localhost")

The latter line cannot be modified, other than changes to the parameters inside the function (e.g., values for dbname, username, conpass, and host). For example, if your password is different than ““, you would set conpass=”mypassword”. This line needs be run only once, every time the package is loaded.

Supported RaMP Queries

Retrieve Analytes From Input Pathway(s)

Analytes (genes, proteins, metabolites) can be retrieve by pathway. Users have to input the exact pathway name. Here is an example:

myanalytes <- getAnalyteFromPathway(pathway="sphingolipid metabolism")
## [1] "fired"
## [1] "Timing .."
##    user  system elapsed 
##   0.617   0.033   0.739
datatable(myanalytes)

To retrieve information from multiple pathways, input a vector of pathway names:

myanalytes <- getAnalyteFromPathway(pathway=c("De Novo Triacylglycerol Biosynthesis", 
                                              "sphingolipid metabolism"))
## [1] "fired"
## [1] "Timing .."
##    user  system elapsed 
##   0.193   0.005   0.314

Retrieve Pathways From Input Analyte(s)

It is oftentimes useful to get a sense of what pathways are represented in a dataset (this is particularly true for metabolomics, where coverage of metabolites varies depending on what platform is used). In other cases, one may be interested in exploring one or several metabolites to see what pathways they are arepresented in.

Note that it is always preferable to utilize IDs rather then common names. When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP currently supports the following ID types (that should be prepended):

  metabprefixes <- getPrefixesFromAnalytes("metabolite")
  geneprefixes <- getPrefixesFromAnalytes("gene")

  datatable(rbind(metabprefixes, geneprefixes))

In this example, we will search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatinine.

pathwaydfids <- getPathwayFromAnalyte(c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "ensembl:ENSG00000141510"))
## [1] "Starting getPathwayFromAnalyte()"
## [1] "Working on ID List..."
## [1] "finished getPathwaytFromAnalyte()"
## [1] "Found 316 associated pathways."
datatable(pathwaydfids)

Note that each row returns a pathway attributed to one of the input analytes. To retrieve the number of unique pathways returned for all analytes or each analyte, try the following:

print(paste("Number of Unique Pathways Returned for All Analytes:", 
            length(unique(pathwaydfids$pathwayId))))
## [1] "Number of Unique Pathways Returned for All Analytes: 247"
lapply(unique(pathwaydfids$commonName), function(x) {
        (paste("Number of Unique Pathways Returned for",x,":",
                length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})
## [[1]]
## [1] "Number of Unique Pathways Returned for TP53 : 185"
## 
## [[2]]
## [1] "Number of Unique Pathways Returned for MDM2 : 83"
## 
## [[3]]
## [1] "Number of Unique Pathways Returned for Glutamate; L-Glutamic acid : 39"
## 
## [[4]]
## [1] "Number of Unique Pathways Returned for Creatine : 9"

Retrieve Metabolites from Metabolite Ontologies

Conversely, the user can retrieve the metabolites that are associated with a specific ontology or vector of ontologies. We can accomplish this using the function getMetaFromOnto(). It should be noted that it does not matter which ontology the metabolites are from. The function will return all metabolites associated with all the ontologies specified by the user.

ontologies.of.interest <- c("Colon", "Liver", "Lung")

new.metabolites <- RaMP::getMetaFromOnto(ontology = ontologies.of.interest)
## [1] "Retreiving Metabolites for input ontology terms."
## [1] "Found 3 ontology term matches."
## [1] "Found 1529 metabolites associated with the input ontology terms."
## [1] "Finished getting metabolies from ontology terms."
datatable(new.metabolites)

Retrieve Ontologies from Input Metabolites

RaMP contains information on where the metabolites originate from the biospecimen. This information is called ontology. Here are all the ontologies found in RaMP.

To retrieve ontologies that are associated with our metabolites we can use getOntoFromMeta(). This function takes in a vector of metabolites as an input and returns a vector comprised of the ontologies from the user’s defined metabolites.

analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "ensembl:ENSG00000141510")
new.ontologies <- RaMP::getOntoFromMeta(analytes = analytes.of.interest)
datatable(new.ontologies)

Retrieve Analytes Involved in the Same Reaction

The user may want to know what gene transcripts encode enzymes which can catalyze reactions involving metabolites in their experiment. RaMP can return this data to its user.

We can return the gene transcripts using the rampFastCata() function. To use it the user needs to provide a vector of metabolites they are interested in and the connection information for MySQL. The user can also input a vector of protein IDs or gene transcripts to return the metabolites involved in chemical reactions with the input proteins or gene transcript encoded proteins.

#Input Metabolites
analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
        "hmdb:HMDB0000148", "ensembl:ENSG00000141510")

new.transcripts <- rampFastCata(analytes = analytes.of.interest)
## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 100"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 104"
datatable(new.transcripts)
#Input Proteins
proteins.of.interest <- c("uniprot:094808", "uniprot:Q99259")

new.metabolites <- rampFastCata(analytes = proteins.of.interest)
## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 0"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 12"
datatable(new.metabolites)

RaMP has a built in function which is able to generate networks from the transcript data. This function is named plotCataNetwork(). This function uses the dataframe created by rampFastCata() as an input. These plots are completely interactive.

plotCataNetwork(new.transcripts)

Retrieve Chemical Classes from Input Metabolites

RaMP incorporates Classfire and lipidMAPS classes. The function chemicalClassSurvey() function takes as input a vector of metabolites and outputs the classes associated with each metabolite input.

metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
                            'hmdb:HMDB0011211')
chemical.classes <- chemicalClassSurvey(mets = metabolites.of.interest)
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...colating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
metabolite.classes <- as.data.frame(chemical.classes$met_classes)
datatable(metabolite.classes)

Retrieve Chemical Property Information from Input Metabolites

Chemical properties captured by RaMP include SMILES, InChI, InChI-keys, monoisotopic masses, molecular formula, and common name. The getChemicalProperties() function takes as input a vector of metabolites and outputs a list of chemical property information that can easily be converted into a dataframe.

chemical.properties <- getChemicalProperties(metabolites.of.interest)
## Starting Chemical Property Query
## Finished Chemical Property Query
chemical.data <- chemical.properties$chem_props
datatable(chemical.data)

Enrichment Analyses

RaMP performs pathway and chemical class overrespresentation analysis using Fisher’s tests.

Perform Pathway Enrichment

Using the pathways that our analytes map to, captured in the pathwaydfids data frame in the previous step, we can now run Fisher’s Exact test to identify pathways that are enriched for our analytes of interest:

fisher.results <- runCombinedFisherTest(analytes = c(
                                                  "hmdb:HMDB0000033",
                                                  "hmdb:HMDB0000052",
                                                  "hmdb:HMDB0000094",
                                                  "hmdb:HMDB0000161",
                                                  "hmdb:HMDB0000168",
                                                  "hmdb:HMDB0000191",
                                                  "hmdb:HMDB0000201",
                                                  "chemspider:10026",
                                                  "hmdb:HMDB0006059",
                                                  "Chemspider:6405",
                                                  "CAS:5657-19-2",
                                                  "hmdb:HMDB0002511",
                                                  "chemspider:20171375",
                                                  "CAS:133-32-4",
                                                  "CAS:5746-90-7",
                                                  "CAS:477251-67-5",
                                                  "hmdb:HMDB0000695",
                                                  "chebi:15934",
                                                  "CAS:838-07-3",
                                                  "hmdb:HMDBP00789",
                                                  "hmdb:HMDBP00283",
                                                  "hmdb:HMDBP00284",
                                                  "hmdb:HMDBP00850"
))

Note: To explicitly view the results of mapping input IDs to RaMP, users can run the getPathwayFromAnalyte() function as noted in above in the section “Retrieve Pathways From Input Analyte(s)”.

Once we have our fisher results we can format them into a new dataframe and filter the pathways for significance. For this example we will be using an FDR p-value cutoff of 0.05.

#Returning Fisher Pathways and P-Values
filtered.fisher.results <- FilterFishersResults(fisher.results, pval_type = 'holm', pval_cutoff=0.05)
## [1] "Filtering Fisher Results..."
## [1] "Fisher Result Type: Pathway Enrichment"

Because RaMP combines pathways from multiple sources, pathways may be represented more than once. Further, due to the hierarchical nature of pathways and because Fisher’s testing assumes pathways are independent, subpathways and their parent pathways may appear in a list. To help group together pathways that represent similar biological processes, we have implemented a clustering algorithm that groups pathways together if they share analytes in common.

clusters <- RaMP::findCluster(filtered.fisher.results,
  perc_analyte_overlap = 0.2,
  min_pathway_tocluster = 2, perc_pathway_overlap = 0.2
)

## print("Pathways with Holm-adjusted Pval < 0.05")

datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
  rownames = FALSE
)

To view clustered pathway results:

pathwayResultsPlot(filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2, 
    min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE)

Perform Chemical Enrichment

After retrieving chemical classes of metabolites, the function chemicalClassEnrichment() function will perform overrepresentation analysis using a Fisher’s test and output classes that show enrichment in the user input list of metabolites relative to the backgroud metabolite population (all meteabolites in RaMP). The function performs enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LipidMaps categories, main classess, and sub classes.

metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
                            'hmdb:HMDB0011211')
chemical.enrichment <- chemicalClassEnrichment(mets = metabolites.of.interest)
## [1] "Starting Chemical Class Enrichment"
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...colating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
## [1] "Finished Chemical Class Enrichment"
# Enrichment was performed on the following chemical classes:
names(chemical.enrichment)
## [1] "ClassyFire_class"       "ClassyFire_sub_class"   "ClassyFire_super_class"
## [4] "LipidMaps_category"     "LipidMaps_main_class"   "LipidMaps_sub_class"   
## [7] "result_type"
# To retrieve results for the ClassyFire Class:
classy_fire_classes <- chemical.enrichment$ClassyFire_class
datatable(classy_fire_classes)

Note: To explicitly view the results of mapping input IDs to RaMP, users can run the chemicalClassSurvey() function as noted in above in the section “Retrieve Chemical Class from Input Metabolites”.

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.2 dplyr_1.0.8    DT_0.21        RaMP_2.0.0    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.2  xfun_0.29         bslib_0.3.1       purrr_0.3.4      
##  [5] lattice_0.20-45   RMariaDB_1.2.1    colorspace_2.0-3  vctrs_0.3.8      
##  [9] generics_0.1.2    SnowballC_0.7.0   htmltools_0.5.2   tidytext_0.3.2   
## [13] yaml_2.3.5        utf8_1.2.2        rlang_1.0.2       jquerylib_0.1.4  
## [17] pillar_1.7.0      glue_1.6.2        DBI_1.1.2         bit64_4.0.5      
## [21] lifecycle_1.0.1   stringr_1.4.0     munsell_0.5.0     gtable_0.3.0     
## [25] visNetwork_2.1.0  htmlwidgets_1.5.4 evaluate_0.14     labeling_0.4.2   
## [29] knitr_1.37        fastmap_1.1.0     crosstalk_1.2.0   fansi_1.0.2      
## [33] highr_0.9         tokenizers_0.2.1  Rcpp_1.0.8.3      scales_1.1.1     
## [37] jsonlite_1.8.0    farver_2.1.0      bit_4.0.4         ggplot2_3.3.5    
## [41] hms_1.1.1         digest_0.6.29     stringi_1.7.6     grid_4.1.2       
## [45] cli_3.2.0         tools_4.1.2       sass_0.4.0        tibble_3.1.6     
## [49] janeaustenr_0.1.5 crayon_1.5.0      tidyr_1.2.0       pkgconfig_2.0.3  
## [53] ellipsis_0.3.2    Matrix_1.3-4      lubridate_1.8.0   assertthat_0.2.1 
## [57] rmarkdown_2.11    rstudioapi_0.13   R6_2.5.1          compiler_4.1.2