This vignette will provide basic steps for interacting with RaMP-DB
(Relational database of Metabolomic Pathways).
The codebase for RaMP-DB is available on our GitHub site,
sqlite branch. Details on RaMP-DB installation are also avaialble
through GitHub, and questions can be asked through the Issues tab or by
sending an email to NCATSRaMP@nih.gov.
RaMP-DB supports queries and enrichment analyses. Supported queries are:
Supported enrichment analyses are:
Once installed, first load the package. The first call is to list available database version within your local file cache and in our remote repository. Initialize RaMP database object. This method will reference a RaMP DB version in local file cache for your current session, or will download the latest version of the RaMP database. Note that this RaMP() method can accept a version argument with a format like, version=‘2.3.2’, for instance. The supplied version should be one of the versions shown after listing available versions.
library(RaMP)
library(DT) # for prettier tables in vignette
library(dplyr)
library(magrittr)
RaMP::listAvailableRaMPDbVersions()
## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
## [1] "No local versions of the RaMP Database were found."
## [1] "Please use the command 'db <- RaMP()' to download the latest version into local file cache."
## [1] "Alternatively you can use the command db <- RaMP(version = <remote_version_number>) using one of the versions listed below."
## [1] "Available remote RaMP SQLite DB versions for download:"
## [1] "2.3.2" "2.3.1"
Analytes (genes, proteins, metabolites) can be retrieve by pathway. Users have to input the exact pathway name. Here is an example:
## [1] "fired!"
## [1] "Timing .."
## user system elapsed
## 0.20 0.06 1.47
To retrieve information from multiple pathways, input a vector of pathway names:
myanalytes <- getAnalyteFromPathway(db = rampDB, pathway=c("De Novo Triacylglycerol Biosynthesis",
"sphingolipid metabolism"))
## [1] "fired!"
## [1] "Timing .."
## user system elapsed
## 0.17 0.15 0.93
It is oftentimes useful to get a sense of what pathways are represented in a dataset (this is particularly true for metabolomics, where coverage of metabolites varies depending on what platform is used). In other cases, one may be interested in exploring one or several metabolites to see what pathways they are arepresented in.
Note that it is always preferable to utilize IDs rather then common names. When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP currently supports the following ID types (that should be prepended):
metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite")
geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene")
datatable(rbind(metabprefixes, geneprefixes))
In this example, we will search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatinine.
pathwaydfids <- getPathwayFromAnalyte(db = rampDB, c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
"hmdb:HMDB0000148", "ensembl:ENSG00000141510"))
## [1] "Starting getPathwayFromAnalyte()"
## [1] "Working on ID List..."
## [1] "finished getPathwayFromAnalyte()"
## [1] "Found 866 associated pathways."
Note that each row returns a pathway attributed to one of the input analytes. To retrieve the number of unique pathways returned for all analytes or each analyte, try the following:
print(paste("Number of Unique Pathways Returned for All Analytes:",
length(unique(pathwaydfids$pathwayId))))
## [1] "Number of Unique Pathways Returned for All Analytes: 722"
lapply(unique(pathwaydfids$commonName), function(x) {
(paste("Number of Unique Pathways Returned for",x,":",
length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})
## [[1]]
## [1] "Number of Unique Pathways Returned for MDM2 : 402"
##
## [[2]]
## [1] "Number of Unique Pathways Returned for TP53 : 214"
##
## [[3]]
## [1] "Number of Unique Pathways Returned for L-Glutamic acid,Glutamate : 238"
##
## [[4]]
## [1] "Number of Unique Pathways Returned for Creatine : 12"
Conversely, the user can retrieve the metabolites that are associated with a specific ontology or vector of ontologies. We can accomplish this using the function getMetaFromOnto(). It should be noted that it does not matter which ontology the metabolites are from. The function will return all metabolites associated with all the ontologies specified by the user.
ontologies.of.interest <- c("Colon", "Liver", "Lung")
new.metabolites <- RaMP::getMetaFromOnto(db = rampDB, ontology = ontologies.of.interest)
## [1] "Retreiving Metabolites for input ontology terms."
## [1] "Found 3 ontology term matches."
## [1] "Found 1482 metabolites associated with the input ontology terms."
## [1] "Finished getting metabolies from ontology terms."
RaMP contains information on where the metabolites originate from the biospecimen. This information is called ontology. Here are all the ontologies found in RaMP.
To retrieve ontologies that are associated with our metabolites we can use getOntoFromMeta(). This function takes in a vector of metabolites as an input and returns a vector comprised of the ontologies from the user’s defined metabolites.
The user may want to know what gene transcripts encode enzymes which can catalyze reactions involving metabolites in their experiment. RaMP can return this data to its user.
We can return the gene transcripts using the rampFastCata() function. To use it the user needs to provide a vector of metabolites they are interested in and the connection information for MySQL. The user can also input a vector of protein IDs or gene transcripts to return the metabolites involved in chemical reactions with the input proteins or gene transcript encoded proteins.
#Input Metabolites
analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
"hmdb:HMDB0000148", "ensembl:ENSG00000141510")
new.transcripts <- rampFastCata(db = rampDB, analytes = analytes.of.interest)
## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 100"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 13062"
#Input Proteins
proteins.of.interest <- c("uniprot:094808", "uniprot:Q99259")
new.metabolites <- rampFastCata(db = rampDB, analytes = proteins.of.interest)
## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 0"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 12"
RaMP has a built in function which is able to generate networks from the transcript data. This function is named plotCataNetwork(). This function uses the dataframe created by rampFastCata() as an input. These plots are completely interactive.
RaMP incorporates Classfire and lipidMAPS classes. The function chemicalClassSurvey() function takes as input a vector of metabolites and outputs the classes associated with each metabolite input.
metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
'hmdb:HMDB0011211')
chemical.classes <- chemicalClassSurvey(db = rampDB, mets = metabolites.of.interest)
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
Chemical properties captured by RaMP include SMILES, InChI, InChI-keys, monoisotopic masses, molecular formula, and common name. The getChemicalProperties() function takes as input a vector of metabolites and outputs a list of chemical property information that can easily be converted into a dataframe.
## Starting Chemical Property Query
## Finished Chemical Property Query
RaMP performs pathway and chemical class overrespresentation analysis using Fisher’s tests.
Using the pathways that our analytes map to, captured in the pathwaydfids data frame in the previous step, we can now run Fisher’s Exact test to identify pathways that are enriched for our analytes of interest:
fisher.results <- runCombinedFisherTest(db = rampDB, analytes = c(
"hmdb:HMDB0000033",
"hmdb:HMDB0000052",
"hmdb:HMDB0000094",
"hmdb:HMDB0000161",
"hmdb:HMDB0000168",
"hmdb:HMDB0000191",
"hmdb:HMDB0000201",
"chemspider:10026",
"hmdb:HMDB0006059",
"Chemspider:6405",
"CAS:5657-19-2",
"hmdb:HMDB0002511",
"chemspider:20171375",
"CAS:133-32-4",
"CAS:5746-90-7",
"CAS:477251-67-5",
"hmdb:HMDB0000695",
"chebi:15934",
"CAS:838-07-3",
"hmdb:HMDBP00789",
"hmdb:HMDBP00283",
"hmdb:HMDBP00284",
"hmdb:HMDBP00850"
))
Note: To explicitly view the results of mapping input IDs to RaMP, users can run the getPathwayFromAnalyte() function as noted in above in the section “Retrieve Pathways From Input Analyte(s)”.
Once we have our fisher results we can format them into a new dataframe and filter the pathways for significance. For this example we will be using an FDR p-value cutoff of 0.05.
#Returning Fisher Pathways and P-Values
filtered.fisher.results <- FilterFishersResults(fisher.results, pval_type = 'holm', pval_cutoff=0.05)
## [1] "Filtering Fisher Results..."
## [1] "Fisher Result Type: Pathway Enrichment"
Because RaMP combines pathways from multiple sources, pathways may be represented more than once. Further, due to the hierarchical nature of pathways and because Fisher’s testing assumes pathways are independent, subpathways and their parent pathways may appear in a list. To help group together pathways that represent similar biological processes, we have implemented a clustering algorithm that groups pathways together if they share analytes in common.
clusters <- RaMP::findCluster(db = rampDB, filtered.fisher.results,
perc_analyte_overlap = 0.2,
min_pathway_tocluster = 2, perc_pathway_overlap = 0.2
)
## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."
## print("Pathways with Holm-adjusted Pval < 0.05")
datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
rownames = FALSE
)
To view clustered pathway results:
pathwayResultsPlot(db = rampDB, filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2,
min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE)
## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."
After retrieving chemical classes of metabolites, the function chemicalClassEnrichment() function will perform overrepresentation analysis using a Fisher’s test and output classes that show enrichment in the user input list of metabolites relative to the backgroud metabolite population (all meteabolites in RaMP). The function performs enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LipidMaps categories, main classess, and sub classes.
metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
'hmdb:HMDB0011211')
chemical.enrichment <- chemicalClassEnrichment(db = rampDB, mets = metabolites.of.interest)
## [1] "Starting Chemical Class Enrichment"
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
## [1] "check total summary"
## [1] "getting population totals"
## [1] "Finished Chemical Class Enrichment"
## [1] "ClassyFire_class" "ClassyFire_sub_class" "ClassyFire_super_class"
## [4] "result_type"
# To retrieve results for the ClassyFire Class:
classy_fire_classes <- chemical.enrichment$ClassyFire_class
datatable(classy_fire_classes)
Note: To explicitly view the results of mapping input IDs to RaMP, users can run the chemicalClassSurvey() function as noted in above in the section “Retrieve Chemical Class from Input Metabolites”.
This code section demonstrates a Rhea reaction query.
analytes.of.interest <- c('chebi:57368', 'uniprot:Q96N66', 'CHEBI:73003')
reactionsLists <- RaMP::getReactionsForAnalytes(db = rampDB, analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T)
## [1] "Retrieving reactions for compounds"
## [1] "Retrieving reactions for genes/proteins"
# just show the reactions with at least one metabolite and one protein in commmon.
datatable(subset(reactionsLists$metProteinCommonReactions, select = -c(rxn_html_label)))
Three reaction lists are returned, metabolites-to-reactions, proteins-to-reactions, and reactions that have at least one metaboite and one protein from the input analyte list.
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magrittr_2.0.2 dplyr_1.1.2 DT_0.28 RaMP_3.0.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 lattice_0.20-45 tidyr_1.3.0
## [4] visNetwork_2.1.2 assertthat_0.2.1 digest_0.6.29
## [7] utf8_1.2.2 BiocFileCache_2.0.0 R6_2.5.1
## [10] RSQLite_2.3.1 evaluate_0.21 highr_0.10
## [13] httr_1.4.7 ggplot2_3.4.3 pillar_1.9.0
## [16] rlang_1.1.0 curl_4.3.2 rstudioapi_0.13
## [19] data.table_1.14.8 jquerylib_0.1.4 blob_1.2.4
## [22] R.utils_2.12.2 R.oo_1.24.0 Matrix_1.4-1
## [25] rmarkdown_2.24 labeling_0.4.2 tidytext_0.4.1
## [28] htmlwidgets_1.6.2 bit_4.0.4 munsell_0.5.0
## [31] compiler_4.1.0 janeaustenr_1.0.0 xfun_0.40
## [34] pkgconfig_2.0.3 htmltools_0.5.6 tidyselect_1.2.0
## [37] tibble_3.2.1 fansi_1.0.3 dbplyr_2.1.1
## [40] withr_2.5.0 R.methodsS3_1.8.1 rappdirs_0.3.3
## [43] SnowballC_0.7.1 grid_4.1.0 jsonlite_1.8.7
## [46] gtable_0.3.4 lifecycle_1.0.3 DBI_1.1.3
## [49] scales_1.2.1 tokenizers_0.3.0 cli_3.6.1
## [52] stringi_1.7.6 cachem_1.0.6 farver_2.1.1
## [55] bslib_0.4.0 ellipsis_0.3.2 filelock_1.0.2
## [58] generics_0.1.3 vctrs_0.6.3 tools_4.1.0
## [61] bit64_4.0.5 glue_1.6.2 purrr_1.0.1
## [64] crosstalk_1.2.0 fastmap_1.1.0 yaml_2.3.7
## [67] colorspace_2.1-0 memoise_2.0.1 knitr_1.43
## [70] sass_0.4.2