Running the linear model on the NCI-60 cell lines

Introduction

The NCI-60 cancer cell lines were developed as a drug screening tool focusing on a range of cancer types. In this vignette, we compare drug scores of NCI-60 cancer cell lines (1, 2).

Loading in IntLIM and Files

IntLIM, available from Github, can be installed as in the documentation. Once IntLIM is installed, what is necessary is loading in the library.

rm(list = ls())
if(!require("devtools")){
  install.packages("devtools")
}

## Loading required package: devtools

## Loading required package: usethis

library("devtools")
if(!require("IntLIM")){
  install_github("ncats/IntLIM")
}

## Loading required package: IntLIM

library(IntLIM)
library(rmarkdown)

For the NCI-60 cell lines, metabolomics and gene expression data were downloadable from the DTP website (https://wiki.nci.nih.gov/display/ncidtpdata/molecular+target+data). The Metabolon data consisting of 353 metabolites and 58 cell lines with 177 technical replicates total was filtered for metabolites that had a median coefficient of variation of below 0.3. The coefficient value was arbitrarily selected to filter out technical replicates having high variability. The resulting metabolite abundance data set of 280 metabolites was subsequently log2 transformed. Probes from the Chiron Affymetrix U133 were mapped to genes using the Ensembl database hgu133.plus.db. Probes mapping to more than one gene were removed. In cases where more than one probe was matching to a given gene, only the probe with the highest mean expression was used for analysis. This resulted in a total of 17,987 genes.

This data has been formatted for IntLIM. We load in the data as follows. The nci60.csv meta file contains a list with the phenotypic data file, metabolite data file, gene expression data file, metabolite meta file, and gene expression meta file. The function ShowStats() will give a summary of the NCI-60 data for all samples containing both gene expression and metabolite abundance. We find that we have gene expression data involving 17,987 genes and metabolite abundance data involving 280 metabolites with 57 cell lines.

csvfile <- file.path(getwd(), "/NCI60_data/nci60.csv")
inputData <- IntLIM::ReadData(inputFile = csvfile,analyteType1id='id',analyteType2id='id',
                              class.feat = list(drugscore = "numeric"))

## Warning in IntLIM::ReadData(inputFile = csvfile, analyteType1id = "id", : The
## following samples were not shared in all data types and were removed: RXF.393

## IntLIMData created

IntLIM::ShowStats(inputData)

##   Num_Analytes_Type1 Num_Analytes_Type2 Num_Samples
## 1              17987                280          57

Filtering Gene Expression and Metabolite Data

We remove genes with mean belows below the 10th percentile. Furthermore, we remove analytes with more than 80% missing values (where an NA value indicates missingness). This results in gene expression data involving 16,188 genes and metabolite abundance data involving 280 metabolites in 57 cell lines.

inputDatafilt <- IntLIM::FilterData(inputData,analyteType1perc=0.10, analyteMiss = 0.80)

## Warning in IntLIM::FilterData(inputData, analyteType1perc = 0.1, analyteMiss =
## 0.8): No filtering by percentile is applied for analyte type 2

IntLIM::ShowStats(inputDatafilt)

##   Num_Analytes_Type1 Num_Analytes_Type2 Num_Samples
## 1              16188                280          57

We can obtain boxplot distributions of the data as follows.

IntLIM::PlotDistributions(inputDatafilt)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Principal Component Analysis

The principal component analysis is performed on filtered metabolite and gene expression data to obtain visual representations showing how different sub-sections of the data could be grouped into different clusters. Common samples are shown. Darker blue samples indicate higher drug scores.

PlotPCA(inputDatafilt, stype = "drugscore")

Running Linear Model

The linear model for integrating multi-omics and data is: a_i = β0 + β1 a_j + β2 p + β3 (a_j:p) + Σ(β(3+c) C) + ε where ‘a_i’ and ‘a_j’ are (optionally log-transformed) analyte abundances, ‘p’ is phenotype (cancer type, patient diagnosis, treatment group, etc), ‘(a_j:p)’ is the association between analyte level and phenotype, C is a set of clinical covariates (e.g. age, sex, etc), c is a covariate index, and ‘ε’ is the error term that is normally distributed. A statistically significant p-value of the ‘(a_j:p)’ association term indicates that the slope relating analyte levels differs by phenotype. We run a linear model that included 16,188 genes and 280 metabolites (total of 4,532,640 possible associations and hence models). The model is run as below by calling RunIntLim(). DistPvalues() allows us to obtain a distribution of p-values for the (a_j:p) term, and DistRSquared() allows us to obtain a distribution of R^2 values for the models. InteractionCoefficientGraph() allows us to obtain an ordered plot of the (a_j:p) term coefficients. We also show a volcano plot showing the relationship between interaction coefficients and unadjusted p-values using pvalCoefVolcano().

Covariates (e.g. age and batch) may also be included in the models as a list using the covar parameter. Please be advised that not all covariates in the experiment need be included. In particular, colliders (variables influenced by one or more other variables) should not be included, and potential confounding variables should be included. The choice of covariates should be determined by the experiment.

Note that the user must select either analyte type 1 or analyte type 2 for the independent and outcome variables. This choice should be biologically motivated. For instance, if the user wants to investigate metabolite abundances that are influenced by gene expression level and metabolite abundance is analyte type 1 and gene expression is analyte type 2, then the outcome would be analyte type 1 and the independent variable would be analyte type 2.

myres <- IntLIM::RunIntLim(inputDatafilt, stype="drugscore", continuous = TRUE,
                           independent.var.type = 1, outcome = 2, save.covar.pvals = TRUE)

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

IntLIM::DistPvalues(myres, adjusted = FALSE)

IntLIM::DistRSquared(myres)

IntLIM::InteractionCoefficientGraph(inputResults = myres, 
                                    interactionCoeffPercentile = 0.9)

IntLIM::pvalCoefVolcano(inputResults = myres, inputData = inputDatafilt,
                        pvalcutoff = 0.05)

## 4532640 pairs found given cutoffs

The next step is to process the results of this model by filtering the results of the linear model by FDR-adjusted p-value cutoff (0.10 selected here) for the (a_j:p) association coefficient. We further may only interested in results where the interaction coefficient is in the top Xth percentile (e.g. 50th shown here) and where the model fit as measured by R^2 value is above a given cutoff (e.g. 0.2). This is done with the ProcessResults() function.

It is also possible to filter for significant main effects (independent analyte level or phenotype). Users can then determine which significant pairs also have significant main effects.

myres.sig <- IntLIM::ProcessResults(myres,  inputDatafilt, pvalcutoff = 0.10, rsquaredCutoff = 0,
                                    coeffPercentile = 0.5)

## 2517 pairs found given cutoffs

myres.sig.stype <- IntLIM::ProcessResults(myres,  inputDatafilt, pvalcutoff = 0.10, rsquaredCutoff = 0,
                                    coeffPercentile = 0.5, coefficient = "stype")

## 5038 pairs found given cutoffs

myres.sig.a <- IntLIM::ProcessResults(myres,  inputDatafilt, pvalcutoff = 0.10, rsquaredCutoff = 0,
                                    coeffPercentile = 0.5, coefficient = "analyte")

## 3715 pairs found given cutoffs

OutputResults(inputResults = myres.sig, filename = "NCI60pairs.csv")

## Wrote results to NCI60pairs.csv

From this model we find 2,517 gene-metabolite correlations that meet our criteria. From the HistogramPairs() histogram plots below, we can see that most genes are associated with a single metabolite and that most metabolites are associated with a single gene. However, we see that some genes are associated with up to 12 metabolites and that some metabolites are associated with up to 359 genes.

We can see that a few of the metabolites associated with high numbers of genes by drug score are 2’-deoxyuridine 5’-triphosphate, inositol 1-phosphate, beta-nicotinamide adenine dinucleotide, and cholesterol.

# Plot histogram.
IntLIM::HistogramPairs(myres.sig, type = "outcome")

IntLIM::HistogramPairs(myres.sig, type = "independent")

# Tabulate gene and metabolite counts.
table.gene.count <- as.data.frame(table(as.character(myres.sig$Analyte1)))
table.metab.count <- as.data.frame(table(as.character(myres.sig$Analyte2)))
print(table.gene.count[order(-table.gene.count$Freq)[1:10],])

##         Var1 Freq
## 66     ANXA2   12
## 114  B4GALT4   10
## 1365   TSTA3   10
## 1210   SOCS3    8
## 212    CD151    7
## 505     GDI2    7
## 513     GLRB    7
## 1360 TSPAN17    7
## 1452  ZNF101    7
## 271    CORO7    6

print(table.metab.count[order(-table.metab.count$Freq)[1:10],])

##                                       Var1 Freq
## 54                                  X.1130  349
## 53                                  X.1122  310
## 56                                  X.1391  246
## 83                                  X.3091  181
## 115                                 X.4271  143
## 123                                 X.4727  128
## 84                                  X.3094  104
## 124       X2..deoxyuridine.5..triphosphate   90
## 23                    inositol.1.phosphate   81
## 8   beta.nicotinamide.adenine.dinucleotide   57

# Print gene associations with metabolites of interest.
str(myres.sig[which(myres.sig$Analyte2 == "X2..deoxyuridine.5..triphosphate"),])

## 'data.frame':    90 obs. of  10 variables:
##  $ Analyte1         : Factor w/ 16188 levels "PRPF8","CAPNS1",..: 2949 6259 3024 6486 16121 15827 6121 1335 3686 3847 ...
##  $ Analyte2         : Factor w/ 280 levels "X.p.Hydroxyphenyl.lactic.acid",..: 98 98 98 98 98 98 98 98 98 98 ...
##  $ interaction_coeff: num  -2.12 -1.94 -1.83 -1.73 -1.53 ...
##  $ Pval             : num  3.93e-05 4.80e-05 1.14e-05 3.93e-05 1.79e-06 ...
##  $ FDRadjPval       : num  0.06805 0.07638 0.03255 0.06805 0.00987 ...
##  $ rsquared         : num  0.321 0.314 0.352 0.326 0.391 ...
##  $ (Intercept)      : num  -1.114 -2.079 -1.771 -0.846 -2.091 ...
##  $ a                : num  -0.1021 0.0163 -0.0233 -0.1794 0.0205 ...
##  $ type             : num  18.2 13.4 15.2 11.1 10.2 ...
##  $ a:type           : num  -2.12 -1.94 -1.83 -1.73 -1.53 ...

str(myres.sig[which(myres.sig$Analyte2 == "inositol.1.phosphate"),])

## 'data.frame':    81 obs. of  10 variables:
##  $ Analyte1         : Factor w/ 16188 levels "PRPF8","CAPNS1",..: 12497 26 8779 13158 13537 8045 15507 8367 7732 9433 ...
##  $ Analyte2         : Factor w/ 280 levels "X.p.Hydroxyphenyl.lactic.acid",..: 32 32 32 32 32 32 32 32 32 32 ...
##  $ interaction_coeff: num  -2.69 -2.53 -2.48 -2.33 -2.05 ...
##  $ Pval             : num  1.10e-06 9.78e-06 2.57e-06 4.45e-05 4.40e-05 ...
##  $ FDRadjPval       : num  0.00738 0.02964 0.01249 0.07311 0.07263 ...
##  $ rsquared         : num  0.57 0.558 0.548 0.53 0.536 ...
##  $ (Intercept)      : num  3.55 6.98 2.09 3.4 2.29 ...
##  $ a                : num  -0.594 -0.757 -0.431 -0.548 -0.441 ...
##  $ type             : num  16.5 22.9 12.5 14.2 11 ...
##  $ a:type           : num  -2.69 -2.53 -2.48 -2.33 -2.05 ...

str(myres.sig[which(myres.sig$Analyte2 == "beta.nicotinamide.adenine.dinucleotide"),])

## 'data.frame':    57 obs. of  10 variables:
##  $ Analyte1         : Factor w/ 16188 levels "PRPF8","CAPNS1",..: 544 3817 6620 12752 7613 3909 13004 12384 12621 13901 ...
##  $ Analyte2         : Factor w/ 280 levels "X.p.Hydroxyphenyl.lactic.acid",..: 107 107 107 107 107 107 107 107 107 107 ...
##  $ interaction_coeff: num  -8.8 -6.56 -6.34 -5.85 -5.63 ...
##  $ Pval             : num  3.90e-05 1.53e-06 2.52e-05 5.60e-06 1.30e-07 ...
##  $ FDRadjPval       : num  0.06797 0.00908 0.05215 0.02078 0.00161 ...
##  $ rsquared         : num  0.284 0.363 0.293 0.338 0.417 ...
##  $ (Intercept)      : num  -1.333 7.9 3.633 5.278 0.859 ...
##  $ a                : num  0.0833 -0.9215 -0.8746 -1.1285 -0.1938 ...
##  $ type             : num  73.9 60.1 29.3 31.2 44.1 ...
##  $ a:type           : num  -8.8 -6.56 -6.34 -5.85 -5.63 ...

str(myres.sig[which(myres.sig$Analyte2 == "cholesterol"),])

## 'data.frame':    40 obs. of  10 variables:
##  $ Analyte1         : Factor w/ 16188 levels "PRPF8","CAPNS1",..: 209 8447 678 2873 495 8367 5437 1099 2091 1877 ...
##  $ Analyte2         : Factor w/ 280 levels "X.p.Hydroxyphenyl.lactic.acid",..: 14 14 14 14 14 14 14 14 14 14 ...
##  $ interaction_coeff: num  -2.35 -1.68 -1.59 -1.58 -1.38 ...
##  $ Pval             : num  7.44e-05 3.47e-07 4.24e-05 4.37e-05 7.65e-05 ...
##  $ FDRadjPval       : num  0.09491 0.00328 0.07138 0.0724 0.09633 ...
##  $ rsquared         : num  0.463 0.634 0.491 0.582 0.472 ...
##  $ (Intercept)      : num  1.56 1.94 2.68 6.59 2.89 ...
##  $ a                : num  -0.197 -0.38 -0.351 -0.784 -0.377 ...
##  $ type             : num  19.7 8.84 11.98 12.98 10.42 ...
##  $ a:type           : num  -2.35 -1.68 -1.59 -1.58 -1.38 ...

We can show some example plots of some of these pairs. The first example is the HHAT vs. cholesterol.

The PlotPairResiduals() function plots the standardized residuals for a given pair, allowing the user to determine whether model assumptions are fulfilled. In the example below, there is a slight linear trend in the residuals, indicating that the relationship between HHAT and cholesterol may better be approximated by a more complex model.

IntLIM::PlotPair(inputDatafilt, myres, 
                 independentAnalyteOfInterest = "HHAT", 
                 outcomeAnalyteOfInterest = "cholesterol",
                 independentVariable = 1,
                 outcome = 2)

## 
## Call:  stats::glm(formula = form, data = dataframe)
## 
## Coefficients:
## (Intercept)            g         type       g:type  
##     -1.9542       0.3361      -7.6420       1.2900  
## 
## Degrees of Freedom: 56 Total (i.e. Null);  53 Residual
## Null Deviance:       51.07 
## Residual Deviance: 19.74     AIC: 111.3

IntLIM::PlotPairResiduals(inputDatafilt, myres, 
                 independentAnalyteOfInterest = "HHAT", 
                 outcomeAnalyteOfInterest = "cholesterol",
                 independentVariable = 1,
                 outcome = 2)

Another example is CDKN1A vs. cholesterol. In the example below, there is a linear trend in the residuals, indicating that the relationship between CDKN1A and cholesterol may better be approximated by a more complex model.

IntLIM::PlotPair(inputDatafilt, myres, 
                 independentAnalyteOfInterest = "CDKN1A", 
                 outcomeAnalyteOfInterest = "cholesterol",
                 independentVariable = 1,
                 outcome = 2)

## 
## Call:  stats::glm(formula = form, data = dataframe)
## 
## Coefficients:
## (Intercept)            g         type       g:type  
##     -1.1986       0.1198      -4.8846       0.5108  
## 
## Degrees of Freedom: 56 Total (i.e. Null);  53 Residual
## Null Deviance:       51.07 
## Residual Deviance: 23.19     AIC: 120.5

IntLIM::PlotPairResiduals(inputDatafilt, myres, 
                 independentAnalyteOfInterest = "CDKN1A", 
                 outcomeAnalyteOfInterest = "cholesterol",
                 independentVariable = 1,
                 outcome = 2)

Cross-Validation

We then run 5-fold cross-validation to determine the overlap of significant pairs across folds. To replicate leave-one-out cross-validation as shown in the paper, change folds from 5 to 57.

folds <- 5
crossValResults <- RunCrossValidation(inputData = inputData, analyteType1perc = 0.10, 
                   analyteMiss = 0.80, stype="drugscore", outcome = c(2), 
                   independent.var.type = c(1), pvalcutoff = 0.10, 
                   rsquaredCutoff = 0.2,
                   folds = folds, continuous = TRUE,
                   interactionCoeffPercentile = 0.5)

## Warning in FilterData(inputData = inputDataFolds[[i]]$training, analyteType1perc
## = analyteType1perc, : No filtering by percentile is applied for analyte type 2

## Warning in FilterData(inputData = inputDataFolds[[i]]$training, analyteType1perc
## = analyteType1perc, : No filtering by percentile is applied for analyte type 2

## Warning in FilterData(inputData = inputDataFolds[[i]]$training, analyteType1perc
## = analyteType1perc, : No filtering by percentile is applied for analyte type 2

## Warning in FilterData(inputData = inputDataFolds[[i]]$training, analyteType1perc
## = analyteType1perc, : No filtering by percentile is applied for analyte type 2

## Warning in FilterData(inputData = inputDataFolds[[i]]$training, analyteType1perc
## = analyteType1perc, : No filtering by percentile is applied for analyte type 2

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## Running the analysis on continuous data with range -1.09231221699814 0.713663955547085

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## Running the analysis on continuous data with range -0.857229959662547 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 5412 pairs found given cutoffs

## 5864 pairs found given cutoffs

## 14230 pairs found given cutoffs

## 4577 pairs found given cutoffs

## 2888 pairs found given cutoffs

IntLIM::PlotFoldOverlapUpSet(crossValResults$processed)

# Find number of pairs overlapping in at least half of the folds.
result <- lapply(1:folds, function(j){return(rownames(crossValResults$processed[[j]]))})
result_flat <- unlist(result)
fold_count <- table(result_flat)
length(which(fold_count > folds / 2))

# Find number of pairs from original data that overlap the cross-validation data.
resultPairs <- paste(myres.sig$Analyte1, myres.sig$Analyte2, sep = "__")
foldCountResults<- fold_count[which(names(fold_count) %in% resultPairs)]
str(foldCountResults)
length(which(foldCountResults > folds / 2))

Permutation

Finally, we run 5 permutations of the data. To replicate the results of the paper, change the num.permutations from 5 to 100. For large datasets and high numbers of permutations, you may wish to run your code on a supercomputer. For very large datasets, you may wish to create several batch jobs on a supercomputer, each of which calls PermuteIntLIM() with num.permutations = 1, and then concatenate the results after all jobs have completed. If you do this, it is recommended to change the seed for each call.

PermuteIntLIM() returns R^2 values averaged over all pairs, the number of significant pairs per permutation, and the significant pairs per permutation. These can be further refined using PermutationCountSummary(), which computes the number of significant pairs, independent variable analytes, and outcome analytes per permutation. Furthermore, a violin plot can be generated by setting plot = TRUE. In addition, PermutationPairSummary() computes the number of permutations in which each pair was found to be significant. This can also be visualized using a horizontal bar plot. For the NCI-60 data set, there are 4 pairs that were significant in 4 out of 5 permutations. In addition, the number of metabolites and pairs that were significant in the original data was higher than the number in the permuted data.

# Permute and print summary.
perm.res <- IntLIM::PermuteIntLIM(data = inputDatafilt, stype = "drugscore", outcome = 2,
                     independent.var.type = 1, pvalcutoff = 0.10, interactionCoeffPercentile = 0.5,
                  rsquaredCutoff = 0.2,
                  num.permutations = 5, continuous = TRUE)

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 673 pairs found given cutoffs

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 655 pairs found given cutoffs

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 2761 pairs found given cutoffs

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 1080 pairs found given cutoffs

## Running the analysis on continuous data with range -1.09231221699814 0.81607409536015

## 10 % complete

## 20 % complete

## 30 % complete

## 40 % complete

## 50 % complete

## 60 % complete

## 70 % complete

## 80 % complete

## 90 % complete

## 2259 pairs found given cutoffs

summaryCount <- IntLIM::PermutationCountSummary(permResults = perm.res, inputResults = myres.sig,
                              plot = TRUE)

print(summaryCount)

# Compute the number of times each pair significant in the original data was
# also significant in the permuted data.
summaryPairs <- IntLIM::PermutationPairSummary(permResults = perm.res, inputResults = myres.sig,
                              plot = TRUE)

str(summaryPairs)

The list of unique genes and metabolites in these pairs can be used to conduct a pathway enrichment analysis. Using RaMP, we query the set of pairs for shared pathways and shared reactions.

if(!require("RaMP")){
 install_github("ncats/RAMP-DB")
}

## Loading required package: RaMP

library(RaMP)

# Load files.
fdata <- inputDatafilt@analyteType2MetaData

# Match metabolite ID's to source.
metabid <- fdata[myres.sig$Analyte2, "databaseId"]

# Match gene ID's to source.
geneid <- paste0("gene_symbol:", myres.sig$Analyte1)

# Connect to RaMP. Replace with your MySQL password, or "" if not using a password.
password <- unlist(unname(read.csv(file.path("~/mysql_pw.txt"), header = FALSE)))
pkg.globals <- setConnectionToRaMP(dbname = "ramp", username = "root", conpass = password,
  host = "localhost")

# Find which pairs share pathways.
sharesPathway <- lapply(1:nrow(myres.sig), function(i){
  shares <- FALSE
  pwayResult <- getPathwayFromAnalyte(c(metabid[[i]], geneid[[i]]))
  pwayResultMetab <- pwayResult$pathwayId[which(pwayResult$inputId == metabid[[i]])]
  pwayResultGene <- pwayResult$pathwayId[which(pwayResult$inputId == geneid[[i]])]
  if(length(intersect(pwayResultMetab, pwayResultGene)) > 0){
    shares <- TRUE
  }
  return(shares)
})

# Find which pairs share reactions.
sharesRxn <- lapply(1:nrow(myres.sig), function(i){
  shares <- FALSE
  tryCatch({
    rxnResult <- rampFastCata(metabid[[i]])$rxn_partner_ids
    rxnResultAll <- unlist(lapply(rxnResult, function(res){
      return(strsplit(res, "; ")[[1]])
    }))
    if(geneid[[i]] %in% rxnResultAll){
      shares <- TRUE
    }
  }, error = function(cond){})
  return(shares)
})

# Further analyses reported in paper for comparison with original data
print(summaryPairs[which(summaryPairs$Pair == "HHAT__V__cholesterol"),])

##                      Pair Perm.Count
## 2205 HHAT__V__cholesterol          0

print(summaryPairs[which(summaryPairs$Pair == "CDKN1A__V__cholesterol"),])

##                        Pair Perm.Count
## 1818 CDKN1A__V__cholesterol          0

# Compare to pairs that share pathways.
pathwayPairs <- paste(myres.sig[which(sharesPathway == TRUE), "Analyte1"], 
                        myres.sig[which(sharesPathway == TRUE), "Analyte2"], sep = "__V__")
length(which(summaryPairs[which(summaryPairs$Pair %in% pathwayPairs), "Perm.Count"] == 0))

## [1] 44

# Compare to pairs that share reactions.
rxnPairs <- paste(myres.sig[which(sharesRxn == TRUE), "Analyte1"], 
                        myres.sig[which(sharesRxn == TRUE), "Analyte2"], sep = "__V__")
length(which(summaryPairs[which(summaryPairs$Pair %in% rxnPairs), "Perm.Count"] == 0))

## [1] 0

This chunk evaluates a subset of pairs using glm, illustrating the difference in running time between glm and IntLIM. This evaluation was conducted strictly for publication purposes and isn’t something the typical user will need to run. Nevertheless, the code is included here for those who wish to replicate the results of the paper.

library(stats)
library(rsq)

# Subset data.
inputDatafilt@analyteType1 <- inputDatafilt@analyteType1[1:10,]

# Run IntLIM.
myres <- IntLIM::RunIntLim(inputDatafilt, stype="drugscore", continuous = TRUE,
                           independent.var.type = 1, outcome = 2)

# Run GLM.
starttime <- Sys.time()
subdat <- NULL
suppressMessages({pvals = lapply(rownames(inputDatafilt@analyteType2), function(m){
  pvals_single = lapply(rownames(inputDatafilt@analyteType1), function(g){
    subdat = data.frame(drugscore=inputDatafilt@sampleMetaData,
                        m=inputDatafilt@analyteType2[m,],
                        g=inputDatafilt@analyteType1[g,])
    attach(subdat)
    form_glm = "m ~ g + drugscore + g:drugscore"
    myfit = glm(form_glm, data = subdat)

    pvals = coef(summary(myfit))[,4]
    result = data.frame(rsq(myfit), pvals["g"], pvals["g:drugscore"],
                        pvals["drugscore"])
    colnames(result) = c("Rsquared", "Pval_gene", "Pval_interaction", "Pval_drugscore")
    rownames(result) = paste(m, g, sep = "_")
    return(result)
  })
  result_metab = do.call("rbind", pvals_single)
})})
endtime <- Sys.time()
print(endtime - starttime)

References

Su, G., Burant, C.F., Beecher, C.W., Athey, B.D. and Meng, F. (2011) Integrated metabolome and transcriptome analysis of the NCI60 dataset. BMC bioinformatics, 12, S36.
Reinhold, W. C., Sunshine, M., Liu, H., Varma, S., Kohn, K. W., Morris, J., Doroshow, J., & Pommier, Y. (2012). CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set. Cancer Research, 72(14), 3499. https://doi.org/10.1158/0008-5472.CAN-12-1370.