sva_recount.Rmd

---
title: "SVA Recount"
author: "Christopher Lo"
date: "9/19/2022"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r echo=F, message=F, warning=F}
library(tidyverse)
library(recount3)
library(sva)
library(DESeq2)
library(grid)
library(gridExtra)
library(fastDummies)
library(ggfortify)
theme_set(theme_bw())
```


## Introduction

*What happens to the number of Surrogate Variables as we increase the number of studies in recount3?*

To illustrate this, we examined 4 studies that involved case/control prostate tumor and normal samples,
with a minimal of 25 samples per study. For each study, we examined the number of SVs (using method = "be"), 
and we also aggregated them one by one and examined the number of SVs. 

1.    SRP118614: "Overall design: Matched high-grade (GS=7(4+3)) prostate tumor and adjacent
normal specimens from 16 patients (8 AAM and 8 EAM) were subjected to two 
replicate runs of RNA-sequencing."

2.    SRP002628: "Overall design: We sequenced the transcriptome (polyA+) of 20 prostate cancer 
tumors and 10 matched normal tissues using Illumina GAII platform. 
Then we used bioinformatic approaches to identify prostate cancer specific 
aberrations which include gene fusion, alternative splicing, somatic mutation."

3.    SRP212704: "Overall design: Strand specific total RNA seq was performed using frozen 
patient matched prostate cancer tissue in biological duplicates.
Purpose: The goal of present study is to compare transcript level changes
between normal and tumor of same individuals"  

4.    SRP027258: "We utilized RNA sequencing to test the hypothesis that SFN modifies the 
expression of genes that are critical in prostate cancer progression. 
Normal prostate epithelial cells, and androgen-dependent and androgen-independent 
prostate cancer cells were treated with 15 µM SFN and the transcriptome was 
determined at 6 and 24 hour time points."

We also have a second case/control set on TB samples. 

1.    SRP098758 (n=428): "Samples are collected from subjects in a household contact study after a person comes back to the household, diagnosed with TB. Samples are collected every 6 months up to, 18 months. Some people go on to develop TB (cases) where as some others do not (controls). Here we are trying to establish a gene signature to predict the occurrence of TB."

2.    ERP115010 (n=360): "Close contacts of active TB were defined as individuals with a cumulative duration of exposure of greater than eight hours in a confined space to the index case prior to initiation of treatment. Known human immunodeficiency virus (HIV)-positive patients were excluded. At enrollment, interferon gamma release assays (IGRAs) were done using the QuantiFERON-TB Plus assay (Qiagen, Germany), and peripheral blood was collected into Tempus tubes for whole genome transcriptional profiling by RNA sequencing. Participants who progressed to active TB were identified by linkage with the national electronic TB register. Local case notes were reviewed to identify individuals who had received preventative treatment. This submission contains data from n=360 adult participants, of which n=9 progressed to TB during a median follow-up time of 1.9 years. [note: we identify case base on IGRA assay result, which does not distinguish active vs. latent TB. ]

3. SRP126580 (n=54): "Overall design: We undertook RNA Sequencing (RNA-Seq) of our earlier Berry et al. 2010 (GSE19444 and GSE19442) cohorts and additionally set up a prospective cohort study at Leicester (UK) in subject groups of incident TB and recent TB contacts, respectively. In the Leicester cohort, we performed systematic longitudinal sampling and clinical characterisation first, to validate our TB signature using RNA-Seq in a new and independent cohort of individuals with active TB and LTBI, and secondly to provide longitudinal data in a low TB incidence setting. All samples in this series were re-analyzed from GSE19444. There are links on each sample page to the original sample." [*QC note: 28% of genes have count of 0*]


```{r echo=F, message=F}
#Setup recount3
human_projects <- available_projects()
```

```{r echo=F, message=F}
#Studies we want to look at, and what we should grep for to find the case status
#under "sra.sample_attributes"
prostateCancerStudies = data.frame(studyName = c("SRP118614", "SRP002628", "SRP212704", "SRP027258"),
                        caseStatus_grep = c("prostate tumor", "Prostate cancer tissue", "Tumor", "prostate cancer"))

TB_Studies = data.frame(studyName = c("SRP098758", "ERP115010", "SRP126580"),
                        caseStatus_grep = c("case", "positive", "Active_TB"))
```


```{r echo=F, message=F}
#Define analysis functions.

analyzeSingleStudy = function(studyName, caseStatus_grep, logPlot = F) {
  #Load in phenotype metadata
  proj_info <- subset(human_projects, project == studyName)
  rse_gene <- create_rse(proj_info)
  phenotype = data.frame(colData(rse_gene))
  phenotype$caseStatus = grepl(caseStatus_grep, phenotype$sra.sample_attributes)
  mod = model.matrix(~caseStatus, data = phenotype)
  #Load in raw counts, and nomralize it via sizeFactors from DESeq. 
  geneCounts = assays(rse_gene)$raw_counts
  dds <- DESeqDataSetFromMatrix(countData = geneCounts, colData = phenotype, design = ~ caseStatus)
  dds <- estimateSizeFactors(dds)
  geneCountsNormalized <- counts(dds, normalized = TRUE)
  #Summary statistics of sizeFactors. 
  print(qplot(sizeFactors(dds), main = "Sizefactor estimates per sample"))
  print(qplot(colSums(geneCountsNormalized), main = paste0("Normalized total counts per sample for ", studyName)))
  
  #Estimate number of SVs
  n.sv = num.sv(geneCountsNormalized, mod, method = "be", vfilter = 3000)
  #PCA analysis
  pca = prcomp(t(geneCountsNormalized))
  variance = pca$sdev^2 / sum(pca$sdev^2)
  variance = variance[1:round(ncol(geneCounts) / 4, 1)]
  #PCA plots
  g = qplot(c(1:length(variance)), variance) + geom_line() + geom_point() +
        geom_hline(yintercept = 1/ncol(geneCounts), linetype = "dashed") +
        xlab("Principal Component") + ylab("Variance Explained") + 
        ggtitle(paste0(studyName, "\nNumber of SVs: ", n.sv))  + 
        ylim(0, 1)
  print(g)
  if(logPlot) {
    g = qplot(c(1:length(variance)), log10(variance)) + geom_line() + geom_point() +
        geom_hline(yintercept = log10(1/ncol(geneCounts)), linetype = "dashed") +
        xlab("Principal Component") + ylab("log10(Variance Explained)") + 
        ggtitle(paste0(studyName, "\nNumber of SVs: ", n.sv)) 
    print(g)
  }
  print(autoplot(pca, data = phenotype, colour = 'caseStatus'))
  
  #Estimate SVs:
  nullMod = as.matrix(mod[, 1])
  svobj = sva(geneCountsNormalized, mod, nullMod, n.sv = n.sv, vfilter = 3000)
  
  #Regression
  modsv = cbind(mod, svobj$sv)
  fitsv = lm.fit(modsv, t(geneCountsNormalized))
  
  return(list(svobj = svobj,
              geneCounts = geneCounts, #return raw counts. 
              mod = mod,
              fitsv = fitsv))
}
```


```{r echo=F, message=F}
#Define analysis functions.

sum_of_span_residue = function(X, Y) {
  X = as.matrix(X)
  Y = as.matrix(Y)
  stopifnot(nrow(X) == nrow(Y))
  #normalize X and Y by columns
  X = scale(X)
  Y = scale(Y)
  mod1 = lm.fit(X, Y)
  mod1_res = norm(mod1$fitted.values - Y, type = "F") / (nrow(Y) * ncol(Y))
  mod2 = lm.fit(Y, X)
  mod2_res = norm(mod2$fitted.values - X, type = "F") / (nrow(X) * ncol(X))
  return(mod1_res + mod2_res)
}

analyzeMultipleStudies = function(mods, geneCounts, singleStudy_SV, studyNames, forcedNumSV, logPlot = F) {
  stopifnot(class(mods) == "list", class(geneCounts) == "list", class(studyNames) == "character")
  stopifnot(length(mods) > 1, length(geneCounts) > 1, length(studyNames) > 1)
  
  #Get study variable
  studyNamesLong = character()
  for(j in 1:length(studyNames)) {
    studyNamesLong = c(studyNamesLong,
                       rep(studyNames[j], ncol(geneCounts[[j]])))
  }
  
  #Get sample size for each study
  singleStudy_size = unlist(lapply(singleStudy_SV, function(x) nrow(x$sv)))
  #Collapse geneCounts to dataframe
  geneCounts = do.call(cbind, geneCounts)
  #Collapse mods to dataframe, create dummy vars
  mods = do.call(rbind, mods)
  mods = dummy_cols(mods, select_columns = "i", remove_selected_columns = T, 
                    remove_first_dummy = T)
  mods = as.matrix(mods)
  
  #normalize across studies
  mods2 = as.data.frame(mods)
  for(col in 1:ncol(mods2)) {
    mods2[, col] = as.factor(mods2[, col])
  }
  dds <- DESeqDataSetFromMatrix(countData = geneCounts, colData = mods2, design = ~ caseStatusTRUE)
  dds <- estimateSizeFactors(dds)
  geneCountsNormalized <- counts(dds, normalized=TRUE)
  
  #SVA analysis
  n.sv = num.sv(geneCountsNormalized, as.matrix(mods), method = "be", vfilter = 3000)
  #PCA analysis
  pca = prcomp(t(geneCountsNormalized))
  variance = pca$sdev^2 / sum(pca$sdev^2)
  variance = variance[1:round(ncol(geneCounts) / 4, 1)]
  #PCA plots
  g = qplot(c(1:length(variance)), variance) + geom_line() + geom_point() + 
        geom_hline(yintercept = 1/ncol(geneCounts), linetype = "dashed") +
        xlab("Principal Component") + ylab("Variance Explained") + 
        ggtitle(paste0(studyNames, collapse = " + "),
                subtitle = paste0("Number of SVs: ", n.sv)) + 
        ylim(0, 1)
  print(g)
  if(logPlot) {
    g = qplot(c(1:length(variance)), log10(variance)) + geom_line() + geom_point() +
        geom_hline(yintercept = log10(1/ncol(geneCounts)), linetype = "dashed") +
        xlab("Principal Component") + ylab("log10(Variance Explained)") + 
        ggtitle(paste0(studyNames, collapse = " + "),
                subtitle = paste0("Number of SVs: ", n.sv)) 
    print(g)
  }

  mods_plot = as.data.frame(mods)
  mods_plot$studyName = studyNamesLong
  mods_plot$caseStatusTRUE = as.factor(mods_plot$caseStatusTRUE)
  print(autoplot(pca, data = mods_plot, colour = 'studyName', shape = 'caseStatusTRUE'))
  
  #Infer SV
  nullMods = mods[, -2]
  svobj = sva(geneCountsNormalized, mods, nullMods, n.sv = n.sv, vfilter = 3000)
  #Forced SVs
  span_residue = expand.grid(n.sv = 1:(forcedNumSV + 6),
                             study = 1:length(singleStudy_SV))
  span_residue$residue = NA

  for(n.sv in unique(span_residue$n.sv)) {
    forced_svobj = tryCatch({
       sva(geneCountsNormalized, mods, nullMods, n.sv = n.sv, vfilter = 3000)
    }, error = function(e) {
        joint_svobj_forced = NA
    }, finally = {})
    for(study in unique(span_residue$study)) {
      if(study == 1) {
        idx_start = 1
        idx_end = singleStudy_size[1]
      }else {
        idx_start = singleStudy_size[study - 1] + 1
        idx_end = singleStudy_size[study - 1] + singleStudy_size[study]
      }
      if(!is.na(forced_svobj)) {
        residue = sum_of_span_residue(singleStudy_SV[[study]]$sv,
                                      forced_svobj$sv[idx_start:idx_end ,])
        span_residue$residue[span_residue$n.sv == n.sv & span_residue$study == study] = residue
      }
    }
  }
  
  return(list(svobj = svobj,
              span_residue = span_residue))
}
```

```{r echo=F, message=F}
#Run analysis here.
#runAnalysis = function(study_info, logPlot = F) {
study_info = prostateCancerStudies
logPlot = F

agg_mods = list()
agg_geneCounts = list()
singleStudy_SV = list()
singleStudy_coefs = list()
multiStudy_SV = list()

multiStudy_span_residue = list()
expected_multiStudy_SV = 0
expected_multiStudy_SV_all = c()

for(i in 1:nrow(study_info)) {
  cat(i, "\n")
  study_i = analyzeSingleStudy(study_info$studyName[i], 
                               study_info$caseStatus_grep[i],
                               logPlot)
  
  agg_mods[[i]] = as.matrix(cbind(study_i$mod, i))
  agg_geneCounts[[i]] = study_i$geneCounts
  singleStudy_SV[[i]] = study_i$svobj
  singleStudy_coefs[[i]] = study_i$fitsv$coefficients
  expected_multiStudy_SV = expected_multiStudy_SV + study_i$svobj$n.sv
  
  if(i > 1) {
    multiStudy_i = analyzeMultipleStudies(agg_mods, 
                                          agg_geneCounts, 
                                          singleStudy_SV,
                                          study_info$studyName[1:i],
                                          expected_multiStudy_SV,
                                          logPlot)
    expected_multiStudy_SV_all[i] = expected_multiStudy_SV
    multiStudy_SV[[i]] = multiStudy_i$svobj
    multiStudy_span_residue[[i]] = multiStudy_i$span_residue
  }
  
}
```


```{r echo=F, message=F, fig.width=12, fig.height=10}
residue_plot_max = .2
plots = list()
blank <- grid.rect(gp=gpar(col="white"))

plots[[1]] = ggplot(multiStudy_span_residue[[2]] %>% filter(study == 1), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[2]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[2], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[2]] = ggplot(multiStudy_span_residue[[2]] %>% filter(study == 2), aes(n.sv, residue)) + geom_point() + geom_line() +  geom_vline(xintercept = multiStudy_SV[[2]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[2], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[3]] = blank
plots[[4]] = blank

plots[[5]] = ggplot(multiStudy_span_residue[[3]] %>% filter(study == 1), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[3]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[3], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[6]] = ggplot(multiStudy_span_residue[[3]] %>% filter(study == 2), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[3]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[3], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[7]] = ggplot(multiStudy_span_residue[[3]] %>% filter(study == 3), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[3]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[3], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[8]] = blank

plots[[9]] = ggplot(multiStudy_span_residue[[4]] %>% filter(study == 1), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[4]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[4], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[10]] = ggplot(multiStudy_span_residue[[4]] %>% filter(study == 2), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[4]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[4], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[11]] = ggplot(multiStudy_span_residue[[4]] %>% filter(study == 3), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[4]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[4], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

plots[[12]] = ggplot(multiStudy_span_residue[[4]] %>% filter(study == 4), aes(n.sv, residue)) + geom_point() + geom_line() + geom_vline(xintercept = multiStudy_SV[[4]]$n.sv, color = "red") + geom_vline(xintercept = expected_multiStudy_SV_all[4], color = "blue") + scale_y_continuous(limits = c(0, residue_plot_max))

grid.arrange(grobs = plots, nrow = 3)
```


```{r echo=F, message=F, fig.width=12, fig.height=9}
myPairs = function(df1, df2, bottom = "", left = "") {
  stopifnot(nrow(df1) == nrow(df2))
  plots = list()
  k = 1
  for(i in 1:ncol(df1)) {i
    for (j in 1:ncol(df2)) {
      correlation = round(cor(df1[, i], df2[, j]), 2)
      df = data.frame(x = df1[, i], y = df2[, j])
      plots[[k]] = ggplot(df, aes(x, y)) + geom_point(size = .8) + ggtitle(correlation) +
        theme(axis.text.x=element_blank(),
              axis.text.y=element_blank(),
              axis.ticks=element_blank(),
              legend.position="none",
              axis.title.x=element_blank(),
              axis.title.y=element_blank())
      k = k + 1
    }
  }
  top = paste0("SV Spanning Score: ", round(sum_of_span_residue(df1, df2), 4))
  print(marrangeGrob(plots, nrow = ncol(df1), ncol = ncol(df2), top = top, bottom = bottom, left = left))
}


```

```{r echo=F}
coefs = cbind(t(singleStudy_coefs[[1]][3:5 ,]),
              t(singleStudy_coefs[[2]][3:6 ,]),
              singleStudy_coefs[[3]][3 ,],
              t(singleStudy_coefs[[4]][3:4 ,]))
cor(coefs)
```


```{r echo=F, fig.width=10, fig.height=8}

# Compare the SVs to make sure they are similar between the joint and separate 
# analyses by plotting combinations of SVs against each other for the same samples
# 
# multiStudy2SV1 = multiStudy_SV[[2]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy2ForcedSV1 = multiStudy_forced_SV[[2]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy2SV2 = multiStudy_SV[[2]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):nrow(multiStudy_forced_SV[[2]]$sv) ,]
# multiStudy2ForcedSV2 = multiStudy_forced_SV[[2]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):nrow(multiStudy_forced_SV[[2]]$sv) ,]
# 
# multiStudy3SV1 = multiStudy_SV[[3]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy3ForcedSV1 = multiStudy_forced_SV[[3]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy3SV2 = multiStudy_SV[[3]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)) ,]
# multiStudy3ForcedSV2 = multiStudy_forced_SV[[3]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)) ,]
# multiStudy3SV3 = multiStudy_SV[[3]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+1):nrow(multiStudy_forced_SV[[3]]$sv) ,]
# multiStudy3ForcedSV3 = multiStudy_forced_SV[[3]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+1):nrow(multiStudy_forced_SV[[3]]$sv) ,]
# 
# multiStudy4SV1 = multiStudy_SV[[4]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy4ForcedSV1 = multiStudy_forced_SV[[4]]$sv[1:nrow(singleStudy_SV[[1]]$sv) ,]
# multiStudy4SV2 = multiStudy_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)) ,]
# multiStudy4ForcedSV2 = multiStudy_forced_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)) ,]
# multiStudy4SV3 = multiStudy_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+nrow(singleStudy_SV[[3]]$sv)) ,]
# multiStudy4ForcedSV3 = multiStudy_forced_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+1):(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+nrow(singleStudy_SV[[3]]$sv)) ,]
# multiStudy4SV4 = multiStudy_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv) + nrow(singleStudy_SV[[3]]$sv)+1):nrow(multiStudy_forced_SV[[4]]$sv) ,]
# multiStudy4ForcedSV4 = multiStudy_forced_SV[[4]]$sv[(nrow(singleStudy_SV[[1]]$sv)+nrow(singleStudy_SV[[2]]$sv)+nrow(singleStudy_SV[[3]]$sv)+1):nrow(multiStudy_forced_SV[[4]]$sv) ,]
# 
# myPairs(singleStudy_SV[[1]]$sv, multiStudy2SV1, left = "Study 1", bottom = "Study 1 + 2")
# myPairs(singleStudy_SV[[2]]$sv, multiStudy2SV2, left = "Study 2", bottom = "Study 1 + 2")
# myPairs(singleStudy_SV[[1]]$sv, multiStudy2ForcedSV1, left = "Study 1", bottom = "Study 1 + 2 forced")
# myPairs(singleStudy_SV[[2]]$sv, multiStudy2ForcedSV2, left = "Study 2", bottom = "Study 1 + 2 forced")
# 
# myPairs(singleStudy_SV[[1]]$sv, data.frame(multiStudy3SV1), left = "Study 1", bottom = "Study 1 + 2 + 3")
# myPairs(singleStudy_SV[[2]]$sv, data.frame(multiStudy3SV2), left = "Study 2", bottom = "Study 1 + 2 + 3")
# myPairs(singleStudy_SV[[3]]$sv, data.frame(multiStudy3SV3), left = "Study 3", bottom = "Study 1 + 2 + 3")
# myPairs(singleStudy_SV[[1]]$sv, multiStudy3ForcedSV1,  left = "Study 1", bottom = "Study 1 + 2 + 3 forced")
# myPairs(singleStudy_SV[[2]]$sv, multiStudy3ForcedSV2, left = "Study 2", bottom = "Study 1 + 2 + 3 forced")
# myPairs(singleStudy_SV[[3]]$sv, multiStudy3ForcedSV3, left = "Study 3", bottom = "Study 1 + 2 + 3 forced")
# 
# myPairs(singleStudy_SV[[1]]$sv, multiStudy4SV1, left = "Study 1", bottom = "Study 1 + 2 + 3 + 4")
# myPairs(singleStudy_SV[[2]]$sv, multiStudy4SV2, left = "Study 2", bottom = "Study 1 + 2 + 3 + 4")
# myPairs(singleStudy_SV[[3]]$sv, multiStudy4SV3, left = "Study 3", bottom = "Study 1 + 2 + 3 + 4")
# myPairs(singleStudy_SV[[4]]$sv, multiStudy4SV4, left = "Study 4", bottom = "Study 1 + 2 + 3 + 4")
# myPairs(singleStudy_SV[[1]]$sv, multiStudy4ForcedSV1, left = "Study 1", bottom = "Study 1 + 2 + 3 + 4 forced")
# myPairs(singleStudy_SV[[2]]$sv, multiStudy4ForcedSV2, left = "Study 2", bottom = "Study 1 + 2 + 3 + 4 forced")
# myPairs(singleStudy_SV[[3]]$sv, multiStudy4ForcedSV3, left = "Study 3", bottom = "Study 1 + 2 + 3 + 4 forced")
# myPairs(singleStudy_SV[[4]]$sv, multiStudy4ForcedSV4, left = "Study 4", bottom = "Study 1 + 2 + 3 + 4 forced")

```


<!-- ## Analysis: Prostate Studies -->

<!-- ```{r echo=F} -->
<!-- runAnalysis(prostateCancerStudies) -->
<!-- ``` -->

<!-- ## Analysis: TB Studies -->

<!-- ```{r echo=F} -->

<!-- runAnalysis(TB_Studies, logPlot = T) -->

<!-- ``` -->