I present here an R function to generate Manhattan plots using ggplot
. It also returns a list of significant SNPs, according to different thresholds, if desired.
From Wikipedia, the free encyclopedia
A Manhattan plot is a type of scatter plot, usually used to display data with a large number of data-points - many of non-zero amplitude, and with a distribution of higher-magnitude values, for instance in genome-wide association studies (GWAS). In GWAS Manhattan plots, genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P-value for each single nucleotide polymorphism (SNP) displayed on the Y-axis, meaning that each dot on the Manhattan plot signifies a SNP. Because the strongest associations have the smallest P-values (e.g., 10−15), their negative logarithms will be the greatest (e.g., 15).
Thus, the myManhattan
function has been used to show differential methylation as shown in a recent paper published in Clinical Epigenetics
Download the repository from github or clone it by typing in the terminal
git clone https://github.com/alfonsosaera/myManhattan.git
You can copy the function from the myManhattanFunction.R
file or load it using source
source("myManhattanFunction.R")
Load data and take a look to the file to show the required structure
ex <- read.table("example.txt")
head(ex)
## SNP CHR BP P
## rs12124819 rs12124819 1 766409 0.7668670
## rs28705211 rs28705211 1 890368 0.8026096
## rs9777703 rs9777703 1 918699 0.3812962
## rs3121567 rs3121567 1 933331 0.6391422
## rs9442372 rs9442372 1 1008567 0.6082297
## rs3737728 rs3737728 1 1011278 0.9558932
Using function with defaults settings
myManhattan(ex)
## Loading required package: ggplot2
The graph.title
argument is used to specify a title:
myManhattan(ex, graph.title = "My Manhattan Plot")
Use the font.size
argument (default is 12) to modify the size of all text elements in the graph.
myManhattan(ex, graph.title = "My Manhattan Plot", font.size = 15)
Where to draw a "genome-wide sigificant" (red) or "suggestive" (blue) line.
genomewideline
default is 5e-08.
suggestiveline
default is 1e-5.
Set to FALSE to disable.
myManhattan(ex, graph.title = "My Manhattan Plot",
suggestiveline = FALSE, genomewideline = 1e-8)
Both can be specified.
myManhattan(ex, graph.title = "My Manhattan Plot",
suggestiveline = 2e-4, genomewideline = 1e-6)
Line colors can also be customized if desired.
genomewidecolor
default is "red".
suggestivecolor
default is "blue".
myManhattan(ex, graph.title = "My Manhattan Plot",
suggestiveline = 2e-4, suggestivecolor = "orange",
genomewideline = 1e-06, genomewidecolor = "pink")
Use the highlight
argument (default is NULL
). The highlight.col
argument sets the color (Default is "green").
If set to a numeric value, all SNPs with a p-value lower than the specified value are marked.
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
genomewideline = 1e-8, highlight = 1e-8)
You can also mark specific SNP providing the names
my.SNPs <- as.character(ex$SNP[ex$P < 1e-4])
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 1e-6,
genomewideline = 1e-8, highlight = my.SNPs)
Proportion of each chromosome is modified with even.facet
argument.
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
genomewideline = 1e-8, highlight = 1e-8, even.facet = T)
Specify chromosome names with crhom.lab
argument
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
genomewideline = 1e-8, highlight = 1e-8, even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"))
Specify chromosome colors with col
argument, default is c("lightblue", "blue")
. See http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf for a list of colors. See https://www.r-bloggers.com/palettes-in-r/ for pallete use in R.
Example using rainbow
default R pallete. If the colors are too bright, try col = rainbow(25, s= 0.75)
.
myManhattan(ex, graph.title = "My Manhattan Plot",
suggestiveline = FALSE, genomewideline = 1e-8,
col = rainbow(25), even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"))
Another example using the dark
pallete of the RColorBrewer
package. Note that the brewer.pal
function must be used to generate the color pallete.
library(RColorBrewer)
myManhattan(ex, graph.title = "My Manhattan Plot",
col = brewer.pal(8, "Dark2"),
highlight = 1e-4,
suggestiveline = 1e-2,
genomewideline = 1e-4,
chrom.lab = c(as.character(1:22),"X","Y","MT"))
Specify significance levels with significance
argument. This argument overrides genomewideline
and suggestiveline
. When argument report
is set to TRUE
the info of the significant SNPs is printed.
significance
argument can be a specific number
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
genomewideline = 1e-6, even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = 3e-5,
report = TRUE)
## SNP CHR BP P
## rs6550962 rs6550962 3 25356968 2.462824e-05
## rs10027212 rs10027212 4 30585306 1.357062e-05
## rs4733560 rs4733560 8 128848183 2.789866e-09
## rs10112382 rs10112382 8 128853579 9.641204e-16
## rs17769347 rs17769347 18 36989057 2.663669e-05
If significance
is set to "Bonferroni", genomewideline
is set to the corrected significance level and suggestiveline
is modified accordingly.
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
genomewideline = 1e-6, even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = "Bonferroni",
report = TRUE)
## Bonferroni correction significance level: 5.039103e-07
## SNP CHR BP P
## rs4733560 rs4733560 8 128848183 2.789866e-09
## rs10112382 rs10112382 8 128853579 9.641204e-16
If significance
is set to "FDR", genomewideline
is set to 0.05
and suggestiveline
to FALSE
.
myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
genomewideline = 1e-6, even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = "FDR",
report = TRUE)
## SNP CHR BP P fdr
## rs4733560 rs4733560 8 128848183 2.789866e-09 1.384108e-04
## rs10112382 rs10112382 8 128853579 9.641204e-16 9.566388e-11
Since the function returns a ggplot
object you can add your own modifications.
First, store the function output in an object
mM <- myManhattan(ex, graph.title = "My Manhattan Plot", even.facet = T,
chrom.lab = c(as.character(1:22),"X","Y","MT"))
mM
Then, add your modifications. In this case, change the title of Y axis
mM + ylab(expression("Association to Phenotype (" * -log[10](italic(p)) *")"))