Skip to content

Latest commit

 

History

History
274 lines (190 loc) · 9.23 KB

README.md

File metadata and controls

274 lines (190 loc) · 9.23 KB

Manhattan plot with ggplot

I present here an R function to generate Manhattan plots using ggplot. It also returns a list of significant SNPs, according to different thresholds, if desired.

From Wikipedia, the free encyclopedia

A Manhattan plot is a type of scatter plot, usually used to display data with a large number of data-points - many of non-zero amplitude, and with a distribution of higher-magnitude values, for instance in genome-wide association studies (GWAS). In GWAS Manhattan plots, genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P-value for each single nucleotide polymorphism (SNP) displayed on the Y-axis, meaning that each dot on the Manhattan plot signifies a SNP. Because the strongest associations have the smallest P-values (e.g., 10−15), their negative logarithms will be the greatest (e.g., 15).

Thus, the myManhattan function has been used to show differential methylation as shown in a recent paper published in Clinical Epigenetics

Installation

Download the repository from github or clone it by typing in the terminal

git clone https://github.com/alfonsosaera/myManhattan.git

Usage

You can copy the function from the myManhattanFunction.R file or load it using source

source("myManhattanFunction.R")

Input data

Load data and take a look to the file to show the required structure

ex <- read.table("example.txt")
head(ex)
##                   SNP CHR      BP         P
## rs12124819 rs12124819   1  766409 0.7668670
## rs28705211 rs28705211   1  890368 0.8026096
## rs9777703   rs9777703   1  918699 0.3812962
## rs3121567   rs3121567   1  933331 0.6391422
## rs9442372   rs9442372   1 1008567 0.6082297
## rs3737728   rs3737728   1 1011278 0.9558932

Using function with defaults settings

myManhattan(ex)
## Loading required package: ggplot2

Add title

The graph.title argument is used to specify a title:

myManhattan(ex, graph.title = "My Manhattan Plot")

Use the font.size argument (default is 12) to modify the size of all text elements in the graph.

myManhattan(ex, graph.title = "My Manhattan Plot", font.size = 15)

Indicative lines

Where to draw a "genome-wide sigificant" (red) or "suggestive" (blue) line. genomewideline default is 5e-08. suggestiveline default is 1e-5.

Set to FALSE to disable.

myManhattan(ex, graph.title = "My Manhattan Plot",
            suggestiveline = FALSE, genomewideline = 1e-8)

Both can be specified.

myManhattan(ex, graph.title = "My Manhattan Plot",
            suggestiveline = 2e-4, genomewideline = 1e-6)

Line colors can also be customized if desired. genomewidecolor default is "red". suggestivecolor default is "blue".

myManhattan(ex, graph.title = "My Manhattan Plot",
            suggestiveline = 2e-4, suggestivecolor = "orange",
            genomewideline = 1e-06, genomewidecolor = "pink")

Mark specific points

Use the highlight argument (default is NULL). The highlight.col argument sets the color (Default is "green").

If set to a numeric value, all SNPs with a p-value lower than the specified value are marked.

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
            genomewideline = 1e-8, highlight = 1e-8)

You can also mark specific SNP providing the names

my.SNPs <- as.character(ex$SNP[ex$P < 1e-4])

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 1e-6,
            genomewideline = 1e-8, highlight = my.SNPs)

Graph appearance

Proportion of each chromosome is modified with even.facet argument.

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
            genomewideline = 1e-8, highlight = 1e-8, even.facet = T)

Specify chromosome names with crhom.lab argument

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = FALSE,
            genomewideline = 1e-8, highlight = 1e-8, even.facet = T,
            chrom.lab = c(as.character(1:22),"X","Y","MT"))

Specify chromosome colors with col argument, default is c("lightblue", "blue"). See http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf for a list of colors. See https://www.r-bloggers.com/palettes-in-r/ for pallete use in R.


Example using rainbow default R pallete. If the colors are too bright, try col = rainbow(25, s= 0.75).

myManhattan(ex, graph.title = "My Manhattan Plot",
            suggestiveline = FALSE, genomewideline = 1e-8,
            col = rainbow(25), even.facet = T,
            chrom.lab = c(as.character(1:22),"X","Y","MT"))

Another example using the dark pallete of the RColorBrewer package. Note that the brewer.pal function must be used to generate the color pallete.

library(RColorBrewer)

myManhattan(ex, graph.title = "My Manhattan Plot",
            col = brewer.pal(8, "Dark2"),
            highlight = 1e-4,
            suggestiveline = 1e-2,
            genomewideline = 1e-4,
            chrom.lab = c(as.character(1:22),"X","Y","MT"))

Significance level

Specify significance levels with significance argument. This argument overrides genomewideline and suggestiveline. When argument report is set to TRUE the info of the significant SNPs is printed.

significance argument can be a specific number

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
            genomewideline = 1e-6, even.facet = T,
            chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = 3e-5,
            report = TRUE)
##                   SNP CHR        BP            P
## rs6550962   rs6550962   3  25356968 2.462824e-05
## rs10027212 rs10027212   4  30585306 1.357062e-05
## rs4733560   rs4733560   8 128848183 2.789866e-09
## rs10112382 rs10112382   8 128853579 9.641204e-16
## rs17769347 rs17769347  18  36989057 2.663669e-05

If significance is set to "Bonferroni", genomewideline is set to the corrected significance level and suggestiveline is modified accordingly.

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
            genomewideline = 1e-6, even.facet = T,
            chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = "Bonferroni",
            report = TRUE)
## Bonferroni correction significance level: 5.039103e-07
##                   SNP CHR        BP            P
## rs4733560   rs4733560   8 128848183 2.789866e-09
## rs10112382 rs10112382   8 128853579 9.641204e-16

If significance is set to "FDR", genomewideline is set to 0.05 and suggestiveline to FALSE.

myManhattan(ex, graph.title = "My Manhattan Plot", suggestiveline = 2e-4,
            genomewideline = 1e-6, even.facet = T,
            chrom.lab = c(as.character(1:22),"X","Y","MT"), significance = "FDR",
            report = TRUE)
##                   SNP CHR        BP            P          fdr
## rs4733560   rs4733560   8 128848183 2.789866e-09 1.384108e-04
## rs10112382 rs10112382   8 128853579 9.641204e-16 9.566388e-11

Further customization

Since the function returns a ggplot object you can add your own modifications.

First, store the function output in an object

mM <- myManhattan(ex, graph.title = "My Manhattan Plot", even.facet = T,
             chrom.lab = c(as.character(1:22),"X","Y","MT"))
mM

Then, add your modifications. In this case, change the title of Y axis

mM + ylab(expression("Association to Phenotype (" * -log[10](italic(p)) *")"))