Manhattan plots are a widely used tool in statistical genetics to visualise the results of genome-wide association studies (GWAS). While they are simply scatter plots, the number of points to display is often in the millions and this makes them impractical to render as vector graphics in formats such as PDF or SVG. This is unfortunate as these formats are the gold standard for technical plots and are often requested by academic journals when submitting an article for publication.
This repo contains an R function to generate Manhattan plots with ggplot2
that can quickly be exported into a moderately-sized PDF file with ggplot2::ggsave
(it can also be exported to SVG, although the resulting file is larger; I haven't tested other vector formats). It is based on Holtz Yan's excellent Manhattan plot function, produced for the R Graph Gallery, which I extended by merging overlapping points into single shapes to simplify the resulting output.
The key idea is to use software for processing and plotting geographic features (spefically, I use the sf
R package) to convert each data point into a circle (a polygon in simple features language), merge overlapping circles into a single shape (perform a union operation) and finally plot these simplified shapes.
The function to generate these plots is provided in the fn-ggmanh_vec.R
script and is named ggmanh_vec
. Please see the beginning of this file for a description of the different options available.
To illustrate how to use the function, I provide a simple reproducible example in the script make-manhattan-vec.R
. I start by downloading summary statistics of a GWAS of standing height in European-ancestry samples in the UK Biobank which includes 10.6 million variants from Watanabe et al. (2019) (the article's preprint is available on bioRxiv and the results can be downloaded from the GWAS Atlas). I then call ggman_vec
to make a basic Manhattan plot and export it to PDF format with ggplot2::ggsave
, obtaining the file ukb-height-gwas.pdf
whose size is 773kB.
Making the example Manhattan plot mentioned above takes approximately 33min using four Intel Skylake 2.4GHz processors, each with 16GB of RAM. A simplified version in which we only plot points with p-value lower than or equal to 0.01 (approximately 1.2 million points) takes only 3min55s with the same resources and produces the file ukb-height-gwas-pv2.pdf
.
The following R packages are required:
In addition, the sf
R package requires that the GDAL library be available on the system.
This code has been tested in R 4.2.1 (on CentOS 7.9.2009 with GDAL 3.5.0 installed) with doParallel
1.0.17, dplyr
1.0.9, foreach
1.5.2, ggplot2
3.4.2 and sf
1.0-7.