diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 12bf948..00c17e5 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-30T05:32:13","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2025-01-18T01:30:01","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index 09ef468..a5eceb1 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,44 +1,44 @@ API · GeneticsMakie

API

GeneticsMakie.findclosestgeneMethod
findclosestgene(chr::AbstractString, bp::Real, gencode::DataFrame; start::Bool, proteincoding::Bool)
-findclosestgene(df::DataFrame, gencode::DataFrame; start::Bool, proteincoding::Bool)

Find the closest gene(s) to a genomic coordinate or a list of genomic coordinates using gencode.

Optionally, the closest gene can be defined from the gene start site using start, and only protein coding genes can be considered using proteincoding. The default start and proteincoding are false.

source
GeneticsMakie.findgeneMethod
findgene(gene::AbstractString, gencode::DataFrame)

Find chromosome, gene start, and gene stop sites for the gene of interest.

source
GeneticsMakie.findgwaslociMethod
findgwasloci(gwas::DataFrame; p::Real)
-findgwasloci(gwas::Vector{DataFrame}; p::Real)

Find genome-wide significant loci for gwas that are separated from each other by at least 1 Mb.

Alternatively, find genome-wide significant loci across multiple gwas that are all separated by at least 1 Mb. p determines the genome-wide significance threshold, which is 5e-8 by default.

source
GeneticsMakie.labelgenomeMethod
labelgenome(g::GridPosition, chromosome::AbstractString, range1::Real, range2::Real)

Label g with a given chromosome and genomic range between range1 and range2.

source
GeneticsMakie.mungesumstats!Method
mungesumstats!(gwas::DataFrame)
-mungesumstats!(gwas::Vector{DataFrame})

Munge gwas by harmonizing the names of columns, their types, and P values, among others.

source
GeneticsMakie.parsegtf!Method
parsegtf!(gencode::DataFrame)

Parse gencode by extracting gene_id, gene_name, gene_type, transcript_id, transcript_support_level information from the info column.

source
GeneticsMakie.plotgenes!Method
plotgenes!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, gencode::DataFrame)
+findclosestgene(df::DataFrame, gencode::DataFrame; start::Bool, proteincoding::Bool)

Find the closest gene(s) to a genomic coordinate or a list of genomic coordinates using gencode.

Optionally, the closest gene can be defined from the gene start site using start, and only protein coding genes can be considered using proteincoding. The default start and proteincoding are false.

source
GeneticsMakie.findgeneMethod
findgene(gene::AbstractString, gencode::DataFrame)

Find chromosome, gene start, and gene stop sites for the gene of interest.

source
GeneticsMakie.findgwaslociMethod
findgwasloci(gwas::DataFrame; p::Real)
+findgwasloci(gwas::Vector{DataFrame}; p::Real)

Find genome-wide significant loci for gwas that are separated from each other by at least 1 Mb.

Alternatively, find genome-wide significant loci across multiple gwas that are all separated by at least 1 Mb. p determines the genome-wide significance threshold, which is 5e-8 by default.

source
GeneticsMakie.labelgenomeMethod
labelgenome(g::GridPosition, chromosome::AbstractString, range1::Real, range2::Real)

Label g with a given chromosome and genomic range between range1 and range2.

source
GeneticsMakie.mungesumstats!Method
mungesumstats!(gwas::DataFrame)
+mungesumstats!(gwas::Vector{DataFrame})

Munge gwas by harmonizing the names of columns, their types, and P values, among others.

source
GeneticsMakie.parsegtf!Method
parsegtf!(gencode::DataFrame)

Parse gencode by extracting gene_id, gene_name, gene_type, transcript_id, transcript_support_level information from the info column.

source
GeneticsMakie.plotgenes!Method
plotgenes!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, gencode::DataFrame)
 plotgenes!(ax::Axis, chromosome::AbstractString, bp::Real, gencode::DataFrame)
 plotgenes!(ax::Axis, gene::AbstractString, gencode::DataFrame)

Plot collapsed gene bodies for genes within a given chromosome and genomic range between range1 and range2.

Alternatively, plot within a given chromosome and a certain window around a genomic coordinate bp or plot within a certain window around gene.

Keyword arguments

height      height of exons; default 0.25
 genecolor   color of genes; default :royalblue
 textcolor   color of gene labels; default :black
-window      window around bp or gene; default 1e6
source
GeneticsMakie.plotgenes!Method
plotgenes!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, highlight::Tuple{AbstractVector, AbstractVector}, gencode::DataFrame; height::Real)
+window      window around bp or gene; default 1e6
source
GeneticsMakie.plotgenes!Method
plotgenes!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, highlight::Tuple{AbstractVector, AbstractVector}, gencode::DataFrame; height::Real)
 plotgenes!(ax::Axis, chromosome::AbstractString, bp::Real, highlight::Tuple{AbstractVector, AbstractVector}, gencode::DataFrame; window::Real, height::Real)
-plotgenes!(ax::Axis, gene::AbstractString, highlight::Tuple{AbstractVector, AbstractVector}, gencode::DataFrame; window::Real, height::Real)

Plot gene bodies with a vector of genes highlighted by a vector of colors via highlight.

source
GeneticsMakie.plotgwas!Method
plotgwas!(ax::Axis, gwas::DataFrame)

Plot gwas results as a Manhattan plot.

Keyword arguments

ymax            maximum value for y-axis
+plotgenes!(ax::Axis, gene::AbstractString, highlight::Tuple{AbstractVector, AbstractVector}, gencode::DataFrame; window::Real, height::Real)

Plot gene bodies with a vector of genes highlighted by a vector of colors via highlight.

source
GeneticsMakie.plotgwas!Method
plotgwas!(ax::Axis, gwas::DataFrame)

Plot gwas results as a Manhattan plot.

Keyword arguments

ymax            maximum value for y-axis
 p               genome-wide significance threshold; default 5e-8
 linecolor       color of genome-wide significance line, which can be turned off by setting to nothing; default :red2
 scattercolor    color of genome-wide significant variants, which can be turned off by setting to nothing; default "#4DB069"
 chromcolors     colors of even and odd chromosomes; default ["#0D0D66", "#7592C8"]
-build::Int      human genome build; default 37 
source
GeneticsMakie.plotisoforms!Method
plotisoforms!(ax::Axis, gene::AbstractString, gencode::DataFrame)

Plot each isoform of a given gene on a separate row.

Keyword arguments

orderby::Union{Nothing, AbstractVector{<:AbstractString}}           order of isoforms; default nothing
+build::Int      human genome build; default 37 
source
GeneticsMakie.plotisoforms!Method
plotisoforms!(ax::Axis, gene::AbstractString, gencode::DataFrame)

Plot each isoform of a given gene on a separate row.

Keyword arguments

orderby::Union{Nothing, AbstractVector{<:AbstractString}}           order of isoforms; default nothing
 highlight::Union{Nothing, Tuple{AbstractVector, AbstractVector}}    isoforms to be highlighted and their colors; default nothing
 height                                                              height of exons; default 0.25
 isoformcolor                                                        color of isoforms; default :royalblue
 textcolor                                                           color of isoform labels; default :black
-text::Union{Bool, Symbol}                                           position of isoform labels; default :top
source
GeneticsMakie.plotldMethod
plotld(LD::AbstractMatrix; kwargs)
+text::Union{Bool, Symbol}                                           position of isoform labels; default :top
source
GeneticsMakie.plotldMethod
plotld(LD::AbstractMatrix; kwargs)
 plotld!(ax::Axis, LD::AbstractMatrix; kwargs)

Heatmap of symmetric correlation matrix LD with the diagonal elements on the x-axis.

Keyword arguments

threshold       threshold below which values are ignored; default 1/9
 colormap        colormap of values; default cgrad(:Blues_9, 9, categorical = true)
 colorrange      start and end points of colormap; default (0, 1)
-strokewidth     width of outline around heatmap boxes; default 0
source
GeneticsMakie.plotlocus!Method
plotlocus!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, gwas::DataFrame; kwargs)
+strokewidth     width of outline around heatmap boxes; default 0
source
GeneticsMakie.plotlocus!Method
plotlocus!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, gwas::DataFrame; kwargs)
 plotlocus!(ax::Axis, chromosome::AbstractString, bp::Real, gwas::DataFrame; kwargs)
 plotlocus!(ax::Axis, gene::AbstractString, gwas::DataFrame, gencode::DataFrame; kwargs)

Plot gwas results within a given chromosome and genomic range between range1 and range2.

Alternatively, plot within a given chromosome and a certain window around a genomic coordinate bp or plot within a certain window around gene.

Keyword arguments

ld::Union{Nothing, SnpData, Tuple{SnpData, Union{AbstractString, Tuple{AbstractString, Int}}}}      default nothing
 ymax                                                                                                maximum value for y-axis
-window                                                                                              window around genomic coordinate or gene; default 1e6
source
GeneticsMakie.plotloops!Method
plotloops!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, loopdf::DataFrame)
+window                                                                                              window around genomic coordinate or gene; default 1e6
source
GeneticsMakie.plotloops!Method
plotloops!(ax::Axis, chromosome::AbstractString, range1::Real, range2::Real, loopdf::DataFrame)
 plotloops!(ax::Axis, chromosome::AbstractString, bp::Real, loopdf::DataFrame)
 plotloops!(ax::Axis, gene::AbstractString, loopdf::DataFrame, gencode::DataFrame)

Plot loops present in loopdf within a given chromosome and genomic range between range1 and range2.

Alternatively, plot within a given chromosome and a certain window around a genomic coordinate bp or plot within a certain window around gene.

Keyword arguments

ymax            maximum value for y-axis; default 102
 linewidth       line width of the arcs; default 0.25
 colorarc        color of arcs; default #9658B2
 colorend        color of arcs' ends; default ("#FFBB00", 0.5)
 resolution      plot `resolution` points along x-axis within the given range; default 1000
-window          window around genomic coordinate or gene; default 1e6
source
GeneticsMakie.plotqq!Method
plotqq!(ax::Axis, P::AbstractVector)
+window          window around genomic coordinate or gene; default 1e6
source
GeneticsMakie.plotqq!Method
plotqq!(ax::Axis, P::AbstractVector)
 plotqq!(ax::Axis, gwas::DataFrame)

Plot QQ plot of P values where the expected distribution is the uniform distribution.

Keyword arguments

xstep       x-axis ticks step size; default 1
-ystep       y-axis ticks step size; default 2
source
GeneticsMakie.plotrgMethod
plotrg(r::AbstractMatrix)
 plotrg!(ax::Axis, r::AbstractMatrix)

Correlation plot of matrix r.

Keyword arguments

circle          whether to draw cicles instead of rectangles; default true
 diagonal        whether to visualize diagonal elements; default false
 colormap        colormap of values; default :RdBu_10
 colorrange      start and end points of colormap; default (-1, 1)
-strokewidth     width of outline around surrounding boxes; default 0.5
source
+strokewidth width of outline around surrounding boxes; default 0.5source diff --git a/dev/examples/genes/index.html b/dev/examples/genes/index.html index 0c0c897..daa72d7 100644 --- a/dev/examples/genes/index.html +++ b/dev/examples/genes/index.html @@ -60,4 +60,4 @@ resize_to_layout!(f) vlines!(ax, start, color = (:gold, 0.5), linewidth = 0.5) vlines!(ax, stop, color = (:gold, 0.5), linewidth = 0.5) -f

Then we can save the figure as below.

save("figs/$(gene)-gene.png", f, px_per_unit = 4)
+f

Then we can save the figure as below.

save("figs/$(gene)-gene.png", f, px_per_unit = 4)
diff --git a/dev/examples/gtf/index.html b/dev/examples/gtf/index.html index 9cea72f..ad8fb56 100644 --- a/dev/examples/gtf/index.html +++ b/dev/examples/gtf/index.html @@ -9,4 +9,4 @@ h = ["seqnames", "source", "feature", "start", "end", "score", "strand", "phase", "info"] gencode = CSV.read("data/gencode/$(file)", DataFrame; delim = "\t", comment = "#", header = h)
Human genome build

The latest human genome assembly is GRCh38, but we use an annotation with coordinates from the older version (GRCh37), because a lot of the GWAS results are shared in GRCh37 genomic coordinates. Make sure to use the matching human genome build when visualizing your results.

The ninth column of a GTF file contains rich information about features, so we can parse this column.

GeneticsMakie.parsegtf!(gencode)
Chromosome names

Chromosome names are munged to not contain “chr” prefix, and their type is String, since there could be non-numerical chromosome names, such as sex chromosomes and mitochondrial genome.

To reduce memory intake, we can also subset gencode to most commonly used columns in downstream analyses.

select!(gencode, :seqnames, :feature, :start, :end, :strand, :gene_id, :gene_name, :gene_type, :transcript_id)

To further reduce memory intake, we can instead store and load GENCODE annotation as an Arrow file.

Arrow.write("data/gencode/$(splitext(file)[1]).arrow", gencode)
 gencode = Arrow.Table("data/gencode/$(splitext(file)[1]).arrow")|> DataFrame

Other transcriptome annotations, such as one from RefSeq, can be used for plotting functions as long as they contain the above columns with the right column names.

Once gencode is ready, we can look up where a gene is on the human genome.

GeneticsMakie.findgene("RBFOX1", gencode)
-GeneticsMakie.findgene("ENSG00000078328", gencode)
Gene names

Make sure to use the correct gene name in case the gene cannot be found. The latest gene names can be looked up in databases such as GeneCards.

+GeneticsMakie.findgene("ENSG00000078328", gencode)
Gene names

Make sure to use the correct gene name in case the gene cannot be found. The latest gene names can be looked up in databases such as GeneCards.

diff --git a/dev/examples/gwas/index.html b/dev/examples/gwas/index.html index a842250..9e22f55 100644 --- a/dev/examples/gwas/index.html +++ b/dev/examples/gwas/index.html @@ -59,4 +59,4 @@ rowgap!(f.layout, 1, 0) rowgap!(f.layout, 2, 5) resize_to_layout!(f) -f

+f

diff --git a/dev/examples/isoforms/index.html b/dev/examples/isoforms/index.html index d729c64..a2ab439 100644 --- a/dev/examples/isoforms/index.html +++ b/dev/examples/isoforms/index.html @@ -40,4 +40,4 @@ GeneticsMakie.labelgenome(f[1, 1, Bottom()], chr, range1, range2) rowsize!(f.layout, 1, rs) resize_to_layout!(f) -f

+f

diff --git a/dev/examples/locus/index.html b/dev/examples/locus/index.html index cbc0e8b..993c851 100644 --- a/dev/examples/locus/index.html +++ b/dev/examples/locus/index.html @@ -86,4 +86,4 @@ vlines!(axs[i], stop, color = (:gold, 0.5), linewidth = 0.5) end resize_to_layout!(f) -f

By using Makie.jl's layout tools, it becomes easy to draw additional tracks. For example, in a separate track, the variants could be colored or could have varying sizes depending on their minor allele frequency. In another example, the variants could be colored based on their inclusion in a credible set post-fine-mapping.

Plotting the intersection of SNPs, not the union

GeneticsMakie.plotlocus! plots only the variants that are present in the reference panel, when the ld keyword argument is specified. Although SNPs that are missing in the reference panel could be plotted differently (e.g. with varying transparency and shape), GeneticsMakie.jl is designed to visualize 100s of phenotypes simultaneously in which case such discrepancy is hard to tell and is confusing. Hence, for more direct comparison of loci across phenotypes, only the variants that are found in the reference panel are shown.

Extremely small P values

There are several GWAS loci that harbor extremely small P values, in which cases the P values will be clamped to the smallest floating point number. Such cases are going to be more common in phenotypes that are reaching saturation in terms of GWAS discovery (e.g. height). In those cases, it is more commonplace to observe allelic heterogneity, and it might be more appropriate to plot alternative measures of strength of association (e.g. Z scores).

Patterns of LD

Oftentimes, chunks of LD blocks hug a single or multiple gene boundaries.

Covering the entire genome

Visualizing 1,500 genomic regions with 2 Mb window will more or less cover the entire human genome. Note that empirically speaking, the probability of an arbitrary 2 Mb window harboring at least one genome-wide significant hit across multiple phenotypes is higher than not harboring any significant association.

Phenome-scale LocusZoom

To visualize 100s of phenotypes simultaneously, summary statistics or other relevant genomic annotations should be converted to memory friendly Arrow.jl or Parquet.jl files.

+f

By using Makie.jl's layout tools, it becomes easy to draw additional tracks. For example, in a separate track, the variants could be colored or could have varying sizes depending on their minor allele frequency. In another example, the variants could be colored based on their inclusion in a credible set post-fine-mapping.

Plotting the intersection of SNPs, not the union

GeneticsMakie.plotlocus! plots only the variants that are present in the reference panel, when the ld keyword argument is specified. Although SNPs that are missing in the reference panel could be plotted differently (e.g. with varying transparency and shape), GeneticsMakie.jl is designed to visualize 100s of phenotypes simultaneously in which case such discrepancy is hard to tell and is confusing. Hence, for more direct comparison of loci across phenotypes, only the variants that are found in the reference panel are shown.

Extremely small P values

There are several GWAS loci that harbor extremely small P values, in which cases the P values will be clamped to the smallest floating point number. Such cases are going to be more common in phenotypes that are reaching saturation in terms of GWAS discovery (e.g. height). In those cases, it is more commonplace to observe allelic heterogneity, and it might be more appropriate to plot alternative measures of strength of association (e.g. Z scores).

Patterns of LD

Oftentimes, chunks of LD blocks hug a single or multiple gene boundaries.

Covering the entire genome

Visualizing 1,500 genomic regions with 2 Mb window will more or less cover the entire human genome. Note that empirically speaking, the probability of an arbitrary 2 Mb window harboring at least one genome-wide significant hit across multiple phenotypes is higher than not harboring any significant association.

Phenome-scale LocusZoom

To visualize 100s of phenotypes simultaneously, summary statistics or other relevant genomic annotations should be converted to memory friendly Arrow.jl or Parquet.jl files.

diff --git a/dev/examples/loops/index.html b/dev/examples/loops/index.html index 78d0a2d..18ffea5 100644 --- a/dev/examples/loops/index.html +++ b/dev/examples/loops/index.html @@ -112,4 +112,4 @@ vlines!(axs[i], stop, color = (:gold, 0.5), linewidth = 0.5) end resize_to_layout!(f) -f

As with the LocusZoom plots, by using Makie.jl's layout tools, it becomes easy to draw additional tracks. For example, in a separate track, we can include chromatin interactions present in other samples. In another example, we can include interactions found through other sequencing methods.

+f

As with the LocusZoom plots, by using Makie.jl's layout tools, it becomes easy to draw additional tracks. For example, in a separate track, we can include chromatin interactions present in other samples. In another example, we can include interactions found through other sequencing methods.

diff --git a/dev/examples/peaks/index.html b/dev/examples/peaks/index.html index d089f15..99fe96a 100644 --- a/dev/examples/peaks/index.html +++ b/dev/examples/peaks/index.html @@ -195,4 +195,4 @@ resize_to_layout!(f) f save("ss4.png",f, px_per_unit = 4) -end

+end

diff --git a/dev/examples/summary/index.html b/dev/examples/summary/index.html index ae5169e..0aec0b4 100644 --- a/dev/examples/summary/index.html +++ b/dev/examples/summary/index.html @@ -20,4 +20,4 @@ GeneticsMakie.findclosestgene(loci, gencode; start = true) # closest gene from gene start site GeneticsMakie.findclosestgene(loci, gencode; proteincoding = true) # closest "protein-coding" gene

To reduce memory intake, we can store and load GWAS summary statistics as Arrow files.

for (i, key) in enumerate(keys(gwas))
     Arrow.write("data/gwas/$(key).arrow", dfs[i])
-end
+end diff --git a/dev/examples/twas/index.html b/dev/examples/twas/index.html index df86e66..b485df1 100644 --- a/dev/examples/twas/index.html +++ b/dev/examples/twas/index.html @@ -23,4 +23,4 @@ Label(f[1, 1, Top()], text = "SCZ (2022): SCHEMA", fontsize = 8) rowsize!(f.layout, 1, 50) resize_to_layout!(f) -f

+f

diff --git a/dev/index.html b/dev/index.html index f76fbfa..dfe4723 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · GeneticsMakie

GeneticsMakie

The goal of GeneticsMakie.jl is to permit seamless data visualization and exploratory data analysis of the human genome within the larger Julia data science and OpenMendel ecosystems. The package provides convenient wrapper functions for wrangling genetic association results and plotting them using Makie.jl. Every component of a figure can be easily customized and extended, and the package generates high-quality, publication-ready figures.

"mhc"

Getting started

Please peruse the documentations of Makie.jl, CSV.jl, DataFrames.jl, and SnpArrays.jl. Familiarity with these packages will allow visualization of most types of genetic and genomic data. Makie.jl's default layout tools are particularly useful for plotting different genetic and genomic data modalities as separate layers.

An usage case

If you have run a genome-wide association study (GWAS) at the variant-level, and you would like to eyeball genome-wide significant loci across hundreds of phenotypes, then you are in the right place.

+Home · GeneticsMakie

GeneticsMakie

The goal of GeneticsMakie.jl is to permit seamless data visualization and exploratory data analysis of the human genome within the larger Julia data science and OpenMendel ecosystems. The package provides convenient wrapper functions for wrangling genetic association results and plotting them using Makie.jl. Every component of a figure can be easily customized and extended, and the package generates high-quality, publication-ready figures.

"mhc"

Getting started

Please peruse the documentations of Makie.jl, CSV.jl, DataFrames.jl, and SnpArrays.jl. Familiarity with these packages will allow visualization of most types of genetic and genomic data. Makie.jl's default layout tools are particularly useful for plotting different genetic and genomic data modalities as separate layers.

An usage case

If you have run a genome-wide association study (GWAS) at the variant-level, and you would like to eyeball genome-wide significant loci across hundreds of phenotypes, then you are in the right place.

diff --git a/dev/objects.inv b/dev/objects.inv index 242193d..c8fd6bd 100644 Binary files a/dev/objects.inv and b/dev/objects.inv differ