GNU General Public License, GPLv3 (2015-2020)
pre-release version: v0.1.0
This package provides a high-level Julia interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The jugds package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access.
- Development version from Github, requiring
julia >= v1.0
using Pkg
Pkg.status()
Pkg.add(PackageSpec(url="https://github.com/CoreArray/jugds.jl.git"))
Dr. Xiuwen Zheng (zhengxwen@gmail.com)
- Learn X in Y minutes (where X=Julia): http://learnxinyminutes.com/docs/julia/
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.
- CoreArray C++ library, LGPL-3 License, 2007-2017, Xiuwen Zheng
using jugds
fn = abspath(dirname(pathof(jugds)), "..", "demo", "data", "ceu_exon.gds")
f = open_gds(fn)
f
close_gds(f)
File: jugds/demo/data/ceu_exon.gds (32.5K)
+ [ ] *
|--+ description [ ] *
|--+ sample.id { Str8 90 LZMA_ra(35.8%), 258B } *
|--+ variant.id { Int32 1348 LZMA_ra(16.8%), 906B } *
|--+ position { Int32 1348 LZMA_ra(64.6%), 3.4K } *
|--+ chromosome { Str8 1348 LZMA_ra(4.63%), 158B } *
|--+ allele { Str8 1348 LZMA_ra(16.7%), 902B } *
|--+ genotype [ ] *
| |--+ data { Bit2 1348x90x2 LZMA_ra(26.3%), 15.6K } *
| |--+ extra.index { Int32 0x3 LZMA_ra, 19B } *
| \--+ extra { Int16 0 LZMA_ra, 19B }
|--+ phase [ ]
| |--+ data { Bit1 1348x90 LZMA_ra(0.91%), 138B } *
| |--+ extra.index { Int32 0x3 LZMA_ra, 19B } *
| \--+ extra { Bit1 0 LZMA_ra, 19B }
|--+ annotation [ ]
| |--+ id { Str8 1348 LZMA_ra(38.4%), 5.5K } *
| |--+ qual { Float32 1348 LZMA_ra(2.26%), 122B } *
| \--+ filter { Int32,factor 1348 LZMA_ra(2.26%), 122B } *
\--+ sample.annotation [ ]
\--+ family { Str8 90 LZMA_ra(57.1%), 222B }
JSeqArray.jl: data manipulation of whole-genome sequencing variants in Julia