Skip to content

DuoHash is an advanced tool for the efficient calculation of forward and reverse hashes of spaced k-mers in nucleotide sequences, improving the analysis of genomic data by reducing processing time and computational resources.

License

Notifications You must be signed in to change notification settings

leonardoGemin/DuoHash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuoHash: Improving Spaced k-mer Extraction and Hash Encoding for Bioinformatics Applications

Methods

The DuoHash library provides two classes: DuoHash and DuoHash_multi for handling one or multiple spaced seeds, respectively. The methods of the first class are

  • GetEncoding_naive(),
  • GetEncoding_FSH(),
  • and GetEncoding_ISSH().

The methods of the second class are

  • GetEncoding_naive(),
  • GetEncoding_FSH(),
  • GetEncoding_ISSH(),
  • GetEncoding_FSH_multi(),
  • GetEncoding_MISSH_v1(),
  • GetEncoding_MISSH_col(),
  • GetEncoding_MISSH_col_parallel(),
  • and GetEncoding_MISSH_row().

Both classes share the PrintFASTA() method for saving the resulting spaced k-mers to a file and other methods for handling the various parameters.

Each GetEncoding_<...>() method has four implementations. The first is for the extraction of spaced k-mer and their encoding only, the second allows post-processing of encodings to calculate forward and reverse hashing, the third allows post-processing of encodings for conversion into strings, and the fourth combines the two previous options.

Installation

Make sure CMake is installed on the system.

Download the repository using

$ git clone https://github.com/leonardoGemin/DuoHash.git

and build the library with

$ make build

This will install build/libDuoHash.a in the project's directory.

Usage

To use DuoHash in a C++ project:

  • Import DuoHash in the code using #include <DuoHash.h>
  • Add the include directory (pass -I./include to the compiler)
  • Link the code with libDuoHash.a (pass -L./build -lDuoHash to the compiler)
  • Compile your code with g++-13, -std=c++0x (and preferably -O3), and -fopenmp enabled

Example

Compile example/main.cpp file with

$ cd example
$ g++-13 -std=c++0x -O3 -fopenmp -I../include -L../build -lDuoHash -o main main.cpp

Thesis

Link to my Master Thesis: Gemin_Leonardo.pdf

About

DuoHash is an advanced tool for the efficient calculation of forward and reverse hashes of spaced k-mers in nucleotide sequences, improving the analysis of genomic data by reducing processing time and computational resources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published