-
Notifications
You must be signed in to change notification settings - Fork 0
General Information
The pangenome models genomic variation in a population. There are many types of variations such as Single Nucleotide Polymorphisms (SNPs), Insertions and deletions (Indels), gene presence or absence, and Structural variations, among others. The population, in this case, could be single tissues, species, subspecies, other taxonomic units, or ecological communities at large. As defined in the fields of molecular biology and genetics, a pan-genome is the entire set of genes from all strains within a clade or the union of all the genomes of a clade
There are three types of pangenome. That is:
- Collection pangenome
- Graphical pangenome (Nodes and Edges)
- Presence/Absence pangenome
The choice of either of the types is determined by the research question and the type of sequence data that is available
This involves the collection of genomic sequences, mapping them against a reference genome, and identifying the differences
This type of pangenome depicts the presence or absence of genes within a population. It is characterized by vane diagrams that mainly focus on core and accessory genes. core genes are those genes that are mainly associated with survival and are found in all the organisms under study. Accessory genes on the other hand are found in most but not all the organisms under investigation. They are associated with variations and evolutionary trajectories. The accessory genes link phenotypes and genotypes and are used in species delineation.
These are characterized by nodes and edges.
Nodes are segments of genomic sequence
Edges are used to dictate how the individual segments are joined together
More information can be found here
- Precision medicine
- Structural variations within a population
- Evolutionary studies of closely related species
- Can be used instead of a linear reference genome
This project's main goal was to mine and analyze arboviruses genomic data from East Africa and the world as detailed here. There are five arboviruses that are common in East African countries as shown in the table below
Viruses | Countries |
---|---|
Chikungunya | Kenya |
Dengue | Uganda |
West nile | Tanzania |
Yellow fever | Rwanda |
Zika | Burundi |
South Sudan | |
DRC |
The codes that were used to fetch the metadata and the sequences from the database are available.
Pangenomics is an emerging field of genomics, and therefore little has been done about it, especially viral pangenomics. Many challenges were faced along the way because the pan genomic tools that are available are tailored toward bacterial genomes. In comparison, viruses have fewer genes within their genomes compared to bacteria. They also lack core genes. Moreover, these genes were not well annotated, nullifying the use of presence/absence pangenomes whose tools are publicly available.
Therefore, it was necessary to come up with a working pipeline that could be applied in building pan genomic variation graphs which is not just limited to viruses.
These graphs can have numerous applications including:
-
Reducing bias in genome construction. Genomes reconstructed with a reference appear to be more similar to the reference than they actually are. Pan genomic reference systems can reduce this bias by enabling the direct relationship of new genomes to all those represented in the pangenome
-
Standard pangenomics focus on the presence/absence of genes, and fail to pay attention to the variation between these sequences. Pan genomic graphs attempt to provide a precise model relating many genomes to each other at the base level.