-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the MeSS wiki!
The Metagenomic Snakemake Simulator (MeSS) is a pipeline designed to generate metagenomic mock communities from a set of genomes with user-defined, read proportions. MeSS can be broken down into two main steps:
The first step of the workflow relies on a set of rules from Assembly_finder. Using user-queried taxonomy identifiers or scientific names, Assembly_finder searches all available assemblies from NCBI according to multiple criteria such as Refseq categories, assembly status, contig count and Genbank release date. Once selected, assemblies are then downloaded.
MeSS makes use of art_illumina to generate sequencing reads with error profiles corresponding to a sequencing technology of choice. art_illumina generates a set of reads for each fasta header from the assembly file, thus, to avoid generating reads for each contig in a fragmented assembly, all contigs are merged and seperated by 1000N nucleotides.
By default, MeSS generates even distribution within one superkingdom, thus for 9 bacterial and 2 viral species were queried, each bacterial and viral species will represent respectively 1% and 0.5% of the total number of reads. Furthermore, the pipeline offers the possibility to modify relative read abundance, by setting read percentages for human, virus, bacteria and non-human eukaryotes. To generate the metagenome fasta file, scripts are used to concatenate all reads into one fastq file while shuffling read order to avoid structure in the data.
Finally, to visualize the metagenome's contents, Krona charts representing read proportions can be generated.