-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update hmmer to working version with PGAP * update hmmer to working version with PGAP (#59) (#60) * Adding option for using `bakta` (#58) * first bakta addition test * Fixing error when phigaro output was empty (#55) (#56) * adding checkIfExists rules to input loading * Update build.sh * fixed report if else statement * updated version and changelog * fix example samplesheet url * Create nf-core-bacannot-compare_logo_dark.png * Update phigaro.nf * Update CHANGELOG.md * Update CHANGELOG.md * Update .gitignore * Update bacannot.nf * workflow runs from top to bottom * workflow correctly working * changing version manifest * explaining how to use bakta * adding variables to generic annotation * adding victors backup db * report_general understands bakta * report_resistance understands bakta * update hmmer to working version with PGAP * update hmmer to working version with PGAP (#59) * update hmmer to working version with PGAP (#59) (#60) * Update resfinder2gff.py * Update addBedtoolsIntersect.R * Update addBedtoolsIntersect.R * Update addBedtoolsIntersect.R * Update addBedtoolsIntersect.R * Update docker.config * download db if not available * Update merge_annotations.nf * Update merge_annotations.nf * update process label * Update merge_annotations.nf * fix docker file ownerships * added retry label for bakta * update changelog * changing to new version * add falmeida-py scripts to docker * Create generate_dockers.sh * Minor improvements for custom database annotation (#67) * Add step making Blast DB when annotating custom db Blast database throw error for no alias or index file found for nucleotide database, when using blastn. Thus, make blast database before execute blastn to custom database on genome. * Remove space when generate link in Shiny markdown Whitespace occurs when generated links for custom database report in Shiny markdown. Specify seperate string to emtpy string to fix this. * fix: script was calling an out of bounds value * handle exceptions and missing values * increment test profile * also build for proteins * Use correct version of customdb_gffs in REPORTS module Co-authored-by: fmalmeida <felipemarques89@gmail.com> * fixed docker images * resfinder now uses conda env * rgi does not have --exclude_nudge anymore * add circos conf files * fix indentation * add circos sub-workflow calling * add circos tool * added karyotype and gc_skew modules * fix indentation * modules now have a channel suitable for summary generation * include summary generation module * update docker image * add aro index tsv * add amrfinder2tsv module * add bacannot json summary generation * update docker images * add new modules * added binaries for circos plot * update docker files * add label to module * finalise modules preparing data for circos * include whole circos 'pathway' in workflow * add circos module * circos plot currently functional * re-organised circos outputs * ameliorate default version of circos plot * fixed docker image * fixed custom annotation problem in server * added circos plot with easy_circos * circos plot properly being generated * update changelog * update readme * update documentation * fix tool name * fix dir * update indentation * fixed schema * add pipefail * fix docker image Co-authored-by: Cy <38537368+lam-c@users.noreply.github.com>
- Loading branch information
Showing
62 changed files
with
6,658 additions
and
785 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
#!/usr/bin/env python3 | ||
|
||
## calculate GC | ||
from __future__ import division | ||
|
||
# __author__ = "Wenchao Lin" | ||
# __copyright__ = "Copyright 2014,Tianjin Biochip Corp." | ||
# __license__ = "GPL" | ||
# __version__ = "1.0.0" | ||
# __maintainer__ = "Wenchao Lin" | ||
# __email__ = "linwenchao@yeah.net" | ||
|
||
|
||
from Bio import SeqIO | ||
import sys | ||
from optparse import OptionParser | ||
import numpy as np | ||
|
||
def GC_content_window(s): | ||
|
||
""" | ||
Return GC content of input sequence | ||
""" | ||
|
||
gc = sum(s.count(x) for x in ['G','C','g','c','S','s']) | ||
gc_content = gc/float(len(s)) | ||
return round(gc_content,4) | ||
|
||
|
||
def GC_skew_window(s): | ||
|
||
""" | ||
Reuturn GC skew of a sequence | ||
""" | ||
|
||
g = s.count('G')+s.count('g') | ||
c = s.count('C')+s.count('c') | ||
|
||
try: | ||
skew = (g-c)/float(g+c) | ||
except ZeroDivisionError: | ||
skew = 0 | ||
return round(skew,4) | ||
|
||
|
||
def main(): | ||
|
||
""" | ||
Calculate GC content and GC skew of input Fasta sequence | ||
""" | ||
|
||
usage = "usage: %prog -f input.fa [-w 1000] [-s 1000]" | ||
parser = OptionParser(usage = usage) | ||
parser.add_option("-f","--file",dest="filename",help="Input Fasta format file",metavar="FASTA") | ||
parser.add_option("-w","--window",dest="WindowSize",help="default:1000 WindowSize to calculate",default=1000,type='int') | ||
parser.add_option("-s","--step",dest="StepSize",help="default:1000 StepSize for slide widows",default=1000,type='int') | ||
(options,args) = parser.parse_args() | ||
|
||
window = options.WindowSize | ||
step = options.StepSize | ||
seqobj = SeqIO.parse(options.filename,'fasta') | ||
|
||
for record in seqobj: | ||
pos_array = [] | ||
gc_content_value_array = [] | ||
gc_skew_value_array = [] | ||
name = record.id | ||
seq = record.seq | ||
start = 0 | ||
end = 0 | ||
gc = 0 | ||
gc_skew = 0 | ||
|
||
for i in range(0,len(seq),step): | ||
subseq = seq[i:i+window] | ||
gc_content = (GC_content_window(subseq)) | ||
gc_skew = (GC_skew_window(subseq)) | ||
start = (i + 1 if (i+1<=len(seq)) else i) | ||
end = ( i + step if (i+ step<=len(seq)) else len(seq)) | ||
print ("%s\t%s\t%s\t%s\t%s" % (name,start,end,gc_content,gc_skew)) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() | ||
sys.exit() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
#!/usr/bin/env Rscript | ||
# A supporting script for writing amrfinderplus results as csv! | ||
# To be used with fmalmeida/bacannot-compare | ||
# Author: Felipe Marques de Almeida <almeidafmarques@gmail.com> | ||
|
||
################################################# | ||
### Setting Help message and input parameters ### | ||
################################################# | ||
'usage: amrfinder2csv.R [ --input=<file> --output=<chr> --aro=<file> --sample=<chr> ] | ||
options: | ||
-i, --input=<file> Bacannot JSON summary | ||
-o, --output=<chr> Output file [default: out.tsv] | ||
-a, --aro=<file> CARD ARO categories | ||
-s, --sample=<chr> Filter results of only this sample [default: all] | ||
' -> doc | ||
|
||
# Parse parameters | ||
suppressMessages(library(docopt)) | ||
opt <- docopt(doc) | ||
|
||
if (is.null(opt$input)){ | ||
stop("At least one argument must be supplied (input file)\n", call.=FALSE) | ||
} | ||
|
||
# | ||
# load libraries | ||
# | ||
library(jsonlite) | ||
library(stringr) | ||
library(dplyr) | ||
|
||
# | ||
# functions for CARD metadata | ||
# | ||
aro_index <- | ||
read.csv( opt$aro, sep = "\t" ) | ||
aro_index$ARO.Accession <- | ||
str_remove(aro_index$ARO.Accession, "ARO:") | ||
|
||
get_aro_category <- function(aro) { | ||
#aro <- 3003923 | ||
df <- aro_index %>% dplyr::filter(ARO.Accession == aro) | ||
result <- str_split(df$Drug.Class, ";")[[1]][1] # tentar padronizar pegando so o primeiro | ||
return(result) | ||
} | ||
|
||
get_aro_gene_name <- function(aro) { | ||
# aro <- 3003923 | ||
df <- aro_index %>% filter(ARO.Accession == aro) | ||
result <- df$CARD.Short.Name | ||
return(result) | ||
} | ||
|
||
# | ||
# read annotation summary file | ||
# | ||
summary_file <- | ||
read_json( opt$input ) | ||
samples <- | ||
names( summary_file ) # get available samples in file | ||
|
||
# | ||
# start empty data.frames for parsing | ||
# | ||
identified_genes <- | ||
setNames( | ||
data.frame( | ||
matrix(ncol = 6, nrow = 0) | ||
), c("sample", "contig", "start", "end", "gene", "subclass") | ||
) | ||
without_aro <- | ||
setNames( | ||
data.frame( | ||
matrix(ncol = 6, nrow = 0) | ||
), c("sample", "contig", "start", "end", "gene", "subclass") | ||
) | ||
|
||
# | ||
# parse AMRFinderPlus Results | ||
# | ||
for (sample in samples) { | ||
subset <- summary_file[[sample]]$resistance$amrfinderplus | ||
all_entries <- names(subset) | ||
for (entry in all_entries) { | ||
if (entry != "total") { | ||
|
||
if (subset[[entry]][['type']] == 'AMR') { | ||
|
||
contig = subset[[entry]]$contig | ||
start = subset[[entry]]$start | ||
end = subset[[entry]]$end | ||
|
||
if (is.null(subset[[entry]][['card_aro']])) { | ||
subclass = subset[[entry]]$subclass | ||
without_aro[nrow(without_aro) + 1,] <- | ||
c( sample, contig, start, end, subset[[entry]]$gene, subclass ) | ||
} else { | ||
if (is.null(subset[[entry]][['subclass']])) { | ||
subclass = "None" | ||
} else { | ||
subclass = get_aro_category(subset[[entry]]$card_aro) | ||
} | ||
identified_genes[nrow(identified_genes) + 1,] <- | ||
c( sample, contig, start, end, subset[[entry]]$gene, subclass ) | ||
} | ||
|
||
} | ||
} | ||
} | ||
} | ||
|
||
# | ||
# filter results | ||
# | ||
if (as.character(opt$sample) != 'all') { | ||
identified_genes <- | ||
identified_genes %>% dplyr::filter( as.character(sample) == as.character(opt$sample) ) | ||
} | ||
|
||
# | ||
# save table as file | ||
# | ||
write.table( | ||
identified_genes, | ||
file = as.character( opt$output ), | ||
quote = FALSE, | ||
sep = "\t", | ||
row.names = FALSE | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
#!/usr/bin/env Rscript | ||
# A supporting script for writing VFDB results as csv! | ||
# To be used with fmalmeida/bacannot-compare | ||
# Author: Felipe Marques de Almeida <almeidafmarques@gmail.com> | ||
|
||
################################################# | ||
### Setting Help message and input parameters ### | ||
################################################# | ||
'usage: vfdb2csv.R [ --input=<file> --sample=<chr> --output=<chr> ] | ||
options: | ||
-i, --input=<file> Bacannot JSON summary | ||
-o, --output=<chr> Output file [default: out.tsv] | ||
-s, --sample=<chr> Filter results of only this sample [default: all] | ||
' -> doc | ||
|
||
# Parse parameters | ||
suppressMessages(library(docopt)) | ||
opt <- docopt(doc) | ||
|
||
if (is.null(opt$input)){ | ||
stop("At least one argument must be supplied (input file)\n", call.=FALSE) | ||
} | ||
|
||
# | ||
# load libraries | ||
# | ||
library(jsonlite) | ||
library(stringr) | ||
library(dplyr) | ||
|
||
# | ||
# read annotation summary file | ||
# | ||
summary_file <- | ||
read_json( opt$input ) | ||
samples <- | ||
names( summary_file ) # get available samples in file | ||
|
||
# | ||
# start empty data.frames for parsing | ||
# | ||
identified_genes <- | ||
setNames( | ||
data.frame( | ||
matrix(ncol = 6, nrow = 0) | ||
), c("sample", "contig", "start", "end", "gene", "subclass") | ||
) | ||
|
||
# | ||
# parse VFDB Results | ||
# | ||
for (sample in samples) { | ||
subset <- summary_file[[sample]]$virulence$VFDB | ||
all_entries <- names(subset) | ||
for (entry in all_entries) { | ||
if (entry != "total") { | ||
|
||
contig = subset[[entry]]$chr | ||
start = subset[[entry]]$start | ||
end = subset[[entry]]$end | ||
subclass = subset[[entry]]$virulence_factor | ||
gene = subset[[entry]]$gene | ||
|
||
identified_genes[nrow(identified_genes) + 1,] <- | ||
c( sample, contig, start, end, gene, subclass ) | ||
|
||
} | ||
} | ||
} | ||
|
||
identified_genes <- | ||
identified_genes %>% | ||
distinct() | ||
|
||
# | ||
# filter results | ||
# | ||
if (as.character(opt$sample) != 'all') { | ||
identified_genes <- | ||
identified_genes %>% | ||
dplyr::filter( as.character(sample) == as.character(opt$sample) ) | ||
} | ||
|
||
# | ||
# save table as file | ||
# | ||
write.table( | ||
identified_genes, | ||
file = as.character( opt$output ), | ||
quote = FALSE, | ||
sep = "\t", | ||
row.names = FALSE | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.