-
Notifications
You must be signed in to change notification settings - Fork 4
Partial runs [Deprecated]
João Sequeira edited this page May 16, 2023
·
2 revisions
[Deprecated warning] The instructions in this chapter only work with MOSCA up to version 1.6.1
. However, I'm leaving them here as someone might be interested in using this information.
You may not want to use the entire workflow of MOSCA. Here follow some interesting examples of tasks that are better executed running parts of MOSCA separately. The following commands assume you have installed MOSCA as instructed.
MOSCA's preprocessing script can be used standalone, as it automatically downloads all resources required.
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/preprocess.py -i {your input reads (e.g. mg_R1.fq,mg_R2.fq)} -t {number of threads} -o {output directory} -adaptdir {resources directory}/adapters -rrnadbs {resources directory}/rRNA_databases -d {data_type (either "dna" or "mrna")} -rd {resources directory} -n --minlen {minimum length of reads to keep} --avgqual {minimum average quality of reads to keep}
MOSCA's differential expression analysis module requires replicates. MOSCA's analysis is still possible without replicates by bypassing this task:
- First, preprocess your datasets as explained above
- Join your reads by sample by running, for each "forward" and "reverse" files, the following command:
cat {forward_file} >> {output}/Preprocess/{sample}_forward.fastq
cat {reverse_file} >> {output}/Preprocess/{sample}_forward.fastq
- Perform assembly by running this, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/assembly.py -r {output}/Preprocess/{sample}_forward.fastq,{output}/Preprocess/{sample}_reverse.fastq -t {threads} -o {output}/Assembly/{sample} -a {assembler (either "metaspades" or "megahit"} -m {max_memory}
- Perform binning, if you want to, by running, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/binning.py -c {output}/Assembly/{sample}/contigs.fasta -t {threads} -o {output}/Binning/{sample} -r {output}/Preprocess/{sample}_forward.fastq,{output}/Preprocess/{sample}_reverse.fastq -mset {markerset (either "107" or "40")}
- Perform gene calling and annotation over the contigs by running, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/annotation.py -i {output}/Assembly/{sample}/contigs.fasta -t {threads} -o {output}/Annotation/{sample} -em {error_model} -db {path/to/diamond_database.(fasta/dmnd)} -mts {diamond_max_target_seqs} --assembled"
- Run UPIMAPI for each sample
upimapi.py -i {output}/Annotation/{sample}/aligned.blast -o {output}/Annotation/uniprotinfo --blast --full-id
- Run reCOGnizer for each sample
recognizer.py -f {output}/Annotation/{sample}/fgs.faa -t {threads} -o {output}/Annotation/{sample} -rd {path/to/resources_directory} --remove-spaces
- Run quantification, all at once
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/quantification_analyser.py -e {path/to/experiments_file} -t {threads} -o {output} -if {input_format_of_experiments_file ("excel" or "tsv")}
- Join all information
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/join_information.py -e {path/to/experiments_file} -t {threads} -o {output} -if {input_format_of_experiments_file ("excel" or "tsv")} -nm {normalization_method ("TMM" or "RLE"}
- Run KEGGCharter
kegg_charter.py -f {output}/MOSCA_Entry_Report.xlsx -o {output}/KEGG_maps -mm {metabolic_maps comma-separate (e.g. 00030,00680,...)} -gcol {mg_names comma-separated} -tcol {mt_names comma-separated} -tc 'Taxonomic lineage ({taxa_level})' -not {number_of_taxa} -keggc 'Cross-reference (KEGG)'
- Run final reporting
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/report.py -e {path/to/experiments_file} -o {output} -ldir ~/anaconda3/envs/mosca/share/MOSCA/resources -if {input_format_of_experiments_file ("excel" or "tsv")}