-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTF entries on sequence which is not present in FASTA #151
Comments
Hey - thanks for the issue. Breaking this down
TL,DR:The GTF file you provide as input most likely contains entries that are not present in the input FASTA file. How to fix?This is something that the pipeline should be able to handle, and it is not complicated to implement. Can probably do this later this week. If you need this more urgently, you could manually remove the problematic entries from the GTF file. Anything else?As the pipeline is still under a lot of development, please make sure you use the latest version ( |
Thank you very much for taking time to write me back. Really appreciate it. I wonder if you can provide me some programable way to exclude those transcriptome entries in gtf file that are not in genome.fa? I feel there must be some tools to handle it. I'll look for such tool myself as well. And I'll look forward to hearing the good news from you that the pipeline is fixed. |
Hey, could you check out the branch associated with this issue and give it a try? You can do this by adding |
Thank you, Nictru, I will test it out and let you know how it goes. |
Any updates? |
Sorry for the delay, and I ran the new dev version of the pipeline and here are the results: |
|
Okay great! Will create a PR |
However, the actual pipeline run with the user data and configuration failed at the 'trimgalore', or more specifically, 'cutadapt' stage: cutadapt: error: Compressed file ended before the end-of-stream marker was reached Cutadapt terminated with exit signal: '256'. |
Hey, could you verify that the corresponding input fastq file has not been corrupted? Maybe a download problem or a problem during compression? |
Yes, you are right. It turned out the region2 read file was truncated. I will try to find if I can get the good sequence files and rerun the test. Thank you for pointing this out to me. |
Description of the bug
recently I tried to help a user to run the nf-core/circRNA pipeline, but met tons of errors. And this is one of the latest after I downloaded the reference genome from aws, and rerun everything from scratch, that is, I removed all the residual ~/.nextflow, ~/.singularity and all that in my working directory when I run the command. Here is the command line I ran:
module purge
module load java/17.0.8
module load nextflow/23.04.3 # old 21.10.6 doesn't work
#Set the new Singularity cache directory path
export SING_DIR=/projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3
# Export the NXF_SINGULARITY_CACHEDIR environment variable to point to the new cache directory
export NXF_SINGULARITY_CACHEDIR=$SING_DIR/.singularity
# Create the new Singularity cache directory if it doesn't already exist
mkdir -p $NXF_SINGULARITY_CACHEDIR
# Remove the existing ~/.singularity symbolic link
rm -rf ~/.singularity
# Create a new symbolic link for ~/.singularity that points to the new cache directory
ln -s $NXF_SINGULARITY_CACHEDIR ~/.singularity
cd $SING_DIR
# Yun Test_rerun on 06/16/2024:
nextflow run nf-core/circrna -profile singularity --genome GRCh37 --igenomes_base /projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/ref/references --input samples.csv --module 'circrna_discovery' --outdir test3_out -r dev
And here is my local references folder structure:
./Homo_sapiens
./Homo_sapiens/Ensembl
./Homo_sapiens/Ensembl/GRCh37
./Homo_sapiens/Ensembl/GRCh37/Annotation
./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA
./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/mature.fa
./Homo_sapiens/Ensembl/GRCh37/Annotation/SmallRNA/hairpin.fa
./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes
./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf
./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed
./Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/refFlat.txt.gz
./Homo_sapiens/Ensembl/GRCh37/Sequence
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.dict
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai
./Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/GenomeSize.xml
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.rev.2.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.2.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.1.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.3.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.4.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome.rev.1.ebwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrName.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/exonInfo.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/SAindex
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/Genome
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/geneInfo.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbList.out.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbInfo.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/transcriptInfo.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/sjdbList.fromGTF.out.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrLength.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/Log.out
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genes.gtf
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/genomeParameters.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrStart.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/SA
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/chrNameLength.txt
./Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/exonGeTrInfo.tab
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.3.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.1.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.rev.1.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.4.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.2.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.rev.2.bt2
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.pac
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.ann
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.pac
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.ann
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.sa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.amb
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/genome.fa.bwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.pac
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.ann
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rpac
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rsa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.sa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.amb
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.rbwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.5.x/genome.fa.bwt
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.sa
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.amb
./Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa.bwt
and the error I met is as below:
Jun-17 17:41:27.965 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 11; name: NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome); status: COMPLETED; exit: 1; error: -; workDir: /projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20]
Jun-17 17:41:27.970 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome); work-dir=/projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20
error [nextflow.exception.ProcessFailedException]: Process
NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)
terminated with an error exit status (1)Jun-17 17:41:27.987 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)'
Caused by:
Process
NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME (transcriptome)
terminated with an error exit status (1)Command executed:
gffread
-g genome.fa
-w transcriptome.fasta
transcriptome.filtered.gtf
cat <<-END_VERSIONS > versions.yml
"NFCORE_CIRCRNA:CIRCRNA:QUANTIFICATION:TRANSCRIPTOME":
gffread: $(gffread --version 2>&1)
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
FASTA index file genome.fa.fai created.
Warning: couldn't find fasta record for 'GL000191.1'!
Error: no genomic sequence available (check -g option!).
Work dir:
/projectnb/zhangh/Arshiya_Saxena/HumanBrain_circRNA/nf-core_circrna_pipeline-NAC/yun_test3/work/82/1063296a24d1d4c319a6bd85bdeb20
Looking into a big details about 'gffread' command, I think it is expecting a 'gff' file, while the pipeline provided a 'gtf' as an input file (transcriptome.filtered.gtf) and would this be the cause of the error? if so, how could this happen? is there anything missing in my command line? We used to be able to run the pipeline around Oct 2023, or maybe even this April, but now everything seems broken.
Please help. Thank you very much!!!
The text was updated successfully, but these errors were encountered: