Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-t 24 not working threading #96

Open
jianshu93 opened this issue Nov 3, 2022 · 21 comments
Open

-t 24 not working threading #96

jianshu93 opened this issue Nov 3, 2022 · 21 comments

Comments

@jianshu93
Copy link

Hello Team,

parasail_aligner -a parasail_sg_qx_striped_32 -q SILVA_138.1_SSURef_tax_silva_prok_nr_sbsample_half.fasta -f Danish_01.fa -d -t 24 -g parasail.csv

for many sequences in query file (-q) and only one in -f file, I noticed that parasail is not parallel at all despite I ask it to us 24 reads.

Any idea why?

Thanks,

Jianshu

@jeffdaily
Copy link
Owner

I'm assuming you were building from source since the parasail_aligner app isn't shipped as part of the wheel installs.

  • Is this Windows or Linux or MacOS?
  • Did you build using configure, cmake, or meson?

If you run parasail_aligner -v it will have one of the following messages:

threads: system-specific default, must be >= 1

or

threads: Warning: ignored; OpenMP was not supported by your compiler

Which one do you see?

@jeffdaily
Copy link
Owner

Also, if openmp was not found during configuration, you should receive a runtime warning if you specified -t but it wasn't supported.

-t number of threads requested, but OpenMP was not found during configuration. Running without threads.

@jianshu93
Copy link
Author

it is linux. I am using cmake and it is the first situation mentioned. I do not have openmp error.

Jianshu

@jeffdaily
Copy link
Owner

The code is calling omp_set_num_threads and not doing anything else special. I found the following answer in stack overflow that might help you: https://stackoverflow.com/a/11096742.

Try setting the env var OMP_DYNAMIC=0 and see if it helps.

Does your top or htop output verify whether threading is being used?

@jianshu93
Copy link
Author

Hello Jeff,

I was trying but it does not allow me to run in background with the following error:

input file, query file, and stdin detected; max inputs is 2

I was using the slurm script to submit to a supercomputer:

#!/bin/bash

#SBATCH --partition=ieg_128g,ieg_lm ### Partition (like a queue in PBS)
#SBATCH --job-name=parasial_16S ### Job Name
#SBATCH -o /condo/ieg/jianshu/log/%x.%j.%N.out ### File in which to store job output
#SBATCH -e /condo/ieg/jianshu/log/%x.%j.%N.err ### File in which to store job error
#SBATCH --time=48:00:00 ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1 ### Node count required for the job
#SBATCH --ntasks=1 ### Nuber of tasks to be launched per Node
#SBATCH --cpus-per-task=24 ### Number of threads per task (OMP threads)
#SBATCH --mem=60G ### memory for each job
#SBATCH --mail-type=FAIL ### When to send mail
#SBATCH --mail-user=jianshuzhao@yahoo.com. ### mail to send
#SBATCH --get-user-env ### Import your user environment setup
#SBATCH --requeue ### On failure, requeue for another try
#SBATCH --verbose

source ~/.bashrc
cd /home/jianshu/data
which parasail_aligner
parasail_aligner -a parasail_sg_qx_striped_32 -q SILVA_138.1_SSURef_tax_silva_prok_nr_new.fasta -f Danish_01.fa -d -t 24 -g parasail.csv

the parasail-aligner is obtained by compiling using cmake (mkdir build; cd build;cmake ..; make -j 12).

nohup & did not work with the same error.

Any idea why?

Thanks,

Jianshu

@jeffdaily
Copy link
Owner

Is this still related to your original question of openmp not working? Can we open a new issue for the new observation input file, query file, and stdin detected; max inputs is 2? It seems parasail_aligner needs some fixes to how it detects whether there is piped input from stdin.

@jianshu93
Copy link
Author

Yes! I need to run it in background to use htop or top. It is a server so I do not have choices. and the query sequence file is very large,2 million sequences.

Thanks,

Jianshu

@jeffdaily
Copy link
Owner

Please try pulling and building the following branch. It needs more testing, but I hope it resolves your current issue with stdin. Please let me know if it does resolve your issue so I can create a new release.

hotfix/2.6.1

@jianshu93
Copy link
Author

jianshu93 commented Nov 4, 2022

Hello Jeff,

I still have it, it is really strange (I download the zip of hot fix branch and compile then compile it). I was using:

nohup parasail_aligner -a parasail_sg_qx_striped_32 -q SILVA_138.1_SSURef_tax_silva_prok_nr_new.fasta -f Danish_01.fa -d -t 24 -g parasail.csv &

and error is:

input file, query file, and stdin detected; max inputs is 2
0.00user 0.01system 0:00.02elapsed 55%CPU (0avgtext+0avgdata 5936maxresident)k
0inputs+8outputs (0major+1560minor)pagefaults 0swaps

Thanks,

Jianshu

@jianshu93
Copy link
Author

Do you have the same problem on your side? e.g., running nohup &

Thanks,

Jianshu

@jianshu93
Copy link
Author

I do not have any problems on MacOS after following exactly the same compiling, which is very strange.

Thanks,

Jianshu

@jeffdaily
Copy link
Owner

I could reproduce with nohup. Please try the following, where you pipe the query file through nohup as stdin. parasail_aligner does accept stdin query files.

nohup parasail_aligner -a parasail_sg_qx_striped_32 -f Danish_01.fa -d -t 24 -g parasail.csv < SILVA_138.1_SSURef_tax_silva_prok_nr_new.fasta &

@jianshu93
Copy link
Author

Just tried, still the same error with < SILVA_138.1_SSURef_tax_silva_prok_nr_new.fasta

@jianshu93
Copy link
Author

Hello Jeff,

I am very confused with the output, of the above command used:

0,0,1490,1541,1059,1489,1540
1,0,1535,1541,1097,1534,1540
2,0,1534,1541,1095,1533,1540
3,0,1545,1541,912,1544,1539
4,0,1515,1541,998,1514,1539
5,0,1514,1541,990,1513,1539
6,0,1514,1541,988,1513,1539

which one is alignment score by default?

Thanks,

Jianshu

@jianshu93
Copy link
Author

Hello Jeff,

A quick question: is the score a metric? Especially the triangular rules, for sg mode?

Thanks
jianshu

@jeffdaily
Copy link
Owner

For the default/basic output, it writes one line per alignment performed. The sequences are numbered starting from 0 for the input file and query.

i,
j,
i_len,
j_len,
parasail_result_get_score(result),
parasail_result_get_end_query(result),
parasail_result_get_end_ref(result))

@jianshu93
Copy link
Author

Hello Jeff,

I found that parasail generates very different results compare to other global alignment tools, such as vsearch and edlib, I used this command:

parasail_aligner -a parasail_sg_striped_32 -f Danish_01.fa -d -t 4 -g parasail.csv -q Danish_HQ_MQ_MAG_16S_new.fa

Test.zip

parasail.csv
edlib-aligner_Danish_01_new.txt

query_vsearch_new.txt

parasail.csv is the output sorted by score column. edlib-aligner_Danish_01_new.txt is results from edlib

edlib-aligner -m HW -p -l Danish_HQ_MQ_MAG_16S.fa Danish_01.fa > edlib-aligner_Danish_01_alignment.txt

while query_vsearch_new.txt is from vsearch:

vsearch --usearch_global ./Danish_01.fa --db Danish_HQ_MQ_MAG_16S.fa --id 0.1 --strand both --maxaccepts 0 --maxrejects 0 --blast6out query_vsearch_new.txt --threads 4

I attached the 2 fasta files in Test.zip. I double checked that edlib and vsearch has very similar results for top 5 best hits found while parasail is very different. I used semi-global alignment for all tools.

Any idea why?

Thanks,

Jianshu

@jianshu93
Copy link
Author

Note that you may need to use grep to extract fasta ID from edlib and parasail output for query names to compare with vsearch.

Jianshu

@jeffdaily
Copy link
Owner

The alignment function you selected is semi-global, the "sg" in parasail_sg_striped_32. If you wanted global alignment, that would be "nw" for Needleman-Wunsch.

@jianshu93
Copy link
Author

yes I want semi-global. The other two are all semi-global. Have you benchmarked againt standard dataset?

Thanks

Jianshu

@jeffdaily
Copy link
Owner

Perhaps a more specific semi-global alignment behavior is what you were looking for? Please see the table of all semi-global options.

https://github.com/jeffdaily/parasail#standard-function-naming-convention

As far as benchmarking against standard datasets, no. The tests I wrote only ensure that the reference (non-vectorized) implementations such as parasail_sw get the same results as all of the vectorized variants. The local alignment implementations were initially based on the SSW library and confirmed to get the same results. Early versions of this software only calculated the alignment score and some alignment statistics; when the traceback feature was added the results were compared against EMBOSS and SSW against a handful of randomly selected sequences that are part of the parasail source tree under the data directory.

Also, this project is mostly in maintenance mode. I do not have the time to benchmark against any datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants