You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey everyone,
I am currently working on a new module (gClust) within the cosmosis-standard-library, which is internally heavily parallelized with OpenMP. Similarly, are of course CLASS and CAMB.
This might be now a rookie question, but I was unfortunately not able to resolve it - even with the help of the web and ChatGPT. So, I would be grateful for some insides and other experiences with the parallelization in cosmosis.
After running cosmosis with emcee for already quite some time, I've noticed that MPI for emcee seems to disrupt the OpenMP parallelization within CLASS and my current module. Since the single task timing seemed to be scaling with the number of threads pretty well without MPI, I was previously not concerned about the performance of the chains. Nonetheless, since I am currently optimizing the chains even further, I've stumbled over that fact and wanted to ask if others experience similar issues and if there might be an easy fix to it?
To be precise, here is the testing I've made locally on my local PC (similar tests on two different clusters were leading to similar results):
Without MPI:
1 Thread: 1/1.7 sec. (CLASS/gClust)
10 Threads: 0.35/0.4 sec. (CLASS/gClust)
With MPI (-n 1):
1 Thread: 1/1.7 sec. (CLASS/gClust)
10 Threads: 1/1.7 sec. (CLASS/gClust)
Already checked:
OMP_NUM_THREADS propagates correctly in both cases towards the parallel zone
srun and mpirun behave the same (while smp is not working for me on neither machine, since it simply gets stuck in the beginning without returning an error)
changing the number of tasks, to see if the overhead within MPI might be the issue, doesn't affect the single task performance
Is that actually expected coming from the overhead, or is MPI actually not forking out the threads within the parallel zone properly?
Thank you very much in advance!
Greetings,
Dennis
The text was updated successfully, but these errors were encountered:
Hey everyone,
I am currently working on a new module (gClust) within the cosmosis-standard-library, which is internally heavily parallelized with OpenMP. Similarly, are of course CLASS and CAMB.
This might be now a rookie question, but I was unfortunately not able to resolve it - even with the help of the web and ChatGPT. So, I would be grateful for some insides and other experiences with the parallelization in cosmosis.
After running cosmosis with emcee for already quite some time, I've noticed that MPI for emcee seems to disrupt the OpenMP parallelization within CLASS and my current module. Since the single task timing seemed to be scaling with the number of threads pretty well without MPI, I was previously not concerned about the performance of the chains. Nonetheless, since I am currently optimizing the chains even further, I've stumbled over that fact and wanted to ask if others experience similar issues and if there might be an easy fix to it?
To be precise, here is the testing I've made locally on my local PC (similar tests on two different clusters were leading to similar results):
Without MPI:
With MPI (-n 1):
Already checked:
Is that actually expected coming from the overhead, or is MPI actually not forking out the threads within the parallel zone properly?
Thank you very much in advance!
Greetings,
Dennis
The text was updated successfully, but these errors were encountered: