Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI Parallelization in combination with OpenMP #166

Open
lindede249 opened this issue Jan 23, 2025 · 0 comments
Open

MPI Parallelization in combination with OpenMP #166

lindede249 opened this issue Jan 23, 2025 · 0 comments

Comments

@lindede249
Copy link

Hey everyone,
I am currently working on a new module (gClust) within the cosmosis-standard-library, which is internally heavily parallelized with OpenMP. Similarly, are of course CLASS and CAMB.
This might be now a rookie question, but I was unfortunately not able to resolve it - even with the help of the web and ChatGPT. So, I would be grateful for some insides and other experiences with the parallelization in cosmosis.

After running cosmosis with emcee for already quite some time, I've noticed that MPI for emcee seems to disrupt the OpenMP parallelization within CLASS and my current module. Since the single task timing seemed to be scaling with the number of threads pretty well without MPI, I was previously not concerned about the performance of the chains. Nonetheless, since I am currently optimizing the chains even further, I've stumbled over that fact and wanted to ask if others experience similar issues and if there might be an easy fix to it?

To be precise, here is the testing I've made locally on my local PC (similar tests on two different clusters were leading to similar results):

Without MPI:

  • 1 Thread: 1/1.7 sec. (CLASS/gClust)
  • 10 Threads: 0.35/0.4 sec. (CLASS/gClust)

With MPI (-n 1):

  • 1 Thread: 1/1.7 sec. (CLASS/gClust)
  • 10 Threads: 1/1.7 sec. (CLASS/gClust)

Already checked:

  • OMP_NUM_THREADS propagates correctly in both cases towards the parallel zone
  • srun and mpirun behave the same (while smp is not working for me on neither machine, since it simply gets stuck in the beginning without returning an error)
  • changing the number of tasks, to see if the overhead within MPI might be the issue, doesn't affect the single task performance

Is that actually expected coming from the overhead, or is MPI actually not forking out the threads within the parallel zone properly?

Thank you very much in advance!

Greetings,
Dennis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant