Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gambit segmentation fault #1

Open
aseshkdatta opened this issue Jul 22, 2021 · 47 comments
Open

Gambit segmentation fault #1

aseshkdatta opened this issue Jul 22, 2021 · 47 comments
Assignees

Comments

@aseshkdatta
Copy link

Hi,

The example mpirun of gambit as suggested on page 21
of the GUM paper on arXiv (2107.00030) is leading to
segmentation fault on a particular (desktop) machine.
The message is as follows.

Could you please help.

Thanks and regards.
Asesh

=================================================
asesh@albert:~/MyPlace/Packages/gambit_2.0$ time mpirun -n 4 gambit -f yaml_files/MDMSM_Tute.yaml

GAMBIT 2.0.0
http://gambit.hepforge.org

GAMBIT 2.0.0
http://gambit.hepforge.org

GAMBIT 2.0.0
http://gambit.hepforge.org

GAMBIT 2.0.0
http://gambit.hepforge.org


mpirun noticed that process rank 3 with PID 0 on node albert exited on signal 11 (Segmentation fault).

real 0m5.359s
user 0m6.412s
sys 0m0.089s
asesh@albert:~/MyPlace/Packages/gambit_2.0$

@patscott
Copy link
Member

Hi Asesh,

@tegonzalo has been on holiday but I think he'll be back online at the start of next week and might be able to help out. In the meantime, can you try installing a different MPI library please, and also posting the the log files so that we can see where the segmentation fault seems to have occurred?

Thanks - Pat

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@patscott
Copy link
Member

In that case, please re-cmake and re-compile with MPI support completely turned off (-DWITH_MPI=OFF), confirm that you still still get the error, and send us the single log from that run. It seems like the issue is probably some other library (not MPI). I think it unlikely that it is memory related, as GAMBIT takes much less memory to run than to compile.

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 29, 2021 via email

@patscott
Copy link
Member

patscott commented Jul 30, 2021

Also, when I face the following message due to a previous `make', how should I proceed; simply proceed pressing 'enter'?

This one indicates that something has gone wrong in the download or build of micromegas. In this case you need to nuke micromegas and try building it again: nuke micromegas_MDMSM; make micromegas_MDMSM.

As to the logs, what I mean is the default log in the output directory of the GAMBIT run that you are trying to launch, i.e runs/MDMSM/logs/default.log. If that doesn't exist, then please send scratch/default.log.

To attach files, I think you need to log into GitHub (the ones you apparently attached by email don't seem to have come through).

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 30, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 30, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Jul 30, 2021 via email

@aseshkdatta
Copy link
Author

Screenshot from 2021-07-30 16-48-01

@aseshkdatta
Copy link
Author

Attached the mentioned screen-shot.
Thanks.

@patscott
Copy link
Member

Hi Asesh - OK, maybe we should start with your full cmake output. In your build dir, please run

make nuke-all
rm -rf *
cmake [your cmake options] ...

and post the output here.

@aseshkdatta
Copy link
Author

Hi Pat,

Thanks a lot.

After executing "make nuke-all" and "rm -rf *" from the "build" dir,
I ran "cmake -DWITH_MPI=OFF .." as suggested by you earlier.
cmake-log1.txt

I am attaching herewith the output of cmake.
Asesh

@tegonzalo
Copy link
Contributor

Hi Asesh,

Apologies for the silence, I was on holidays. Thanks Pat for taking over.

I am attaching herewith the output of cmake.

Your cmake output seems fine. MPI is correctly disabled and everything else seems to have configured correctly. Please build now gambit and the relevant backends like this

make micromegas_MDMSM
make calchep
make -j<n> gambit

Once that is finished, if there are no errors, then run a simple test run, first using the spartan yaml file, i.e.

./gambit -f yaml_files/spartan.yaml

and if that works, then run the MDMSM.yaml file.

After every scan, a corresponding folder should have been created in the runs directory, e.g. runs/spartan, with logs, samples and scanner info. If any of the scans above fail but you still have the logs in that directory, please send them so we can have a look at it.

Cheers,
Tomas

@aseshkdatta
Copy link
Author

Hi Tomas,

Thanks for the mail.

It seems the "runs" folder itself is not being created even as the Gambit run
ends up in seg fault. Please see below.

asesh@albert:~/Packages/gambit_2.0$ ./gambit -f yaml_files/spartan.yaml

GAMBIT 2.0.0
http://gambit.hepforge.org

Segmentation fault (core dumped)
asesh@albert:~/Packages/gambit_2.0$

Please let me know how to proceed under the circumstances.

Cheers.
Asesh

@tegonzalo
Copy link
Contributor

That is really unusal. And you said that there is nothing either on scratch/run_time, right? Could you run the following commands and tell me what you see?

First just do

./gambit

If everything is fine, that should print usage instructions, but it requires creating the scratch files, so you will probably see the segfault too. After that also do

./gambit modules

which should give you the list of built modules, but again, may fail if the problem is due to the scratch files.

Let me know what you see on both cases.

Thanks for you help!

Tomas

@aseshkdatta
Copy link
Author

You are welcome, Tomas. The exercise is also very much helpful to me.
Here is what I see.

asesh@albert:/Packages/gambit_2.0$
asesh@albert:
/Packages/gambit_2.0$ ./gambit

GAMBIT 2.0.0
http://gambit.hepforge.org

Segmentation fault (core dumped)
asesh@albert:/Packages/gambit_2.0$
asesh@albert:
/Packages/gambit_2.0$
asesh@albert:~/Packages/gambit_2.0$ ./gambit modules

GAMBIT 2.0.0
http://gambit.hepforge.org

Segmentation fault (core dumped)
asesh@albert:~/Packages/gambit_2.0$

Cheers.
Asesh

@aseshkdatta
Copy link
Author

Right; "scratch/run_time" folder has nothing created inside it!

Asesh

@aseshkdatta
Copy link
Author

aseshkdatta commented Aug 5, 2021 via email

@tegonzalo
Copy link
Contributor

Hi Asesh,

I've been trying to figure out what could be happening to produce a segfault that early on the run, because it doesn't make sense as this has never happened before. Could you show me the backtrace of the segfault? Launch the debugger

gdb ./gambit

and then inside the debugger just run

r modules

once you get the segfault then write

backtrace

and show me what you get.

@tegonzalo
Copy link
Contributor

Also run

r models

just to make sure it has nothing to do with the yaml reader

Thanks again for your help in resolving this.

@aseshkdatta
Copy link
Author

Thanks, Tomas. Here I am attaching the output.

Asesh
backtrace-1

@aseshkdatta
Copy link
Author

aseshkdatta commented Aug 5, 2021 via email

@tegonzalo
Copy link
Contributor

tegonzalo commented Aug 5, 2021

Ah, yes, it is definitely a Mathematica issue. Fortunately to run gambit you don't need Mathematica, you only need it to run gum. So when compiling gambit you could do

cmake -Ditch="Mathematica" ..

and then you can recompile with make and try running it. With this you will (hopefully) get gambit working.

@aseshkdatta
Copy link
Author

aseshkdatta commented Aug 5, 2021 via email

@aseshkdatta
Copy link
Author

aseshkdatta commented Aug 5, 2021 via email

@tegonzalo
Copy link
Contributor

I think it's because you are changing the location of the library in CMakeCache.txt. That file is automatically generated when you run cmake, so any changes to that file will be ignored. But your suggestion of using the static library should work anyway, you just need to pass it as a cmake flag. So when building gum run

cmake -DMathematica_WSTP_LIBRARY=<path_to_your_ibWSTP64i4.a> ..

and then make gum with make.

@aseshkdatta
Copy link
Author

Hi Tomas,

Thanks for your observations.

Using "cmake -Ditch="Mathematica" .. " during gambit build
and employing

cmake -DMathematica_WSTP_LIBRARY=/usr/local/Wolfram/Mathematica/10.0/SystemFiles/Links/WSTP/DeveloperKit/Linux-x86-64/CompilerAdditions/libWSTP64i4.a ..

during gum build, I end up with the following error message of
mpirun of gambit. Segfault is not appearing though any more.

I followed the directives given in the bottom of the right column on
page 19 of arXiv:2107.00030 for the MDMSM model and hence only
built "micromegas_MDMSM", "calchep", "gamlike" and "ddcalc".
The message is talking about DarkSUSY.

Could you please shed some light.

Thanks.
Asesh
gambit-MDMSM-log.txt

@tegonzalo
Copy link
Contributor

That seems to imply that you need to also build darksusy. I don't think that is intended, as you already have micromegas. Let me investigate further. But if you want to continue with the testing, just make

make darksusy_generic_wimp

@pstoecker
Copy link
Member

According to the version of MDMSM_Tute.yaml that made it to version 2.0.0 of gambit, it is intended as the function SimYieldTable_DarkSUSY is chosen to fulfil the capability SimYieldTable for the gamma ray yields that are required by lnL_FermiLATdwarfs.

It might be a typo and it should be SimYieldTable_MicrOmegas. In that case there should be no need for darksusy as far as I see it.

@tegonzalo
Copy link
Contributor

Yes, I just figure out that myself too. @aseshkdatta please just change SimYieldTable_DarkSUSY for SimYieldTable_MicrOmegas in MDMSM_Tute.yaml and try with that.

I will modify that on master so that it gets out in the next release

@aseshkdatta
Copy link
Author

Hi, it's giving an error saying

Error reading Inifile "yaml_files/MDMSM_Tute.yaml"! Please check that file exist!
(yaml-cpp error: yaml-cpp: error at line 127, column 17: illegal map value )

Attaching the full message herewith.

Asesh
gambit-run-yaml-file-with-SimYieldTable_MicrOmegas.txt

@tegonzalo
Copy link
Contributor

Can you send the yaml file so that I can see where the error is?

@aseshkdatta
Copy link
Author

Here is the yaml file. I added ".txt" extension so that it can be uploaded to this page.

Asesh

MDMSM_Tute.yaml.txt

@tegonzalo
Copy link
Contributor

Ok, in line 127 you need to make sure that the indentations match. So function should just fall below capability, but just using spaces, not tabs.

@aseshkdatta
Copy link
Author

Thanks for pointing that out. I corrected that and ran gambit again.
The error message looks like the following. The detailed one is attached herewith.

Asesh


/home/asesh/Packages/gambit_2.0/Backends/installed/calchep/3.6.27/sbin/newProcess: 87: exit: Illegal number: -1
Likelihood contribution from DarkBit::lnL_oh2_upperlimit: -196.799
Likelihood contribution from DarkBit::LUX_2016_GetLogLikelihood: -1.46732
Likelihood contribution from DarkBit::XENON1T_2018_GetLogLikelihood: -3.64953
PROCESS: ~chi,~chi -> mu+,mu-
This particle is absent in the model
Can not compile ~chi,~chi -> mu+,mu-
[albert:09020] *** Process received signal ***
[albert:09020] Signal: Segmentation fault (11)
[albert:09020] Signal code: Address not mapped (1)
[albert:09020] Failing at address: 0x28
[albert:09020] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f184caa7980]
[albert:09020] [ 1] /home/asesh/Packages/gambit_2.0/Backends/installed/calchep/3.6.27/lib/libcalchep.so(passParameters+0x73)[0x7f17bd6b1fdf]
[albert:09020] [ 2] gambit(+0x15ecb18)[0x55f206ebbb18]
gambit-run-corrected-yaml-file-SimYieldTable_MicrOmegas.txt

@tegonzalo
Copy link
Contributor

It looks like something is wrong with calchep. Please try nuking it and rebuilding it again

make nuke-calchep
make calchep

@aseshkdatta
Copy link
Author

Thanks, Tomas. Do I need to make gambit after that?

Asesh

@tegonzalo
Copy link
Contributor

No, just calchep this time

@aseshkdatta
Copy link
Author

Here is the error message:


theta12: 0.58376
theta13: 0.15495
theta23: 0.76958

nuclear_params_sigmas_sigmal:
deltad: -0.427
deltas: -0.085
deltau: 0.842
sigmal: 58
sigmas: 43

Raised at: line 74 in function void Gambit::BackendIniBit::CalcHEP_3_6_27_init() of /home/asesh/Packages/gambit_2.0/Backends/src/frontends/CalcHEP_3_6_27.cpp.
rank 1: FinalizeWithTimeout failed to sync for clean MPI shutdown, calling MPI_Abort...
rank 1: Issuing MPI_Abort command, attempting to terminate all processes...

MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

rank 2: FinalizeWithTimeout failed to sync for clean MPI shutdown, calling MPI_Abort...
rank 2: Issuing MPI_Abort command, attempting to terminate all processes...
[albert:11423] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[albert:11423] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

@tegonzalo
Copy link
Contributor

Can you send me the default log? (runs/MDMSM/logs/default.log)

@aseshkdatta
Copy link
Author

All 4 of them? Or one will suffice?

Asesh

@tegonzalo
Copy link
Contributor

From your message above, it looks like rank 1 was the one that found the error, so send me rank 1

@aseshkdatta
Copy link
Author

Here it is....
default.log_1.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants