Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: dispatching any action causes actmon/actlog to crash with SIGSEGV #2835

Open
wants to merge 3 commits into
base: alpha
Choose a base branch
from

Conversation

merlea
Copy link
Contributor

@merlea merlea commented Nov 25, 2024

  • Rename Svr -> Srv
  • Always use 2 arguments for callback_done signature

A possible fix for #2829 basically setting all the job_done, action_done callbacks to have a signature void (*ast)(void *astprm, char *msg) and setting msg to NULL when applicable. Using this fix with the case described in #2829 leads to no error.

* Rename Svr -> Srv

* Always use 2 arguments for callback_done signature
@zack-vii
Copy link
Contributor

I think we need to add a test maybe using actlog (terminal version of actmon).
What i found is that actmon/actlog dont handle null either .. further they need to be protected in case the action monitor server (mdsip) becomes offline otherwise it may generate a stack overflow.

@zack-vii
Copy link
Contributor

zack-vii commented Nov 25, 2024

If the ACTION_MONITOR server becomes unreachabel/offline The CheckIN will eventually call MessageAst with a NULL for message. This will cause if (!parseMsg(reply, event)) to skip and call CheckIN recursively until the program stack overflows.

We need to be able to detect this from the callback msg somehow, possibly by returning a special message instead. alternatively the CheckIN must be called from a async outside of MessageAst with some wait time to prevent a runaway process at 100% as well as the stack overflow due to recursion.

@merlea
Copy link
Contributor Author

merlea commented Nov 26, 2024

OK. I removed the recursive calls to CheckIn since it already had a loop with pauses.

I also put some stubs for tests of actlog based on the original issue. We could compare the output of actlog with the expected one. I am not too familiar with the tests setup so let me know how best to integrate this.

@joshStillerman
Copy link
Contributor

retest this please

@mwinkel-dev mwinkel-dev added the tool/actions Relates to the action tools (actions, actmon, actlog) label Dec 23, 2024
@WhoBrokeTheBuild
Copy link
Member

By changing the definition of the ast parameter to include arguments, you've triggered some compile errors. You can run a build/test with:

./deploy/build.sh --os=bootstrap
./deploy/build.sh --os=ubuntu24 --test

And here is the error from the Jenkins build:

gcc -Wall -Wextra -Werror   -g -DWITH_DEBUG_SYMBOLS -O0 -fvisibility=hidden  -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fpic -shared-libgcc -fsigned-char -fno-strict-aliasing   -DLIBPREFIX=MDSplus -pthread -I/workspace/tests/64/_include -I/opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/_include -I/workspace/tests/64/include -I/opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/include  -m64 -c -o tcl_dispatch.o /opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/tcl/tcl_dispatch.c

/opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/tcl/tcl_dispatch.c: In function 'TclDispatch_command':

/opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/tcl/tcl_dispatch.c:510:36: error: passing argument 5 of 'ServerDispatchCommand' from incompatible pointer type [-Werror=incompatible-pointer-types]

                                    CommandDone, c.cmd, iostatusp, NULL, 0);

                                    ^~~~~~~~~~~

In file included from /opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/tcl/tcl_dispatch.c:35:

/opt/jenkins/workspace/MDSplus_PR-2835/debian10-64/include/servershr.h:41:63: note: expected 'void (*)(void *, char *)' but argument is of type 'void (*)(DispatchedCommand *, char *)' {aka 'void (*)(struct <anonymous> *, char *)'}

                                         char *command, void (*ast)(void *, char *),

                                                        ~~~~~~~^~~~~~~~~~~~~~~~~~~~

If you include a fix for that, we will rerun the Jenkins build. Otherwise, we will look into this when we have time.

Thank you for investigating this so thoroughly, we greatly appreciate your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tool/actions Relates to the action tools (actions, actmon, actlog)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants