Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actmon crashes the moment any action is dispatched. #2829

Open
cenkoloji opened this issue Nov 20, 2024 · 4 comments
Open

Actmon crashes the moment any action is dispatched. #2829

cenkoloji opened this issue Nov 20, 2024 · 4 comments
Labels
bug An unexpected problem or unintended behavior tool/actions Relates to the action tools (actions, actmon, actlog)

Comments

@cenkoloji
Copy link
Contributor

Affiliation
EPFL/Swiss Plasma Center

Version(s) Affected/Platform(s)
RHEL9 - mdsplus-alpha 7.148.2
RHEL7 - mdsplus-stable 7.142.81

Installation Method(s)
yum/dnf

Describe the bug
Actmon crashes the moment any dispatch command is run.

To Reproduce

  • On 4 separate terminals, start 3 servers and actmon (Change ports if these ones are not available)
mdsip -p 9999 -s -h ./mdsip.hosts  -c 9   # Dispatch server
mdsip -p 9998 -s -h ./mdsip.hosts  -c 9   # Action server
mdsip -p 9997 -s -h ./mdsip.hosts  -c 9   # Monitor server
actmon -monitor localhost:9997

Run following program which will create a tree with a single action node.

import os
import time
import MDSplus

# Create tree path dir
user = os.environ['USER']
distest_path="/tmp/{}/distest".format(user)
os.system('mkdir -p {}'.format(distest_path))
os.environ['distest_path'] = distest_path

# Set environment vars for tree paths in the servers
for port in ["9997","9998","9999"]:
    c = MDSplus.Connection('localhost:' + port)
    c.get('setenv("distest_path=/tmp/{}/distest/")'.format(user))
    c.disconnect()

# Create a single action node
template = 'Build_Action(Build_Dispatch(2, "{}", "INIT", 10, *), {}, *, *, *)'
server = "localhost:9998"
function_expr = 'write(*,"Test action")'
expr = template.format(server, function_expr)

# Create main tree
t = MDSplus.Tree('distest',1,'NEW')
node = t.addNode('ACT01',usage='ACTION')
node.record = t.tdiCompile(expr)
t.write()
t.close()

# Run actions
MDSplus.tcl('dispatch /command /server=localhost:9999 set tree distest /shot=1')
time.sleep(1)
MDSplus.tcl('dispatch /command /server=localhost:9999 dispatch/build/monitor=localhost:9997')
time.sleep(1)
MDSplus.tcl('dispatch /command /server=localhost:9999 dispatch/phase/monitor=localhost:9997 init')

Expected behavior
Actmon doesn't crash
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Output of gdb --args actmon -monitor localhost:9997 , then run, then backtrace after the crash

(gdb) backtrace
#0  0x00007ffff76338aa in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007ffff76b1d43 in strdup () from /lib64/libc.so.6
#2  0x0000000000403173 in parseMsg (event=0x7fffe8000bd0, msg=0x1 <error: Cannot access memory at address 0x1>) at /opt/jenkins/workspace/MDSplus_alpha/rhel9/actions/actlogp.h:111
#3  MessageAst (dummy=<optimized out>, reply=0x1 <error: Cannot access memory at address 0x1>) at /opt/jenkins/workspace/MDSplus_alpha/rhel9/actions/actlogp.h:199
#4  0x00007ffff7d7d8b8 in Job_callback_done (j=0x4951a0, status=<optimized out>, remove=remove@entry=1) at /opt/jenkins/workspace/MDSplus_alpha/rhel9/servershr/Job.h:125
#5  0x00007ffff7d7e142 in Client_do_message (c=c@entry=0x495180, fdactive=fdactive@entry=0x7ffff6d6bd80) at /opt/jenkins/workspace/MDSplus_alpha/rhel9/servershr/Client.h:201
#6  0x00007ffff7d7ea7f in receiver_thread (sockptr=<optimized out>) at /opt/jenkins/workspace/MDSplus_alpha/rhel9/servershr/ServerSendMessage.c:443
#7  0x00007ffff769f802 in start_thread () from /lib64/libc.so.6
#8  0x00007ffff763f450 in clone3 () from /lib64/libc.so.6
  • actlog also crashes and returns a very similar backtrace
  • If needed I can also paste stdout/stderr of each action server.

Is this related to #2689 ?

@cenkoloji cenkoloji added the bug An unexpected problem or unintended behavior label Nov 20, 2024
@zack-vii
Copy link
Contributor

looks like the actmon callback does not expect msg to be 1 but it is. I think the msg is supposed to contains information about what needs to be monitored so it should be an address. somewhere in the it seems to pass a de-referenced char*. .. I personally used actmon a lot about .. feels like 5 years ago .. ther is a good change that one has not been touched .. so the change may be somewhere in the sender half.

@merlea
Copy link
Contributor

merlea commented Nov 22, 2024

I might be missing something obvious but it looks to me like MessageAst is registered as the callback_done function which is usually called via:

callback_done(callback_param);

while its signature is

static void MessageAst(void *dummy __attribute__((unused)), char *reply)

so the second argument is never specified when calling callback_done and it seems to have the value of 0x1 which indeed is not a valid pointer.

However it seems in some cases, the handling is done through events and the event_ast function where there MessageAst is called with 2 arguments.

@zack-vii
Copy link
Contributor

It seems so: #2689 (comment),
Ones just with callback parameters and once with callback parameters and a message.
It may be to decide which one can be adjusted without breaking user code. or if none satisfies that criteria which one offers the most flexibility. Possibly one could always call the callback with params + message but message being null if not used. this should do the least harm. Not sure if it is always safe to pass extra NULL arguments to a method.

@merlea
Copy link
Contributor

merlea commented Nov 25, 2024

I submitted a possible fix in #2835 using 2 arguments in all cases.

@mwinkel-dev mwinkel-dev added the tool/actions Relates to the action tools (actions, actmon, actlog) label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior tool/actions Relates to the action tools (actions, actmon, actlog)
Projects
None yet
Development

No branches or pull requests

4 participants