Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: mdsip services zombie processes #2645

Open
wants to merge 2 commits into
base: alpha
Choose a base branch
from

Conversation

santorofer
Copy link
Contributor

When executing a shot cycle using D-Tacq's digitizers, sometimes mdsip processes become zombie processes. This, in turn, will break the shot cycle for the user.

Those mdsip process will contain sockets that were kept open, even when no data was being received.

Solution: the while loop that controls the data being received from the socket (in the device driver) has now a new logic, that includes using the value of the running node (false or true) to exit the loop correctly when executing the stop method.

For this to work, the user needs to be sure it runs the stop method for the device.

@WhoBrokeTheBuild WhoBrokeTheBuild added bug An unexpected problem or unintended behavior devices Relates to devices (c devices, tdi devices, python devices, java devices, device_support, etc) labels Apr 29, 2024
Copy link
Contributor

@zack-vii zack-vii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you intended to fix both devices the same way? Since it is a common problem with streaming sockets I had to deal with the very problem. I would solve is like this:

# reduce delay after stop
s.settimeout(1) 
while toread and self.running:
    try:
        nbytes = s.recv_into(
            view, min(self.io_buffer_size, toread))
    except socket.timeout:
         if self.running:
             continue
         self.running = False
         break
    if nbytes == 0:
        break                            
    if first:
        self.trig_time = time.time()
        first = False
    view = view[nbytes:]  # slicing views is cheap
    toread -= nbytes
  • If a blocking socket should returns 0 bytes, it has been closed and you would stop the stream
  • If running is false you would stop the stream
  • If the socket times out you would not discard partial buffers but check if it is still running

@santorofer
Copy link
Contributor Author

santorofer commented Jun 10, 2024

Yes, that is correct, we basically fixed two drivers for modules 435 and 423. Now, I think our change to the same part of the code are the same as your suggestion, am I right? Because we also check for a socket.timeout in the try when reading the memoryview(), which is where the check for the nbytes is. I must be missing something.

@@ -357,9 +357,14 @@ def run(self):
while toread:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line seems different from line 299 of the other device while toread and self.running:.
given that the complexity of the run method is quite high it would probably be best to share one implementation, where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior devices Relates to devices (c devices, tdi devices, python devices, java devices, device_support, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants