Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscriptions break every time a new configuration is activated via Twincat XAE Shell and system gets restarted #169

Open
mareksuscak opened this issue Dec 18, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@mareksuscak
Copy link

mareksuscak commented Dec 18, 2024

This has been happening since version 1 but is still happening in version 2 of ads-client. Whenever we make changes via Twincat XAE Shell and activate the new configuration in the target system. The system restarts and all subscriptions stop working - the system no longer syncs the state for us. The connection is apparently preserved and when we send individual read or write commands to the PLC, it still works as expected but subscriptions won't work until we restart our service. I have enabled logging (level 1) and am including some logs below:

  1. This is what happens when I activate the new configuration. Notice the timeout but no detection of lost connection (which is supposed to recreate all subscriptions.
    CleanShot 2024-12-18 at 01 59 13

  2. Without restarting the service, we can still send other commands successfully.
    CleanShot 2024-12-18 at 01 59 51

  3. When we shut down the service, which triggers a cleanup that tries to remove all existing subscriptions and disconnect from the PLC gracefully, we see errors generated by the ads-client
    CleanShot 2024-12-18 at 02 00 28

We use Twincat v3

@mareksuscak
Copy link
Author

Setting connectionDownDelay to 0 seems to address this issue as that triggers connectionLost event and recreates all subscriptions as it should. It seems to me that the issue here is that the current implementation doesn't detect system restarts correctly and doesn't restore subscriptions when the system is restarted too fast. The default settings also make this problem worse as connectionDownDelay's default is 5s and timeoutDelay is set to 2s by default so the system needs to be down for at least 7s for the connection to be flagged as lost and all subscriptions reinitialized. I feel like any command timeout may need to be followed by subscription reinitialization and not just connection loss.

@jisotalo
Copy link
Owner

Thanks for reporting!

I need to check this out. Have you tested if there are similar issues when setting TwinCAT to config mode and back to run? Just for comparison.

@jisotalo jisotalo added the bug Something isn't working label Dec 18, 2024
@mareksuscak
Copy link
Author

I have not tried that. Would it make any difference?

@mareksuscak
Copy link
Author

@jisotalo happy new year! Did you make any progress on this by any chance or have any plans to do so?

@jisotalo
Copy link
Owner

@mareksuscak happy new year!

Unfortunately I haven't yet had time to work on this. But eventually I will.

@mareksuscak
Copy link
Author

mareksuscak commented Jan 13, 2025

WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnecting failed. Keeping trying in the background every 1000 ms...
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnecting failed. Keeping trying in the background every 1000 ms...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnecting failed. Keeping trying in the background every 1000 ms...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: An ADS notification with an unknown handle 1239 was received from 192.168.250.10.1.1:851 and automatically deleted
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var1
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: An ADS notification with an unknown handle 1254 was received from 192.168.250.10.1.1:851 and automatically deleted
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var2
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: An ADS notification with an unknown handle 1316 was received from 192.168.250.10.1.1:851 and automatically deleted
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var3
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: An ADS notification with an unknown handle 1343 was received from 192.168.250.10.1.1:851 and automatically deleted
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var4
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: An ADS notification with an unknown handle 1391 was received from 192.168.250.10.1.1:851 and automatically deleted
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var5
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected and all subscriptions were restored!
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Connection to target was lost. Trying to reconnect automatically...
WARNING: Reconnected but failed to restore following subscriptions:
 - GVL_Remote.Var6
 - GVL_Remote.Var7
 - GVL_Remote.Var8
 - GVL_Remote.Var9
 - GVL_Remote.Var10
 - GVL_Remote.Var11
 - GVL_Remote.Var12
 - GVL_Remote.Var13
 - GVL_Remote.Var14
 - GVL_Remote.Var15
 - GVL_Remote.Var16
 - GVL_Remote.Var17
 - GVL_Remote.Var18
 - GVL_Remote.Var19
 - GVL_Remote.Var20
 - GVL_Remote.Var21
WARNING: An ADS notification with an unknown handle 1444 was received from 192.168.250.10.1.1:851 and automatically deleted

I managed to collect some logs that I hope will be helpful in troubleshooting. It turns out that the reconnects are working but individual subscriptions break as we experience more reconnects over time - even without PLC restarts which is very odd. See the log above.

Would it be possible to re-create the subscription that failed to restore? If so, should we default to that behavior? It feels counterintuitive that individual subscriptions break over time. It made it hard for us to determine what was happening because random things would just stop working at a different time. What would be a common reason for a subscription to break when there's no restart of the PLC? Bad packets or something?

@jisotalo
Copy link
Owner

jisotalo commented Jan 18, 2025

As you described, it seems that the client will not detect that things have changed in the target system when activating configuration.

Changing connectionCheckInterval setting to 100 helped, as it's then polling the system state more often and detecting that there are changes.

I didn't manage to find out a way to detect this change. It needs further research.

It might be hard to fix this as activating configuration is such as "hard" operation.

@jisotalo
Copy link
Owner

Good news!

I had a feeling that there must a solution for this, so I spent couple of hours going through the TwinCAT.Ads source code.

I finally found a way. It's possible to read TwinCAT system state (extended version of our readTcSystemState()) that includes the following data:

{
  adsState: { adsState: 5, adsStateStr: 'Run', deviceState: 1 },
  restartIndex: 114,
  version: 3,
  revision: 1,
  build: 4024,
  platform: 1,
  osType: 3,
  reserved: <Buffer 00 00 00 00>
}

The restartIndex increases always when TwinCAT system is restarted. So after hitting the Activate configuration, the data becomes this:

{
  adsState: { adsState: 5, adsStateStr: 'Run', deviceState: 1 },
  restartIndex: 118,
  version: 3,
  revision: 1,
  build: 4024,
  platform: 1,
  osType: 3,
  reserved: <Buffer 00 00 00 00>
}

In the first tests things look good:

Image

I will need to work on this (we still need to support older devices as well) + add testing. Meanwhile you could test the connectionCheckInterval as mentioned in previous comment.

@mareksuscak
Copy link
Author

Thanks @jisotalo! I'll test the proposed change. However, I feel like there's another bug where we gradually lose all the subscriptions silently and the only clue are the logs that are printed to stdout. I am not sure what's causing this as I do not have a solid mental model of the how subscriptions work from the PLC perspective but it seems that when trying to restore backed up subscriptions upon a brief connection loss, some subscriptions become unrecoverable and just throw an error that isn't visible in the log entry. I'll continue troubleshooting to provide as much info as possible but was wondering if you had any clue what might be happening? I wonder if the PLC cleans up the unresponsive subscription so if the connection loss is caused by a network partition, the subscription cleaned up on the PLC side when a few updates fail and then it becomes unrecoverable?

@jisotalo
Copy link
Owner

Hmm, doesn't sound good. Haven't issued that myself.

Please set the debugging mode to 2 to get better logs:
https://github.com/jisotalo/ads-client?tab=readme-ov-file#debugging

@mareksuscak
Copy link
Author

mareksuscak commented Jan 24, 2025

ads-client unsubscribe(): Unsubscribing failed: ClientError: sendAdsCommand(): Response with ADS error received (error
 1812 - Notification handle is invalid) at Object.responseCallback (C:\Service\plc-conn-test\node_modules\ads-cli
ent\dist\ads-client.js:3149:39) at Client.onAdsCommandReceived (C:\Service\plc-conn-test\node_modules\ads-client\
dist\ads-client.js:1545:29) at Client.onAmsTcpPacketReceived (C:\Service\plc-conn-test\node_modules\ads-client\di
st\ads-client.js:1423:26) at Client.parseAmsTcpPacket (C:\Service\plc-conn-test\node_modules\ads-client\dist\ads-
client.js:1180:14)     at process.processImmediate (node:internal/timers:483:21) { trace: [ 'sendAdsCommand(): Response
with ADS error received (error 1812 - Notification handle is invalid)' ], adsError: { errorCode: 1812, errorStr: 'Notifi
cation handle is invalid' } } +15ms

Did that and while I'm not sure if this is related but we've seen this error pop up a bunch of times. It seems that handles are breaking for some reason, not sure if this would be client-side or on the PLC side as I don't have a full understanding of the internals. Does this help?

@jisotalo
Copy link
Owner

@mareksuscak please set the debugging level to get better logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants