-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: SDK47 Distribution Panic #16940
Comments
@alexanderbez @tac0turtle just tagging you as I think you might be familiar with this part of the codebase, but please let me know if there's anyone else I should reach out to instead! I really appreciate your help here! |
I was checking on this and found that both of those accounts are delegating to Here the last successful tx + first failed tx with this error (14 days ago): https://bigdipper.live/stride/transactions/8EB8195D344435B287BF57A8F0BB7476089F0446C5CB1CE005E4F88BA4DFE939 |
@shellvish it would be helpful to log everything inside |
I would review the validator's slashing history as @facundomedica pointed out. The delta is small, 0.00002579424357, so it could be a rounding issue in stake or something else. This particular area of the module's codebase is very...duct-tapey, so we'll need more insight into the validator's history prior to understanding the root issue. |
@shellvish friendly ping :) |
@alexanderbez @facundomedica Apologies for the delay here! We were a bit sidetracked with some post-ICS cleanup and took a bit longer to dive into this than I would have hoped.. Should be more responsive going forward! I dug into this a bit more and I think I've narrowed down the issue. Here are logs from the validator in question (in
This gives a final Token Amount of It seems like the
I'm still digging into how this "ghost slash" happened, but I'm having a bit of a hard time investigating. I'm curious if either of you might have any theories here? In terms of solving the issue, we think there are two possible solutions
I think (2) is the optimal route, but it might take some time to do this change correctly without introducing any new bugs (although would fully defer to you both here!). (1) might make sense as a short-term bandaid solution though, in order to allow delegators to this validator to have access to their funds again (right now, any tx or query touching this validator throws a panic, so users can't undelegate their STRD). Let me know what you think :) And I truly appreciate your help here! |
Interesting. So is it possible this "ghost" slash is something that Numia just didn't catch? Are the two known slashes from the current version of Stride? Perhaps the ghost one was from a previous version (i.e. pre-upgrade)? In any case, how does (2) really solve it? Could you describe that solution in more detail pls? |
Very possible they didn't catch it! But the logs above show that I'm not quite sure what (2) would entail, but I'm thinking something like: in an upgrade handler, we manually modify the Token -> Shares conversion to reflect that there were 2 slashes and not 3. This could either be increasing the number of tokens that the validator has staked or decreasing the number of shares in the validator. To be clear though, I'm purely speculating, as I'm quite unfamiliar with this part of the codebase! |
Yes, that's a good point. The only way that could happen is if there was a traditional export/import upgrade where one record might've been missed or perhaps an upgrade handler that modified these records. In any case, it is a bit worrisome. This part of the codebase is probably the most sensitive as well (primarily due to |
Hm, it's possible, but as far as I know, I don't think there was an export/import upgrade or an upgrade handler that modified these records? I just checked the v10 and v11 upgrades, and I don't think they modified the distribution records. Do you know which of those two approaches to solve the root cause is preferable, e.g. should we increase the number of tokens or decrease the number of shares? Also, for the short-term, I think we'll raise the threshold of the invariant check above to have a margin of error of ~200ustrd. I just want to confirm with you that you think this is okay / isn't likely to raise any alarm bells? Also happy to chat through this live if that's easier for you! |
I would probably keep the tokens untouched, and modify the shares to have parity with the tokens. Yeah, overall this is pretty weird. Typically, what we'd do to debug this is log all related slashing/distribution actions related to a specific validator and see what we can infer from that. Given that you're already at v11, this might be super tedious. So debugging this is gonna be a challenge. |
Stride added a fix for the specific validator Stride-Labs@1bc120b |
closing for now, i believe this could be related to custom logic in a fork |
Hello everyone, I would like to reopen this issue. The bug has not been corrected and also bitsong is now affected by this problem. It seems that there are "ghost slash" that have not been taken into consideration. |
Summary of Bug
Some addresses on Stride have reported an issue with claiming staking rewards and redelegating tokens. These users were able to claim staking rewards successfully prior to Stride's last upgrade, and the upgrade effectively only upgraded from SDK 46 to SDK 47. We are investigating on our end as well, but I suspect there might be a bug with SDK 47 (or, perhaps, a very precise way that state should be migrated from SDK 46 to 47).
Two examples of users are stride1ct5jdjxwcs2whefy4c4dpuc0t3vuv7ksrnh4j3 and stride1lf87nq0pxw7qzcvvc65evnntd7za6mxw8evpcx.
Importantly, both of these users delegated to a validator that got downtime slashed, restarted their validator, and then started producing blocks again.
Ways to Observe the Issue
Either of the links above link to BigDipper and show “0” staking rewards despite having many tokens staked.
In addition, this tx shows the error msg when a user tries to claim staking rewards. (TLDR is
calculated final stake for delegator stride1ct5... greater than current stake, final stake: 0.257945648504322936 current stake: 0.257919854260751778
).Lastly, this cmd line query
strided q distribution rewards stride1lf87nq0pxw7qzcvvc65evnntd7za6mxw8evpcx
also returns a similar errorError: rpc error: code = Unknown desc = calculated final stake for delegator stride1lf87nq0pxw7qzcvvc65evnntd7za6mxw8evpcx greater than current stake
.My initial guess as to what’s going on is that SDK 47 has some sort of slashing bug, and users are either getting slashed more than they should be, or there’s some accounting bug in the slashing code. It’s possible this error only appears with non-staking based tokens (e.g. native tokens), but extremely unclear right now.
Version
This is occurring on Stride version
v11.0.0
which uses Cosmos SDKv0.47.3
.Steps to Reproduce
This tx will show a transaction that is failing.
strided q distribution rewards stride1lf87nq0pxw7qzcvvc65evnntd7za6mxw8evpcx --node https://stride-rpc.polkachu.com:443 --chain-id stride-1
will also display the same issue.Please let me know if I can provide any more detail, or if there's anything I can do to help debug. Thanks so much for the help, it's very much appreciated.
The text was updated successfully, but these errors were encountered: