-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace validation test #204
base: main
Are you sure you want to change the base?
Conversation
Please fix the CI failures to ensure timely review. Also it would be best to CC everyone that was involved with the previous PR |
3b47a57
to
33c203e
Compare
cc @serathius @ahrtr , here is the trace validation test with some additional improvements in TLA+ spec. The idea here is to set up a sample application based on raft, running with various faults injected. The produced traces will then be collected and validated by TLA+ trace validation. Just run |
1. Improve etcdraft spec to provide hooks for model-checking and trace validation. 2. Add new sample application based on raft and capable of injecting various faults. 3. Add new test to generate random traces (with faults) and run trace validation on them. 4. Fix some issues in etcdraft spec. Signed-off-by: Joshua Zhang <joshuazh@microsoft.com>
33c203e
to
9cee7cd
Compare
CCing everyone involved in #113 Your input on this PR would be a huge help! The successful integration of TLA+ has the potential to significantly improve the reliability of Raft and etcd. I'd be grateful for any insights you can share. |
@serathius: GitHub didn't allow me to request PR reviews from the following users: andreo, felipecrv, xiang90, zeu5, lemmy. Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
A few changes to the TLA spec are unclear for me. I will summarise my understanding and list the concerns below. SummaryTwo new variables are added In the code, a new trace event Clarifications
Minor commentsThis is not related to the changes in this PR but unclear to me - when a node is stopped (as a failure introduced |
Thanks @zeu5 for the summary. Please allow me to clarify some of my thoughts when I made this change. Please correct me if I made any mistake here.
To my understanding,
The TLA+ spec focuses on state transitions. A non-functional node implies no state changes or message exchange, effectively modeled by behaviors excluding actions related to that node. The state space exploration will include behaviors representing such node downtime scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification @joshuazh-x I have requested some additional minor changes.
For further updates, I suppose it would also be good to include as part of the merge workflow some model checking on the updated TLA models to ensure the correctness of the updates. @serathius please consider this.
1. Remove type annotations which are not needed here. 2. Add more comments for the actions. 3. Fix simulation argument in valide-model.sh. 4. Let validate-model.sh use all cores to improve performance. Signed-off-by: Joshua Zhang <joshuazh@microsoft.com>
1. Decouple SpecAction and relevant definitions from the core etcd raft spec. Put them into a middle layer spec etcdraft_control.tla 2. Fix racing issue in trace_validation_test.go. Signed-off-by: Joshua Zhang <joshuazh@microsoft.com>
I'll keep each fix in a separated commit during the review. If everything goes well in the end, I'll combine them into one before final merge. |
/cc @serathius @ahrtr @zeu5 @lemmy any feedbacks for this pr? |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: joshuazh-x, lemmy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please try not to submit huge PR. Please try to breakdown this PR as much as you can,
|
Following up on previous discussion with @serathius and @ahrtr, I make some improvements to the Raft trace validation test:
make trace-validation
command.These changes aim to streamline the trace validation process and make it more efficient to automate testing.
Please let me know if you have any questions or feedback.