Fix for Health State Handling in Storage Health Check #774
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue Summary
The previous implementation of the storage health check had multiple issues in two separate files:
bin/shield-pipe
, the storage plugin test (test-store
) failed on the first attempt due to a lack of delay between asset creation and verification, as the storage backend needed time to become consistent. A delay (sleep 5
) was added in this file specifically to resolve that issue.core/scheduler/chore.go
, the health check logic was prematurely marking the storage system as unhealthy after the first failure, even when subsequent retries were successful. This was due tostore.Healthy
being set tofalse
after the first failure without correction on success.Symptoms Observed
Fix Implemented
The corrections were made as follows:
bin/shield-pipe
: A delay (sleep 5
) was added between asset creation and retrieval to ensure storage consistency.core/scheduler/chore.go
:Validation and Testing
NoSuchKey
error was triggered, followed by successful retries.Logs (Before and After Fix)
Before Fix:
After Fix:
Impact and Benefits