87 inconsistent ha peer upgrade order due to thread limitations #88
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces modifications to the firewall upgrade process, specifically targeting the handling of High Availability (HA) synchronization checks. Previously, strict HA synchronization checks could potentially halt the upgrade process if HA peers were not perfectly synchronized, especially in scenarios where the number of targeted firewalls exceeded the available thread count. This change aims to enhance the flexibility and resilience of the upgrade process by relaxing the HA synchronization requirements.
Changes Made
HA Synchronization Check Adjustment: The default behavior of the ha_sync_check_firewall function has been updated to not enforce strict HA synchronization (strict_sync_check: bool = False). This adjustment allows the upgrade process to proceed even in cases where HA synchronization might not be perfect, accommodating the asynchronous nature of multi-threaded operations.
Simplified HA Sync Check Execution: The execution of the ha_sync_check_firewall function within the upgrade_firewall function has been simplified by removing the conditional strict_sync_check argument. The function now operates under the assumption that strict sync checks are not required by default, thus streamlining the logic and reducing complexity.
Removed Redundant Code: Code segments that determined the necessity of strict HA sync checks based on the revisit status of devices (is_device_to_revisit) have been removed. This elimination reflects the shift towards a more lenient approach to HA sync checks, rendering such checks unnecessary.
Implications
Testing
Extensive testing in both standalone and HA-configured environments is recommended to ensure that the relaxed synchronization checks do not introduce unexpected behaviors or instability.
Monitoring tools and procedures should be in place to quickly identify and address any HA sync issues following upgrades.
Conclusion
This PR represents a strategic shift towards a more resilient and adaptable upgrade process for Palo Alto Networks firewalls.
By accommodating the complexities of HA environments and the limitations of multi-threaded operations, it aims to provide a smoother and more reliable upgrade experience.