From 2e19fc3d94f9b7e48e68dae1e01e6a351d893f8c Mon Sep 17 00:00:00 2001 From: Rob Fisher Date: Wed, 20 Dec 2023 15:53:10 -0500 Subject: [PATCH] Updated introductory paragraphs and added them to navigation --- documentation/modules/ROOT/nav.adoc | 12 ++++++ .../modules/ROOT/pages/API-Compatibility.adoc | 15 +++---- .../modules/ROOT/pages/CNF-Upgrade-Prep.adoc | 39 ++++++++++++------- .../modules/ROOT/pages/OCP-upgrade-prep.adoc | 6 +-- 4 files changed, 47 insertions(+), 25 deletions(-) diff --git a/documentation/modules/ROOT/nav.adoc b/documentation/modules/ROOT/nav.adoc index 1db61fe..6ba9e73 100644 --- a/documentation/modules/ROOT/nav.adoc +++ b/documentation/modules/ROOT/nav.adoc @@ -10,3 +10,15 @@ ** xref:API-Compatibility.adoc#k8s-skew[Kubernetes Version Skew] ** xref:API-Compatibility.adoc#ocp-upgrade-path[OpenShift Upgrade Path] +* xref:CNF-Upgrade-Prep.adoc[CNF Upgrade Preparation] +** xref:CNF-Upgrade-Prep.adoc#life-of-a-pod[Life of a POD] +** xref:CNF-Upgrade-Prep.adoc#cnf-req-doc[CNF Requirements Document] +** xref:CNF-Upgrade-Prep.adoc#pdb[POD Disruption Budget] +** xref:CNF-Upgrade-Prep.adoc#pod-anti-affinity[POD Anti-affinity] + +* xref:OCP-upgrade-prep.adoc[OCP Upgrade Preparation] +** xref:OCP-upgrade-prep.adoc#firmware-compatibility[Firmware compatibility] +** xref:OCP-upgrade-prep.adoc#layer-product-compatibility[Layer product compatibility] +** xref:OCP-upgrade-prep.adoc#prepare-mcp[Prepare MCPs] + +* xref:Applying-MCPs.adoc[Applying MCPs] diff --git a/documentation/modules/ROOT/pages/API-Compatibility.adoc b/documentation/modules/ROOT/pages/API-Compatibility.adoc index 12c0fbd..e599d50 100644 --- a/documentation/modules/ROOT/pages/API-Compatibility.adoc +++ b/documentation/modules/ROOT/pages/API-Compatibility.adoc @@ -17,17 +17,18 @@ The easiest way verify your application functionality will still work, is to mak [#ocp-upgrade-path] == OpenShift Upgrade Path -Please also note that not all releases of OCP can be upgraded to any arbitrary Z-release even if they contain all of the required patches. +Can I choose any Z-release in the new EUS or Y-stream version? -NO +The new 4.Y+2(or+1).Z release needs to have the same patch level that your currency 4.Y.Z release has. + +Why does 4.14.1 not have the same patches as 4.12.45? +All “new” patches are applied upstream first. This means that after 4.14.0 was release, 4.15 became the upstream version. +For example: the way that patches are applied to new and old releases is with new z-releases so a patch that is applied in X.Y+2.4 might have also been applied to X.Y.36. + OpenShift upgrade process mandates that: If fix “A” is present in a specific X.Y.Z release of OCP Then fix “A” MUST be present in the X.Y+1.Z release that OCP is upgraded TO -Consequence of the chosen destination version of 4.12.z defines which is the maximum version of OCP4.11.z, OCP4.10.z and OCP4.9.z -not all 4.9.z version will permit to upgrade to a given version of OCP4.12.z -A given version of OCP4.12.z will have requirements to a maximum version of OCP4.9z -This is due to how fixes are backported into older releases of OCP. - -You can use the https://access.redhat.com/labs/ocpupgradegraph/update_path[upgrade graph tool] to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations. +You can use the upgrade graph tool to determine if the path is valid for your z-release. You should also always verify with your Sales Engineer or Technical Account Manager at Red Hat to make sure the upgrade path is valid for Telco implementations. .K8s Version Skey image::k8s-vers-skew.png[] diff --git a/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc b/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc index 6b214ec..cf210af 100644 --- a/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc +++ b/documentation/modules/ROOT/pages/CNF-Upgrade-Prep.adoc @@ -5,37 +5,46 @@ include::_attributes.adoc[] The life of a POD is an important topic to understand. This section will describe several topics that are important to keeping your CNF PODs healthy and allow the cluster to properly schedule them during an upgrade. +[#life-of-a-pod] +== Life of a POD +Why is this important? + +Pods don’t move or reboot. Pods are deleted and a new pod takes its place. + +There isn’t (or shouldn’t be) a single pod with it’s own set of specifications but instead it should have lots of other pods that are the exact same in a group, called a deployment. A deployment should spread the workload across all of the pods. + +This is specified because we need to move away from the idea that each and every single pod needs to be cared for like it is that only strand holding things together. The old saying goes, a rope is made up of many strands, which is what makes it stronger than any single strand. + +[#cnf-req-doc] == CNF Requirements Document Before you go any further, please read through the https://connect.redhat.com/sites/default/files/2022-05/Cloud%20Native%20Network%20Function%20Requirements%201-3.pdf[CNF requirements document]. In this section a few of the most important points will be discussed but the CNF Requirements Document has additional detail and other important topics. +[#pdb] == POD Disruption Budget -Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep -from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be -improperly configured. + -For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler -that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down. +Each set of PODs in a deployment can be given a specific minimum number of PODs that should be running in order to keep from disrupting the functionality of the CNF, thus called the POD disruption budget (PDB). However, this budget can be improperly configured. + +For example, if you have 4 PODs in a deployment and your PDB is set to 4, this means that you are telling the scheduler that you NEED 4 PODs running at all times. Therefore, in this scenario ZERO PODs can come down. .Deployment with no PDB -image::../assets/images/PDB-full.jpg[] +image::PDB-full.jpg[] + +To fix this, the PDB can be set to 2, letting 2 of the 4 PODs to be scheduled as down and this would then let the worker nodes where those PODs are located be rebooted. -To fix this, the PDB can be set to 2, letting 2 of the 4 pods to be scheduled as down and this would then let the worker -nodes where those PODs are located be rebooted. +This does NOT mean that your deployment will be running on only 2 pods for a period of time. This means that 2 new pods can be created to replace 2 current pods and there can be a short period of time as the new pods come online and the old pods are deleted. .Deployment with PDB -image::../assets/images/PDB-down-2.jpg[] +image::PDB-down-2.jpg[] +[#pod-anti-affinity] == POD Anti-affinity -True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an -application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since -processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have -anti-affinity set on them so that they are NOT running on the same hardware. It so happens that anti-affinity also -helps during upgrades because it makes sure that PODs are on different worker nodes, therefore allowing enough PODs to -come down even after considering their PDB. +True high availability requires a duplication of a process to be running on separate hardware, thus making sure that an application will continue to run if one piece of hardware goes down. OpenShift can easily make that happen since processes are automatically duplicated in separate PODs within a deployment. However, those PODs need to have anti-affinity set on them so that they are NOT running on the same hardware. + +During an upgrade anti-affinity is important so that there aren’t too many pods on a node when it is time for it to reboot. For example: if there are 4 pods from a single deployment on a node, and the PDB is set to only allow 1 pod be deleted at a time, then it will take 4 times a long for that node to reboot because it will be waiting on all 4 pods to be deleted. == Liveness / Readiness Probes diff --git a/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc b/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc index 39b77a5..15e8395 100644 --- a/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc +++ b/documentation/modules/ROOT/pages/OCP-upgrade-prep.adoc @@ -45,13 +45,13 @@ section, below, for more details on the pause/un-pause process. // insert image for MCP .Worker node MCPs in a 5 rack cluster -image::../assets/images/5Rack-MCP.jpg[] +image::5Rack-MCP.jpg[] The division and size of these MCPs can vary depending on many factors. In general the standard division is between 8 and 10 nodes per MCP to allow the operations team to control how many nodes are taken down at a time. .Separate MCPs inside of a group of Load Balancer or purpose built nodes -image::../assets/images/LBorHT-MCP.jpg[] +image::LBorHT-MCP.jpg[] In larger clusters there is quite often a need to separate out several nodes for purposes like Load Balancing or other high throughput purposes, which usually have different machine sets to configure SR-IOV. In these cases we do not want @@ -60,7 +60,7 @@ out into at least 3 different MCPs and unpause them individually. // insert image for MCP .Small cluster worker MCPs -image::../assets/images/Worker-MCP.jpg[] +image::Worker-MCP.jpg[] Smaller cluster example with 1 rack