module updates

RedHatQuickCourses · Jan 10, 2025 · 418cc10 · 418cc10
1 parent 7b8d138
commit 418cc10
Show file tree

Hide file tree

Showing 8 changed files with 224 additions and 38 deletions.
diff --git a/content/modules/ROOT/nav.adoc b/content/modules/ROOT/nav.adoc
@@ -61,4 +61,6 @@
 ** xref:module-09.adoc#checkingressconfig[Check the _Ingress Controller_ configuration]
 ** xref:module-09.adoc#solution[Issue solution]
 
-* xref:module-10.adoc[10. Exploring etcd snapshots with koff]
+* xref:module-10.adoc[10. Exploring etcd snapshots with koff]
+** xref:module-10.adoc#gettingstarted[Getting Started with koff]
+** xref:module-10.adoc#koffget[Viewing resources in etcd]
diff --git a/content/modules/ROOT/pages/module-02.adoc b/content/modules/ROOT/pages/module-02.adoc
@@ -29,7 +29,7 @@ cd ~/Module2/
 [source,bash]
 ----
 $ omc use module2-demo-must-gather
-Must-Gather  : /home/lab-user/Module2/sno_demo/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-2de07af89683678ae6bb7a939615fc0d4ced7fe185add38b050f2c6f60023b6f
+Must-Gather  : /home/lab-user/Module2/module2-demo-must-gather/sno_demo/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-2de07af89683678ae6bb7a939615fc0d4ced7fe185add38b050f2c6f60023b6f
 Project      : default
 ApiServerURL : https://api.cluster-6fmht.dynamic.redhatworkshops.io:6443
 Platform     : None

diff --git a/content/modules/ROOT/pages/module-03.adoc b/content/modules/ROOT/pages/module-03.adoc
@@ -26,7 +26,7 @@ We recently patched this cluster to from *4.14.27* to *4.14.37*. Previous scale-
 
 . Check the cluster `nodes` and cluster `machines` to verify the nodes do not exist and the machines do
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]

diff --git a/content/modules/ROOT/pages/module-04.adoc b/content/modules/ROOT/pages/module-04.adoc
@@ -97,7 +97,7 @@ Start by taking a high level view. You can be both broad and granular with audit
 
 Look at the top usage for the common `--by=` groups like `resource` and `user`
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -120,7 +120,7 @@ We spotted something suspicious, so let's drill down a little deeper.
 When evaluating the data, always factor in things like the total number of requests, time period and the number of nodes.
 ====
 
-.Click to show some details if you need a hint
+.*Click to show some details if you need a hint*
 [%collapsible]
 ====
 Our top 3 resources from the previous command were `nodes`, `configmaps` and `pods`:
@@ -140,7 +140,7 @@ Our top 3 users from the previous command were `sysdig-agent`, `apiserver` and `
 
 One of those sticks out a lot, but let's first take a look at our top 3 resources. For this we can use the `--resource=` flag, in addition to `--by=` and `-o top` to down on a specific resource.
 
-.Click to show some details if you need a hint
+.*Click to show some details if you need a hint*
 [%collapsible]
 ====
 ----
@@ -167,7 +167,7 @@ Let's try to answer the following:
 What is the user doing? +
 What is the problem?
 
-.Click to show some details if you need a hint
+.*Click to show some details if you need a hint*
 [%collapsible]
 ====
 ----
@@ -186,7 +186,7 @@ Top 10 "GET" (of 440076 total hits):
    8308x [   270.327µs] [403-8307] /api/v1/nodes/cluster-app-02.dmz/proxy/metrics    [system:serviceaccount:openshift-example-sysdig-agent:sysdig-agent]
 ----
 
-The conclusion is that there was an issue with the `SysDig`` monitoring component that was causing it to fail authentication when trying to collect `node` metrics and in turn spam the API server.
+The conclusion is that there was an issue with the `SysDig` monitoring component that was causing it to fail authentication when trying to collect `node` metrics and in turn spam the API server.
 ====
 
 I hope you found this introduction to the `kubectl-dev_tool` useful and can leverage it the next time you have an issue!

diff --git a/content/modules/ROOT/pages/module-05.adoc b/content/modules/ROOT/pages/module-05.adoc
@@ -63,7 +63,12 @@ Options:
 ====
 [source,bash]
 ----
-$ ocp_insights.sh --file insights_archive.tar.gz
+cd ~/Module5/
+----
+
+[source,bash]
+----
+$ ocp_insights.sh --file module5-insights-data
 
 Cluster Version: 4.14.27
 Channel: eus-4.14
@@ -152,7 +157,7 @@ To see all Alerts run: jq -r . insights-2024-08-14-144858/config/alerts.json
 ====
 [source,bash]
 ----
-$ ocp_insights.sh --file insights_archive.tar.gz --customer_memory
+$ ocp_insights.sh --file module5-insights-data --customer_memory
 ...
 Customer Namespace Memory Usage.
 
@@ -184,7 +189,7 @@ Total Customer Namespace Memory Usage: 121.9884G
 ====
 [source,bash]
 ----
-$ ocp_insights.sh --file insights_archive.tar.gz --etcd_metrics
+$ ocp_insights.sh --file module5-insights-data --etcd_metrics
 etcd server slow apply total
 
 etcd-ocp4-2nvq7-master-0,3548
@@ -209,7 +214,7 @@ etcd-ocp4-2nvq7-master-1,22
 ====
 [source,bash]
 ----
-$ ocp_insights.sh --file insights_archive.tar.gz --storage_classes
+$ ocp_insights.sh --file module5-insights-data --storage_classes
 ...
 StorageClass Information.
 

diff --git a/content/modules/ROOT/pages/module-06.adoc b/content/modules/ROOT/pages/module-06.adoc
@@ -60,7 +60,12 @@ options:
 ====
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --stats
+cd ~/Module6/
+----
+
+[source,bash]
+----
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --stats
 Stats about etcd "apply request took too long" messages: etcd-ocp4-2nvq7-master-0
 	First Occurrence: 2024-07-28T04:00:27
 	Last Occurrence: 2024-08-14T15:18:23
@@ -135,7 +140,7 @@ Stats about etcd "slow fdatasync" messages: etcd-ocp4-2nvq7-master-2
 ====
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --errors
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --errors
 POD                       	ERROR                                                 	COUNT
 etcd-ocp4-2nvq7-master-0	waiting for ReadIndex response took too long, retrying	 295
 etcd-ocp4-2nvq7-master-0	slow fdatasync                                        	  60
@@ -180,7 +185,7 @@ etcd-ocp4-2nvq7-master-2	sending buffer is full
 ====
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --ttl
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --ttl
 POD                       	DATE      	COUNT
 etcd-ocp4-2nvq7-master-0	2024-07-28	121
 etcd-ocp4-2nvq7-master-0	2024-07-29	112
@@ -215,7 +220,7 @@ etcd-ocp4-2nvq7-master-2	2024-08-14	952
 ====
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --ttl --date 2024-07-28 --pod etcd-ocp4-2nvq7-master-1
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --ttl --date 2024-07-28 --pod etcd-prodshift-2nvq7-master-1
 POD                       	DATE 	COUNT
 etcd-ocp4-2nvq7-master-1	05:16	12
 etcd-ocp4-2nvq7-master-1	05:31	13
@@ -232,7 +237,7 @@ etcd-ocp4-2nvq7-master-1	08:12	 3
 ====
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --ttl --compare
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --ttl --compare
 Date: 2024-07-28
 POD                            COUNT
 etcd-ocp4-2nvq7-master-0     121
@@ -254,7 +259,7 @@ etcd-ocp4-2nvq7-master-2     152
 
 [source,bash]
 ----
-$ etcd-ocp-diag.py --path <path_to_mg> --ttl --date 2024-07-28 --compare
+$ etcd-ocp-diag.py --path module6-must-gather.6521552859184261155/ --ttl --date 2024-07-28 --compare
 Date: 04:02
 POD                            COUNT
 etcd-ocp4-2nvq7-master-0     8

diff --git a/content/modules/ROOT/pages/module-09.adoc b/content/modules/ROOT/pages/module-09.adoc
@@ -20,7 +20,7 @@ We are on a _UPI_ cluster and use an external _Load Balancer_ to send traffic to
 
 . Using the `omc use` command, set the `module9-must-gather.local` _must-gather_ as the current archive in use.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -46,7 +46,7 @@ OCP might receive critical networking bugfixes between different z-releases, the
 
 * Check the cluster version.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -57,7 +57,7 @@ omc get ClusterVersion version
 
 * Check which CNI (_Container Network Interface_) plugin is being used on the cluster.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -68,7 +68,7 @@ omc get Network cluster -o json | yq '.spec.networkType'
 
 * Check which are the installed _Incress Controllers_.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -99,7 +99,7 @@ The command `oc adm must-gather` does not collect data from all Namepsaces, but
 
 Which command should we ask to a customer in order to collect data of a specific Namespace ?
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -115,7 +115,7 @@ In this lab, the _inspect_ of `fsi-project` is named `module9-inspect-fsi-projec
 The `omc` tool isn't restricted to _must-gathers_, but it can be set to read from a Namespace _inspect_ archive too.
 =====
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -141,7 +141,7 @@ As the saying goes: _"When you hear hoofbeats behind you, don't expect to see a
 
 * First of all, it will be handy to find the Selector used by the Deployment `fsi-application` for its Pods. Let's check it and put it into a shell variable. 
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -157,7 +157,7 @@ echo $SELECTOR_LABEL
 
 * Then, check that the Pod replicas in the reported Deployment `fsi-application` are all running.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -178,7 +178,7 @@ omc get pod -l $SELECTOR_LABEL
 When a Pod is correctly "connected" to a Service, its IP address will appear in the Endpoints object corresponding to the Service
 =====
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -194,7 +194,7 @@ omc get pod -l $SELECTOR_LABEL -o wide
 
 * As reported by the customer, even if the above checks were successfull, we should still expect to see traffic logs (for example, _GET_ requests) only in the logs of one of the two Pods. Let's verify by checking all Pods logs.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -222,7 +222,7 @@ That is, let's analyze the application related Route and how it is configured in
 
 * Let's check the Route.
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -243,14 +243,16 @@ fsi-route   fsi-route-fsi-project.apps.foobarbank.lab.upshift.rdu2.redhat.com
 
 [IMPORTANT]
 =====
-The application specific Route is contained into the _inspect_, however the `default` _Ingress Controller_ configuration is always contained into the _must-gather_.
+The application specific Route is found inside the _inspect_ must-gather, however the `default` _Ingress Controller_ configuration is always only found in a full _must-gather_.
 =====
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
+Switch back to the full must-gather and use the `backends` command to view all of the haproxy configurations.
 [source,bash]
 ----
+omc use module9-must-gather.local/
 omc haproxy backends fsi-project
 ----
 ====
@@ -263,18 +265,18 @@ NAMESPACE	NAME		INGRESSCONTROLLER	SERVICES	PORT		TERMINATION
 fsi-project	fsi-route	default			fsi-service	https(8443)	passthrough/Redirect	
 ----
 
-* Everything seems correct so far, therefore we need to dig deeper. Let's manually print the whole `fsi-route` Route "admission" directly from the `default` _Ingress Controller_ configuration file.
+* Everything seems correct so far, therefore we need to dig deeper. Let's manually print the `fsi-route` configuration directly from the `default` _Ingress Controller_ haproxy configuration file.
 
 [TIP]
 =====
-In general, the `default` _Ingress Controller_ configuration file can be found at the following path: 
+In a full must-gather, the `default` _Ingress Controller_ configuration file can be found at the following path: 
 
 `<must-gather-archive>/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-<hash>/ingress_controllers/default/<ingress-default-pod>/haproxy.config`.
 
-Note that there is one `haproxy.config` file for each _Ingress Controller_ Pod.
+Note that there is one `haproxy.config` file for each _Ingress Controller_ Pod, although they should all be the same.
 =====
 
-.Click to show some commands if you need a hint
+.*Click to show some commands if you need a hint*
 [%collapsible]
 ====
 [source,bash]
@@ -309,7 +311,7 @@ backend be_tcp:fsi-project:fsi-route
 [#solution]
 == Issue solution
 
-Gothca ! The Route seems using the _balance_ of type `source`. We can verify whether this is the intended _Ingress Controller_ behavior by checking the official OCP documentation about link:https://docs.openshift.com/container-platform/4.17/networking/routes/route-configuration.html#nw-route-specific-annotations_route-configuration[_Route-specific annotations_]. 
+Gothca! The Route seems using the _balance_ of type `source`. We can verify whether this is the intended _Ingress Controller_ behavior by checking the official OCP documentation about link:https://docs.openshift.com/container-platform/4.17/networking/routes/route-configuration.html#nw-route-specific-annotations_route-configuration[_Route-specific annotations_]. 
 
 There we can read:
 
@@ -318,5 +320,4 @@ There we can read:
 The default value is "source" for TLS passthrough routes. For all other routes, the default is "random".
 ----
 
-OCP is therefore correctly behaving. The issue is not a bug, but a misconfiguration by the customer who supposed the _balance_ type was `random` for all types of Routes.
-
+OCP is therefore correctly behaving. The issue is not a bug, but a misconfiguration/misunderstanding by the customer who assumed the _balance_ type was `random` for all Routes.