diff --git a/docs/usage/trouble_shooting.md b/docs/usage/trouble_shooting.md index a938fe10..1d564493 100644 --- a/docs/usage/trouble_shooting.md +++ b/docs/usage/trouble_shooting.md @@ -1,11 +1,14 @@ # Trouble Shooting ## Kepler Pod failed to start + ### Background + Kepler uses eBPF to obtain performance counter readings and processes stats. Since eBPF requires kernel headers, Kepler will fail to start up when the kernel headers are missing. ### Diagnose -To confirm, check the Kepler Pod logs with the following command and look for message `not able to load eBPF modules`. + +To confirm, check the Kepler Pod logs with the following command and look for message `not able to load eBPF modules`. ```bash kubectl logs -n kepler daemonset/kepler-exporter @@ -28,7 +31,7 @@ On OpenShift, install the MachineConfiguration [here](https://github.com/sustain ### Background -Kepler uses RAPL counters on x86 platforms to read energy consumption. +Kepler uses RAPL counters on x86 platforms to read energy consumption. VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained ML models. The models use either hardware performance counters or cGroup stats to estimate energy consumed by processes. Currently the cGroup based models use cGroup v2 features such as `cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`, `cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`. ### Diagnose @@ -43,4 +46,4 @@ ls /sys/fs/cgroup/cgroup.controllers Enable cGroup v2 on the node by following [these Kubernetes instruction](https://kubernetes.io/docs/concepts/architecture/cgroups/). -On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs) \ No newline at end of file +On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)