Skip to content

Commit

Permalink
docs: additional cleanup code blocks in bullets
Browse files Browse the repository at this point in the history
This can be squashed once reviewed.

Additional changes:
* Globally updated `>Notes: to `!!! notes` as done in original commit.
* Globally removed `# ` that were in code blocks that didn't require
  root access.
* Globally indented code blocks in bullet items so the code block
  matched up with the bullet text.
* Changed all `kepler` to `Kepler` were applicable.
* Fixed typos or bad grammer as they were stumbled upon.

Signed-off-by: Billy McFall <22157057+Billy99@users.noreply.github.com>
  • Loading branch information
Billy99 committed May 2, 2024
1 parent 70b3cff commit 298f1a4
Show file tree
Hide file tree
Showing 15 changed files with 239 additions and 175 deletions.
2 changes: 1 addition & 1 deletion docs/design/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The main feature of `Kepler Model Server` is to return a [power estimation model

In addition, the online-trainer can be deployed as a sidecar container to the server (main container) to execute training pipelines and update the model on the fly when power metrics are available.

`Kepler Estimator` is a client module to kepler model server running as a sidecar of Kepler Exporter (main container).
`Kepler Estimator` is a client module to Kepler model server running as a sidecar of Kepler Exporter (main container).

This python will serve a PowerRequest from model package in Kepler Exporter as defined in estimator.go via unix domain socket `/tmp/estimator.sock`.

Expand Down
8 changes: 4 additions & 4 deletions docs/design/ebpf_in_kepler.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- [How to list all currently registered kprobes ?](#list-kprobes)
- [Hardware CPU Events Monitoring](#hardware-cpu-events-monitoring)
- [How to check if kernel supports perf_event_open?](#check-support-perf_event_open)
- [Kernel routine probed by kepler](#kernel-routine-probed-by-kepler)
- [Kernel routine probed by Kepler](#kernel-routine-probed-by-kepler)
- [Hardware CPU events monitored by Kepler](#hardware-cpu-events-monitored-by-kepler)
- [Calculate process (aka task) total CPU time](#calculate-total-cpu-time)
- [Calculate task CPU cycles](#calculate-total-cpu-cycle)
Expand Down Expand Up @@ -63,13 +63,13 @@ Check presence of `/proc/sys/kernel/perf_event_paranoid` to know if kernel suppo

**CAP_SYS_ADMIN** is highest level of capability, it must have some security implications

## Kernel routine probed by kepler
## Kernel Routine Probed by Kepler

Kepler traps into `finish_task_switch` kernel function [3], which is responsible for cleaning up after a task switch occurs. Since the probe is `kprobe` it is called before `finish_task_switch` is called (instead of a `kretprobe` which is called after the probed function returns).

When a context switch occurs inside the kernel, the function `finish_task_switch` is called on the new task which is going to use the CPU. This function receives an argument of type `task_struct*` which contains all the information about the task which is leaving the CPU.[3]

The probe function in kepler is
The probe function in Kepler is

```c
int kprobe__finish_task_switch(struct pt_regs *ctx, struct task_struct *prev)
Expand Down Expand Up @@ -146,7 +146,7 @@ This value is stored in array `cpu_freq_array`

## Calculate 'page cache hit'

The probe function in kepler `kprobe__set_page_dirty` and `kprobe__mark_page_accessed` are used to track page cache hit for write and read action respectively.
The probe function in Kepler `kprobe__set_page_dirty` and `kprobe__mark_page_accessed` are used to track page cache hit for write and read action respectively.

## Process Table

Expand Down
47 changes: 23 additions & 24 deletions docs/design/kepler-energy-sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,22 @@ The support for different power domains varies according to the processor model.

## Reading Energy values

Kepler chooses to use one energy source in the following order of preference:

1. Sysfs
2. MSR
3. Hwmon

### Using RAPL Sysfs

From Linux Kernel version 3.13 onwards, RAPL values can be read using `Power Capping Framework`[2].

Linux Power Capping framework exposes power capping devices to user space via sysfs in the form of
a tree of objects.

This sysfs tree is mounted at `/sys/class/powercap/intel-rapl`. When RAPL is available, this path
exists and Kepler reads energy values from this path.

### Using RAPL MSR (Model Specific Registers)

The RAPL energy counters can be accessed through model-specific registers (MSRs). The counters are
Expand All @@ -56,45 +72,28 @@ There are basically two types of events that RAPL events report
Static Events: thermal specifications, maximum and minimum power caps, and time windows.
Dynamic Events: RAPL domain energy readings from the chip such as PKG, PP0, PP1 or DRAM

### Using RAPL Sysfs

From Linux Kernel version 3.13 onwards, RAPL values can be read using `Power Capping Framework`[2].

Linux Power Capping framework exposes power capping devices to user space via sysfs in the form of
a tree of objects.

This sysfs tree is mounted at `/sys/class/powercap/intel-rapl`. When RAPL is available, this path
exists and kepler reads energy values from this path.

### Using kernel driver xgene-hwmon

Using Xgene-hwmon driver kepler reads power from APM X-Gene SoC. It supports reading CPU and IO
Using Xgene-hwmon driver Kepler reads power from APM X-Gene SoC. It supports reading CPU and IO
power in micro watts.

### Using eBpf perf events

Not used in kepler
Not used in Kepler.

### Using PAPI library

Performance Application Programming Interface (PAPI)
Not used in kepler

Kepler chooses to use one enenry sources in the following order of preference:

1. Sysfs
2. MSR
3. Hwmon
Performance Application Programming Interface (PAPI) is not used in Kepler.

## Permissions required

### MSRs
### Sysfs (powercap)

Root access is required to use the msr driver
Root access is required to use powercap driver.

### Sysfs (powercap)
### MSRs

Root access is required to use powercap driver
Root access is required to use the msr driver.

## References

Expand Down
108 changes: 74 additions & 34 deletions docs/design/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,51 +6,57 @@ scraped by any database that understands this format, such as [Prometheus][0] an
Kepler exports a variety of container metrics to Prometheus, where the main ones are those related
to energy consumption.

## Kepler metrics overview
## Kepler Metrics Overview

All the metrics specific to the Kepler Exporter are prefixed with `kepler`.

## Kepler metrics for Container Energy Consumption
## Kepler Metrics for Container Energy Consumption

- **kepler_container_joules_total** (Counter)

This metric is the aggregated package/socket energy consumption of CPU, dram, gpus, and other host components for a given container.
Each component has individual metrics which are detailed next in this document.

This metric simplifies the Prometheus metric for performance reasons. A very large promQL query typically introduces a
very high overhead on Prometheus.

- **kepler_container_core_joules_total** (Counter)

This measures the total energy consumption on CPU cores that a certain container has used.
Generally, when the system has access to [RAPL][3] metrics, this metric will reflect the proportional container energy consumption of the RAPL
Power Plan 0 (PP0), which is the energy consumed by all CPU cores in the socket.
Generally, when the system has access to [RAPL][3] metrics, this metric will reflect the proportional container energy
consumption of the RAPL Power Plan 0 (PP0), which is the energy consumed by all CPU cores in the socket.
However, this metric is processor model specific and may not be available on some server CPUs.
The RAPL CPU metric that is available on all processors that support RAPL is the package, which we will detail
on another metric.

In some cases where RAPL is available but core metrics are not, Kepler may use the energy consumption package.
But note that package energy consumption is not just from CPU cores, it is all socket energy consumption.

In case [RAPL][3] is not available, kepler might estimate this metric using the model server.
In case [RAPL][3] is not available, Kepler might estimate this metric using the model server.

- **kepler_container_dram_joules_total** (Counter)

This metric describes the total energy spent in DRAM by a container.

- **kepler_container_uncore_joules_total** (Counter)

This measures the cumulative energy consumed by certain uncore components, which are typically the last level cache,
integrated GPU and memory controller, but the number of components may vary depending on the system.
The uncore metric is processor model specific and may not be available on some server CPUs.

When [RAPL][3] is not available, kepler can estimate this metric using the model server if the node CPU supports the uncore metric.
When [RAPL][3] is not available, Kepler can estimate this metric using the model server if the node CPU supports the uncore metric.

- **kepler_container_package_joules_total** (Counter)

This measures the cumulative energy consumed by the CPU socket, including all cores and uncore components (e.g.
last-level cache, integrated GPU and memory controller).
RAPL package energy is typically the PP0 + PP1, but PP1 counter may or may not account for all energy usage
by uncore components. Therefore, package energy consumption may be higher than core + uncore.

When [RAPL][3] is not available, kepler might estimate this metric using the model server.
When [RAPL][3] is not available, Kepler might estimate this metric using the model server.

- **kepler_container_other_joules_total** (Counter)

This measures the cumulative energy consumption on other host components besides the CPU and DRAM.
The vast majority of motherboards have a energy consumption sensor that can be accessed via the kernel acpi or ipmi.
This sensor reports the energy consumption of the entire system.
Expand All @@ -60,30 +66,31 @@ All the metrics specific to the Kepler Exporter are prefixed with `kepler`.
Generally, this metric is the host energy consumption (from acpi) less the RAPL Package and DRAM.

- **kepler_container_gpu_joules_total** (Counter)

This measures the total energy consumption on the GPUs that a certain container has used.
Currently, Kepler only supports NVIDIA GPUs, but this metric will also reflect other accelerators in the future.
So when the system has NVIDIA GPUs, kepler can calculate the energy consumption of the container's gpu using the GPU's
processeses energy consumption and utilization via NVIDIA nvml package.
Currently, Kepler only supports NVIDIA GPUs, but this metric will also reflect other accelerators in the future.
So when the system has NVIDIA GPUs, Kepler can calculate the energy consumption of the container's gpu using the GPU's
processes energy consumption and utilization via NVIDIA nvml package.

- **kepler_container_energy_stat** (Counter)

This metric contains several container metrics labeled with container resource utilization cgroup metrics
that are used in the model server for predictions.

This metric is specific for the model server and might be updated any time.

Note:
"system_process" is a special indicator that aggregate all the non-container workload into system process consumption metric.
## Kepler Metrics for Container Resource Utilization

## Kepler metrics for Container resource utilization
### Base Metric

### Base metric
- **kepler_container_bpf_cpu_time_us_total**

- ***kepler_container_bpf_cpu_time_us_total**
This measures the total CPU time used by the container using BPF tracing. This is a minimum exposed metric.

### Hardware counter metrics
### Hardware Counter Metrics

- **kepler_container_cpu_cycles_total**

This measures the total CPU cycles used by the container using hardware counters.
To support fine-grained analysis of performance and resource utilization, hardware counters are particularly desirable
due to its granularity and precision..
Expand All @@ -93,70 +100,95 @@ Note:
On systems where processors run at varying frequencies, CPU cycles and total CPU time will have different values.

- **kepler_container_cpu_instructions_total**

This measure the total cpu instructions used by the container using hardware counters.

CPU instructions are the de facto metric for accounting for CPU utilization.

- **kepler_container_cache_miss_total**

This measures the total cache miss that has occurred for a given container using hardware counters.

As there is no event counter that measures memory access directly, the number of last-level cache misses gives
a good proxy for the memory access number. If an LLC read miss occurs, a read access to main memory
should occur (but note that this is not necessarily the case for LLC write misses under a write-back cache policy).

Note:
You can enable/disable expose of those metrics through `expose-hardware-counter-metrics` kepler execution option or `EXPOSE_HW_COUNTER_METRICS` environment value.
!!! note
You can enable/disable expose of those metrics through `expose-hardware-counter-metrics` Kepler execution option or `EXPOSE_HW_COUNTER_METRICS` environment value.

### cGroups metrics
### cGroups Metrics

- **kepler_container_cgroupfs_cpu_usage_us_total**

This measures the total CPU time used by the container reading from cGroups stat.

- **kepler_container_cgroupfs_memory_usage_bytes_total**

This measures the total memory in bytes used by the container reading from cGroups stat.

- **kepler_container_cgroupfs_system_cpu_usage_us_total**

This measures the total CPU time in kernel space used by the container reading from cGroups stat.

- **kepler_container_cgroupfs_user_cpu_usage_us_total**

This measures the total CPU time in userspace used by the container reading from cGroups stat.

Note:
!!! note
You can enable/disable expose of those metrics through `EXPOSE_CGROUP_METRICS` environment value.

### IRQ metrics
### IRQ Metrics

- **kepler_container_bpf_net_tx_irq_total**

This measures the total transmitted packets to network cards of the container using BPF tracing.

- **kepler_container_bpf_net_rx_irq_total**

This measures the total packets received from network cards of the container using BPF tracing.

- **kepler_container_bpf_net_tx_irq_total** This measures the total transmitted packets to network cards of the container using BPF tracing.
- **kepler_container_bpf_net_rx_irq_total** This measures the total packets received from network cards of the container using BPF tracing.
- **kepler_container_bpf_block_irq_total** This measures block I/O called of the container using BPF tracing.
- **kepler_container_bpf_block_irq_total**

Note:
This measures block I/O called of the container using BPF tracing.

!!! note
You can enable/disable expose of those metrics through `EXPOSE_IRQ_COUNTER_METRICS` environment value.

## Kepler metrics for Node information
## Kepler Metrics for Node Information

- **kepler_node_info** (Counter)

This metric shows the node metadata like the node CPU architecture.

## Kepler metrics for Node energy consumption
## Kepler Metrics for Node Energy Consumption

- **kepler_node_core_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_uncore_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_dram_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_package_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_other_host_components_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_gpu_joules_total** (Counter)

Similar to container metrics, but representing the aggregation of all containers running on the node and operating system (i.e. "system_process").

- **kepler_node_platform_joules_total** (Counter)
This metric represents the total energy consumption of the host.

This metric represents the total energy consumption of the host.

The vast majority of motherboards have a energy consumption sensor that can be accessed via the acpi or ipmi kernel.
This sensor reports the energy consumption of the entire system.
Expand All @@ -166,18 +198,25 @@ Note:
Generally, this metric is the host energy consumption from Redfish BMC or acpi.

- **kepler_node_energy_stat** (Counter)

This metric contains multiple metrics from nodes labeled with container resource utilization cgroup metrics
that are used in the model server.

This metric is specific to the model server and can be updated at any time.

## Kepler metrics for Node resource utilization
!!! note
"system_process" is a special indicator that aggregate all the non-container workload into system process consumption metric.

## Kepler Metrics for Node Resource Utilization

### Accelerator Metrics

### Accelerator metrics
- **kepler_node_accelerator_intel_qat**

- **kepler_node_accelerator_intel_qat** This measures the utilization of the accelerator Intel QAT on a certain node. When the system has Intel QATs, kepler can calculate the utilization of the node's QATs through telemetry.
This measures the utilization of the accelerator Intel QAT on a certain node. When the system has Intel QATs,
Kepler can calculate the utilization of the node's QATs through telemetry.

## Exploring Node Exporter metrics through the Prometheus expression
## Exploring Node Exporter Metrics Through the Prometheus Expression

All the energy consumption metrics are defined as counter following the [Prometheus metrics guide](https://prometheus.io/docs/practices/naming/) for energy related metrics.

Expand All @@ -188,9 +227,10 @@ Therefore, for get the container energy consumption you can use the following qu
sum by (pod_name, container_name, container_namespace, node)(irate(kepler_container_joules_total{}[1m]))
```

Note that we report the node label in the container metrics because the OS metrics "system_process" will have the same name and namespace across all nodes and we do not want to aggregate them.
Note that we report the node label in the container metrics because the OS metrics "system_process" will have the
same name and namespace across all nodes and we do not want to aggregate them.

## RAPL power domain
## RAPL Power Domain

[RAPL power domains supported](https://zhenkai-zhang.github.io/papers/rapl.pdf) in some
resent Intel microarchitecture (consumer-grade/server-grade):
Expand Down
3 changes: 2 additions & 1 deletion docs/design/power_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ of the trained model. This modeling can be used even if the power metric cannot
can be done in three levels: Node total power (including fan, power supply, etc.), Node internal component
powers (such as CPU, Memory), Pod power.

> **Note**: Also see [Get started with Kepler Model Server](../kepler_model_server/get_started.md)
!!! note
Also see [Get started with Kepler Model Server](../kepler_model_server/get_started.md)

- **Pre-trained Power Models**: We provide pre-trained power models for different deployment scenarios.
Current x86_64 pre-trained model are developed in [Intel® Xeon® Processor E5-2667 v3][1]. Models with
Expand Down
Loading

0 comments on commit 298f1a4

Please sign in to comment.