Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix lint issues on main #154

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/design/architecture.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Kepler Exporter公开了有关Kubernetes组件(如Pods和Nodes)能耗的各

请点击链接查看相关能耗[指标/metrics](metrics.md)的定义。

![](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/doc/kepler-arch.png)
![Kepler Architecture](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/doc/kepler-arch.png)

## Kepler Model Server
Kepler Model Server主要提供[能耗预估模型](../kepler_model_server/power_estimation.md),该模型支持对各种粒度的(如节点数,节点CPU数,Pod数,Pod进程数)的请求,并返回指标,精准性模型过滤器。
Expand Down
39 changes: 9 additions & 30 deletions docs/design/ebpf_in_kepler.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,16 @@
# eBPF in Kepler

## Contents

- [Background](#background)
- [What is eBPF ?](#what-is-ebpf)
- [What is a kprobe?](#what-is-a-kprobe)
- [How to list all currently registered kprobes ?](#list-kprobes)
- [Hardware CPU Events Monitoring](#hardware-cpu-events-monitoring)
- [How to check if kernel supports perf_event_open?](#check-support-perf_event_open)
- [Kernel routine probed by Kepler](#kernel-routine-probed-by-kepler)
- [Hardware CPU events monitored by Kepler](#hardware-cpu-events-monitored-by-kepler)
- [Calculate process (aka task) total CPU time](#calculate-total-cpu-time)
- [Calculate task CPU cycles](#calculate-total-cpu-cycle)
- [Calculate task Ref CPU cycles](#calculate-total-cpu-ref-cycle)
- [Calculate task CPU instructions](#calculate-total-cpu-instr)
- [Calculate task Cache misses](#calculate-total-cpu-cache-miss)
- [Calculate 'On CPU Average Frequency'](#calculate-on-cpu-avg-freq)
- [Process Table](#process-table)
- [References](#references)

## Background

<!-- markdownlint-disable MD033 -->
### What is eBPF ? <a name="what-is-ebpf"></a>
### What is eBPF?

eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules. [1]

### What is a kprobe?

KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. KProbes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. You can trap at almost any kernel code address, specifying a handler routine to be invoked when the breakpoint is hit. [2]

#### How to list all currently registered kprobes ? <a name="list-kprobes"></a>
#### How to list all currently registered kprobes?

```bash
sudo cat /sys/kernel/debug/kprobes/list
Expand All @@ -44,7 +24,7 @@ Using syscall `perf_event_open` [5], Linux allows to set up performance monitori
This syscall takes `pid` and `cpuid` as parameters. Kepler uses `pid == -1` and `cpuid` as actual cpu id.
This combination of pid and cpu allows measuring all process/threads on the specified cpu.

#### How to check if kernel supports `perf_event_open`? <a name="check-support-perf_event_open"></a>
#### How to check if kernel supports `perf_event_open`?

Check presence of `/proc/sys/kernel/perf_event_paranoid` to know if kernel supports `perf_event_open` and what is allowed to be measured

Expand Down Expand Up @@ -91,7 +71,7 @@ Kepler opens monitoring for following hardware cpu events

Performance counters are accessed via special file descriptors. There's one file descriptor per virtual counter used. The file descriptor is associated with the corresponding array. When bcc wrapper functions are used, it reads the corresponding fd, and return values.

## Calculate process (aka task) total CPU time <a name="calculate-total-cpu-time"></a>
## Calculate process (aka task) total CPU time

The ebpf program (`bpfassets/bcc/bcc.c`) maintains a mapping from a `<pid, cpuid>` pair to a timestamp. The timestamp signifies the moment `kprobe__finish_task_switch` was called for pid when this pid was to be scheduled on cpu `<cpuid>`

Expand All @@ -107,7 +87,7 @@ Within the function `get_on_cpu_time`, the difference between the current timest

This `on_cpu_time_delta` is used to accumulate the `process_run_time` metrics for the previous task.

## Calculate task CPU cycles <a name="calculate-total-cpu-cycle"></a>
## Calculate task CPU cycles

For task cpu cycles, the bpf program maintains an array named `cpu_cycles`, indexed by `cpuid`. This contains values from perf array `cpu_cycles_hc_reader`, which is a perf event type array.

Expand All @@ -120,20 +100,19 @@ On each task switch:

The delta thus calculated is the cpu cycles used by the process leaving the cpu

## Calculate task Ref CPU cycles <a name="calculate-total-cpu-ref-cycle"></a>
## Calculate task Ref CPU cycles

Same process as calculating CPU cycles, difference being perf array used is `cpu_ref_cycles_hc_reader` and prev value is stored in `cpu_ref_cycles`

## Calculate task CPU instructions <a name="calculate-total-cpu-instr"></a>
## Calculate task CPU instructions

Same process as calculating CPU cycles, difference being perf array used is `cpu_instr_hc_reader` and prev value is stored in `cpu_instr`

## Calculate task Cache misses <a name="calculate-total-cpu-cache-miss"></a>
## Calculate task Cache misses

Same process as calculating CPU cycles, difference being perf array used is `cache_miss_hc_reader` and prev value is stored in `cache_miss`

## Calculate 'On CPU Average Frequency' <a name="calculate-on-cpu-avg-freq"></a>
<!-- markdownlint-enable MD033 -->
## Calculate 'On CPU Average Frequency'

```c
avg_freq = ((on_cpu_cycles_delta * CPU_REF_FREQ) / on_cpu_ref_cycles_delta) * HZ;
Expand Down
78 changes: 33 additions & 45 deletions docs/design/ebpf_in_kepler.zh.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,34 @@
# Kepler中的ebpf

## 目录
- [背景](#背景)
- [什么是ebpf](#什么是ebpf)
- [什么是kprobe?](#什么是kprobe?)
- [如何查看已经注册的kprobes?](#如何查看已经注册的kprobes?)
- [CPU硬件事件监控](#CPU硬件事件监控)
- [如何检查Linux内核是否支持perf_event_open?](#如何检查Linux内核是否支持perf_event_open?)
- [kepler探测内核进程](#kepler探测内核进程)
- [Kepler监控CPU硬件事件](#Kepler监控CPU硬件事件)
- [计算进程CPU运行总时间](#计算进程CPU运行总时间)
- [计算进程CPU周期](#计算进程CPU周期)
- [计算进程参考CPU周期](#计算进程参考CPU周期)
- [计算进程CPU指令](#计算进程CPU指令)
- [计算进程缓存失效](#计算进程缓存失效)
- [计算CPU上平均频率](#计算CPU上平均频率)
- [进程表](#进程表)
- [参考](#参考)

## 背景

### 什么是ebpf ? <a name="什么是ebpf"></a>
### 什么是ebpf?

eBPF是一项革命性的技术,起源于Linux内核,可以在操作系统内核等特权上下文中运行沙盒程序。它用于安全有效地扩展内核的功能,而无需更改内核源代码或加载内核模块。[1]

### 什么是kprobe?
### 什么是kprobe?

KProbes是Linux内核的一种调试机制,也可用于监视生产系统内的事件。KProbes使您能够动态地闯入任何内核例程,并以无中断的方式收集调试和性能信息。您可以在几乎任何内核代码地址设置陷阱,指定在遇到断点时要调用的处理程序例程。[2]

#### 如何查看已经注册的kprobes? <a name="如何查看已经注册的kprobes"></a>
```
#### 如何查看已经注册的kprobes?

```bash
sudo cat /sys/kernel/debug/kprobes/list
```

### CPU硬件事件监控

性能计数器是在如今大多数CPU上均已实现的一种特殊的硬件计数器。这些计数器在统计某些特殊类型的硬件事件: 例如执行命令,缓存失效,或分支预测错误的同时并不会降低内核或者程序执行速度。[4]

使用系统调用 `perf_event_open` [5], Linux系统允许设置硬件和软件性能的性能监视。它返回一个文件描述符来读取性能信息。
这个系统调用使用 `pid` 和 `cpuid` 作为参数. Kepler使用`pid == -1`和`cpuid`作为实际的cpuid。
这种pid和cpu的组合允许测量指定cpu上的所有进程/线程。

#### 如何检查Linux内核是否支持`perf_event_open`? <a name="check-support-perf_event_open"></a>
#### 如何检查Linux内核是否支持`perf_event_open`?

检查是否存在`/proc/sys/kernel/perf_event_paranoid`,以了解内核是否支持`perf_event_open`以及允许测量的内容

```
```bash
The perf_event_paranoid file can be set to restrict
access to the performance counters.

Expand All @@ -62,13 +49,16 @@ kepler捕捉内核函数`finish_task_switch`[3], 该函数负责在任务切换
当内核发生上下文切换时,函数`finish_task_switch`在新进程进入CPU时被调用。这个函数接受参数类型`task_struct*`,该参数类型包含所有关于离开CPU进程的所有信息。[3]

kepler的探测函数
```

```c
int kprobe__finish_task_switch(struct pt_regs *ctx, struct task_struct *prev)
```

第一个参数的类型是指向`pt_regs`结构的指针,该结构指的是在内核函数条目时保持CPU寄存器状态的结构。此结构包含与CPU寄存器相对应的字段,例如通用寄存器(例如,r0、r1等)、堆栈指针(sp)、程序计数器(pc)和其他特定于体系结构的寄存器。
第二个参数是指向`task_struct`的指针,该指针包含前一任务的任务信息,即离开CPU的任务。

## Kepler监控CPU硬件事件

Kepler监控以下CPU硬件事件

| PERF Type | Perf Count Type | Description | Array name <br>(in bpf program) |
Expand All @@ -80,10 +70,11 @@ Kepler监控以下CPU硬件事件

性能计数器通过特殊的文件描述符进行访问。每个使用的虚拟计数器都有一个文件描述符。文件描述符与相应的数组相关联。当使用bcc包装器函数时,它读取相应的fd并返回值。

## 计算进程CPU运行总时间<a name="calculate-total-cpu-time"></a>
## 计算进程CPU运行总时间

ebpf函数(`bpfassets/perf_event/perf_event.c`)维护一个基于时间戳对于`<pid, cpuid>`表。时间戳表示在cpu上调度pid时为pid调用`kprobe_finish_task_switch`的时刻`<cpuid>`

```
```c
// <Task PID, CPUID> => Context Switch Start time

typedef struct pid_time_t { u32 pid; u32 cpu; } pid_time_t;
Expand All @@ -94,7 +85,8 @@ BPF_HASH(pid_time, pid_time_t);

此`on_cpu_time_delta`用于累积前一任务的`process_run_time`度量。

## 计算进程CPU周期 <a name="calculate-total-cpu-cycle"></a>
## 计算进程CPU周期

对于进程的CPU周期,bpf程序维护`cpu_cycles`数组,并通过`cpuid`作为索引。这个数组包含性能数组`cpu_cycles_hc_reader`,是一个性能事件的数组。

在每个任务切换上,
Expand All @@ -105,17 +97,21 @@ BPF_HASH(pid_time, pid_time_t);

由此计算出的增量是离开cpu的进程所使用的cpu周期。

## 计算进程参考CPU周期 <a name="calculate-total-cpu-ref-cycle"></a>
## 计算进程参考CPU周期

与计算CPU周期的过程相同,所使用的性能数组的替换为`cpu_ref_cycles_hc_reader`,prev值存储在`CPU_ref_cycles`中

## 计算进程CPU指令<a name="calculate-total-cpu-instr"></a>
## 计算进程CPU指令

与计算CPU周期的过程相同,所使用的性能数组的替换为`cpu_instr_hc_reader`,prev值存储在`cpu_instr`中

## 计算进程缓存失效 <a name="calculate-total-cpu-cache-miss"></a>
## 计算进程缓存失效

与计算CPU周期的过程相同,所使用的性能数组的替换为`cache_miss_hc_reader`,prev值存储在`cache_miss`中

## 计算CPU上平均频率 <a name="calculate-on-cpu-avg-freq"></a>
```
## 计算CPU上平均频率

```c
avg_freq = ((on_cpu_cycles_delta * CPU_REF_FREQ) / on_cpu_ref_cycles_delta) * HZ;

CPU_REF_FREQ = 2500
Expand Down Expand Up @@ -144,19 +140,11 @@ bpf程序维护一个名为`processes`的bpf散列。此散列维护为进程计
## 参考

[1] [https://ebpf.io/what-is-ebpf/](https://ebpf.io/what-is-ebpf/) , [https://www.splunk.com/en_us/blog/learn/what-is-ebpf.html](https://www.splunk.com/en_us/blog/learn/what-is-ebpf.html) , [https://www.tigera.io/learn/guides/ebpf/](https://www.tigera.io/learn/guides/ebpf/)

[2] [An introduction to KProbes](https://lwn.net/Articles/132196/) , [Kernel Probes (Kprobes)](https://docs.kernel.org/trace/kprobes.html)

[3] [finish_task_switch - clean up after a task-switch](https://elixir.bootlin.com/linux/v6.4-rc7/source/kernel/sched/core.c#L5157)

[4] [Performance Counters for Linux](https://elixir.bootlin.com/linux/latest/source/tools/perf/design.txt)

[5] [perf_event_open(2) — Linux manual page](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html)





[2] [An introduction to KProbes](https://lwn.net/Articles/132196/) , [Kernel Probes (Kprobes)](https://docs.kernel.org/trace/kprobes.html)

[3] [finish_task_switch - clean up after a task-switch](https://elixir.bootlin.com/linux/v6.4-rc7/source/kernel/sched/core.c#L5157)

[4] [Performance Counters for Linux](https://elixir.bootlin.com/linux/latest/source/tools/perf/design.txt)

[5] [perf_event_open(2) — Linux manual page](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html)
1 change: 0 additions & 1 deletion docs/design/power_model.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,3 @@ VM with node info and power passthrough from BM (x86 but no power meter)|Power E
VM with node info and power passthrough from BM (non-x86 with power meter)|Measurement + VM Mapping|Power Estimation|Power Ratio
VM with node info|Power Estimation|Power Estimation|Power Ratio
Pure VM|\-|\-|Power Estimation
|||
4 changes: 3 additions & 1 deletion docs/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ Kepler (Kubernetes-based Efficient Power Level Exporter)是一个prometheus expo
项目的Github地址 ➡️ [Kepler](https://github.com/sustainable-computing-io/kepler).
目前中文文档依旧在施工中,欢迎贡献。

<!-- markdownlint-disable -->
</br></br></br></br></br></br></br></br>
<p style="text-align: center;">
目前该项目已经成为Cloud Native Computing Foundation sandbox project.

<img src="../cncf-color-bg.svg" width="40%" height="20%">
</p>
</p>
<!-- markdownlint-enable -->
18 changes: 9 additions & 9 deletions docs/installation/community-operator.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ git clone https://github.com/sustainable-computing-io/kepler-operator.git
cd kepler-exporter
```
---
## 从Operator Hub安装operator
## 从Operator Hub安装operator

1. 选中Operators > OperatorHub. 搜索 `Kepler`. 点击 `Install`
![](../fig/ocp_installation/operator_installation_ocp_1_0.6.z.png)
1. 选中Operators > OperatorHub. 搜索 `Kepler`. 点击 `Install`
![Operator installation in OCP](../fig/ocp_installation/operator_installation_ocp_1_0.6.z.png)

2. 允许安装
![](../fig/ocp_installation/operator_installation_ocp_7_0.6.z.png)
![Operator installation in OCP](../fig/ocp_installation/operator_installation_ocp_7_0.6.z.png)

3. 创建Kepler的Custom Resource
![](../fig/ocp_installation/operator_installation_ocp_2_0.6.z.png)
![Operator installation in OCP](../fig/ocp_installation/operator_installation_ocp_2_0.6.z.png)
> 注意:当前的OCP控制台可能会显示一个JavaScript错误(预计将在4.13.5中修复),但它不会影响其余步骤。修复程序目前可在4.13.0-0.nightly-2023-07-08-165124版本的OCP控制台上获得。

---
Expand All @@ -40,17 +40,17 @@ hack/dashboard/openshift/deploy-grafana.sh

### 访问Garafana Console
配置Networking > Routes.
![](../fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png)
![](../fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png)
![Operator installation](../fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png)
![Operator installation](../fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png)

### Grafana Dashboard
使用密钥`kepler:kepler`登陆Grafana Dashboard.
![](../fig/ocp_installation/operator_installation_ocp_6_0.6.z.png)
![Operator installation](../fig/ocp_installation/operator_installation_ocp_6_0.6.z.png)

---

## 故障排除

> 注意:如果数据源出现问题,请检查API令牌是否已正确更新

![](../fig/ocp_installation/operator_installation_ocp_3_0.6.z.png)
![Operator installation](../fig/ocp_installation/operator_installation_ocp_3_0.6.z.png)
18 changes: 9 additions & 9 deletions docs/installation/kepler-operator.zh.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# 通过Kepler Operator在Kind上安装

## 需求:
## 需求

在开始前请确认您已经安装了:

- `kubectl`
- 下载了`kepler-operator`[repository](https://github.com/sustainable-computing-io/kepler-operator)
- 目标k8s集群。您可以使用Kind来简单构建一个本地k8s集群来体验本教程。[local cluster for testing](#run-a-kind-cluster-locally), 或直接在您远端的k8s集群执行。注意您的controller将会自动使用当前的kubeconfig配置文件。您可以通过`kubectl cluster-info`来查看。
- 下载了`kepler-operator`[repository](https://github.com/sustainable-computing-io/kepler-operator)
- 目标k8s集群。您可以使用Kind来简单构建一个本地k8s集群来体验本教程。[启动一个本地kind集群](#run-a-kind-cluster-locally), 或直接在您远端的k8s集群执行。注意您的controller将会自动使用当前的kubeconfig配置文件。您可以通过`kubectl cluster-info`来查看。
- 有`kubeadmin` 或者 `cluster-admin` 权限的用户。

### 启动一个本地kind集群
### 启动一个本地kind集群 <a name="run-a-kind-cluster-locally"></a>

``` sh
cd kepler-operator
Expand Down Expand Up @@ -50,9 +50,9 @@ kubectl port-forward svc/grafana 3000:3000 -n monitoring
让`kube-prometheus` 使用 `kepler-exporter` 服务端口进行监控,您需要配置service monitor.

!!! note
默认情况下`kube-prometheus` 不会捕捉`monitoring`命名空间之外的服务. 如果您的kepler部署在`monitoring`空间之外[请看考以下步骤](#scrape-all-namespaces).
默认情况下`kube-prometheus` 不会捕捉`monitoring`命名空间之外的服务. 如果您的kepler部署在`monitoring`空间之外[监控所有的命名空间](#scrape-all-namespaces).

```
```cmd
kubectl apply -n monitoring -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Expand Down Expand Up @@ -92,7 +92,7 @@ EOF
- 登陆[localhost:3000](http:localhost:3000)默认用户名/密码为`admin:admin`
- 倒入默认[dashboard](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/grafana-dashboards/Kepler-Exporter.json)

![](../fig/ocp_installation/kind_grafana.png)
![kind-grafana](../fig/ocp_installation/kind_grafana.png)

### 卸载operator
通过以下命令卸载:
Expand All @@ -104,7 +104,7 @@ make undeploy

## 错误排查

### 监控所有的命名空间
### 监控所有的命名空间 <a name="scrape-all-namespaces"></a>

kube-prometheus默认不会监控所有的命名空间,这是由于RBAC控制的。
以下clusterrole `prometheus-k8s`的配置讲允许kube-prometheus监控所有命名空间。
Expand All @@ -130,7 +130,7 @@ PolicyRule:

```

- 在创建[local cluster](#run-a-kind-cluster-locally)定制prometheus,请参考
- 在创建[启动一个本地kind集群](#run-a-kind-cluster-locally)定制prometheus,请参考
kube-prometheus文档[Customizing Kube-Prometheus](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizing.md)

- 请确定您应用了[this jsonnet](https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/customizations/monitoring-all-namespaces.md)保证prometheus监控所有命名空间。
1 change: 0 additions & 1 deletion docs/installation/kepler.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ using `admin:admin`. Skip the window where Grafana asks to input a new password.

!!! note
To forward ports simply run:

```console
kubectl port-forward --address localhost -n kepler service/kepler-exporter 9102:9102 &
kubectl port-forward --address localhost -n monitoring service/prometheus-k8s 9090:9090 &
Expand Down