Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging guidelines #182

Merged
merged 3 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Our community meeting is weekly at Th 10AM PDT; [zoom link here](https://zoom.us

We currently utilize the [#wg-serving](https://kubernetes.slack.com/?redir=%2Fmessages%2Fwg-serving) slack channel for communications.

Contributions are readily welcomed, thanks for joining us!
Contributions are readily welcomed, follow the [dev guide](./docs/dev.md) to start contributing!

### Code of conduct

Expand Down
67 changes: 67 additions & 0 deletions docs/dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<!-- Dev guide -->


## Logging

### Change log verbosity
We use the `k8s.io/klog/v2` package to manage logging.

We generally follow the [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md), which states "the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4)".

To configure logging verbosity, specify the `v` flag such as `--v=2`.

### Add logs

The [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) has the following definitions:

* `klog.V(0).InfoS` = `klog.InfoS` - Generally useful for this to **always** be visible to a cluster operator
* `klog.V(1).InfoS` - A reasonable default log level if you don't want verbosity.
* `klog.V(2).InfoS` - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems.
* `klog.V(3).InfoS` - Extended information about changes
* `klog.V(4).InfoS` - Debug level verbosity
* `klog.V(5).InfoS` - Trace level verbosity

We thus define the following constants to help choosing the level:
```
const(
REQUIRED=0
DEFAULT_MINIMAL=1
DEFAULT=2
VERBOSE=3
DEBUG=4
TRACE=5
liu-cong marked this conversation as resolved.
Show resolved Hide resolved
)
```

liu-cong marked this conversation as resolved.
Show resolved Hide resolved
The guidelines are written in the context of a k8s controller. Our [ext-proc](../pkg/ext-proc/) does more things such as handling requests and scraping metrics, therefore we adapt the guidelines as follows:

1. The server startup process. Logging at the default verbosity level is generally welcome here as this is only logged once at startup, and provides useful info for debugging.
liu-cong marked this conversation as resolved.
Show resolved Hide resolved
* `klog.V(REQUIRED).InfoS` = `klog.InfoS` Default, log things such as startup flags

2. Reconciler loops. The reconciler loops watch for CR changes such as the `InferenceModel` CR. And given changes in these CRs significantly affect the behavior of the extension, we recommend using v=1 verbosity level as default, and sparsely use higher verbosity levels.

* `klog.V(DEFAULT_MINIMAL).InfoS`
* Default log level in the reconcilers.
* Information about config (listening on X, watching Y)
* Errors that repeat frequently that relate to conditions that can be corrected (e.g., inference model not initialized yet)
* `klog.V(DEFAULT).InfoS`
* System state changing (adding/removing objects in the data store)
* `V(VERBOSE)` and above: Use your best judgement.

3. Inference request handling. These requests are expected to be much higher volume than the control flow in the reconcilers and therefore we should be mindful of log spamming. We recommend using v=2 to log important info about a request, such as the HTTP response code, and higher verbosity levels for less important info.

* `klog.V(DEFAULT).InfoS`
* Logging HTTP requests and their exit code
liu-cong marked this conversation as resolved.
Show resolved Hide resolved
* Important decision making such as picking the target model, target pod
* `klog.V(VERBOSE).InfoS`
* Detailed request scheduling algorithm operations, such as running the filtering logic
* `V(DEBUG)` and above: Use your best judgement.

4. Metric scraping loops. These loops run at a very high frequency, and logs can be very spammy if not handled properly.
* `klog.V(DEBUG).InfoS`
* Transient errors/warnings, such as failure to get response from a pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be logged as error so that it shows up clearly in the logs for the platform admin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop runs very frequently (current default is 20ms). Even some transient error will create a spam. So I am suggesting using DEBUG for any logging here.

I think to solve the error visibility problem, we can have that in the debug loop under the "Misc" section below, which runs every 5s. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can discuss on the specific implementation PRs.

* `klog.V(TRACE).InfoS` - For other logs.

5. Misc
* `klog.V(VERBOSE).InfoS`
* A periodically (every 5s) printed debug message with the current pods and metrics. This is very important to debug the request scheduling algorithm, and yet not spammy compared to the metric scraping loop logs.