-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track and report unpack performance #3610
Conversation
This is a thought experiment, based off our earlier discussion of tracking an asynchronous unpacker. I thought it would be fun to see some information on tarball compression ratios, as well as tracking the min/max unpack time in accessible metadata rather than just in logs. I need to jog back to Horreum, but before diving back into the pool of Java, I took a recreational break... So I added a simple `server.unpack-perf` metadata, which is a JSON block like `{"min": <seconds>, "max": <seconds>, "count": <unpack_count>}`, and then played with the report generator to get some statistics. The sample below is for a `runlocal`, with a few small-ish tarballs. The big catch in deploying this would be that none of the existing datasets will have `server.unpack-perf` until they're unpacked again, which definitely reduces the usefulness of this thought experiment. Nevertheless, I figured I might as well post it for consideration. ``` Cache report: 5 datasets currently unpacked, consuming 51.7 MB 8 datasets have been unpacked a total of 15 times The least recently used cache was referenced today, pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 The most recently used cache was referenced today, uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 The smallest cache is 4.1 kB, nometadata The biggest cache is 19.6 MB, trafficgen_basic-forwarding-example_tg:trex-profile_pf:forwarding_test.json_ml:5_tt:bs__2019-08-27T14:58:38 The worst compression ratio is 22.156%, uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 The best compression ratio is 96.834%, pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 The fastest cache unpack is 0.013 seconds, nometadata The slowest cache unpack is 0.078 seconds, trafficgen_basic-forwarding-example_tg:trex-profile_pf:forwarding_test.json_ml:5_tt:bs__2019-08-27T14:58:38 The fastest cache unpack streaming rate is 253.666 Mb/second, trafficgen_basic-forwarding-example_tg:trex-profile_pf:forwarding_test.json_ml:5_tt:bs__2019-08-27T14:58:38 The slowest cache unpack streaming rate is 0.133 Mb/second, nometadata ```
The `pbench-tree-manage` utility supports a deep `ARCHIVE` tree display and also managed the periodic background cache reclamation. Curious after we saw a failed reclaim, I wanted to play with it a bit and realized that it does a full (`search`) discovery unconditionally. First, even with `--display` (which now that we have a proper report generator is rarely necessary) we probably can use the faster SQL discovery most of the time, although I added an option to select the slower `--search` discovery. More importantly, though, the cache reclaimer doesn't need a fully discovered cache manager since it takes the short-cut of examining the `/srv/pbench/cache` tree directly: so we can move the discovery into the `--display` path.
Also a few minor corrections identified during ops review.
I thought this morning about adding a CLI audit tool to query the audit log. While I wasn't quite motivated enough to write it, it occurred to me to at least cobble up a simple set of audit log statistics while eating breakfast. So here 'tis.
Well isn't that cute: an expired SSL cert trying to copy the IT CA cert!
And ... I can't log in to Jenkins to restart the build (just in case), because it seems to just ignore the login. Which might conceivably be related ... I suppose it's telling me to "enjoy my weekend and get off the computer"... |
Some cleanup and review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, although I do have one wrinkle for your consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I added a simple
server.unpack-perf
metadata, which is a JSON block like{"min": <seconds>, "max": <seconds>, "count": <unpack_count>}
, and then played with the report generator to get some statistics.I also wrote a report of the
Audit
table contents to summarize the operations, statuses, and users involved in the Pbench Server.The sample below is for a
runlocal
, with a few small-ish tarballs. The big catch in deploying this would be that none of the existing datasets will haveserver.unpack-perf
until they're unpacked again, which somewhat reduces the value of the statistics until they get unpacked again (e.g., for TOC or visualize).Nevertheless, I figured I might as well post it for consideration. Some of the statistics (and how they're calculated and/or represented) are no doubt arguable; but I enjoyed seeing the numbers anyway. 😆