-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset-selectivity performance regression? #129
Comments
https://conbench.ursa.dev/benchmarks/4fe411bf67a94bc6aa9787fc0394bd03/ does not show a lot of history. It goes back only until 22-12-28. Manual history extension: https://conbench.ursa.dev/benchmarks/db5ae59ff5944b2180dc73e2c35e2c43/ |
Data points from 22-12-22: Summarized observations:
|
Returned back to normal in subsequent run:
|
Notably, this was executed on |
Yeah, this is a circumstance that we have observed in a few places even with dedicated runners, we have some benchmark cases that have blips like this that look like regressions initially, but go away on the next run. I was talking to Austin about this the other day and wrote up (some of) what we talked about in conbench/conbench#572 which is a large(r) project, but one I think would be good to spec out and see how much it would take to add. Of course, like I mention there: that project should not preclude looking at this benchmark code to see if there's something we can do to make it more reliable. One question about the disk I/O interference: I thought you crafted this benchmark so that it was using ram disks to prevent something like that from happening (but I will admit I didn't follow super closely while I was out, so I might be wrong about that), which should make disk I/O not super important, yeah? |
Cool, replied there!
Of course, and instead of 'code' I'd more generally say 'method'. Careful thought about what a benchmark is even supposed to measure and whether or not it makes sense to have that affected by disk I/O is I suppose one of the most influential approaches to fight instability.
I did that in 'my' benchmark |
AH, right right of course I got those wires crossed, sorry |
I have seen something in conbench.ursa.dev that I would love to use as an example scenario: do we have a performance regression, or do we maybe have a methodological weakness?
https://conbench.ursa.dev/benchmarks/4fe411bf67a94bc6aa9787fc0394bd03/
That is, around
2023-01-05 07:49
it was measured with apache/arrow@e5ec942 that benchmarkdataset-selectivity
with case permutation10%, nyctaxi_multi_parquet_s3
took almost two seconds in each of three iterations:[1.951955, 1.846497, 1.891674]
.In the previous 1-2 weeks it took ~1.2 seconds:
The text was updated successfully, but these errors were encountered: