-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add knn result consistency test #14167
base: main
Are you sure you want to change the base?
Conversation
Hmm that is bad ... it means there is a test bug or test infra bug (separate from the scary bug this test is chasing!)? Oh, maybe force
or so? |
As for the reproducibility problem, that may be caused by concurrent HNSW merging, which is nondeterministic. |
@msokolov @mikemccand maybe the consistency I am testing isn't clear. First: Index a bunch of vectors I am not sure any merging or indexing time changes would effect this no? |
I think our comments relate to the observation that the test does not reproducibly fail with the same seed |
🤦 for sure. Let me see if I can shore it up. |
OK, I cleaned it all up, and have two separate tests, one for multi-threaded one for single threaded. The multi-threaded one is the only one that fails periodically, which explains the difficulty in replicating. Threads might be racing to explore their segments first and thus stop exploring other graphs sooner than other runs. As for the single-threaded, I haven't had it fail in 10s of thousands of runs. Which doesn't 100% mean there isn't an issue there as well. I just haven't had a failure yet. |
OK, if I change to never use |
@benwtrent Thanks for raising this, this indeed happens because of MultiLeafKnnCollector and search threads exchanging info of the globally collected results. Because it is not deterministic when each segment thread shares info with the global queue, we may get inconsistent results between runs. So far, I could not find a way to make it deterministic. |
Inspired by some weird behavior I have seen, adding a consistency test.
I found that indeed, this fails over some seeds.
Frustratingly, the seeded failures do not seem to be repeatable. But, running
Results in failures, though, not consistently. This seems to indicate some funky race condition.
Obviously, this shouldn't be merged until we figure out the consistency issue.