Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add knn result consistency test #14167

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

benwtrent
Copy link
Member

Inspired by some weird behavior I have seen, adding a consistency test.

I found that indeed, this fails over some seeds.

Frustratingly, the seeded failures do not seem to be repeatable. But, running

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestSeededKnnFloatVectorQuery.testRandomConsistency" -Dtests.iters=1000

Results in failures, though, not consistently. This seems to indicate some funky race condition.

Obviously, this shouldn't be merged until we figure out the consistency issue.

@mikemccand
Copy link
Member

Frustratingly, the seeded failures do not seem to be repeatable.

Hmm that is bad ... it means there is a test bug or test infra bug (separate from the scary bug this test is chasing!)?

Oh, maybe force SerialMergeScheduler to your RandomIndexWriter? Since CMS (Lucene's default and RIW will sometimes pick that) launches threads and we don't know how to determinize JVM's/OS's thread scheduling, that might explain the non-reproducibility? E.g.:

    IndexWriterConfig iwc = LuceneTestCase.newIndexWriterConfig(r, new MockAnalyzer(r)), true, r.nextBoolean();
    iwc.setMergeScheduler(new SerialMergeScheduler());
    RandomIndexWriter riw = new RandomIndexWriter(random(), dir, iwc);

or so?

@msokolov
Copy link
Contributor

As for the reproducibility problem, that may be caused by concurrent HNSW merging, which is nondeterministic.

@benwtrent
Copy link
Member Author

@msokolov @mikemccand maybe the consistency I am testing isn't clear.

First: Index a bunch of vectors
Second: do a single query on a static index to get the top-k
Repeat-N: verify the exact same query on the exact same index without changes results in the same docs and scores.

I am not sure any merging or indexing time changes would effect this no?

@msokolov
Copy link
Contributor

I think our comments relate to the observation that the test does not reproducibly fail with the same seed

@benwtrent
Copy link
Member Author

I think our comments relate to the observation that the test does not reproducibly fail with the same seed

🤦 for sure. Let me see if I can shore it up.

@benwtrent
Copy link
Member Author

OK, I cleaned it all up, and have two separate tests, one for multi-threaded one for single threaded.

The multi-threaded one is the only one that fails periodically, which explains the difficulty in replicating. Threads might be racing to explore their segments first and thus stop exploring other graphs sooner than other runs.

As for the single-threaded, I haven't had it fail in 10s of thousands of runs. Which doesn't 100% mean there isn't an issue there as well. I just haven't had a failure yet.

@benwtrent
Copy link
Member Author

OK, if I change to never use MultiLeafKnnCollector, the multi-threaded consistency test passes. But with using that collector, it will fail a couple times over 10k+ repeats.

@mayya-sharipova
Copy link
Contributor

@benwtrent Thanks for raising this, this indeed happens because of MultiLeafKnnCollector and search threads exchanging info of the globally collected results. Because it is not deterministic when each segment thread shares info with the global queue, we may get inconsistent results between runs.

So far, I could not find a way to make it deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants