Add knn result consistency test #14167

benwtrent · 2025-01-23T15:10:24Z

Inspired by some weird behavior I have seen, adding a consistency test.

I found that indeed, this fails over some seeds.

Frustratingly, the seeded failures do not seem to be repeatable. But, running

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestSeededKnnFloatVectorQuery.testRandomConsistency" -Dtests.iters=1000

Results in failures, though, not consistently. This seems to indicate some funky race condition.

Obviously, this shouldn't be merged until we figure out the consistency issue.

mikemccand · 2025-01-23T15:18:01Z

Frustratingly, the seeded failures do not seem to be repeatable.

Hmm that is bad ... it means there is a test bug or test infra bug (separate from the scary bug this test is chasing!)?

Oh, maybe force SerialMergeScheduler to your RandomIndexWriter? Since CMS (Lucene's default and RIW will sometimes pick that) launches threads and we don't know how to determinize JVM's/OS's thread scheduling, that might explain the non-reproducibility? E.g.:

    IndexWriterConfig iwc = LuceneTestCase.newIndexWriterConfig(r, new MockAnalyzer(r)), true, r.nextBoolean();
    iwc.setMergeScheduler(new SerialMergeScheduler());
    RandomIndexWriter riw = new RandomIndexWriter(random(), dir, iwc);

or so?

msokolov · 2025-01-23T15:32:06Z

As for the reproducibility problem, that may be caused by concurrent HNSW merging, which is nondeterministic.

benwtrent · 2025-01-23T15:36:45Z

@msokolov @mikemccand maybe the consistency I am testing isn't clear.

First: Index a bunch of vectors
Second: do a single query on a static index to get the top-k
Repeat-N: verify the exact same query on the exact same index without changes results in the same docs and scores.

I am not sure any merging or indexing time changes would effect this no?

msokolov · 2025-01-23T15:39:07Z

I think our comments relate to the observation that the test does not reproducibly fail with the same seed

benwtrent · 2025-01-23T15:41:15Z

I think our comments relate to the observation that the test does not reproducibly fail with the same seed

🤦 for sure. Let me see if I can shore it up.

benwtrent · 2025-01-23T16:46:16Z

OK, I cleaned it all up, and have two separate tests, one for multi-threaded one for single threaded.

The multi-threaded one is the only one that fails periodically, which explains the difficulty in replicating. Threads might be racing to explore their segments first and thus stop exploring other graphs sooner than other runs.

As for the single-threaded, I haven't had it fail in 10s of thousands of runs. Which doesn't 100% mean there isn't an issue there as well. I just haven't had a failure yet.

benwtrent · 2025-01-23T17:20:36Z

OK, if I change to never use MultiLeafKnnCollector, the multi-threaded consistency test passes. But with using that collector, it will fail a couple times over 10k+ repeats.

mayya-sharipova · 2025-01-23T20:46:31Z

@benwtrent Thanks for raising this, this indeed happens because of MultiLeafKnnCollector and search threads exchanging info of the globally collected results. Because it is not deterministic when each segment thread shares info with the global queue, we may get inconsistent results between runs.

So far, I could not find a way to make it deterministic.

Add knn result consistency test

ca22de6

iter

2c19484

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add knn result consistency test #14167

Add knn result consistency test #14167

benwtrent commented Jan 23, 2025

mikemccand commented Jan 23, 2025

msokolov commented Jan 23, 2025

benwtrent commented Jan 23, 2025

msokolov commented Jan 23, 2025

benwtrent commented Jan 23, 2025

benwtrent commented Jan 23, 2025

benwtrent commented Jan 23, 2025

mayya-sharipova commented Jan 23, 2025

Add knn result consistency test #14167

Are you sure you want to change the base?

Add knn result consistency test #14167

Conversation

benwtrent commented Jan 23, 2025

mikemccand commented Jan 23, 2025

msokolov commented Jan 23, 2025

benwtrent commented Jan 23, 2025

msokolov commented Jan 23, 2025

benwtrent commented Jan 23, 2025

benwtrent commented Jan 23, 2025

benwtrent commented Jan 23, 2025

mayya-sharipova commented Jan 23, 2025