Replies: 4 comments 5 replies
-
THe case of kthGreatest== 0 it is considered in the next lines in kthGreatest but then not when accessing the data as the comparison is done against candidates instead of kgr.numNonZero in MatchHashesAndScoreQuery DocIdSetIterator function |
Beta Was this translation helpful? Give feedback.
-
I think your understanding is correct until the kthGreatest method.
In MatchHashesAndScoreQuery, we're iterating over the documents that match each hash: We're reporting each document to the counter by calling So if a document has zero hits, we would never see it, and we would never report it to the counter. That's at least my understanding of how it works. I'm happy to be wrong if it means we can make the search run faster :) |
Beta Was this translation helpful? Give feedback.
-
Also I did ad a log, and I thought that the countHits function in elastiknn-lucene/src/main/java/org/apache/lucene/search/MatchHashesAndScoreQuery.java |
Beta Was this translation helpful? Give feedback.
-
This was fixed in #720 , released in https://github.com/alexklibisz/elastiknn/releases/tag/8.15.0.1 |
Beta Was this translation helpful? Give feedback.
-
If I have well understood the way elastiknn processes the queries is the following: (please correct me if I'm wrong)
Let's assume K=4 (hash functions), L=10 (hash tables) and that the query vector is V. This implies that there are 10 groups of 4 planes to discriminate.
So first you compute for the 10 hash tables, a 4 bit vector (each bit corresponding if the vector to search is in the positive/negative part to each hash function) this gives, 10 values to query,( QVs ) ,each value having the form [num_hash,vector]
Then for each shard,
new ArrayHitCounter(reader.maxDoc()
)In this last step you use the kthGreatest(candidates) to get the top(N) documents. To solve this, You generate a table that for each document you can get the number of hits. and you determine the minimum number of hits needed to be on the top(N) The problem, is that this minimum can be zero, so you are reporting documents that have no hits at all.
I think that a minimum of one hit should be required for a document to be selected.
I would change this line from kthGreatest
while (kthGreatest > 0) {
to
while (kthGreatest > 1) {
I don't know if you agree ...
Beta Was this translation helpful? Give feedback.
All reactions