You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In NgramPostingLists we are taking the union of the results for the different ngrams. This means that if the query is changed from "atlantis" to "an atlantis" the number of documents returned is much larger.
Do we want this?
The text was updated successfully, but these errors were encountered:
Good point. I think we should still take the union, because if we take the intersection then we might start dropping documents which are really great matches, but don't contain every single n-gram from the query.
For example, suppose you have:
query = "what kind of company is google"
doc1 = "google is a technology company"
Whilst doc1 is a great match, it would be dropped for not matching every single n-gram from the query.
I realize that taking the union means that we will end up getting more results for longer queries, but I suppose it's probably better not to drop potentially very relevant results. I guess, taking the intersection might give better precision, but taking the union will almost certainly give better recall. In the precision/recall trade-off, I suppose we probably want recall more (since our answer-extraction component will take care of finding the best answer amongst the top results, hopefully).
In NgramPostingLists we are taking the union of the results for the different ngrams. This means that if the query is changed from "atlantis" to "an atlantis" the number of documents returned is much larger.
Do we want this?
The text was updated successfully, but these errors were encountered: