-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searchable.search doesn't return keyword's position information #224
Comments
Hmmm, yeah interesting issue. The search integration aims to return entities that have associated content that match the search terms. That said, I can see how this would be useful. Perhaps the searchContent endpoint should return a resultset that links to the entity and also supplies additional information about the match including pageNumber, text position, relevancy and so on. |
@lmtoo you are using elasticsearch, correct? |
hi, @paulcwarren I remove spring-content's elasticsearch module and implement the similar feature.
TextExtractor like this : `interface TextExtractor {
}` extract method will return page's words , each element in this list as a page's words each page's words map to a DocumentPage instance ,which have contentId 、 pageNumber and pageContent |
I see. So you have a custom solution for the page numbers part of it then. That makes sense because, to the best of my knowledge, neither elasticsearch or solr can provide page number information. The closest feature they offer is term vectors (for position) and highlighting for marked up abstracts. Even then I don't think solrj (the client API we use) supports term vectors. Plus I have little to no experience about how accurate the position information is that you get back from extracted text then applied to the original document content. That said, I am definitely happy to extend spring content fulltext modules to support both term vectors and highlighting and then we can see if there is a customization for supporting page numbers but I can't think how to do that cleanly atm. Whilst there I will have a go at tackling your previous issue #223 too. |
So, here is where we are at with this one. Spring Content Solr, Elasticsearrch and REST all now support custom search types allowing you to define your own result type to be returned from a I would like to understand you solution more though to see how we progress from here. If I understand your solution it sounds like you have one DocumentPage for each page of a document. The page's content is associated with that DocumentPage instance. Unclear to me if you still use searchContent to search that content, or not. Or if you do some other search against the word index directly? |
Searchable.search dosn't return keyword's position information, like pageNumber or text position
The text was updated successfully, but these errors were encountered: