-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev minor #1574
Dev minor #1574
Conversation
* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search * add migration script
* make the summary change non-breaking * rollbk
* tweak downgrade * fix js sdk * fix js sdk * fix upgrade logic * up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 116c7e0 in 1 minute and 29 seconds
More details
- Looked at
3329
lines of code in58
files - Skipped
2
files when reviewing. - Skipped posting
8
drafted comments based on config settings.
1. py/core/providers/embeddings/litellm.py:217
- Draft comment:
Consider usingaiohttp
for asynchronous HTTP requests in thearerank
method to avoid blocking the event loop. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is redundant because thearerank
method already usesaiohttp
for asynchronous HTTP requests. The suggestion to useaiohttp
is not needed as the implementation already follows this practice.
I might be missing some subtlety in the comment, such as a specific aspect ofaiohttp
usage that could be improved. However, the comment seems to be a general suggestion rather than a specific improvement.
The comment does not specify any particular improvement or issue with the currentaiohttp
usage, so it seems unnecessary.
The comment should be deleted because it suggests something that is already implemented, making it redundant.
2. py/core/providers/embeddings/litellm.py:297
- Draft comment:
Consider handling specific exceptions related to HTTP requests, such asaiohttp.ClientResponseError
, to provide more precise error handling. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. py/core/providers/embeddings/ollama.py:185
- Draft comment:
Thererank
andarerank
methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required. - Reason this comment was not posted:
Confidence changes required:50%
Inpy/core/providers/embeddings/ollama.py
, thererank
andarerank
methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.
4. py/core/providers/embeddings/openai.py:225
- Draft comment:
Thererank
andarerank
methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required. - Reason this comment was not posted:
Confidence changes required:50%
Inpy/core/providers/embeddings/openai.py
, thererank
andarerank
methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.
5. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:87
- Draft comment:
Consider parameterizing the file namedocument_summaries.json
to allow flexibility in specifying different file paths or names. - Reason this comment was not posted:
Confidence changes required:40%
Inpy/migrations/versions/2fac23e4d91b_migrate_to_document_search.py
, theasync_generate_all_summaries
function uses a hardcoded file namedocument_summaries.json
. This could be parameterized for flexibility.
6. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:122
- Draft comment:
Consider parameterizing the limit for fetching document chunks to allow flexibility in processing different batch sizes. - Reason this comment was not posted:
Confidence changes required:40%
Inpy/migrations/versions/2fac23e4d91b_migrate_to_document_search.py
, theasync_generate_all_summaries
function uses a hardcoded limit of 10 for fetching document chunks. This could be parameterized for flexibility.
7. py/migrations/versions/d342e632358a_migrate_to_asyncpg.py:154
- Draft comment:
Ensure that the old table name is correctly specified. It currently defaults to the project name, which might not be accurate in all cases. - Reason this comment was not posted:
Comment did not seem useful.
8. py/sdk/mixins/ingestion.py:41
- Draft comment:
The outerExitStack
is unnecessary since anotherExitStack
is opened inside. Consider removing the outerExitStack
. - Reason this comment was not posted:
Confidence changes required:50%
Inpy/sdk/mixins/ingestion.py
, theingest_files
method has a nestedExitStack
which is unnecessary. The outerExitStack
can be removed.
Workflow ID: wflow_0D81OfcQhgHplKbJ
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
This pull request enhances search capabilities, updates database schemas, and improves SDK and provider functionalities for better document handling and search operations.
SearchSettings
to replaceVectorSearchSettings
andDocumentSearchSettings
.searchDocuments
inr2rClient.ts
to usevector_search_settings
.search_documents
method inPostgresDocumentHandler
for semantic, full-text, and hybrid search.summary
andsummary_embedding
columns todocument_info
table inmigrate_to_document_search.py
.doc_search_vector
for full-text search inmigrate_to_document_search.py
.migrate_to_asyncpg.py
.RetrievalMixins
andIngestionMixins
to support new search settings and document ingestion.DocumentOverviewResponse
to includesummary
.LiteLLMEmbeddingProvider
andOllamaEmbeddingProvider
to support reranking.LiteLLMEmbeddingProvider
.pyproject.toml
with new dependencies.package.json
to0.3.16
.This description was created by for 116c7e0. It will automatically update as commits are pushed.