Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev minor #1574

Merged
merged 7 commits into from
Nov 13, 2024
Merged

Dev minor #1574

merged 7 commits into from
Nov 13, 2024

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Nov 11, 2024

Important

This pull request enhances search capabilities, updates database schemas, and improves SDK and provider functionalities for better document handling and search operations.

  • Search Enhancements:
    • Introduced SearchSettings to replace VectorSearchSettings and DocumentSearchSettings.
    • Updated searchDocuments in r2rClient.ts to use vector_search_settings.
    • Added search_documents method in PostgresDocumentHandler for semantic, full-text, and hybrid search.
  • Database and Schema Updates:
    • Added summary and summary_embedding columns to document_info table in migrate_to_document_search.py.
    • Created doc_search_vector for full-text search in migrate_to_document_search.py.
    • Renamed tables and added vector support in migrate_to_asyncpg.py.
  • SDK and API Changes:
    • Updated RetrievalMixins and IngestionMixins to support new search settings and document ingestion.
    • Modified DocumentOverviewResponse to include summary.
  • Provider Updates:
    • Enhanced LiteLLMEmbeddingProvider and OllamaEmbeddingProvider to support reranking.
    • Added async reranking in LiteLLMEmbeddingProvider.
  • Miscellaneous:
    • Updated pyproject.toml with new dependencies.
    • Incremented version in package.json to 0.3.16.

This description was created by Ellipsis for 116c7e0. It will automatically update as commits are pushed.

emrgnt-cmplxty and others added 3 commits November 11, 2024 11:21
* adds document summary to ingestion pipeline

* cleanup impl

* new hybrid document search

* implement hybrid document search
Copy link

vercel bot commented Nov 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
yc_demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 8:52pm
yc-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 8:52pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
recommendation_platform ⬜️ Ignored (Inspect) Nov 12, 2024 8:52pm

* adds document summary to ingestion pipeline

* cleanup impl

* new hybrid document search

* implement hybrid document search

* add migration script
* make the summary change non-breaking

* rollbk
* tweak downgrade

* fix js sdk

* fix js sdk

* fix upgrade logic

* up
@emrgnt-cmplxty emrgnt-cmplxty marked this pull request as ready for review November 13, 2024 02:15
@emrgnt-cmplxty emrgnt-cmplxty merged commit 2378f58 into main Nov 13, 2024
22 of 23 checks passed
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 116c7e0 in 1 minute and 29 seconds

More details
  • Looked at 3329 lines of code in 58 files
  • Skipped 2 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. py/core/providers/embeddings/litellm.py:217
  • Draft comment:
    Consider using aiohttp for asynchronous HTTP requests in the arerank method to avoid blocking the event loop.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is redundant because the arerank method already uses aiohttp for asynchronous HTTP requests. The suggestion to use aiohttp is not needed as the implementation already follows this practice.
    I might be missing some subtlety in the comment, such as a specific aspect of aiohttp usage that could be improved. However, the comment seems to be a general suggestion rather than a specific improvement.
    The comment does not specify any particular improvement or issue with the current aiohttp usage, so it seems unnecessary.
    The comment should be deleted because it suggests something that is already implemented, making it redundant.
2. py/core/providers/embeddings/litellm.py:297
  • Draft comment:
    Consider handling specific exceptions related to HTTP requests, such as aiohttp.ClientResponseError, to provide more precise error handling.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. py/core/providers/embeddings/ollama.py:185
  • Draft comment:
    The rerank and arerank methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In py/core/providers/embeddings/ollama.py, the rerank and arerank methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.
4. py/core/providers/embeddings/openai.py:225
  • Draft comment:
    The rerank and arerank methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In py/core/providers/embeddings/openai.py, the rerank and arerank methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.
5. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:87
  • Draft comment:
    Consider parameterizing the file name document_summaries.json to allow flexibility in specifying different file paths or names.
  • Reason this comment was not posted:
    Confidence changes required: 40%
    In py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py, the async_generate_all_summaries function uses a hardcoded file name document_summaries.json. This could be parameterized for flexibility.
6. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:122
  • Draft comment:
    Consider parameterizing the limit for fetching document chunks to allow flexibility in processing different batch sizes.
  • Reason this comment was not posted:
    Confidence changes required: 40%
    In py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py, the async_generate_all_summaries function uses a hardcoded limit of 10 for fetching document chunks. This could be parameterized for flexibility.
7. py/migrations/versions/d342e632358a_migrate_to_asyncpg.py:154
  • Draft comment:
    Ensure that the old table name is correctly specified. It currently defaults to the project name, which might not be accurate in all cases.
  • Reason this comment was not posted:
    Comment did not seem useful.
8. py/sdk/mixins/ingestion.py:41
  • Draft comment:
    The outer ExitStack is unnecessary since another ExitStack is opened inside. Consider removing the outer ExitStack.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In py/sdk/mixins/ingestion.py, the ingest_files method has a nested ExitStack which is unnecessary. The outer ExitStack can be removed.

Workflow ID: wflow_0D81OfcQhgHplKbJ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@emrgnt-cmplxty emrgnt-cmplxty deleted the dev-minor branch December 5, 2024 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant