Dev minor #1574

emrgnt-cmplxty · 2024-11-11T23:25:26Z

Important

This pull request enhances search capabilities, updates database schemas, and improves SDK and provider functionalities for better document handling and search operations.

Search Enhancements:
- Introduced SearchSettings to replace VectorSearchSettings and DocumentSearchSettings.
- Updated searchDocuments in r2rClient.ts to use vector_search_settings.
- Added search_documents method in PostgresDocumentHandler for semantic, full-text, and hybrid search.
Database and Schema Updates:
- Added summary and summary_embedding columns to document_info table in migrate_to_document_search.py.
- Created doc_search_vector for full-text search in migrate_to_document_search.py.
- Renamed tables and added vector support in migrate_to_asyncpg.py.
SDK and API Changes:
- Updated RetrievalMixins and IngestionMixins to support new search settings and document ingestion.
- Modified DocumentOverviewResponse to include summary.
Provider Updates:
- Enhanced LiteLLMEmbeddingProvider and OllamaEmbeddingProvider to support reranking.
- Added async reranking in LiteLLMEmbeddingProvider.
Miscellaneous:
- Updated pyproject.toml with new dependencies.
- Incremented version in package.json to 0.3.16.

^{This description was created by}^{for 116c7e0. It will automatically update as commits are pushed.}

* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search

vercel · 2024-11-11T23:26:58Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
yc_demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 12, 2024 8:52pm
yc-demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 12, 2024 8:52pm

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
recommendation_platform	⬜️ Ignored (Inspect)			Nov 12, 2024 8:52pm

* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search * add migration script

* make the summary change non-breaking * rollbk

* tweak downgrade * fix js sdk * fix js sdk * fix upgrade logic * up

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 116c7e0 in 1 minute and 29 seconds

More details

Looked at 3329 lines of code in 58 files
Skipped 2 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. py/core/providers/embeddings/litellm.py:217

Draft comment:
Consider using aiohttp for asynchronous HTTP requests in the arerank method to avoid blocking the event loop.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is redundant because the arerank method already uses aiohttp for asynchronous HTTP requests. The suggestion to use aiohttp is not needed as the implementation already follows this practice.
I might be missing some subtlety in the comment, such as a specific aspect of aiohttp usage that could be improved. However, the comment seems to be a general suggestion rather than a specific improvement.
The comment does not specify any particular improvement or issue with the current aiohttp usage, so it seems unnecessary.
The comment should be deleted because it suggests something that is already implemented, making it redundant.

2. py/core/providers/embeddings/litellm.py:297

Draft comment:
Consider handling specific exceptions related to HTTP requests, such as aiohttp.ClientResponseError, to provide more precise error handling.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. py/core/providers/embeddings/ollama.py:185

Draft comment:
The rerank and arerank methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required.
Reason this comment was not posted:
Confidence changes required: 50%
In py/core/providers/embeddings/ollama.py, the rerank and arerank methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.

4. py/core/providers/embeddings/openai.py:225

Draft comment:
The rerank and arerank methods currently return the input results without modification. Ensure this is the intended behavior or implement the necessary logic if reranking is required.
Reason this comment was not posted:
Confidence changes required: 50%
In py/core/providers/embeddings/openai.py, the rerank and arerank methods are implemented but do not perform any operations. This might be intentional, but it's worth noting in case functionality is expected.

5. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:87

Draft comment:
Consider parameterizing the file name document_summaries.json to allow flexibility in specifying different file paths or names.
Reason this comment was not posted:
Confidence changes required: 40%
In py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py, the async_generate_all_summaries function uses a hardcoded file name document_summaries.json. This could be parameterized for flexibility.

6. py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py:122

Draft comment:
Consider parameterizing the limit for fetching document chunks to allow flexibility in processing different batch sizes.
Reason this comment was not posted:
Confidence changes required: 40%
In py/migrations/versions/2fac23e4d91b_migrate_to_document_search.py, the async_generate_all_summaries function uses a hardcoded limit of 10 for fetching document chunks. This could be parameterized for flexibility.

7. py/migrations/versions/d342e632358a_migrate_to_asyncpg.py:154

Draft comment:
Ensure that the old table name is correctly specified. It currently defaults to the project name, which might not be accurate in all cases.
Reason this comment was not posted:
Comment did not seem useful.

8. py/sdk/mixins/ingestion.py:41

Draft comment:
The outer ExitStack is unnecessary since another ExitStack is opened inside. Consider removing the outer ExitStack.
Reason this comment was not posted:
Confidence changes required: 50%
In py/sdk/mixins/ingestion.py, the ingest_files method has a nested ExitStack which is unnecessary. The outer ExitStack can be removed.

Workflow ID: wflow_0D81OfcQhgHplKbJ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

emrgnt-cmplxty and others added 3 commits November 11, 2024 11:21

add add-hoc rerank implementation to embedding, add async rerank (#1572)

6ee6d9e

add HF defaults

1bc3cee

Feature/add document summary to ingestion (#1573)

c3a0273

* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search

vercel bot deployed to Preview – yc_demo November 11, 2024 23:26 View deployment

vercel bot deployed to Preview – yc-demo November 11, 2024 23:27 View deployment

Feature/add document summary to ingestion (#1575)

960543d

* adds document summary to ingestion pipeline * cleanup impl * new hybrid document search * implement hybrid document search * add migration script

vercel bot deployed to Preview – yc-demo November 12, 2024 01:40 View deployment

vercel bot deployed to Preview – yc_demo November 12, 2024 01:41 View deployment

make the summary change non-breaking (#1576)

a8e05de

* make the summary change non-breaking * rollbk

vercel bot deployed to Preview – yc_demo November 12, 2024 03:30 View deployment

vercel bot deployed to Preview – yc-demo November 12, 2024 03:30 View deployment

up

a7bed5c

vercel bot deployed to Preview – yc-demo November 12, 2024 03:42 View deployment

vercel bot deployed to Preview – yc_demo November 12, 2024 03:43 View deployment

Feature/tweak downgrade logic (#1577)

116c7e0

* tweak downgrade * fix js sdk * fix js sdk * fix upgrade logic * up

vercel bot deployed to Preview – yc-demo November 12, 2024 20:51 View deployment

vercel bot deployed to Preview – yc_demo November 12, 2024 20:52 View deployment

emrgnt-cmplxty marked this pull request as ready for review November 13, 2024 02:15

emrgnt-cmplxty merged commit 2378f58 into main Nov 13, 2024
22 of 23 checks passed

ellipsis-dev bot reviewed Nov 13, 2024

View reviewed changes

emrgnt-cmplxty deleted the dev-minor branch December 5, 2024 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev minor #1574

Dev minor #1574

emrgnt-cmplxty commented Nov 11, 2024 •

edited by ellipsis-dev bot

Loading

vercel bot commented Nov 11, 2024 •

edited

Loading

ellipsis-dev bot left a comment

Dev minor #1574

Dev minor #1574

Conversation

emrgnt-cmplxty commented Nov 11, 2024 • edited by ellipsis-dev bot Loading

vercel bot commented Nov 11, 2024 • edited Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

emrgnt-cmplxty commented Nov 11, 2024 •

edited by ellipsis-dev bot

Loading

vercel bot commented Nov 11, 2024 •

edited

Loading