dev-minor (#1509)

* adding bin support and make it default (#1508) * Feature/tweak actions (#1507) * up * tweak actions * adding bin sup and making it default * tested and vetted * up (#1510) * up * set verification to default false * cleanup (#1512) * cleanup * cleanup prompt mgmt * up * cleanup printout * cleanup new parser logic, set vlm as default for all providers * allow user to re-override * modify exp backoff implementation (#1513) * Feature/tweak actions (#1507) * up * tweak actions * modify exp backoff impl --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> * Patch/touchups (#1515) * cleanup * cleanup prompt mgmt * up * cleanup printout * cleanup new parser logic, set vlm as default for all providers * allow user to re-override * add touchups * add extra parsers (#1516) * add extra parsers (#1518) * minor fixes (#1514) * Feature/add back ollama provider (#1522) * add extra parsers * add back ollama * rvert auth workflow * Feature/add prompt tests and cleanup (#1523) * add extra parsers * add prompt tests, cleanup * add prompt tests, cleanup * merge * set mock console as default * set mock console as default * fix config * Update community model (#1524) * Feature/tweak actions (#1507) * up * tweak actions * Sync JS SDK, Harmonize Python SDK KG Methods (#1511) * Feature/move logging (#1492) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) * Feature/dev rebased (#1500) * Feature/move logging (#1493) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * Update js package (#1498) * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * Fix handling for R2R exceptions (#1501) * fix doc test (#1502) * Harmonize python SDK KG methods for optional params, add missing JS methods --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Clean up pagination and offset around KG (#1519) * Move to R2R light for integration testing (#1521) * Update community model --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Patch/fix import bleed (#1526) * Feature/tweak actions (#1507) * up * tweak actions * Sync JS SDK, Harmonize Python SDK KG Methods (#1511) * Feature/move logging (#1492) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) * Feature/dev rebased (#1500) * Feature/move logging (#1493) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * Update js package (#1498) * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * Fix handling for R2R exceptions (#1501) * fix doc test (#1502) * Harmonize python SDK KG methods for optional params, add missing JS methods --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Clean up pagination and offset around KG (#1519) * Move to R2R light for integration testing (#1521) --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * Patch/fix import bleed (#1527) * Feature/tweak actions (#1507) * up * tweak actions * Sync JS SDK, Harmonize Python SDK KG Methods (#1511) * Feature/move logging (#1492) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) * Feature/dev rebased (#1500) * Feature/move logging (#1493) * move logging provider out * move logging provider to own directory, remove singleton * cleanup * Update js package (#1498) * fix refactoring tweak (#1496) * Fix JSON serialization and Prompt ID Bugs for Prompts (#1491) * Bug in get prompts * Add tests * Prevent verbose logging on standup * Remove kg as required key in config, await get_all_prompts * Remove reference to fragment id * comment out ingestion * complete logging port (#1499) --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * Fix handling for R2R exceptions (#1501) * fix doc test (#1502) * Harmonize python SDK KG methods for optional params, add missing JS methods --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Clean up pagination and offset around KG (#1519) * Move to R2R light for integration testing (#1521) * fix ollama pdf parser --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * email auth false and js bump * fix actions (#1528) * Feature/add poppler check and fallback (#1529) * fix actions * fallback * Patch/import shutil (#1530) * fix actions * fallback * import shutil * Feature/include basic pdf parsing everywhere (#1531) * fix actions * fallback * import shutil * add basic pdf as extra parser in all configs * Remove non existent user login? * attempt login * Change password back * add explicit setting, trigger rebuild --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
SciPhi-AI · Oct 30, 2024 · ddfe870 · ddfe870
1 parent a85b67e
commit ddfe870
Show file tree

Hide file tree

Showing 107 changed files with 2,109 additions and 1,569 deletions.
diff --git a/.github/actions/run-script-zerox-tests/action.yml b/.github/actions/run-script-zerox-tests/action.yml
diff --git a/.github/actions/run-sdk-prompt-management-tests/action.yml b/.github/actions/run-sdk-prompt-management-tests/action.yml
@@ -0,0 +1,42 @@
+name: 'Run SDK Prompt Management Tests'
+description: 'Runs SDK prompt management tests for R2R'
+runs:
+  using: "composite"
+  steps:
+    # First run basic prompt operations
+    - name: Add prompt test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_add_prompt
+
+    - name: Get prompt test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_get_prompt
+
+    - name: Get all prompts test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_get_all_prompts
+
+    - name: Update prompt test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_update_prompt
+
+    # Then run error handling and access control tests
+    - name: Prompt error handling test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_prompt_error_handling
+
+    - name: Prompt access control test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_prompt_access_control
+
+    # Finally run deletion test
+    - name: Delete prompt test (SDK)
+      working-directory: ./py
+      shell: bash
+      run: poetry run python tests/integration/runner_sdk.py test_delete_prompt
diff --git a/.github/workflows/r2r-full-integration-deep-dive-tests.yml b/.github/workflows/r2r-full-integration-deep-dive-tests.yml
@@ -36,6 +36,3 @@ jobs:
 
       - name: Start R2R Full server
         uses: ./.github/actions/start-r2r-full
-
-      - name: Run Test Zerox
-        uses: ./.github/actions/run-script-zerox-tests
diff --git a/.github/workflows/r2r-full-py-integration-tests-graphrag.yml b/.github/workflows/r2r-full-py-integration-tests-graphrag.yml
@@ -16,6 +16,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:

diff --git a/.github/workflows/r2r-full-py-integration-tests-mac-and-windows.yml b/.github/workflows/r2r-full-py-integration-tests-mac-and-windows.yml
@@ -6,6 +6,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:

diff --git a/.github/workflows/r2r-full-py-integration-tests.yml b/.github/workflows/r2r-full-py-integration-tests.yml
@@ -16,6 +16,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:
@@ -27,6 +28,7 @@ jobs:
           - sdk-retrieval
           - sdk-auth
           - sdk-collections
+          - sdk-prompts
     env:
       OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
       TELEMETRY_ENABLED: 'false'
@@ -56,29 +58,27 @@ jobs:
       - name: Run CLI Ingestion Tests
         if: matrix.test_category == 'cli-ingestion'
         uses: ./.github/actions/run-cli-ingestion-tests
-        continue-on-error: true
 
       - name: Run CLI Retrieval Tests
         if: matrix.test_category == 'cli-retrieval'
         uses: ./.github/actions/run-cli-retrieval-tests
-        continue-on-error: true
 
       - name: Run SDK Ingestion Tests
         if: matrix.test_category == 'sdk-ingestion'
         uses: ./.github/actions/run-sdk-ingestion-tests
-        continue-on-error: true
 
       - name: Run SDK Retrieval Tests
         if: matrix.test_category == 'sdk-retrieval'
         uses: ./.github/actions/run-sdk-retrieval-tests
-        continue-on-error: true
 
       - name: Run SDK Auth Tests
         if: matrix.test_category == 'sdk-auth'
         uses: ./.github/actions/run-sdk-auth-tests
-        continue-on-error: true
 
       - name: Run SDK Collections Tests
         if: matrix.test_category == 'sdk-collections'
         uses: ./.github/actions/run-sdk-collections-tests
-        continue-on-error: true
+
+      - name: Run SDK Prompt Tests
+        if: matrix.test_category == 'sdk-prompts'
+        uses: ./.github/actions/run-sdk-prompt-management-tests
diff --git a/.github/workflows/r2r-light-py-integration-tests-graphrag.yml b/.github/workflows/r2r-light-py-integration-tests-graphrag.yml
@@ -18,6 +18,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:

diff --git a/.github/workflows/r2r-light-py-integration-tests-mac-and-windows.yml b/.github/workflows/r2r-light-py-integration-tests-mac-and-windows.yml
@@ -8,6 +8,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:

diff --git a/.github/workflows/r2r-light-py-integration-tests.yml b/.github/workflows/r2r-light-py-integration-tests.yml
@@ -18,6 +18,7 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
+    continue-on-error: true
 
     strategy:
       matrix:
@@ -29,6 +30,7 @@ jobs:
           - sdk-retrieval
           - sdk-auth
           - sdk-collections
+          - sdk-prompts
     env:
       OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
       TELEMETRY_ENABLED: 'false'
@@ -59,29 +61,27 @@ jobs:
       - name: Run CLI Ingestion Tests
         if: matrix.test_category == 'cli-ingestion'
         uses: ./.github/actions/run-cli-ingestion-tests
-        continue-on-error: true
 
       - name: Run CLI Retrieval Tests
         if: matrix.test_category == 'cli-retrieval'
         uses: ./.github/actions/run-cli-retrieval-tests
-        continue-on-error: true
 
       - name: Run SDK Ingestion Tests
         if: matrix.test_category == 'sdk-ingestion'
         uses: ./.github/actions/run-sdk-ingestion-tests
-        continue-on-error: true
 
       - name: Run SDK Retrieval Tests
         if: matrix.test_category == 'sdk-retrieval'
         uses: ./.github/actions/run-sdk-retrieval-tests
-        continue-on-error: true
 
       - name: Run SDK Auth Tests
         if: matrix.test_category == 'sdk-auth'
         uses: ./.github/actions/run-sdk-auth-tests
-        continue-on-error: true
 
       - name: Run SDK Collections Tests
         if: matrix.test_category == 'sdk-collections'
         uses: ./.github/actions/run-sdk-collections-tests
-        continue-on-error: true
+
+      - name: Run SDK Prompt Tests
+        if: matrix.test_category == 'sdk-prompts'
+        uses: ./.github/actions/run-sdk-prompt-management-tests
diff --git a/docs/api-reference/openapi.json b/docs/api-reference/openapi.json
diff --git a/docs/cookbooks/graphrag.mdx b/docs/cookbooks/graphrag.mdx
@@ -99,9 +99,6 @@ excluded_parsers = ["mp4"]
     semantic_similarity_threshold = 0.7
     generation_config = { model = "openai/gpt-4o-mini" }
 
-  [ingestion.extra_parsers]
-    pdf = "zerox"
-
 [database]
 provider = "postgres"
 batch_size = 256
@@ -204,9 +201,6 @@ max_characters = 1_024
 combine_under_n_chars = 128
 overlap = 256
 
-    [ingestion.extra_parsers]
-    pdf = "zerox"
-
 [orchestration]
 provider = "hatchet"
 kg_creation_concurrency_lipmit = 32

diff --git a/js/sdk/package-lock.json b/js/sdk/package-lock.json
diff --git a/js/sdk/package.json b/js/sdk/package.json
@@ -1,6 +1,6 @@
 {
   "name": "r2r-js",
-  "version": "0.3.11",
+  "version": "0.3.12",
   "description": "",
   "main": "dist/index.js",
   "browser": "dist/index.browser.js",

diff --git a/py/cli/command_group.py b/py/cli/command_group.py
@@ -2,7 +2,7 @@
 from asyncclick import pass_context
 from asyncclick.exceptions import Exit
 
-from r2r import R2RAsyncClient
+from sdk import R2RAsyncClient
 
 
 @click.group()

diff --git a/py/cli/commands/ingestion.py b/py/cli/commands/ingestion.py
@@ -11,7 +11,7 @@
 from cli.command_group import cli
 from cli.utils.param_types import JSON
 from cli.utils.timer import timer
-from core.base.abstractions import IndexMeasure, IndexMethod, VectorTableName
+from shared.abstractions import IndexMeasure, IndexMethod, VectorTableName
 
 
 async def ingest_files_from_urls(client, urls):
@@ -243,6 +243,7 @@ async def create_vector_index(
     index_measure,
     index_arguments,
     index_name,
+    index_column,
     no_concurrent,
 ):
     """Create a vector index for similarity search."""
@@ -254,6 +255,7 @@ async def create_vector_index(
             index_measure=index_measure,
             index_arguments=index_arguments,
             index_name=index_name,
+            index_column=index_column,
             concurrently=not no_concurrent,
         )
     click.echo(json.dumps(response, indent=2))

diff --git a/py/core/__init__.py b/py/core/__init__.py
@@ -134,6 +134,9 @@
     # Crypto provider
     "CryptoConfig",
     "CryptoProvider",
+    # Email provider
+    "EmailConfig",
+    "EmailProvider",
     # Database providers
     "DatabaseConfig",
     "DatabaseProvider",
@@ -192,9 +195,9 @@
     "AudioParser",
     "DOCXParser",
     "ImageParser",
-    "PDFParser",
+    "VLMPDFParser",
+    "BasicPDFParser",
     "PDFParserUnstructured",
-    "PDFParserMarker",
     "PPTParser",
     # Structured parsers
     "CSVParser",
@@ -233,6 +236,7 @@
     # Embeddings
     "LiteLLMEmbeddingProvider",
     "OpenAIEmbeddingProvider",
+    "OllamaEmbeddingProvider",
     # LLM
     "OpenAICompletionProvider",
     "LiteLLMCompletionProvider",

diff --git a/py/core/base/__init__.py b/py/core/base/__init__.py
@@ -106,6 +106,9 @@
     # Crypto provider
     "CryptoConfig",
     "CryptoProvider",
+    # Email provider
+    "EmailConfig",
+    "EmailProvider",
     # Database providers
     "DatabaseConfig",
     "DatabaseProvider",

diff --git a/py/core/base/parsers/base_parser.py b/py/core/base/parsers/base_parser.py
@@ -3,14 +3,11 @@
 from abc import ABC, abstractmethod
 from typing import AsyncGenerator, Generic, TypeVar
 
-from ..abstractions import DataType
-
 T = TypeVar("T")
 
 
 class AsyncParser(ABC, Generic[T]):
+
     @abstractmethod
-    async def ingest(
-        self, data: T, **kwargs
-    ) -> AsyncGenerator[DataType, None]:
+    async def ingest(self, data: T, **kwargs) -> AsyncGenerator[str, None]:
         pass
diff --git a/py/core/base/providers/__init__.py b/py/core/base/providers/__init__.py
@@ -16,6 +16,7 @@
     UserHandler,
     VectorHandler,
 )
+from .email import EmailConfig, EmailProvider
 from .embedding import EmbeddingConfig, EmbeddingProvider
 from .ingestion import ChunkingStrategy, IngestionConfig, IngestionProvider
 from .llm import CompletionConfig, CompletionProvider
@@ -36,6 +37,9 @@
     # Crypto provider
     "CryptoConfig",
     "CryptoProvider",
+    # Email provider
+    "EmailConfig",
+    "EmailProvider",
     # Database providers
     "DatabaseConnectionManager",
     "DocumentHandler",

diff --git a/py/core/base/providers/auth.py b/py/core/base/providers/auth.py
@@ -10,6 +10,8 @@
 from ..api.models import UserResponse
 from .base import Provider, ProviderConfig
 from .crypto import CryptoProvider
+from .database import DatabaseProvider
+from .email import EmailProvider
 
 logger = logging.getLogger()
 
@@ -33,8 +35,17 @@ def validate_config(self) -> None:
 
 class AuthProvider(Provider, ABC):
     security = HTTPBearer(auto_error=False)
-
-    def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
+    crypto_provider: CryptoProvider
+    email_provider: EmailProvider
+    database_provider: DatabaseProvider
+
+    def __init__(
+        self,
+        config: AuthConfig,
+        crypto_provider: CryptoProvider,
+        database_provider: DatabaseProvider,
+        email_provider: EmailProvider,
+    ):
         if not isinstance(config, AuthConfig):
             raise ValueError(
                 "AuthProvider must be initialized with an AuthConfig"
@@ -43,6 +54,8 @@ def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
         self.admin_email = config.default_admin_email
         self.admin_password = config.default_admin_password
         self.crypto_provider = crypto_provider
+        self.database_provider = database_provider
+        self.email_provider = email_provider
         super().__init__(config)
         self.config: AuthConfig = config  # for type hinting
-Original file line number
+Diff line change
@@ Expand Up / @@ -16,6 +16,7 @@ on: @@
     jobs:
       test:
         runs-on: ${{ matrix.os }}
+        continue-on-error: true
         strategy:
           matrix:
@@ Expand Down @@