Skip to content

Commit

Permalink
dev-minor (#1509)
Browse files Browse the repository at this point in the history
* adding bin support and make it default (#1508)

* Feature/tweak actions (#1507)

* up

* tweak actions

* adding bin sup and making it default

* tested and vetted

* up (#1510)

* up

* set verification to default false

* cleanup (#1512)

* cleanup

* cleanup prompt mgmt

* up

* cleanup printout

* cleanup new parser logic, set vlm as default for all providers

* allow user to re-override

* modify exp backoff implementation (#1513)

* Feature/tweak actions (#1507)

* up

* tweak actions

* modify exp backoff impl

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>

* Patch/touchups (#1515)

* cleanup

* cleanup prompt mgmt

* up

* cleanup printout

* cleanup new parser logic, set vlm as default for all providers

* allow user to re-override

* add touchups

* add extra parsers (#1516)

* add extra parsers (#1518)

* minor fixes (#1514)

* Feature/add back ollama provider (#1522)

* add extra parsers

* add back ollama

* rvert auth workflow

* Feature/add prompt tests and cleanup (#1523)

* add extra parsers

* add prompt tests, cleanup

* add prompt tests, cleanup

* merge

* set mock console as default

* set mock console as default

* fix config

* Update community model (#1524)

* Feature/tweak actions (#1507)

* up

* tweak actions

* Sync JS SDK, Harmonize Python SDK KG Methods (#1511)

* Feature/move logging (#1492)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

* Feature/dev rebased (#1500)

* Feature/move logging (#1493)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* Update js package (#1498)

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Fix handling for R2R exceptions (#1501)

* fix doc test (#1502)

* Harmonize python SDK KG methods for optional params, add missing JS methods

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Clean up pagination and offset around KG (#1519)

* Move to R2R light for integration testing (#1521)

* Update community model

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Patch/fix import bleed (#1526)

* Feature/tweak actions (#1507)

* up

* tweak actions

* Sync JS SDK, Harmonize Python SDK KG Methods (#1511)

* Feature/move logging (#1492)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

* Feature/dev rebased (#1500)

* Feature/move logging (#1493)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* Update js package (#1498)

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Fix handling for R2R exceptions (#1501)

* fix doc test (#1502)

* Harmonize python SDK KG methods for optional params, add missing JS methods

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Clean up pagination and offset around KG (#1519)

* Move to R2R light for integration testing (#1521)

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Patch/fix import bleed (#1527)

* Feature/tweak actions (#1507)

* up

* tweak actions

* Sync JS SDK, Harmonize Python SDK KG Methods (#1511)

* Feature/move logging (#1492)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

* Feature/dev rebased (#1500)

* Feature/move logging (#1493)

* move logging provider out

* move logging provider to own directory, remove singleton

* cleanup

* Update js package (#1498)

* fix refactoring tweak (#1496)

* Fix JSON serialization and Prompt ID Bugs for Prompts (#1491)

* Bug in get prompts

* Add tests

* Prevent verbose logging on standup

* Remove kg as required key in config, await get_all_prompts

* Remove reference to fragment id

* comment out ingestion

* complete logging port (#1499)

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* Fix handling for R2R exceptions (#1501)

* fix doc test (#1502)

* Harmonize python SDK KG methods for optional params, add missing JS methods

---------

Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>
Co-authored-by: emrgnt-cmplxty <owen@algofi.org>

* Clean up pagination and offset around KG (#1519)

* Move to R2R light for integration testing (#1521)

* fix ollama pdf parser

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>

* email auth false and js bump

* fix actions (#1528)

* Feature/add poppler check and fallback (#1529)

* fix actions

* fallback

* Patch/import shutil (#1530)

* fix actions

* fallback

* import shutil

* Feature/include basic pdf parsing everywhere (#1531)

* fix actions

* fallback

* import shutil

* add basic pdf as extra parser in all configs

* Remove non existent user login?

* attempt login

* Change password back

* add explicit setting, trigger rebuild

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
  • Loading branch information
3 people authored Oct 30, 2024
1 parent a85b67e commit ddfe870
Show file tree
Hide file tree
Showing 107 changed files with 2,109 additions and 1,569 deletions.
14 changes: 0 additions & 14 deletions .github/actions/run-script-zerox-tests/action.yml

This file was deleted.

42 changes: 42 additions & 0 deletions .github/actions/run-sdk-prompt-management-tests/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: 'Run SDK Prompt Management Tests'
description: 'Runs SDK prompt management tests for R2R'
runs:
using: "composite"
steps:
# First run basic prompt operations
- name: Add prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_add_prompt

- name: Get prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_get_prompt

- name: Get all prompts test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_get_all_prompts

- name: Update prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_update_prompt

# Then run error handling and access control tests
- name: Prompt error handling test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_prompt_error_handling

- name: Prompt access control test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_prompt_access_control

# Finally run deletion test
- name: Delete prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_delete_prompt
3 changes: 0 additions & 3 deletions .github/workflows/r2r-full-integration-deep-dive-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,3 @@ jobs:

- name: Start R2R Full server
uses: ./.github/actions/start-r2r-full

- name: Run Test Zerox
uses: ./.github/actions/run-script-zerox-tests
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/r2r-full-py-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand All @@ -27,6 +28,7 @@ jobs:
- sdk-retrieval
- sdk-auth
- sdk-collections
- sdk-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: 'false'
Expand Down Expand Up @@ -56,29 +58,27 @@ jobs:
- name: Run CLI Ingestion Tests
if: matrix.test_category == 'cli-ingestion'
uses: ./.github/actions/run-cli-ingestion-tests
continue-on-error: true

- name: Run CLI Retrieval Tests
if: matrix.test_category == 'cli-retrieval'
uses: ./.github/actions/run-cli-retrieval-tests
continue-on-error: true

- name: Run SDK Ingestion Tests
if: matrix.test_category == 'sdk-ingestion'
uses: ./.github/actions/run-sdk-ingestion-tests
continue-on-error: true

- name: Run SDK Retrieval Tests
if: matrix.test_category == 'sdk-retrieval'
uses: ./.github/actions/run-sdk-retrieval-tests
continue-on-error: true

- name: Run SDK Auth Tests
if: matrix.test_category == 'sdk-auth'
uses: ./.github/actions/run-sdk-auth-tests
continue-on-error: true

- name: Run SDK Collections Tests
if: matrix.test_category == 'sdk-collections'
uses: ./.github/actions/run-sdk-collections-tests
continue-on-error: true

- name: Run SDK Prompt Tests
if: matrix.test_category == 'sdk-prompts'
uses: ./.github/actions/run-sdk-prompt-management-tests
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/r2r-light-py-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand All @@ -29,6 +30,7 @@ jobs:
- sdk-retrieval
- sdk-auth
- sdk-collections
- sdk-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: 'false'
Expand Down Expand Up @@ -59,29 +61,27 @@ jobs:
- name: Run CLI Ingestion Tests
if: matrix.test_category == 'cli-ingestion'
uses: ./.github/actions/run-cli-ingestion-tests
continue-on-error: true

- name: Run CLI Retrieval Tests
if: matrix.test_category == 'cli-retrieval'
uses: ./.github/actions/run-cli-retrieval-tests
continue-on-error: true

- name: Run SDK Ingestion Tests
if: matrix.test_category == 'sdk-ingestion'
uses: ./.github/actions/run-sdk-ingestion-tests
continue-on-error: true

- name: Run SDK Retrieval Tests
if: matrix.test_category == 'sdk-retrieval'
uses: ./.github/actions/run-sdk-retrieval-tests
continue-on-error: true

- name: Run SDK Auth Tests
if: matrix.test_category == 'sdk-auth'
uses: ./.github/actions/run-sdk-auth-tests
continue-on-error: true

- name: Run SDK Collections Tests
if: matrix.test_category == 'sdk-collections'
uses: ./.github/actions/run-sdk-collections-tests
continue-on-error: true

- name: Run SDK Prompt Tests
if: matrix.test_category == 'sdk-prompts'
uses: ./.github/actions/run-sdk-prompt-management-tests
2 changes: 1 addition & 1 deletion docs/api-reference/openapi.json

Large diffs are not rendered by default.

6 changes: 0 additions & 6 deletions docs/cookbooks/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,6 @@ excluded_parsers = ["mp4"]
semantic_similarity_threshold = 0.7
generation_config = { model = "openai/gpt-4o-mini" }

[ingestion.extra_parsers]
pdf = "zerox"

[database]
provider = "postgres"
batch_size = 256
Expand Down Expand Up @@ -204,9 +201,6 @@ max_characters = 1_024
combine_under_n_chars = 128
overlap = 256

[ingestion.extra_parsers]
pdf = "zerox"

[orchestration]
provider = "hatchet"
kg_creation_concurrency_lipmit = 32
Expand Down
2 changes: 1 addition & 1 deletion js/sdk/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/sdk/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "r2r-js",
"version": "0.3.11",
"version": "0.3.12",
"description": "",
"main": "dist/index.js",
"browser": "dist/index.browser.js",
Expand Down
2 changes: 1 addition & 1 deletion py/cli/command_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from asyncclick import pass_context
from asyncclick.exceptions import Exit

from r2r import R2RAsyncClient
from sdk import R2RAsyncClient


@click.group()
Expand Down
4 changes: 3 additions & 1 deletion py/cli/commands/ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from cli.command_group import cli
from cli.utils.param_types import JSON
from cli.utils.timer import timer
from core.base.abstractions import IndexMeasure, IndexMethod, VectorTableName
from shared.abstractions import IndexMeasure, IndexMethod, VectorTableName


async def ingest_files_from_urls(client, urls):
Expand Down Expand Up @@ -243,6 +243,7 @@ async def create_vector_index(
index_measure,
index_arguments,
index_name,
index_column,
no_concurrent,
):
"""Create a vector index for similarity search."""
Expand All @@ -254,6 +255,7 @@ async def create_vector_index(
index_measure=index_measure,
index_arguments=index_arguments,
index_name=index_name,
index_column=index_column,
concurrently=not no_concurrent,
)
click.echo(json.dumps(response, indent=2))
Expand Down
8 changes: 6 additions & 2 deletions py/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConfig",
"DatabaseProvider",
Expand Down Expand Up @@ -192,9 +195,9 @@
"AudioParser",
"DOCXParser",
"ImageParser",
"PDFParser",
"VLMPDFParser",
"BasicPDFParser",
"PDFParserUnstructured",
"PDFParserMarker",
"PPTParser",
# Structured parsers
"CSVParser",
Expand Down Expand Up @@ -233,6 +236,7 @@
# Embeddings
"LiteLLMEmbeddingProvider",
"OpenAIEmbeddingProvider",
"OllamaEmbeddingProvider",
# LLM
"OpenAICompletionProvider",
"LiteLLMCompletionProvider",
Expand Down
3 changes: 3 additions & 0 deletions py/core/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConfig",
"DatabaseProvider",
Expand Down
7 changes: 2 additions & 5 deletions py/core/base/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,11 @@
from abc import ABC, abstractmethod
from typing import AsyncGenerator, Generic, TypeVar

from ..abstractions import DataType

T = TypeVar("T")


class AsyncParser(ABC, Generic[T]):

@abstractmethod
async def ingest(
self, data: T, **kwargs
) -> AsyncGenerator[DataType, None]:
async def ingest(self, data: T, **kwargs) -> AsyncGenerator[str, None]:
pass
4 changes: 4 additions & 0 deletions py/core/base/providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
UserHandler,
VectorHandler,
)
from .email import EmailConfig, EmailProvider
from .embedding import EmbeddingConfig, EmbeddingProvider
from .ingestion import ChunkingStrategy, IngestionConfig, IngestionProvider
from .llm import CompletionConfig, CompletionProvider
Expand All @@ -36,6 +37,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConnectionManager",
"DocumentHandler",
Expand Down
17 changes: 15 additions & 2 deletions py/core/base/providers/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
from ..api.models import UserResponse
from .base import Provider, ProviderConfig
from .crypto import CryptoProvider
from .database import DatabaseProvider
from .email import EmailProvider

logger = logging.getLogger()

Expand All @@ -33,8 +35,17 @@ def validate_config(self) -> None:

class AuthProvider(Provider, ABC):
security = HTTPBearer(auto_error=False)

def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
crypto_provider: CryptoProvider
email_provider: EmailProvider
database_provider: DatabaseProvider

def __init__(
self,
config: AuthConfig,
crypto_provider: CryptoProvider,
database_provider: DatabaseProvider,
email_provider: EmailProvider,
):
if not isinstance(config, AuthConfig):
raise ValueError(
"AuthProvider must be initialized with an AuthConfig"
Expand All @@ -43,6 +54,8 @@ def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
self.admin_email = config.default_admin_email
self.admin_password = config.default_admin_password
self.crypto_provider = crypto_provider
self.database_provider = database_provider
self.email_provider = email_provider
super().__init__(config)
self.config: AuthConfig = config # for type hinting

Expand Down
Loading

0 comments on commit ddfe870

Please sign in to comment.