Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-minor #1509

Merged
merged 24 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
610c1db
adding bin support and make it default (#1508)
emrgnt-cmplxty Oct 28, 2024
2f674dd
up (#1510)
emrgnt-cmplxty Oct 28, 2024
080d8cb
cleanup (#1512)
emrgnt-cmplxty Oct 28, 2024
f46d4bf
modify exp backoff implementation (#1513)
shreyaspimpalgaonkar Oct 28, 2024
97bb290
Patch/touchups (#1515)
emrgnt-cmplxty Oct 29, 2024
a58776a
add extra parsers (#1516)
emrgnt-cmplxty Oct 29, 2024
ebb4c6f
add extra parsers (#1518)
emrgnt-cmplxty Oct 29, 2024
0480d2e
minor fixes (#1514)
shreyaspimpalgaonkar Oct 29, 2024
939f7d4
Feature/add back ollama provider (#1522)
emrgnt-cmplxty Oct 29, 2024
746bfe4
Feature/add prompt tests and cleanup (#1523)
emrgnt-cmplxty Oct 29, 2024
1010c74
fix config
emrgnt-cmplxty Oct 29, 2024
794c408
Update community model (#1524)
NolanTrem Oct 30, 2024
96e2367
Patch/fix import bleed (#1526)
emrgnt-cmplxty Oct 30, 2024
680c327
Patch/fix import bleed (#1527)
emrgnt-cmplxty Oct 30, 2024
7774c46
Merge branch 'main' into dev-minor
emrgnt-cmplxty Oct 30, 2024
f23d87e
email auth false and js bump
NolanTrem Oct 30, 2024
ed6b07d
fix actions (#1528)
emrgnt-cmplxty Oct 30, 2024
977edf0
Feature/add poppler check and fallback (#1529)
emrgnt-cmplxty Oct 30, 2024
be8cf5e
Patch/import shutil (#1530)
emrgnt-cmplxty Oct 30, 2024
77c8979
Feature/include basic pdf parsing everywhere (#1531)
emrgnt-cmplxty Oct 30, 2024
352ceb9
Remove non existent user login?
NolanTrem Oct 30, 2024
f1f501d
attempt login
NolanTrem Oct 30, 2024
f32955d
Change password back
NolanTrem Oct 30, 2024
1d5d55a
add explicit setting, trigger rebuild
emrgnt-cmplxty Oct 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions .github/actions/run-script-zerox-tests/action.yml

This file was deleted.

42 changes: 42 additions & 0 deletions .github/actions/run-sdk-prompt-management-tests/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: 'Run SDK Prompt Management Tests'
description: 'Runs SDK prompt management tests for R2R'
runs:
using: "composite"
steps:
# First run basic prompt operations
- name: Add prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_add_prompt

- name: Get prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_get_prompt

- name: Get all prompts test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_get_all_prompts

- name: Update prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_update_prompt

# Then run error handling and access control tests
- name: Prompt error handling test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_prompt_error_handling

- name: Prompt access control test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_prompt_access_control

# Finally run deletion test
- name: Delete prompt test (SDK)
working-directory: ./py
shell: bash
run: poetry run python tests/integration/runner_sdk.py test_delete_prompt
3 changes: 0 additions & 3 deletions .github/workflows/r2r-full-integration-deep-dive-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,3 @@ jobs:

- name: Start R2R Full server
uses: ./.github/actions/start-r2r-full

- name: Run Test Zerox
uses: ./.github/actions/run-script-zerox-tests
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/r2r-full-py-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand All @@ -27,6 +28,7 @@ jobs:
- sdk-retrieval
- sdk-auth
- sdk-collections
- sdk-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: 'false'
Expand Down Expand Up @@ -56,29 +58,27 @@ jobs:
- name: Run CLI Ingestion Tests
if: matrix.test_category == 'cli-ingestion'
uses: ./.github/actions/run-cli-ingestion-tests
continue-on-error: true

- name: Run CLI Retrieval Tests
if: matrix.test_category == 'cli-retrieval'
uses: ./.github/actions/run-cli-retrieval-tests
continue-on-error: true

- name: Run SDK Ingestion Tests
if: matrix.test_category == 'sdk-ingestion'
uses: ./.github/actions/run-sdk-ingestion-tests
continue-on-error: true

- name: Run SDK Retrieval Tests
if: matrix.test_category == 'sdk-retrieval'
uses: ./.github/actions/run-sdk-retrieval-tests
continue-on-error: true

- name: Run SDK Auth Tests
if: matrix.test_category == 'sdk-auth'
uses: ./.github/actions/run-sdk-auth-tests
continue-on-error: true

- name: Run SDK Collections Tests
if: matrix.test_category == 'sdk-collections'
uses: ./.github/actions/run-sdk-collections-tests
continue-on-error: true

- name: Run SDK Prompt Tests
if: matrix.test_category == 'sdk-prompts'
uses: ./.github/actions/run-sdk-prompt-management-tests
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/r2r-light-py-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ on:
jobs:
test:
runs-on: ${{ matrix.os }}
continue-on-error: true

strategy:
matrix:
Expand All @@ -29,6 +30,7 @@ jobs:
- sdk-retrieval
- sdk-auth
- sdk-collections
- sdk-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: 'false'
Expand Down Expand Up @@ -59,29 +61,27 @@ jobs:
- name: Run CLI Ingestion Tests
if: matrix.test_category == 'cli-ingestion'
uses: ./.github/actions/run-cli-ingestion-tests
continue-on-error: true

- name: Run CLI Retrieval Tests
if: matrix.test_category == 'cli-retrieval'
uses: ./.github/actions/run-cli-retrieval-tests
continue-on-error: true

- name: Run SDK Ingestion Tests
if: matrix.test_category == 'sdk-ingestion'
uses: ./.github/actions/run-sdk-ingestion-tests
continue-on-error: true

- name: Run SDK Retrieval Tests
if: matrix.test_category == 'sdk-retrieval'
uses: ./.github/actions/run-sdk-retrieval-tests
continue-on-error: true

- name: Run SDK Auth Tests
if: matrix.test_category == 'sdk-auth'
uses: ./.github/actions/run-sdk-auth-tests
continue-on-error: true

- name: Run SDK Collections Tests
if: matrix.test_category == 'sdk-collections'
uses: ./.github/actions/run-sdk-collections-tests
continue-on-error: true

- name: Run SDK Prompt Tests
if: matrix.test_category == 'sdk-prompts'
uses: ./.github/actions/run-sdk-prompt-management-tests
2 changes: 1 addition & 1 deletion docs/api-reference/openapi.json

Large diffs are not rendered by default.

6 changes: 0 additions & 6 deletions docs/cookbooks/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,6 @@ excluded_parsers = ["mp4"]
semantic_similarity_threshold = 0.7
generation_config = { model = "openai/gpt-4o-mini" }

[ingestion.extra_parsers]
pdf = "zerox"

[database]
provider = "postgres"
batch_size = 256
Expand Down Expand Up @@ -204,9 +201,6 @@ max_characters = 1_024
combine_under_n_chars = 128
overlap = 256

[ingestion.extra_parsers]
pdf = "zerox"

[orchestration]
provider = "hatchet"
kg_creation_concurrency_lipmit = 32
Expand Down
2 changes: 1 addition & 1 deletion js/sdk/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion js/sdk/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "r2r-js",
"version": "0.3.11",
"version": "0.3.12",
"description": "",
"main": "dist/index.js",
"browser": "dist/index.browser.js",
Expand Down
2 changes: 1 addition & 1 deletion py/cli/command_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from asyncclick import pass_context
from asyncclick.exceptions import Exit

from r2r import R2RAsyncClient
from sdk import R2RAsyncClient


@click.group()
Expand Down
4 changes: 3 additions & 1 deletion py/cli/commands/ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from cli.command_group import cli
from cli.utils.param_types import JSON
from cli.utils.timer import timer
from core.base.abstractions import IndexMeasure, IndexMethod, VectorTableName
from shared.abstractions import IndexMeasure, IndexMethod, VectorTableName


async def ingest_files_from_urls(client, urls):
Expand Down Expand Up @@ -243,6 +243,7 @@ async def create_vector_index(
index_measure,
index_arguments,
index_name,
index_column,
no_concurrent,
):
"""Create a vector index for similarity search."""
Expand All @@ -254,6 +255,7 @@ async def create_vector_index(
index_measure=index_measure,
index_arguments=index_arguments,
index_name=index_name,
index_column=index_column,
concurrently=not no_concurrent,
)
click.echo(json.dumps(response, indent=2))
Expand Down
8 changes: 6 additions & 2 deletions py/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConfig",
"DatabaseProvider",
Expand Down Expand Up @@ -192,9 +195,9 @@
"AudioParser",
"DOCXParser",
"ImageParser",
"PDFParser",
"VLMPDFParser",
"BasicPDFParser",
"PDFParserUnstructured",
"PDFParserMarker",
"PPTParser",
# Structured parsers
"CSVParser",
Expand Down Expand Up @@ -233,6 +236,7 @@
# Embeddings
"LiteLLMEmbeddingProvider",
"OpenAIEmbeddingProvider",
"OllamaEmbeddingProvider",
# LLM
"OpenAICompletionProvider",
"LiteLLMCompletionProvider",
Expand Down
3 changes: 3 additions & 0 deletions py/core/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConfig",
"DatabaseProvider",
Expand Down
7 changes: 2 additions & 5 deletions py/core/base/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,11 @@
from abc import ABC, abstractmethod
from typing import AsyncGenerator, Generic, TypeVar

from ..abstractions import DataType

T = TypeVar("T")


class AsyncParser(ABC, Generic[T]):

@abstractmethod
async def ingest(
self, data: T, **kwargs
) -> AsyncGenerator[DataType, None]:
async def ingest(self, data: T, **kwargs) -> AsyncGenerator[str, None]:
pass
4 changes: 4 additions & 0 deletions py/core/base/providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
UserHandler,
VectorHandler,
)
from .email import EmailConfig, EmailProvider
from .embedding import EmbeddingConfig, EmbeddingProvider
from .ingestion import ChunkingStrategy, IngestionConfig, IngestionProvider
from .llm import CompletionConfig, CompletionProvider
Expand All @@ -36,6 +37,9 @@
# Crypto provider
"CryptoConfig",
"CryptoProvider",
# Email provider
"EmailConfig",
"EmailProvider",
# Database providers
"DatabaseConnectionManager",
"DocumentHandler",
Expand Down
17 changes: 15 additions & 2 deletions py/core/base/providers/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
from ..api.models import UserResponse
from .base import Provider, ProviderConfig
from .crypto import CryptoProvider
from .database import DatabaseProvider
from .email import EmailProvider

logger = logging.getLogger()

Expand All @@ -33,8 +35,17 @@ def validate_config(self) -> None:

class AuthProvider(Provider, ABC):
security = HTTPBearer(auto_error=False)

def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
crypto_provider: CryptoProvider
email_provider: EmailProvider
database_provider: DatabaseProvider

def __init__(
self,
config: AuthConfig,
crypto_provider: CryptoProvider,
database_provider: DatabaseProvider,
email_provider: EmailProvider,
):
if not isinstance(config, AuthConfig):
raise ValueError(
"AuthProvider must be initialized with an AuthConfig"
Expand All @@ -43,6 +54,8 @@ def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider):
self.admin_email = config.default_admin_email
self.admin_password = config.default_admin_password
self.crypto_provider = crypto_provider
self.database_provider = database_provider
self.email_provider = email_provider
super().__init__(config)
self.config: AuthConfig = config # for type hinting

Expand Down
Loading
Loading