Skip to content

Commit

Permalink
Dev minor (#1126)
Browse files Browse the repository at this point in the history
* Feature/improve r2r telemetry (#1122)

* improve telemetry

* finish telemetry tweaks

* Feature/improve cli infra (#1123)

* improve telemetry

* finish telemetry tweaks

* up

* Feature/add serve fallback to main (#1125)

* improve telemetry

* finish telemetry tweaks

* up

* fallback to main

* Merge fragments (#1127)

* troubleshooting docs (#1128)

* troubleshooting docs (#1129)

* add system diagram (#1130)

* add system diagram

* rm multi

* fix overview

* cleanup and fix

* fix syntax

* change to fast strategy by default (#1133)

* Update parameter passing in js sdk (#1132)

* Docs changes + add entity and relationship types (#1134)

* up

* up

* up

* up

* reduce verbosity

* Feature/dev minor cleanups (#1135)

* cleanups

* bump pkg

---------

Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
  • Loading branch information
3 people authored Sep 12, 2024
1 parent 74a5e51 commit dd6ef23
Show file tree
Hide file tree
Showing 74 changed files with 10,199 additions and 1,939 deletions.
84 changes: 0 additions & 84 deletions .github/workflows/build-main-old.yml

This file was deleted.

14 changes: 13 additions & 1 deletion .github/workflows/build-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,25 @@ jobs:
release_version: ${{ steps.version.outputs.RELEASE_VERSION }}
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Checkout Repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install toml package
run: pip install toml

- name: Determine version to use
id: version
run: |
if [ -n "${{ github.event.inputs.version }}" ]; then
echo "RELEASE_VERSION=${{ github.event.inputs.version }}" >> $GITHUB_OUTPUT
else
echo "RELEASE_VERSION=main" >> $GITHUB_OUTPUT
VERSION=$(python -c "import toml; print(toml.load('py/pyproject.toml')['tool']['poetry']['version'])")
echo "RELEASE_VERSION=$VERSION" >> $GITHUB_OUTPUT
fi
- name: Set matrix
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*.gguf
logs/
workspace/
py/workspace/
uploads/
env/
**/__pycache__
Expand All @@ -19,6 +20,7 @@ coverage.xml

node_modules/
dist/
**/.data/*

*.exe
*.exe~
Expand Down
19 changes: 8 additions & 11 deletions docs/cookbooks/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,21 @@ Note that graph construction may take long for local LLMs, we recommend using cl
<Tabs>
<Tab title="Cloud LLMs">
```bash
r2r serve --config-name=neo4j_kg
r2r serve
```

<Accordion icon="gear" title="Configuration: neo4j_kg">
<Accordion icon="gear" title="Configuration: r2r.toml">
``` toml
[chunking]
provider = "unstructured_local"
strategy = "auto"
chunking_strategy = "basic"
new_after_n_chars = 2_048
max_characters = 4_096 # use larger max_characters for KG construction
combine_under_n_chars = 512
overlap = 20

[kg]
provider = "neo4j"
batch_size = 256
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"

[kg.kg_creation_settings]
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
max_knowledge_triples = 100
fragment_merge_count = 4 # number of fragments to merge into a single extraction
generation_config = { model = "gpt-4o-mini" } # and other params, model used for triplet extraction

[kg.kg_enrichment_settings]
Expand Down Expand Up @@ -104,7 +98,10 @@ provider = "neo4j"
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"

[kg.kg_creation_settings]
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
max_knowledge_triples = 100
fragment_merge_count = 4 # number of fragments to merge into a single extraction
generation_config = { model = "ollama/llama3.1" } # and other params, model used for triplet extraction

[kg.kg_enrichment_settings]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: 'Configuration for Restructuring data after ingestion using Knowled

It is often effective to restructure data after ingestion to improve retrieval performance and accuracy. R2R supports knowledge graphs for data restructuring. You can find out more about creating knowledge graphs in the [Knowledge Graphs Guide](/cookbooks/graphrag).

You can configure knowledge graph enrichment in the R2R configuration file. To do this, just set the `kg.kg_enrichment_settings` section in the configuration file. Following is the sample format from the example configuration file `neo4j_kg.toml`.
You can configure knowledge graph enrichment in the R2R configuration file. To do this, just set the `kg.kg_enrichment_settings` section in the configuration file. Following is the sample format from the example configuration file `r2r.toml`.

```toml
[kg]
Expand All @@ -14,6 +14,9 @@ batch_size = 256
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"

[kg.kg_creation_settings]
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
fragment_merge_count = 4 # number of fragments to merge into a single extraction
max_knowledge_triples = 100 # max number of triples to extract for each document chunk
generation_config = { model = "gpt-4o-mini" } # and other generation params

Expand Down
16 changes: 13 additions & 3 deletions docs/documentation/configuration/knowledge-graph/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ description: 'Configure your R2R knowledge graph provider.'
---
## Knowledge Graph Provider

R2R supports knowledge graph functionality to enhance document understanding and retrieval. By default, R2R uses [Neo4j](https://neo4j.com/) as the knowledge graph provider. We are actively working to integrate with [Memgraph](https://memgraph.com/docs). You can find out more about creating knowledge graphs in the [Knowledge Graphs Guide](/cookbooks/graphrag).
R2R supports knowledge graph functionality to enhance document understanding and retrieval. By default, R2R uses [Neo4j](https://neo4j.com/) as the knowledge graph provider. We are actively working to integrate with [Memgraph](https://memgraph.com/docs). You can find out more about creating knowledge graphs in the [GraphRAG Cookbook](/cookbooks/graphrag).


To configure the knowledge graph settings:
To configure the knowledge graph settings for your project:

1. Edit the `kg` section in your `r2r.toml` file:

Expand All @@ -18,8 +18,11 @@ batch_size = 256
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"

[kg.kg_creation_settings]
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
generation_config = { model = "gpt-4o-mini" }
max_knowledge_triples = 100 # max number of triples to extract for each document chunk
fragment_merge_count = 4 # number of fragments to merge into a single extraction

[kg.kg_enrichment_settings]
max_summary_input_length = 65536
Expand All @@ -38,6 +41,7 @@ Let's break down the knowledge graph configuration options:
- `kg_extraction_prompt`: Specifies the prompt template to use for extracting knowledge graph information from text.
- `kg_creation_settings`: Configuration for the model used in knowledge graph creation.
- `max_knowledge_triples`: The maximum number of knowledge triples to extract for each document chunk.
- `fragment_merge_count`: The number of fragments to merge into a single extraction.
- `generation_config`: Configuration for the model used in knowledge graph creation.
- `kg_enrichment_settings`: Similar configuration for the model used in knowledge graph enrichment.
- `generation_config`: Configuration for the model used in knowledge graph enrichment.
Expand All @@ -46,7 +50,7 @@ Let's break down the knowledge graph configuration options:

### Neo4j Configuration

When using Neo4j as the knowledge graph provider, you need to set up the following environment variables or provide them in the `r2r.toml` file:
When using Neo4j as the knowledge graph provider, you need to set up the following environment variables or provide them in the `r2r.toml` file. To set them as environment variables:

```bash
export NEO4J_USER=your_neo4j_username
Expand All @@ -55,6 +59,8 @@ export NEO4J_URL=bolt://your_neo4j_host:7687
export NEO4J_DATABASE=neo4j
```

And to set them directly in your config:

```toml r2r.toml
[kg]
provider = "neo4j"
Expand All @@ -64,6 +70,10 @@ url = "bolt://your_neo4j_host:7687"
database = "neo4j"
```

<Note>
Setting configuration values in the `r2r.toml` will override environment variables by default.
</Note>


### Knowledge Graph Operations

Expand Down
2 changes: 1 addition & 1 deletion docs/documentation/deep-dive/main/config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ config = R2RConfig.from_toml("path/to/your/r2r.toml")
r2r = R2RBuilder(config).build()

# Or use a preset configuration
r2r = R2RBuilder(config_name="neo4j_kg").build()
r2r = R2RBuilder(config_name="default").build()
```

## Configuration Sections
Expand Down
Loading

0 comments on commit dd6ef23

Please sign in to comment.