feat: async semantic splitter noder parser #17449

mjrowsky · 2025-01-07T17:02:38Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # allows using embed_model async methods when parsing nodes

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

logan-markewich · 2025-01-08T21:15:16Z

llama-index-core/llama_index/core/node_parser/text/semantic_splitter.py

+        nodes_with_progress = get_tqdm_iterable(nodes, show_progress, "Parsing nodes")
+
+        for node in nodes_with_progress:
+            nodes = await self.abuild_semantic_nodes_from_documents([node], show_progress)


This is a great start! We could make this event faster too, but maybe another PR

For example, we can run multiple nodes at once behind a semaphore to limit concurrency

from llama_index.core.async_utils import run_jobs ... jobs = [] for node in nodes_with_progress: jobs.append(self.abuild_semantic_nodes_from_documents([node], False)) results = await run_jobs(jobs, workers=4, show_progress=show_progress)

logan-markewich · 2025-01-09T21:43:54Z

the unit test failure seemed unrelated imo

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 7, 2025

logan-markewich reviewed Jan 8, 2025

View reviewed changes

logan-markewich approved these changes Jan 8, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 8, 2025

logan-markewich enabled auto-merge (squash) January 8, 2025 21:15

auto-merge was automatically disabled January 9, 2025 09:13
Head branch was pushed to by a user without write access

Miras Ayed added 2 commits January 9, 2025 10:19

feat: async semantic splitter noder parser

ba6d5c6

fix lint

d08e9d9

mjrowsky force-pushed the async_semantic_splitter branch from 7e3429e to d08e9d9 Compare January 9, 2025 09:19

logan-markewich merged commit 8620869 into run-llama:main Jan 9, 2025
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async semantic splitter noder parser #17449

feat: async semantic splitter noder parser #17449

mjrowsky commented Jan 7, 2025

logan-markewich Jan 8, 2025

logan-markewich commented Jan 9, 2025

feat: async semantic splitter noder parser #17449

feat: async semantic splitter noder parser #17449

Conversation

mjrowsky commented Jan 7, 2025

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

logan-markewich Jan 8, 2025

Choose a reason for hiding this comment

logan-markewich commented Jan 9, 2025