Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(bedrock): s3-data-source - add hierarchical and semantic chunking support #648

Closed
1 of 2 tasks
aws-rafams opened this issue Aug 21, 2024 · 1 comment
Closed
1 of 2 tasks
Labels
duplicate This issue or pull request already exists

Comments

@aws-rafams
Copy link
Contributor

aws-rafams commented Aug 21, 2024

Describe the feature

It would be useful to add support to hierarchical and semantic chunking, besides the already supported, fixed size.

export enum ChunkingStrategy {
/**
* Amazon Bedrock splits your source data into chunks of the approximate size
* that you set in the `fixedSizeChunkingConfiguration`.
*/
FIXED_SIZE = 'FIXED_SIZE',
/**
* `FIXED_SIZE` with the default chunk size of 300 tokens and 20% overlap.
* If default is selected, chunk size and overlap set by the user will be
* ignored.
*/
DEFAULT = 'DEFAULT',
/**
* Amazon Bedrock treats each file as one chunk. If you choose this option,
* you may want to pre-process your documents by splitting them into separate
* files.
*/
NONE = 'NONE',
}

https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking-parsing.html#kb-hiearchical-chunking
https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking-parsing.html#kb-semantic-chunking

Use Case

Semantic chunking improves retrieval accuracy by focusing on the semantic content rather than just syntactic structure.

Proposed Solution

modify s3-data-source.ts module

Other Information

Semantic and Hierarchical Chunking are already supported by L1 construct
https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_bedrock.CfnDataSource.ChunkingConfigurationProperty.html#chunkingstrategy

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change
@aws-rafams aws-rafams added the needs-triage This issue or PR still needs to be triaged. label Aug 21, 2024
@krokoko krokoko added duplicate This issue or pull request already exists and removed needs-triage This issue or PR still needs to be triaged. labels Aug 21, 2024
@krokoko
Copy link
Collaborator

krokoko commented Aug 21, 2024

closing as duplicate of #587
The new chunking strategies and associated parameters (see here) are available in cfn but not in cdk yet. They should be added in an upcoming release, we will then implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants