Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi Modal Bedrock Integration #17451

Merged
merged 5 commits into from
Jan 13, 2025

Conversation

dnandha
Copy link
Contributor

@dnandha dnandha commented Jan 7, 2025

Description

This PR adds support for AWS Bedrock multi-modal models in LlamaIndex. The integration allows users to interact with Bedrock's multi-modal models (currently Claude 3 family) for image and text analysis tasks.

Key features:

  • Support for all Claude 3 multi-modal models in Bedrock
  • Proper handling of image inputs through both file paths and base64 encoding
  • Token count tracking from Bedrock API responses
  • Comprehensive AWS authentication methods
  • Retry logic for API resilience
  • Both synchronous and asynchronous APIs

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating?

  • Yes
  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Tests include:

  • Class inheritance and initialization
  • Model validation
  • Synchronous and asynchronous completion
  • Image input handling
  • Response parsing
  • Token count tracking

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

Implementation Details

The implementation includes:

  1. BedrockMultiModal class that inherits from MultiModalLLM
  2. Support for both file-based and base64-encoded image inputs
  3. Proper message formatting for Bedrock API
  4. AWS credential resolution with multiple authentication methods
  5. Retry logic for API resilience
  6. Token count tracking from API response headers
  7. Comprehensive documentation and examples

Testing

The package includes unit tests that cover:

  • Model initialization and validation
  • Image input handling
  • API request formatting
  • Response parsing
  • Error handling
  • Token count tracking

All tests are passing and the code follows LlamaIndex's coding standards.

Solves #13507.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 7, 2025
@logan-markewich
Copy link
Collaborator

I appreciate the PR @dnandha -- but actually, Im inclined to not merge this. We are in the middle of deprecating our multimodal llm classes, and building in multi modal capabilities directly into the existing LLM class

For example, anthropic:

OpenAI:

Usage:
image

@logan-markewich
Copy link
Collaborator

Issue we are tracking the migration with
#15949

@dnandha
Copy link
Contributor Author

dnandha commented Jan 8, 2025

Hey @logan-markewich . Thanks for looking into the PR so quickly and sharing your concern. I wasn't aware about the issue you mentioned, but see your point. However, after checking your standpoint, my suggestion would still be merging the PR:

  • This integration can serve as a basis for the migration to llms with ImageBlock
  • It gives users something to work with, currently it's not possible to work with Bedrock multi modal models with images
  • The tutorials all build upon multi-modal-llms: https://docs.llamaindex.ai/en/stable/examples/multi_modal/anthropic_multi_modal/: it would be easy to add a new page for bedrock. later, after deprecation of multi-modal-llms the tutorials could all redirect readers to the llm integration

Since the work has already been done, my suggestion would be the following:

  1. Review / Merge PR
  2. Add another checkbox for Bedrock in Consolidate MultiModal LLMs with Base LLMs #15949
  3. Work on migration into llms (I can assist)
  4. Add a "TO BE DEPRECATED" note with a link to Consolidate MultiModal LLMs with Base LLMs #15949 to all existing multi-modal-integrations folders, so that other developers know

Let me know your thoughts on this.

@logan-markewich
Copy link
Collaborator

@dnandha the unit tests are failing, can you take a look? If these are relying on actual api calls, you might have to mock them

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 13, 2025
@logan-markewich logan-markewich merged commit 080faff into run-llama:main Jan 13, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants