Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: Structured Outputs / Function Calling for vertexAI Batch predictions #1655

Closed
1 task done
davidfeiz opened this issue Jan 24, 2025 · 1 comment
Closed
1 task done
Assignees

Comments

@davidfeiz
Copy link

Is your feature request related to a problem? Please describe.

I already posted a question regarding this in the googlecloud-forum( https://www.googlecloudcommunity.com/gc/AI-ML/Structured-Output-in-vertexAI-BatchPredictionJob/m-p/862525) but I'll provide a brief description of the problem.
For the evaluation of an app we want to process a large dataset using the Batch predictions API. For comparability with the actual dev/prod pipeline, we need to ensure that the output is generated in the same way. We us a pydantic basemodel and pass it to the real-time API, so do not just use it for the validation of the output. Unfortunately, this seems to not be possible using Batch predictions nor can I use function calling to mitigate this.

Describe the solution you'd like

Ideally it would be possible to set additional parameters like response_schema in GenerationConfig() or a function-calling mechanism when initializing the model, as done for the real-time API.
Implementations in vertex AI GenerativeModel:
-> response_schema
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output
-> function-calling
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#python-from-function

Describe alternatives you've considered

Currently we use the real-time API, and introduce a time-separation between consecutive calls, to avoid hitting the APIs rate limits. But this is unreliable, since we still reach these limits sometimes. Moreover, there is no easy way to run the evaluation pipeline asynchronously, which would enable us to send batches containing different configurations for evaluation without waiting for the response first.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@holtskinner
Copy link
Collaborator

Hi, there is an example showing how to use Controlled Generation with Batch Prediction in this notebook https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb

@holtskinner holtskinner self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants