You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I already posted a question regarding this in the googlecloud-forum( https://www.googlecloudcommunity.com/gc/AI-ML/Structured-Output-in-vertexAI-BatchPredictionJob/m-p/862525) but I'll provide a brief description of the problem.
For the evaluation of an app we want to process a large dataset using the Batch predictions API. For comparability with the actual dev/prod pipeline, we need to ensure that the output is generated in the same way. We us a pydantic basemodel and pass it to the real-time API, so do not just use it for the validation of the output. Unfortunately, this seems to not be possible using Batch predictions nor can I use function calling to mitigate this.
Currently we use the real-time API, and introduce a time-separation between consecutive calls, to avoid hitting the APIs rate limits. But this is unreliable, since we still reach these limits sometimes. Moreover, there is no easy way to run the evaluation pipeline asynchronously, which would enable us to send batches containing different configurations for evaluation without waiting for the response first.
Additional context
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I already posted a question regarding this in the googlecloud-forum( https://www.googlecloudcommunity.com/gc/AI-ML/Structured-Output-in-vertexAI-BatchPredictionJob/m-p/862525) but I'll provide a brief description of the problem.
For the evaluation of an app we want to process a large dataset using the Batch predictions API. For comparability with the actual dev/prod pipeline, we need to ensure that the output is generated in the same way. We us a pydantic basemodel and pass it to the real-time API, so do not just use it for the validation of the output. Unfortunately, this seems to not be possible using Batch predictions nor can I use function calling to mitigate this.
Describe the solution you'd like
Ideally it would be possible to set additional parameters like response_schema in GenerationConfig() or a function-calling mechanism when initializing the model, as done for the real-time API.
Implementations in vertex AI GenerativeModel:
-> response_schema
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output
-> function-calling
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#python-from-function
Describe alternatives you've considered
Currently we use the real-time API, and introduce a time-separation between consecutive calls, to avoid hitting the APIs rate limits. But this is unreliable, since we still reach these limits sometimes. Moreover, there is no easy way to run the evaluation pipeline asynchronously, which would enable us to send batches containing different configurations for evaluation without waiting for the response first.
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: