You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLMs have been able to stream data back chunk by chunk for months now. Guardrails needs to be able to ingest and emit streamed data from LLMs/to clients. Here's a proposed plan for that support
Guardrails Support for streaming
Nonexistent. Guardrails does not curenlty support the streaming APIs at all
The plan
Guardrails generally does two types of validation - string and schema
Step 1 (by mid nov)
Schema based validation first checks that the resulting LLM output matches a schema provided by the object in RAIL (or pydantic). Then, it checks, field by field, each specified validator. As soon as failures are hit, Guardrails enacts the on-fail action for that specific failure. For schema validation, the only on-fail options are reask and FAIL. For field-level validation, there are many more options
To recap, currently validation ALWAYS does
Full Skeleton validation
THEN
Each Field validation
Streaming fundamentally changes this story
Let's say we have this output format
Because we know what the output looks like, we can dynamically run validation on the streamed input IF we have an alternative JSON parser. We can also run field-level validation in lock-step.
after each of chunk 0, chunk 1, and chunk 2, we know that the json is valid and relates to the schema specified in .
After Chunk 1, we KNOW that we have the name field completed, and we CAN run field-level validation on it. If the on-fail action is fix OR noop, we can run the validation in parallel. BUT if the on-fail action is reask, we can stop retrieving the streamed results and immediately reask.
Where an api is passed to a Guard object right now, we should be able to accept a stream parameter. This should work for openai by default, it should also work for custom implementations of streaming solutions.
Outputs:
The exact same as guardrails as it currently exists. The way to describe this part of the project is as an internally-managed streaming solution. All of the final, validated output is returned at once, NOT chunk-by-chunk
Step 2: (by end of nov)
return output chunk-by-chunk. This is significantly more compliated from a USER perspective than Step 1. A few key considerations:
When we have a 'reask' scenario, how do we retroactively change a chunk?
On total failures, what do we do?
When do we return a chunk?
In real time, how do we communicate that we're doing Reasks, fixes, etc...
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
LLMs have been able to stream data back chunk by chunk for months now. Guardrails needs to be able to ingest and emit streamed data from LLMs/to clients. Here's a proposed plan for that support
Guardrails Support for streaming
Nonexistent. Guardrails does not curenlty support the streaming APIs at all
The plan
Guardrails generally does two types of validation - string and schema
Step 1 (by mid nov)
Schema based validation first checks that the resulting LLM output matches a schema provided by the object in RAIL (or pydantic). Then, it checks, field by field, each specified validator. As soon as failures are hit, Guardrails enacts the on-fail action for that specific failure. For schema validation, the only on-fail options are reask and FAIL. For field-level validation, there are many more options
To recap, currently validation ALWAYS does
THEN
Streaming fundamentally changes this story
Let's say we have this output format
Because we know what the output looks like, we can dynamically run validation on the streamed input IF we have an alternative JSON parser. We can also run field-level validation in lock-step.
after each of chunk 0, chunk 1, and chunk 2, we know that the json is valid and relates to the schema specified in .
After Chunk 1, we KNOW that we have the name field completed, and we CAN run field-level validation on it. If the on-fail action is fix OR noop, we can run the validation in parallel. BUT if the on-fail action is reask, we can stop retrieving the streamed results and immediately reask.
Example failed field-level validation
Example failed skeletal validation
Example successful skeletal validation
What do the inputs and outputs look like?
Inputs:
Where an
api
is passed to a Guard object right now, we should be able to accept a stream parameter. This should work for openai by default, it should also work for custom implementations of streaming solutions.Outputs:
The exact same as guardrails as it currently exists. The way to describe this part of the project is as an internally-managed streaming solution. All of the final, validated output is returned at once, NOT chunk-by-chunk
Step 2: (by end of nov)
return output chunk-by-chunk. This is significantly more compliated from a USER perspective than Step 1. A few key considerations:
When we have a 'reask' scenario, how do we retroactively change a chunk?
On total failures, what do we do?
When do we return a chunk?
In real time, how do we communicate that we're doing Reasks, fixes, etc...
Examples:
Step 3 (? dec?)
Live-streaming validation on strings and other datatypes
prompt: write me a story
llm_output =
chunk 1
700 chars
chunk 2
299 chars
chunk 3
15 chars
chunk 4
1000 chars
Beta Was this translation helpful? Give feedback.
All reactions