Streaming support #418

zsimjee · 2023-11-01T18:07:45Z

zsimjee
Nov 1, 2023
Maintainer

LLMs have been able to stream data back chunk by chunk for months now. Guardrails needs to be able to ingest and emit streamed data from LLMs/to clients. Here's a proposed plan for that support

Guardrails Support for streaming

Nonexistent. Guardrails does not curenlty support the streaming APIs at all

The plan

Guardrails generally does two types of validation - string and schema

Step 1 (by mid nov)

Schema based validation first checks that the resulting LLM output matches a schema provided by the object in RAIL (or pydantic). Then, it checks, field by field, each specified validator. As soon as failures are hit, Guardrails enacts the on-fail action for that specific failure. For schema validation, the only on-fail options are reask and FAIL. For field-level validation, there are many more options

To recap, currently validation ALWAYS does

Full Skeleton validation

THEN

Each Field validation

Streaming fundamentally changes this story

Let's say we have this output format

Because we know what the output looks like, we can dynamically run validation on the streamed input IF we have an alternative JSON parser. We can also run field-level validation in lock-step.


Chunk 0:

{

 pets: [{


Chunk 1:

 name: "fido",


Chunk 2:

 age: 10

 }]

}

after each of chunk 0, chunk 1, and chunk 2, we know that the json is valid and relates to the schema specified in .

After Chunk 1, we KNOW that we have the name field completed, and we CAN run field-level validation on it. If the on-fail action is fix OR noop, we can run the validation in parallel. BUT if the on-fail action is reask, we can stop retrieving the streamed results and immediately reask.

Example failed field-level validation


{

 pets: [{


Chunk 1:

 name: "f",


Chunk 2:

 age: 10

 }]

}

Example failed skeletal validation


{

 pets: [{


Chunk 1:

 name: 783487,


Chunk 2:

 age: 10

 }]

}

Example successful skeletal validation


Chunk 0:

{

 pets: [{


Chunk 1:

 age: 10,


Chunk 2:

 name: "f"

 }]

}

What do the inputs and outputs look like?

Inputs:

Where an api is passed to a Guard object right now, we should be able to accept a stream parameter. This should work for openai by default, it should also work for custom implementations of streaming solutions.

Outputs:

The exact same as guardrails as it currently exists. The way to describe this part of the project is as an internally-managed streaming solution. All of the final, validated output is returned at once, NOT chunk-by-chunk

Step 2: (by end of nov)

return output chunk-by-chunk. This is significantly more compliated from a USER perspective than Step 1. A few key considerations:

When we have a 'reask' scenario, how do we retroactively change a chunk?
On total failures, what do we do?
When do we return a chunk?
In real time, how do we communicate that we're doing Reasks, fixes, etc...

Examples:

Example failed skeletal validation


Chunk 0:

{

 pets: [{

Chunk 1:

 age: 10,

Chunk 2:

 name: 783487,

 }]

}

Step 3 (? dec?)

Live-streaming validation on strings and other datatypes

prompt: write me a story

llm_output =

chunk 1

700 chars

chunk 2

299 chars

chunk 3

15 chars

chunk 4

1000 chars

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming support #418

{{title}}

Replies: 0 comments

Select a reply

Streaming support #418

zsimjee Nov 1, 2023 Maintainer

Guardrails Support for streaming

The plan

Step 1 (by mid nov)

What do the inputs and outputs look like?

Step 2: (by end of nov)

Step 3 (? dec?)

Replies: 0 comments

zsimjee
Nov 1, 2023
Maintainer