Initialize KVCache page_tables
to zero to prevent NaNs
#833
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I was seeing outputs from
llama3.1_70b_tp8
that had token corruption:The output from unsharded 70b was fine, but the output from sharded 70b was showing this token corruption. After further investigation, it appears that this was occurring for
llama3.1_8b_tp8
also, if the input prompt is sufficiently long.The key difference I saw between the sharded and unsharded models is that the KVCache for sharded models was being initialized with NaN values.
For examples, this was the debug stats of
kv_cache_shard_0
, prior to prefill even being executed:Other tensors initialized with even more NaN values. This was
kv_cache_shard_7
:For
unsharded_7b
andunsharded_8b
, none of the device_arrays for the kvcache contained NaN values. This appears to be what caused the output to eventually output all0
tokens for the sharded models.After explicitly initializing
page_tables
to zero, the issue went away and output looked good across multiple requests of varying length: