You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most tokenizers define their max model length as either 510 tokens or more and is based on:
Model max token lenght - number of tokens needed to define a sentence (start and end)
Example
Most tokenizers follow this convention, but there are some that have nearly infinite length, with tokenizer.model_max_length=1000000000000000019884624838656
This means that when converting the tokenizer max length, in Tensorflow, most values are assumed to be ints, but with nearly infinit model length, it needs to be a tf.long or greater for the conversion not to fail
Initially, the tokenizers model_max_length was set dynamically, but is now set to 510 tokens. This should be changed to reflect the actual tokenizers.
The text was updated successfully, but these errors were encountered:
Most tokenizers define their max model length as either 510 tokens or more and is based on:
Example
Most tokenizers follow this convention, but there are some that have nearly infinite length, with
tokenizer.model_max_length=1000000000000000019884624838656
This means that when converting the tokenizer max length, in Tensorflow, most values are assumed to be ints, but with nearly infinit model length, it needs to be a
tf.long
or greater for the conversion not to failInitially, the tokenizers
model_max_length
was set dynamically, but is now set to 510 tokens. This should be changed to reflect the actual tokenizers.The text was updated successfully, but these errors were encountered: