You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flair uses the tokenizer.model_max_length in the TransformerEmbeddings to truncate (if allow_long_sentences=False) or split (if allow_long_sentences=True) long sentences.
To Reproduce
fromflair.dataimportSentencefromflair.embeddingsimportTransformerWordEmbeddingsemb=TransformerWordEmbeddings("distilbert-base-cased", allow_long_sentences=True)
emb.embed(Sentence("Hallo World "*1024))
Expected behavior
The code should run through without any issue.
Logs and Stack traces
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\base.py", line 50, in embed
self._add_embeddings_internal(data_points)
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 705, in _add_embeddings_internal
embeddings = self._forward_tensors(tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 1424, in _forward_tensors
return self.forward(**tensors)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\flair\embeddings\transformer.py", line 1324, in forward
hidden_states = self.model(input_ids, **model_kwargs)[-1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\transformers\models\distilbert\modeling_distilbert.py", line 806, in forward
embeddings = self.embeddings(input_ids, inputs_embeds) # (bs, seq_length, dim)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bened\anaconda3\envs\py312\Lib\site-packages\transformers\models\distilbert\modeling_distilbert.py", line 144, in forward
embeddings = input_embeds + position_embeddings # (bs, max_seq_length, dim)
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (3074) must match the size of tensor b (512) at non-singleton dimension 1
If you run into this problem, you can hotfix it in 2 ways:
pin transformers<4.40.0
provide the model_max_length parameter yourself, e.g. emb = TransformerWordEmbeddings("distilbert-base-cased", allow_long_sentences=True, model_max_length=512)
Environment
Versions:
Flair
0.13.1
Pytorch
2.3.0+cpu
Transformers
4.40.0
GPU
False
The text was updated successfully, but these errors were encountered:
Describe the bug
This is due to a regression on the transformers side, see: huggingface/transformers#30643 for details.
Flair uses the
tokenizer.model_max_length
in the TransformerEmbeddings to truncate (ifallow_long_sentences=False
) or split (ifallow_long_sentences=True
) long sentences.To Reproduce
Expected behavior
The code should run through without any issue.
Logs and Stack traces
Screenshots
No response
Additional Context
This bug is on the side of huggingface/transformers#30643 therefore this issue is only for visiblity.
If you run into this problem, you can hotfix it in 2 ways:
transformers<4.40.0
model_max_length
parameter yourself, e.g.emb = TransformerWordEmbeddings("distilbert-base-cased", allow_long_sentences=True, model_max_length=512)
Environment
Versions:
Flair
0.13.1
Pytorch
2.3.0+cpu
Transformers
4.40.0
GPU
False
The text was updated successfully, but these errors were encountered: