Using sdpa and flash_attention_2 error #168

aixingxy · 2024-11-22T03:12:13Z

Hello, thanks for this great job! I followed the instructions INFERENCE , but encountered some difficulties.

from parler_tts import ParlerTTSForConditionalGeneration
import torch
from transformers import AutoTokenizer
import soundfile as sf


torch_device = "cuda:0" # use "mps" for Mac
torch_dtype = torch.float32
model_name = "parler-tts/parler-tts-mini-v1"

attn_implementation = "sdpa" # "sdpa" or "flash_attention_2"

model = ParlerTTSForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch_dtype, attn_implementation=attn_implementation).to(torch_device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."

input_ids = tokenizer(description, return_tensors="pt").input_ids.to(torch_device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(torch_device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().to(torch.float32).numpy().squeeze()

sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

when I set attn_implementation="sdpa"，get an error

ValueError: T5EncoderModel does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")`

and set attn_implementation="flash_attention_2"，get an error

ValueError: T5EncoderModel does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/google/flan-t5-large/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

I use A100 GPU, my environment is:

transformers                4.46.1
torch                       2.3.0
flash-attn                  2.5.8

Am I missing some important configuration information?

The text was updated successfully, but these errors were encountered:

remichu-ai · 2024-11-29T01:30:55Z

i am encountering this error too. Appreciate if there can be any help

aixingxy · 2024-11-29T01:31:53Z

来函妥收。

jack-richards · 2024-11-29T21:24:56Z

I am getting the same issue, I also followed the tutorial exactly.

lukaLLM · 2024-12-11T15:52:31Z

@aixingxy I am getting the same issue It was working before on an old installation that I have on Conda it seems there was some update that made it happen as I installed a new one and got it this week. I advise you to use the default one as when I tested all of them on 3090 and L40s I didnt see much difference in speed.

bzikst · 2025-01-12T17:12:45Z

Bumping transformers version to 4.48.0 solved the problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using sdpa and flash_attention_2 error #168

Using sdpa and flash_attention_2 error #168

aixingxy commented Nov 22, 2024

remichu-ai commented Nov 29, 2024

aixingxy commented Nov 29, 2024 via email

jack-richards commented Nov 29, 2024

lukaLLM commented Dec 11, 2024

bzikst commented Jan 12, 2025

Using sdpa and flash_attention_2 error #168

Using sdpa and flash_attention_2 error #168

Comments

aixingxy commented Nov 22, 2024

remichu-ai commented Nov 29, 2024

aixingxy commented Nov 29, 2024 via email

jack-richards commented Nov 29, 2024

lukaLLM commented Dec 11, 2024

bzikst commented Jan 12, 2025