-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using sdpa and flash_attention_2 error #168
Comments
i am encountering this error too. Appreciate if there can be any help |
来函妥收。
|
I am getting the same issue, I also followed the tutorial exactly. |
@aixingxy I am getting the same issue It was working before on an old installation that I have on Conda it seems there was some update that made it happen as I installed a new one and got it this week. I advise you to use the default one as when I tested all of them on 3090 and L40s I didnt see much difference in speed. |
Bumping transformers version to 4.48.0 solved the problem |
Hello, thanks for this great job! I followed the instructions INFERENCE , but encountered some difficulties.
when I set
attn_implementation="sdpa"
,get an errorand set
attn_implementation="flash_attention_2"
,get an errorI use A100 GPU, my environment is:
Am I missing some important configuration information?
The text was updated successfully, but these errors were encountered: