SAGE ATTENTION: 2.1 TIMES FASTER THAN FLASH ATTENTION 2 AND 2.7 TIMES FASTER THAN XFORMERS #1051
joseph777111
started this conversation in
Ideas
Replies: 1 comment
-
Looks interesting thanks for sharing. Cc @barronalex |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
https://arxiv.org/abs/2410.02367
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
ABSTRACT:
Beta Was this translation helpful? Give feedback.
All reactions