forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[XLA:GPU] Use f32->bfloat conversion instructions on sm_80+
We tried this before with an intrinsic, but that breaks vectorization. Relying on native LLVM types doesn't while delivering the same code improvements. The downside is that LLVM now knows that it's a bfloat instead of a i16 and will optimize based on it. While making this change I had to patch a bunch of holes in the NVPTX LLVM backend, there might be more. Depends on llvm/llvm-project#74827 PiperOrigin-RevId: 590118269
- Loading branch information
1 parent
d820ba9
commit a28a99a
Showing
3 changed files
with
33 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters