-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would you share the performance data? #2
Comments
Hi :D I'm assuming you have [matrix A of batch size b] and [matrix B of batch size b]. Just consider that this is not the fastest possible kernel for batch tiled-matrix-multiplication. |
btw, it might be the tile size or the block size that hinders fast multiplication. |
Hi, thanks for your reply. I'm trying to design a fast batch tiled-mm kernel. In my circumstance, the batch means that multiple matrix A multiply with a single matrix B. Not multi-A multiply correspondingly with multi-B. I will take a deeper look into your idea :) Thanks, Here is my idea:
The performance becomes not that good, so I wondered whether there exist some good ideas. |
I tried to realize the matrix multiplication over batched 2D input matrix A with the same 2D matrix B.
By simply expand the gridDims.z, which seems the same way like you did, I find the computation become quite slow.
The text was updated successfully, but these errors were encountered: