Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transpose basis performance for magma backend #1385

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

sebastiangrimberg
Copy link
Collaborator

In profiling it turns out that the cost of the magma_dlaset call to zero out V for the basis transpose application was quite expensive and the cause of the transpose kernels being 10-20% slower than the non-transpose ones. This PR removes these calls and modifies the relevant magma backend kernels to write to V instead of accumulating into it after it was set to zero.

NOTE: This PR currently targets #1301, and can be either merged in there or we can wait until #1301 is approved and merged to main before rebasing and merging this one.

@sebastiangrimberg
Copy link
Collaborator Author

Here is an example performance comparison for the non-tensor basis application with P = 20, Q = 24 (p = 3 tet) on a V100. This PR should also help improve performance for the magma tensor backends as well.

Before:

image

After:

image

@nbeams
Copy link
Contributor

nbeams commented Oct 26, 2023

Hi @sebastiangrimberg, I'm trying to get caught up on the activity with these PRs. I'd vote for going ahead and merging this into #1301 as you are currently proposing, then we can merge #1301 into main, then rebase #1366 with the updated kernels so that I can create/add the updated MI250X and A100 data to that PR. Is that what you had in mind? Or do you prefer to do #1366 before #1301?

@sebastiangrimberg
Copy link
Collaborator Author

sebastiangrimberg commented Oct 27, 2023

Sounds great. I will merge this into #1301 and will wait for your final review there. Thanks @nbeams! Note the failing Python CI test is unrelated, should be fixed by #1387.

@sebastiangrimberg sebastiangrimberg merged commit a0804ae into sjg/hcurl-hdiv-basis-magma Oct 27, 2023
@sebastiangrimberg sebastiangrimberg deleted the sjg/magma-transpose-opt branch October 27, 2023 00:22
sebastiangrimberg added a commit to sebastiangrimberg/libCEED that referenced this pull request Oct 27, 2023
Improve transpose basis performance for `magma` backend
sebastiangrimberg added a commit that referenced this pull request Nov 1, 2023
Improve transpose basis performance for `magma` backend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants