Skip to content
This repository has been archived by the owner on Jun 27, 2024. It is now read-only.

[RFC] Integration with cuDNN via IREE compiler/runtime plugins #12

Open
ezhulenev opened this issue Apr 18, 2023 · 1 comment
Open

[RFC] Integration with cuDNN via IREE compiler/runtime plugins #12

ezhulenev opened this issue Apr 18, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@ezhulenev
Copy link
Contributor

One of the initial goals of openxla-nvgpu plugin is to show how to integrate NVIDIA libraries with the IREE compiler/runtime. The work is already in progress, and few PRs are merged. The design of this integration is outlined in this document: https://docs.google.com/document/d/1WzSH7LdQdL1CQmlIOUyy6auDiX6d3cl5LAzZU_I4KCY/edit#

@ezhulenev ezhulenev added the enhancement New feature or request label Apr 18, 2023
@chsigg
Copy link
Contributor

chsigg commented Apr 20, 2023

Thanks Eugene for sharing the doc, it looks like a solid plan.

I will try to cover some things that are not discussed in the doc here.

Input dialect

We initially started with lowering from mhlo, but switched to stablehlo now. The two dialects are mostly the same, so this shouldn't be difficult, but we will likely need to update the resnet50 model where we want to show the performance advantage of using libraries compared to the code that IREE is currently able to generate.

Do I understand correctly that the various IREE importers are targeting StableHLO and the standard IREE pipeline is able to consume this?

Layout assignment

This is currently being worked out on the IREE side. My very high level thinking is that it will provide an external layout interface that can be injected to StableHLO ops to communicate preferred layouts. We could then inject the same interfaces to cuDNN ops to constraint the layouts to what cuDNN expects.

Cost model

We will implement a model that determines the cost of StableHLO ops and their cuDNN graph equivalent. This will determine which subgraphs are outlined to cuDNN ops. This needs to take downstream fusion opportunities into account, because the performance profiles of a fused vs unfused op are vastly different.

Compilation pipeline

I need to leave this as a placeholder, because I haven't looked into it yet. But we should document our requirements for hooking into the IREE compilation pipeline. The main open issue here is when/how we perform the outlining of StableHLO ops and lowering to cuDNN ops. As far as I know, the downstream path from cuDNN ops is already working.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants