-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize Frontier compiler entries between SCREAM and E3SM. #6773
Comments
Tagging @bartgol @jgfouca @ambrad @dqwu @brhillman @AaronDonahue |
Cosmetic comment: I personally find the string |
To drive the merging of Frontier compiler entries, I think it would be useful to choose one or two Scream test cases within the E3SM machine/compiler configurations and resolve any issues encountered during their build and execution on Frontier. If necessary, we may create a new Scream test case. |
Use any of the scream test cases that already exist such as ne30_ne30.F2010-SCREAMv1 which is in e3sm_scream_v1_medres in https://github.com/E3SM-Project/E3SM/blob/49fdbe3661f2b8c95d8459081f500fae3a069ba0/cime_config/tests.py#L704C6-L704C28 Those all pass on frontier when using --machine frontier-scream-gpu --compiler craygnuamdgpu You could start by just copying the craygnuamdgpu compiler entry to the frontier machine file while we figure out how to name these. |
FYI, discussed this topic extensively during the Perf/Infra call today: https://acme-climate.atlassian.net/wiki/spaces/EP/pages/4825645058/2024-12-09+Performance+Infrastructure+Meeting+Notes @jgfouca and @grnydawn to coordinate and work together to consolidate config. |
During the EAMxx dev call, we decided on using a "-" between the cpu and gpu instead of putting "gpu" in the compiler name. If there is no dash, that means its a cpu-only compile. |
I've tried to bring up these topics before, but wanted to again state for the record: I also think on frontier, we can simply use |
@rljacob what if we used openacc? Should the compiler be called gnu-openacc? gnu-gnu? And what if we used openacc for f90, and nvcc for C++? Should we call it gnu-nvcc? gnu-openacc-nvcc? |
We have 3 things: wrapper, host compiler, device compiler. That would add a fourth thing: GPU programming model. Why would that be necessary? |
I'm just trying to understand what happens if we use nvcc for C++ and openacc for f90, both running on device. NVCC is a compiler, not a prog model. So would you do Re: openacc. It is a prog model, so what would you use, gnu-gnu for code that uses openacc (or openmp-target) for the accelerator? |
I am working on renaming the 1. Compiler Wrapper
2. Programming Language
3. GPU Programming Framework
4. Model-Specific Compiler
Tentatively, I renamed Thoughts on Renaming
Proposed Naming ConventionConsidering these factors, I came up with the following naming convention: <Fortran compiler name without compiler wrapper>-<C++(GPU) compiler name without compiler wrapper>-<Optional specific information such as model name or GPU programming framework> For example:
The |
This is good but I would like to do away with "model-specific" compiler configs. Why does EAMxx need that? |
Again for the record, I don't think this is the best approach. We don't need those longer names. |
Most compiler names might not include the third, model-specific part. The model name in "gnu-hipcc-eamxx" is added because it may not be able to compile the Omega code due to the "eamxx-specific" compiler configuration explained above. If we still want to remove the third part, I believe the "eamxx-specific" compiler configuration might need to be removed. |
As noted above, most compiler names might not include the third, model-specific part. So, in most cases, the length of this naming scheme would be similar to other suggestions. I think we need to decide which information to retain in the compiler name and which to discard. |
Opening this issue to discuss how to make sure all scream cases build with the "frontier" machine description and at least one compiler entry so we can remove "frontier-scream-gpu" from config_machines.xml
See https://acme-climate.atlassian.net/wiki/spaces/EIDMG/pages/3446079573/How+to+describe+heterogenous+node+machines+with+CIME for background.
In E3SM-Project/scream#2969 (comment) it was noted that screams "craygnuamdgpu” tells the user that it "Uses Cray wrappers with Gnu compilers for the host, and uses the AMD Hip compiler for the GPU". We should follow that convention.
The text was updated successfully, but these errors were encountered: