Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Failure to CSE add with commuted adds #122624

Open
Kmeakin opened this issue Jan 12, 2025 · 2 comments
Open

[AArch64] Failure to CSE add with commuted adds #122624

Kmeakin opened this issue Jan 12, 2025 · 2 comments

Comments

@Kmeakin
Copy link
Contributor

Kmeakin commented Jan 12, 2025

https://godbolt.org/z/937sM6E1e

The code for u32::checked_add() produces two adds on AArch64:

pub fn checked_add_u32(x: u32, y: u32) -> Option<u32> {
    x.checked_add(y)
}
example::checked_add_u32::hb25fbb68aa8411f0:
        cmn     w0, w1
        add     w1, w1, w0
        cset    w8, lo
        mov     w0, w8
        ret

cmn is an alias for adds wzr, w0, w1. And since add is commutative, it should be CSE'd with the next instruction, to give:

example::checked_add_u32::hb25fbb68aa8411f0:
        adds    w1, w1, w0
        cset    w8, lo
        mov     w0, w8
        ret

The same issue does not occur on x86_64 or RISC-V

@llvmbot
Copy link
Member

llvmbot commented Jan 12, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Karl Meakin (Kmeakin)

https://godbolt.org/z/zxfEn17d8

The code for u32::checked_add() produces two adds on AArch64:

pub fn checked_add(x: u32, y: u32) -&gt; Option&lt;u32&gt; {
    x.checked_add(y)
}
example::checked_add::h2758207837cc4075:
        cmn     w0, w1
        add     w1, w1, w0
        cset    w8, lo
        mov     w0, w8
        ret

cmn is an alias for adds wzr, w0, w1. And since add is commutative, it should be CSE'd with the next instruction, to give:

example::checked_add::h2758207837cc4075:
        adds    w1, w1, w0
        cset    w8, lo
        mov     w0, w8
        ret

The same issue does not occur on x86_64 or RISC-V

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Jan 12, 2025

Looking at the LLVM-IR produced, I think the issue is that the llvm.uadd.with.overflow is not CSE'd with the add nuw:

define { i32, i32 } @_ZN7example15checked_add_u3217hb25fbb68aa8411f0E(i32 noundef %x, i32 noundef %y) unnamed_addr #0 {
  %0 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %x, i32 %y)
  %_4.1 = extractvalue { i32, i1 } %0, 1
  %_5 = add nuw i32 %y, %x
  %_0.sroa.3.0 = select i1 %_4.1, i32 undef, i32 %_5, !prof !2
  %not._4.1 = xor i1 %_4.1, true
  %_0.sroa.0.0 = zext i1 %not._4.1 to i32
  %1 = insertvalue { i32, i32 } poison, i32 %_0.sroa.0.0, 0
  %2 = insertvalue { i32, i32 } %1, i32 %_0.sroa.3.0, 1
  ret { i32, i32 } %2
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants