Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCA fails to parse Intel syntax movsx instructions #122616

Open
TiborGY opened this issue Jan 11, 2025 · 7 comments
Open

MCA fails to parse Intel syntax movsx instructions #122616

TiborGY opened this issue Jan 11, 2025 · 7 comments

Comments

@TiborGY
Copy link

TiborGY commented Jan 11, 2025

See: https://gcc.godbolt.org/z/z4nWWTj6E
Trying to analyze some assembly from gfortran results in errors when MCA sees some movsx instructions:

<source>:105:2: error: invalid operand for instruction
        movsx   rax, r10d
        ^
<source>:253:2: error: invalid operand for instruction
        movsx   rdi, esi
@TiborGY TiborGY changed the title MCA fails to parse movsx intructions MCA fails to parse movsx intsructions Jan 11, 2025
@TiborGY TiborGY changed the title MCA fails to parse movsx intsructions MCA fails to parse movsx instructions Jan 11, 2025
@topperc
Copy link
Collaborator

topperc commented Jan 11, 2025

llvm-mca probably expects AT&T syntax by default. I'm on my phone so I cant really look at the godbolt link.

@TiborGY
Copy link
Author

TiborGY commented Jan 11, 2025

llvm-mca probably expects AT&T syntax by default. I'm on my phone so I cant really look at the godbolt link.

Confirmed working with AT&T. Still, this appears to be a movsx specific parsing bug, as so far in general Intel syntax has been working fine with MCA.

@TiborGY TiborGY changed the title MCA fails to parse movsx instructions MCA fails to parse Intel syntax movsx instructions Jan 11, 2025
@topperc
Copy link
Collaborator

topperc commented Jan 11, 2025

llvm-mca probably expects AT&T syntax by default. I'm on my phone so I cant really look at the godbolt link.

Confirmed working with AT&T. Still, this appears to be a movsx specific parsing bug, as so far in general Intel syntax has been working fine with MCA.

I wonder if it appears to work with Intel because the registers are usually the same size, but source and dest are swapped. Movsx is special because the source and dest are different sizes.

@boomanaiden154
Copy link
Contributor

boomanaiden154 commented Jan 12, 2025

I'm pretty sure Craig is right. llvm-mca defaults to ATT Syntax and only works with Intel syntax when you use a .intel_syntax directive at the top of the assembly file. Adding that doesn't exactly fix the problem though...

It looks like there might be a subtle incompatibility between how gfortran emits intel style assembly and how the LLVM MC layer parses it. Taking the following assembly snippet:

movsx   rax, r10d

And running through llvm-mc:

llvm-mc --assemble /test.s --show-encoding

Produces the following:

/test2.s:1:12: error: invalid operand for instruction
movsx rax, r10d
           ^~~~

Writing it in ATT Syntax and then assembling in llvm-mc produces the encoding as 4963C2. Disassembling that with llvm-mc produces movsxd rax, r10d, and running the following snippet through llvm-mca:

.intel_syntax
movsxd	rax, r10d

With the following invocation:

llvm-mca -mcpu=znver3 /test.s

produces normal output.

I'm not exactly sure who is in the right here. Either LLVM should be accepting of movsx rax, r10d in Intel syntax, gfortran should be emitting movsxd when working with 64-bit registers, or maybe both.

Interestingly enough, it also seems like we only have APX test coverage for the MC layer for movsxd in Intel syntax. We should probably rectify that.

@TiborGY
Copy link
Author

TiborGY commented Jan 12, 2025

Wait, so both movsx rax, r10d and movsxd rax, r10d produce the same machine code bytes? Is there some kind of non-bijectiveness in the x86-64 spec regarding this group of instructions that prevents llvm-mc from disassembling 4963C2 into movsx ?

@TiborGY
Copy link
Author

TiborGY commented Jan 12, 2025

Scratch that, based on https://www.felixcloutier.com/x86/movsx:movsxd the 63 opcode definitely means movsxd. So to me it seems that llvm-mc is silently replacing movsx with movsxd when assembling movsx rax, r10d.
That feels strange.

@topperc
Copy link
Collaborator

topperc commented Jan 12, 2025

Scratch that, based on https://www.felixcloutier.com/x86/movsx:movsxd the 63 opcode definitely means movsxd. So to me it seems that llvm-mc is silently replacing movsx with movsxd when assembling movsx rax, r10d. That feels strange.

llvm-mc doesn't accept movsx rax, r10d in Intel syntax, but binutils does. llvm-mc will accept movsxd rax, r10d in Intel syntax. llvm-mc will accept movsx %r10d, %rax or movslq %r10d, %rax in AT&T syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants