Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ILP32 on RV64 in psABI #381

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Liaoshihua
Copy link

This pull request adds a new e_flags X32. It occupies the sixth bit of e_flags layout.

We have initially implemented rv64 ilp32 on the gnu toolchain and kernel.
Details in this link.

@guoren83 @palmer-dabbelt @kito-cheng

@jrtc27
Copy link
Collaborator

jrtc27 commented May 19, 2023

Regardless of whether we want it, x32 is completely the wrong name for it. x32 comes from the x86, x86_64 and (awful) x64 terminology specific to x86.

@jrtc27
Copy link
Collaborator

jrtc27 commented May 19, 2023

I'll also note that, normally, ILP32-on-64 ABIs use ELFCLASS32, not ELFCLASS64. This is complicated on RISC-V by the fact that there is a single EM_RISCV, not separate EM_RISCV32 and EM_RISCV64.

@Liaoshihua
Copy link
Author

I'll also note that, normally, ILP32-on-64 ABIs use ELFCLASS32, not ELFCLASS64. This is complicated on RISC-V by the fact that there is a single EM_RISCV, not separate EM_RISCV32 and EM_RISCV64.

In ILP32-on-RV64, we used ELFCLASS32 and EM_RISCV. X32(Let's temporarily call it this name) is added to distinguish between ilp32 on RV64 and ilp32 on RV32 . I think this e_flag is necessary.

This is the ELF Header which generated by ILP32-on-RV64
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 24620 (bytes into file)
Flags: 0x21, RVC, X32, soft-float ABI

@kito-cheng
Copy link
Collaborator

I would prefer using something like register a new e_machine value EM_RISCV64_X32 or EM_RISCV64_ILP32 rather than use a e_flags for two reason: the flags only used for ELFCLASS32, and it's a wast to ELFCLASS64, use new e_machine could prevent old tool interpreter that as a normal RISC-V 32 binary

@Liaoshihua
Copy link
Author

I would prefer using something like register a new e_machine value EM_RISCV64_X32 or EM_RISCV64_ILP32 rather than use a e_flags for two reason: the flags only used for ELFCLASS32, and it's a wast to ELFCLASS64, use new e_machine could prevent old tool interpreter that as a normal RISC-V 32 binary

I could not agree with you.

  1. There are not EM_RISCV64 or EM_RISCV32 in RISC-V, using a new e_machine value EM_RISCV64_X32 means we have to add EM_RISCV64 and EM_RISCV32 . This is maybe a more serious waste than "it's a wast to ELFCLASS64
  2. In situations similar to mips-n32 and aarch64_ilp32, them use the same e_machine value with N64 or LP64 ,LLP64. Unable to prove such a major flaw, I tend to follow them.

@kito-cheng
Copy link
Collaborator

kito-cheng commented May 19, 2023

e_machine is 16 bit value, and use as sequence number, so we still have 2^16 - 248 = 65288 to use, but e_flags is use as bit vector, and reserved 8 bit for non-std use, and we also used 5 bit, so we have only 19 bits left, and adding EF_RISCV_X32 make it become 18 bits left.

So compared to e_flags, e_machine has much larger room to use (waste :P).

@Liaoshihua
Copy link
Author

OK, you convinced me.
So do we need to set EM_RISCV32 and EM_RISCV64 as well? And what number should be chosen?

@kito-cheng
Copy link
Collaborator

Seems like it should be happened here, we could send request to registry@xinuos.com and https://groups.google.com/g/generic-abi , but I would like to pick an random value before we reach further consensus.

However I know a random value might be a bit too vague, maybe we could tentatively use 1243 (EM_RISCV + 1000), and update it in this PR after it's registered later.

@asb
Copy link
Collaborator

asb commented May 19, 2023

OK, you convinced me. So do we need to set EM_RISCV32 and EM_RISCV64 as well? And what number should be chosen?

I don't have an overly strong view on EM_RISCV_X32 (though Kito makes a good point about eflags bits being scarce), but don't see why we'd need to introduce EM_RISCV32 and EM_RISCV64 as well at this point - it feels like it would cause confusion with no real gain.

@jrtc27
Copy link
Collaborator

jrtc27 commented May 19, 2023

The world would have been better if EM_RISCV had been split in two like that, but it wasn't, so we need to live with that, and trying to retroactively do it is a bad idea. So I agree with Alex.

@pz9115
Copy link

pz9115 commented Nov 14, 2024

This feature has already been adopted by upstream open-source RTOS projects, including NuttX and RT-Thread. During the 11.7 psABI meeting, it was agreed to include N32 in the psABI specification as an experimental feature.

@kito-cheng @guoren83

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 14, 2024

I don’t think it was completely agreed, but it will receive less opposition than trying to make it non-experimental

@guoren83
Copy link

Devboard:
BPI-CanMV-K230D-Zero: https://docs.banana-pi.org/en/BPI-CanMV-K230D/BananaPi_BPI-CanMV-K230D-Zero
Toolchains:
GCC: https://github.com/ruyisdk/riscv-gnu-toolchain-rv64ilp32
LLVM: https://github.com/ruyisdk/llvm-project/tree/rv64ilp32
Linux:
Linux Kernel: https://github.com/ruyisdk/linux-xuantie-kernel
RTOS:
EasyXem (AUTOSAR CP R19-11): https://atomgit.com/easyxmen/XMen/tree/rv64ilp32-dev
Nuttx: https://github.com/apache/nuttx
RT-Thread: RT-Thread/rt-thread#9194

Due to the widespread adoption of N32 ABI, I support including N32 as an experimental feature in the psABI specification.

@aswaterman
Copy link
Contributor

Has the design of the RV64 ILP32 ABI been thoroughly vetted, and have the inevitable performance wrinkles been worked out?

I don't want to be an impediment to adopting an RV64 ILP32 ABI, but this seems like a surprisingly small PR to that end.

riscv-elf.adoc Outdated Show resolved Hide resolved
@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 16, 2024

I don’t see a change regarding ELFCLASS32/64, for one. I’m also concerned about the 2 GiB restriction making this of extremely limited use.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 16, 2024

My other big concern as a quite separate issue is with the toolchain side. Our toolchain conventions dictate that cc -mabi=ilp32[fde]? for a 64-bit-targeting compiler gets you RV32. How then would you enable this ABI instead?

@aswaterman
Copy link
Contributor

aswaterman commented Nov 17, 2024

I’m also concerned about the 2 GiB restriction making this of extremely limited use.

Yeah. What is the exact nature of the restriction? We did design the virtual memory system so that addresses up to 4 GiB are usable. I would think that a 2 GiB relative limit in static addressing is a requirement, following the usual logic for RV64, but we should be able to provide the whole ~4 GiB heap.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 17, 2024

I’m also concerned about the 2 GiB restriction making this of extremely limited use.

Yeah. What is the exact nature of the restriction? We did design the virtual memory system so that addresses up to 4 GiB are usable. I would think that a 2 GiB relative limit in static addressing is a requirement, following the usual logic for RV64, but we should be able to provide the whole ~4 GiB heap.

It's there because the ISA and ABI like to sign-extend things and so sign-extending an address >= 2 GiB gives you something in the kernel's address space, so you'd have to make more changes to the ABI that may have a performance cost. Unlike using UXL=32 where you can actually use the full 32-bit address space, which seems like the wrong way round for the two configurations if anything. Combined that makes this quite a niche thing; you'd likely struggle to build a full distro with it, I know Debian struggled with 32-bit MIPS's (architectural) address space limitations for many years, so it would be for embedded use only, at which point why not just use RV32, that's what it's for.

@aswaterman
Copy link
Contributor

I was conflating the UXL=32 story with the RV64 ILP32 situation. Sorry for that noise.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 17, 2024

I was conflating the UXL=32 story with the RV64 ILP32 situation. Sorry for that noise.

Well, it is still somewhat relevant. As far as I understand, i386 and x32 can both use a 4 GiB address space (provided you're using a 64-bit kernel), and neither MIPS N32 nor MIPS O32 can use a 4 GiB address space as the various untranslated/kernel windows are architectural and the 32-bit regions still present in the 64-bit address space (though it wouldn't surprise me if R6 or another revision allows turning this off given it ditched a bunch of other historical baggage, my knowledge is dated). RISC-V, as far as I know, is unusual in its 32-bit ISA+ABI having a bigger usable address than this proposed 64-bit ISA + 32-bit ABI (or even different in any respect), so it's worth drawing attention to.

@guoren83
Copy link

guoren83 commented Nov 18, 2024

I’m also concerned about the 2 GiB restriction making this of extremely limited use.

Yeah. What is the exact nature of the restriction? We did design the virtual memory system so that addresses up to 4 GiB are usable. I would think that a 2 GiB relative limit in static addressing is a requirement, following the usual logic for RV64, but we should be able to provide the whole ~4 GiB heap.

It's there because the ISA and ABI like to sign-extend things and so sign-extending an address >= 2 GiB gives you something in the kernel's address space, so you'd have to make more changes to the ABI that may have a performance cost. Unlike using UXL=32 where you can actually use the full 32-bit address space, which seems like the wrong way round for the two configurations if anything. Combined that makes this quite a niche thing; you'd likely struggle to build a full distro with it, I know Debian struggled with 32-bit MIPS's (architectural) address space limitations for many years, so it would be for embedded use only, at which point why not just use RV32, that's what it's for.

The facts are:

  1. Sophon-cv1800B 64MB (XuanTie C906)
  2. Kendryte-k230d 128MB (XuanTie C908)
  3. Renesas RZ/Five xxxMB (Andes AX45)
  4. Allwinner D1s/F133 64MB (XuanTie C906)
    ...
    Small memory rv64 ISA chips have widely come out; they need 64ILP32 ABI to solve the productization problem.

Your question becomes, why did the above fact come out? So, Let me guess a little:

  1. RV64 ISA is more scalable. For example, rv32 Linux only supports a maximum of 1GB of physical memory, but rv64 Linux supports 128+GB-sv39/64+TB-sv48 with almost no limitations.
  2. RV64 ISA supports ld/sd instructions, which satisfies the vendor more than lw/sw.
  3. RV64 ISA supports 64-bit atomic instructions, which satisfies the Linux & OS kernel design.
  4. RV64 ISA supports 64-bit ALU instructions and helps with "long long" variable-type processing.

RV64 ISA has a differentiated advantage over arm32, making it easier to attract customers. Customers will ask you to give the advantages of rv32 over arm32, which is very difficult. Sometimes, telling customers that your product can be upgraded to 64-bit ISA and evolve to a higher-end product form is more convincing.

@guoren83
Copy link

for a 64-bit-targeting compiler gets you RV32. How then would you enable this ABI instead?

Only when -march=rv64* && -mabi=ilp32* are specified, the rv64ilp32 ABI would be enabled. This will not affect the current default -mabi=ilp32* usage, which stays on the rv32ilp32 ABI.

@aswaterman
Copy link
Contributor

aswaterman commented Nov 18, 2024

I also see value in RV64 ILP32, since the wider registers, memory accesses, etc. will offer higher performance than UXL=32.

But I'd like to take a beat and see if we can somehow work around the 2 GiB heap addressing limitation. As a strawman, we could consider the idea that pointers are unsigned and zero-extended. Pointers would be loaded from memory using LWU instead of LW, which would hurt code size by a few %. Some address computations would be performed using ADD instead of ADDW, which would improve code size a little bit. Some type-punning operations would result in explicit zero-/sign-extensions, increasing dynamic instruction count somewhat. It would be possible for buggy code to generate pointers greater than 2^32, since ADD[I] and load/store offset addressing don't wrap addresses mod 2^32, but this would correspond to UB anyway.

Is something like this workable?

@guoren83
Copy link

guoren83 commented Nov 19, 2024

I also see value in RV64 ILP32, since the wider registers, memory accesses, etc. will offer higher performance than UXL=32.

But I'd like to take a beat and see if we can somehow work around the 2 GiB heap addressing limitation. As a strawman, we could consider the idea that pointers are unsigned and zero-extended.
Pointers would be loaded from memory using LWU instead of LW, which would hurt code size by a few %.

Em... There is no C.LWU for LWU, so code size is greatly affected. I make a comparison for Linux kernel code size:
When ISA_C is on, rv32ilp32 is 8580K, and rv64ilp32 is 9041K. (rv64ilp32 expands by 5.4% over rv32)
When ISA_C is off, rv32ilp32 is 11838‬K, and rv64ilp32 is 12091K. (rv64ilp32 expands by 2.1% over rv32)
We found that losing C.LWU contributes a lot to code size.

Some address computations would be performed using ADD instead of ADDW, which would improve code size a little bit.

Yes, and compiler work is more complex than it sounded.

Some type-punning operations would result in explicit zero-/sign-extensions, increasing dynamic instruction count somewhat.

I agree.

It would be posible for buggy code to generate pointers greater than 2^32, since ADD[I] and load/store offset addressing don't wrap addresses mod 2^32, but this would correspond to UB anyway.
Is something like this workable?

Theoretically, this is possible. But we must pay the inevitable performance cost and face complex compiler issues. So, we choose the simplest and most effective way to deal with it: Limit the addressing to 2GiB in psabi-spec to minimize the modification of the spec and compiler. Next, please let me give more explanation from the perspective of productization demand:

From practice, 2GiB is enough for the embedded scenario. Most small memory devices only need no more than 1GB of physical address (e.g., rv32 Linux only supports a maximum of 1GiB physical memory), so 2GiB address space is enough. RISC-V Linux 64-bit compat mode (UXL=32) only supports 2GiB user-space address space, not 4GiB, because it's enough for the embedded scenarios in practice (e.g., k230/k230d productization). So 2GiB is enough for ILP32 in practice, as well as 64ILP32.

In the end, rv64ilp32 is the supplement for rv64lp64. If you need more address space, go to rv64lp64 with a simple replacement. (ISA is the same: RV64*)

@kito-cheng
Copy link
Collaborator

Limiting the address space to just 2G may cause Asan to not work well. Asan
consumes a lot of virtual memory space, so having the full 4G available, if
possible, would be ideal. Asan is a very popular modern memory debugging
tool, and losing support for it would not be a good idea.


Back to zero-extension vs. sign-extension, I would prefer sign-extension
because both the ISA and psABI are sign-extension preferred. Here are some
concrete examples:

  • Arguments are sign-extended if their size is narrower than XLEN, so pointers
    passed as arguments would always require an extra zero-extension IF we don't
    change the calling convention.
  • The medlow memory model becomes useless, as address generation would require
    an extra zero-extension (e.g., lui, addilui, addi, zext), and accessing
    global objects would also need an extra code sequence (e.g., lui, lw
    lui, addi, zext, lw).
  • No C.LWU, as @guoren83 mentioned.

And one Linux-specific question for rv64ilp32 with a 2G restriction: Where is
the address range for placing the vDSO? Does it also go in the lower 2G?

@aswaterman
Copy link
Contributor

It sounds like we might go down a path of doing the easier, and more code-size-efficient, 2 GiB thing. We could always do the 4 GiB thing later as another ABI if we absolutely need to do so.

@guoren83
Copy link

guoren83 commented Nov 20, 2024

Limiting the address space to just 2G may cause Asan to not work well. Asan consumes a lot of virtual memory space, so having the full 4G available, if possible, would be ideal. Asan is a very popular modern memory debugging tool, and losing support for it would not be a good idea.

Linux user-mode rv32ilp32 only supports 2GiB with rv64lp64 Linux kernel (compat mode) and about 2.4GiB with rv32ilp32 Linux kernel (native mode). So, how did Asan support rv32ilp32, which only has 2~2.4 GiB address space?

ps:
Note the current rv64lp64 compat mode only supports (UXL=32 rv32ilp32 ABI) 2GiB address space, not 4GiB address space. AddressSanitizer (ASan) is relevant to all ILP32 ABIs, not just the 64ILP32 ABI.

Back to zero-extension vs. sign-extension, I would prefer sign-extension because both the ISA and psABI are sign-extension preferred. Here are some concrete examples:

  • Arguments are sign-extended if their size is narrower than XLEN, so pointers
    passed as arguments would always require an extra zero-extension IF we don't
    change the calling convention.
  • The medlow memory model becomes useless, as address generation would require
    an extra zero-extension (e.g., lui, addilui, addi, zext), and accessing
    global objects would also need an extra code sequence (e.g., lui, lw
    lui, addi, zext, lw).
  • No C.LWU, as @guoren83 mentioned.

And one Linux-specific question for rv64ilp32 with a 2G restriction: Where is the address range for placing the vDSO? Does it also go in the lower 2G?

Yes, it's the same as user-space rv32ilp32 ABI of rv64lp64 Linux kernel (compat mode with only 2GiB address space).

@kito-cheng
Copy link
Collaborator

Limiting the address space to just 2G may cause Asan to not work well. Asan consumes a lot of virtual memory space, so having the full 4G available, if possible, would be ideal. Asan is a very popular modern memory debugging tool, and losing support for it would not be a good idea.

Linux user-mode rv32ilp32 only supports 2GiB with rv64lp64 Linux kernel (compat mode) and about 2.4GiB with rv32ilp32 Linux kernel (native mode). So, how did Asan support rv32ilp32, which only has 2~2.4 GiB address space?

It's never supported on upstream so honest I don't know, some T-head folks say they will upstream the support but I don't saw yet.

[1] https://lf-rise.atlassian.net/wiki/spaces/HOME/pages/8585550/DP_05_001+-+Address+Sanitizer

ps: Note the current rv64lp64 compat mode only supports (UXL=32 rv32ilp32 ABI) 2GiB address space, not 4GiB address space. AddressSanitizer (ASan) is relevant to all ILP32 ABIs, not just the 64ILP32 ABI.

I am not family with Linux kernel, so I am wondering does compat mode upstreamed?

Back to zero-extension vs. sign-extension, I would prefer sign-extension because both the ISA and psABI are sign-extension preferred. Here are some concrete examples:

  • Arguments are sign-extended if their size is narrower than XLEN, so pointers
    passed as arguments would always require an extra zero-extension IF we don't
    change the calling convention.
  • The medlow memory model becomes useless, as address generation would require
    an extra zero-extension (e.g., lui, addilui, addi, zext), and accessing
    global objects would also need an extra code sequence (e.g., lui, lw
    lui, addi, zext, lw).
  • No C.LWU, as @guoren83 mentioned.

And one Linux-specific question for rv64ilp32 with a 2G restriction: Where is the address range for placing the vDSO? Does it also go in the lower 2G?

It's the same as user-space rv32ilp32 ABI of rv64lp64 Linux kernel (compat mode with only 2GiB address space).

Could you explicitly explain that since I believe not every one is family with Linux kernel here including me.

riscv-cc.adoc Show resolved Hide resolved
riscv-cc.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add rv64ilp32, rv64ilp32f, rv64ilp32d in Named ABIs section, also add a new section === RV64ILP32 Calling Convention right after === ILP32E Calling Convention

riscv-elf.adoc Outdated
Comment on lines 29 to 33
For the ILP32 ABI on RV64* ISA, the medlow allows the code to address lower 2GiB
of the RV64 address space (`0x0` ~ `0x000000007FFFFFFF`).

NOTE: Limiting the address space to lower 2GiB does not pose any issues with sign
extending addresses into the upper 32 bits of a 64-bit register.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer just drop this limitation from the psABI, medlow not really restrict to lower 2 G if using signed-extension, it more like restriction come from OS implementation, which is out of scope of psABI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref to the pseudo-code in the "Linker Relaxation" Chapter. Keeping the 2GiB limitation, we needn't modify any of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use signed-extension for pointer, so in theory we have full 4 GiB for the memory from the 32 bits address space of view, even it seems like lower 2 GiB and highest 2 GiB from the 64 bits address space view.

and I still don't think we should add the limitation here, that more like implementation limitation and should not written down in the psABI side.

Let me say that in another way: Linux user space rv64ilp32 has limitation that only limited to lower 2 GiB is fine, but we should not add that limitation into psABI spec, unless ALL rv64ilp32 scenario should have this limitation, e.g. Linux kernel space rv64ilp32, FreeBSD user/kernel space rv64ilp32 (of cause, this is not existing yet, but once we add this limitation, FreeBSD need take this limitation as well), all RTOS with rv64ilp32 should have this limitation.

Copy link

@guoren83 guoren83 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use signed-extension for pointer, so in theory we have full 4 GiB for the memory from the 32 bits address space of view, even it seems like lower 2 GiB and highest 2 GiB from the 64 bits address space view.

and I still don't think we should add the limitation here, that more like implementation limitation and should not written down in the psABI side.

I'm not sure how to define the 4GiB address space. Some guys say 0xffffffff80000000-0x7fffffff, but some people say 0x0-0xffffffff. But 0x0-0x7ffffff (lowest 2GiB) is determined.

Let me say that in another way: Linux user space rv64ilp32 has limitation that only limited to lower 2 GiB is fine, but we should not add that limitation into psABI spec, unless ALL rv64ilp32 scenario should have this limitation, e.g. Linux kernel space rv64ilp32, FreeBSD user/kernel space rv64ilp32 (of cause, this is not existing yet, but once we add this limitation, FreeBSD need take this limitation as well), all RTOS with rv64ilp32 should have this limitation.

The rv64ilp32 Linux kernel runs in the 2GiB address range, which doesn't need the 4GiB address space. I used the duplicated page table mapping method to make "2GiB-4GiB" equal "-2GiB-0" address space, and sign-extend & zero-extend have the same result. That means the rv64ilp32 Linux kernel follows the rule of the lowest 2GiB address space. Letting the compiler care sign-extend address would cause a performance gap and additional compiler work. As long as the address space is limited to 2GiB, both Linux kernel and other OS kernels have a way of running.

The sign/zero-extend addressing is the pain point of 64ilp32 compared to 32ilp32 & 64lp64; any solution would pay the cost. So, the simplest solution is to limit the address to the lowest 2GiB space. So, let's start rv64ilp32 from the lowest 2GiB address space and see how to solve the upper 2GiB address later. Can we support the upper 2GiB address space in the future?

Best Regards
Guo Ren

Copy link
Collaborator

@kito-cheng kito-cheng Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to define the 4GiB address space. Some guys say 0xffffffff80000000-0x7fffffff, but some people say 0x0-0xffffffff. But 0x0-0x7ffffff (lowest 2GiB) is determined.

You might say you don't know how to define how much space Sv32 has, right? If that's the case, then rv64ilp32 with sign-extension follows the same logic.

The rv64ilp32 Linux kernel runs in the 2GiB address range, which doesn't need the 4GiB address space. I used the duplicated page table mapping method to make "2GiB-4GiB" equal "-2GiB-0" address space, and sign-extend & zero-extend have the same result. That means the rv64ilp32 Linux kernel follows the rule of the lowest 2GiB address space. Letting the compiler care sign-extend address would cause a performance gap and additional compiler work. As long as the address space is limited to 2GiB, both Linux kernel and other OS kernels have a way of running.

The sign/zero-extend addressing is the pain point of 64ilp32 compared to 32ilp32 & 64lp64; any solution would pay the cost. So, the simplest solution is to limit the address to the lowest 2GiB space. So, let's start rv64ilp32 from the lowest 2GiB address space and see how to solve the upper 2GiB address later. Can we support the upper 2GiB address space in the future?

Linux kernel is just one of the user of the psABI spec, so let me reiterate again: Linux user space with lower 2 GiB limitation is fine, but I don't think spec should take this.

And psABI should define the pointer is use sign-extension or zero-extension, this should NOT defined in vague way, also I support this proposal is because I think pointer with sign-extension is reasonable way to RISC-V, and MIPS N32 also go that way as well, which mean this is at least a feasible way.

And last, I am really unhappy about that you guys say it's using sign-extension in the psABI meeting but still want to put this issue in vague way here, seriously I feel that's kinda cheating.

Copy link

@guoren83 guoren83 Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to define the 4GiB address space. Some guys say 0xffffffff80000000-0x7fffffff, but some people say 0x0-0xffffffff. But 0x0-0x7ffffff (lowest 2GiB) is determined.

You might say you don't know how to define how much space Sv32 has, right? If that's the case, then rv64ilp32 with sign-extension follows the same logic.

The rv64ilp32 Linux kernel runs in the 2GiB address range, which doesn't need the 4GiB address space. I used the duplicated page table mapping method to make "2GiB-4GiB" equal "-2GiB-0" address space, and sign-extend & zero-extend have the same result. That means the rv64ilp32 Linux kernel follows the rule of the lowest 2GiB address space. Letting the compiler care sign-extend address would cause a performance gap and additional compiler work. As long as the address space is limited to 2GiB, both Linux kernel and other OS kernels have a way of running.
The sign/zero-extend addressing is the pain point of 64ilp32 compared to 32ilp32 & 64lp64; any solution would pay the cost. So, the simplest solution is to limit the address to the lowest 2GiB space. So, let's start rv64ilp32 from the lowest 2GiB address space and see how to solve the upper 2GiB address later. Can we support the upper 2GiB address space in the future?

Linux kernel is just one of the user of the psABI spec, so let me reiterate again: Linux user space with lower 2 GiB limitation is fine, but I don't think spec should take this.

And psABI should define the pointer is use sign-extension or zero-extension, this should NOT defined in vague way, also I support this proposal is because I think pointer with sign-extension is reasonable way to RISC-V, and MIPS N32 also go that way as well, which mean this is at least a feasible way.

And last, I am really unhappy about that you guys say it's using sign-extension in the psABI meeting but still want to put this issue in vague way here, seriously I feel that's kinda cheating.

Some misunderstandings occur here.

There are three proposals about rv64ilp32 addressing:

1. zero-extension addressing
Address range: 0~4GiB
Because most riscv 32-bit ALU instructions are sign-extension by default, zero-extension addressing would cause more instructions and a performance gap.
(Not recommend)

2. sign-extension addressing
Address range: -2GiB~2GiB
We recommended it in the psABI meeting, but we found it caused a significant modification on psabi-spec:
For example, there are 15+ places of pseudo-code about address calculation:

  5.2 Medium any code model
  \# Calculate address
  lui a0, %hi(symbol)
  addi a0, a0, %lo(symbol) -> addiw a0
  
  8.4.6. Program Linkage Table
  1: 
      auipc t2, %pcrel_hi(.got.plt)
      sub t1, t1, t3 # shifted .got.plt offset + hdr size + 12 -> subw
      l[w|d] t3, %pcrel_lo(1b)(t2) # _dl_runtime_resolve
      addi t1, t1, -(hdr size + 12) # shifted .got.plt offset -> addiw
      addi t0, t2, %pcrel_lo(1b) # &.got.plt -> addiw
      srli t1, t1, log2(16/PTRSIZE) # .got.plt offset
      l[w|d] t0, PTRSIZE(t0) # link map
      jr t3

3. 0~2GiB addressing limitation
Address range: 0~2GiB
Reasons:

  • The motivation for putting 0~2GiB addressing limitation proposal out in this pr is to minimize psabi-spec modification.
  • In actual use, 0~2GiB is enough.
    (Yes, this is a vague way of zero and sign extension addressing in the psabi-spec, and it permits compilers to choose to use sign or zero extension for their implementation.)

In the end:

We're okay with sign-extension addressing. (-2GiB~2GiB)

If you need a sign-extension solution, we will supplement the modification of the address calculation pseudo code in psabi-spec for rv64ilp32 (Making psabi-spec logically self-consistent).

Copy link

@guoren83 guoren83 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer just drop this limitation from the psABI, medlow not really restrict to lower 2 G if using signed-extension, it more like restriction come from OS implementation, which is out of scope of psABI

For sign-extension, we still need to modify this part like this:

The medium low code model, or medlow, allows the code to address the whole RV32 address space or the lower 2 GiB and highest 2 GiB of the RV64 address space (64LP64 ABI: 0xFFFFFFFF7FFFF800 ~ 0xFFFFFFFFFFFFFFFF and 0x0 ~ 0x000000007FFFF7FF) (64ILP32 ABI: 0x0 ~ 0x000000007FFFFFFF and 0xFFFFFFFF80000000 ~ 0xFFFFFFFFFFFFFFFF). By using the lui and load / store instructions, when referring to an object, or addi (64LP64 ABI), or addiw (64ILP32 ABI), when calculating an address literal, for example, a 32-bit address literal can be produced.

The following instructions show how to load a value, store a value, or calculate an address in the
medlow code model.

# Load value from a symbol
lui a0, %hi(symbol)
lw a0, %lo(symbol)(a0)
# Store value to a symbol
lui a0, %hi(symbol)
sw a1, %lo(symbol)(a0)
# Calculate address
lui a0, %hi(symbol)
addi[w] a0, a0, %lo(symbol)

The ranges on RV64 with 64LP64 ABI are not 0x0 ~ 0x000000007FFFFFFF and 0xFFFFFFFF80000000 ~ 0xFFFFFFFFFFFFFFFF due to RISC-V’s sign-extension of immediates; the following code fragments show where the ranges come from:

# Largest postive number:
lui a0, 0x7ffff # a0 = 0x7ffff000
addi a0, 0x7ff # a0 = a0 + 2047 = 0x000000007FFFF7FF
# Smallest negative number:
lui a0, 0x80000 # a0 = 0xffffffff80000000
addi a0, a0, -0x800 # a0 = a0 + -2048 = 0xFFFFFFFF7FFFF800

The ranges on RV64 with 64ILP32 ABI are 0x0 ~ 0x000000007FFFFFFF and 0xFFFFFFFF80000000 ~ 0xFFFFFFFFFFFFFFFF due to RISC-V’s sign-extension of immediates; the following code fragments show where the ranges come from:

# Largest postive number:
lui a0, 0x7ffff # a0 = 0x7ffff000
addiw a0, 0x7ff # a0 = a0 + 2047 = 0x000000007FFFF7FF
# Smallest negative number:
lui a0, 0x80000 # a0 = 0xffffffff80000000
addiw a0, a0, -0x800 # a0 = a0 + -2048 = 0x000000007FFFF800

In the end:
If the sign-extend is determined, all calculated addresses about rv64 must distinguish lp64 & ilp32 ABIs in the psabi-spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you limit to 2 GiB you need to do that already, because 0x000000007FFFFFFF cannot be produced by the normal RV64 code sequences. The only restriction that would let you ignore the problem would be if you only allowed the negative/upper half of the address space to be used. So I don't buy this argument.

Copy link

@guoren83 guoren83 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You pointed out the fallacy of the 0 - 2GiB limitation, and the correct name should be 0~0x7FFFF7FF (upper half of the address space). Thank you.

@kito-cheng is opposed to "limitation the upper half of the address space":
The psABI should define the pointer is use sign-extension or zero-extension, this should NOT defined in vague way.

So, we abandoned the "limitation of the upper half of the address space" proposal and returned to the sign-extension addressing proposal. These days, we will supplement the modification of the address calculation pseudo code in the psabi-spec for rv64ilp32.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrtc27 @kito-cheng

We've updated PR with sign-extend addressing, corrected the naming with RV64ILP32* ABIs, and used "addiw" for address calculation.

If there is any problem, please let me know. Thanks.

@guoren83
Copy link

guoren83 commented Nov 20, 2024

Limiting the address space to just 2G may cause Asan to not work well. Asan consumes a lot of virtual memory space, so having the full 4G available, if possible, would be ideal. Asan is a very popular modern memory debugging tool, and losing support for it would not be a good idea.

Linux user-mode rv32ilp32 only supports 2GiB with rv64lp64 Linux kernel (compat mode) and about 2.4GiB with rv32ilp32 Linux kernel (native mode). So, how did Asan support rv32ilp32, which only has 2~2.4 GiB address space?

It's never supported on upstream so honest I don't know, some T-head folks say they will upstream the support but I don't saw yet.

[1] https://lf-rise.atlassian.net/wiki/spaces/HOME/pages/8585550/DP_05_001+-+Address+Sanitizer

Thx for mentioning. @joshua-arch1 has contributed a 64-bit Asan riscv porting, and he will continue the work of a 32-bit Asan riscv port. As far as I know, @joshua-arch1 would use a 2GiB address space layout, which leaves the target program with 1.7GiB of address space. This is also compatible with the 2.4GiB user address space for the rv32ilp32 Linux kernel.

ps: Note the current rv64lp64 compat mode only supports (UXL=32 rv32ilp32 ABI) 2GiB address space, not 4GiB address space. AddressSanitizer (ASan) is relevant to all ILP32 ABIs, not just the 64ILP32 ABI.

I am not family with Linux kernel, so I am wondering does compat mode upstreamed?

Yes, compat mode has been upstreamed for two years.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/asm/compat.h

Back to zero-extension vs. sign-extension, I would prefer sign-extension because both the ISA and psABI are sign-extension preferred. Here are some concrete examples:

  • Arguments are sign-extended if their size is narrower than XLEN, so pointers
    passed as arguments would always require an extra zero-extension IF we don't
    change the calling convention.
  • The medlow memory model becomes useless, as address generation would require
    an extra zero-extension (e.g., lui, addilui, addi, zext), and accessing
    global objects would also need an extra code sequence (e.g., lui, lw
    lui, addi, zext, lw).
  • No C.LWU, as @guoren83 mentioned.

And one Linux-specific question for rv64ilp32 with a 2G restriction: Where is the address range for placing the vDSO? Does it also go in the lower 2G?

It's the same as user-space rv32ilp32 ABI of rv64lp64 Linux kernel (compat mode with only 2GiB address space).

Could you explicitly explain that since I believe not every one is family with Linux kernel here including me.

vDSO must stay within TASK_SIZE. Yes, it also goes in the lower 2GiB.

Here is the patch:
https://lore.kernel.org/linux-riscv/20231112061514.2306187-6-guoren@kernel.org/

Liaoshihua and others added 2 commits December 4, 2024 22:42
riscv-cc.adoc:
 - Add ABIs and ISAs mapping description about RV64ILP32* ABIs
   on RV64* ISAs.
 - Correct C/{Cpp} type sizes and alignments descriptions.

riscv-elf.adoc:
 - Add EF_RISCV_RV64ILP32 in e_flags field.

Signed-off-by: Liao Shihua <shihua@iscas.ac.cn>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Add abi-rv64ilp32(f)(d)(q) calling convention sections.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Liao Shihua <shihua@iscas.ac.cn>
Signed-off-by: Jia-Wei Chen <jiawei@iscas.ac.cn>
Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that RV64ILP32 continue using addi rather than addiw to minimize the impact on code generation. Otherwise, the code sequence for lui+load/store or auipc+load/store would need to be dropped since the immediate part is just sign-extended, like addi.

The trade-off is that the medlow range would become 0xFFFFFFFF7FFFF800 ~ 0xFFFFFFFFFFFFFFFF and 0x0 ~ 0x000000007FFFF7FF, which is the same as the normal RV64/medlow. We might lose a few addresses for medlow, but we can keep using lui+load/store.

For medany, we should continue using auipc + addi as well, but add a NOTE mentioning that the address space is NOT continuous in the middle. This property will remain even if we use auipc+addiw. The same issue as medlow applies here—we can't use auipc+load/store if we switch to addiw.

I believe this change would still work in most situations, such as under the lower 2G constraint in Linux user-space implementations. This approach is similar to the previous version but makes the 0xFFFFFFFF7FFFF800 ~ 0xFFFFFFFFFFFFFFFF address space usable.

Once again, I understand your concerns about performance and code size. However, I also want to avoid over-constraining RV64ILP32. I believe this approach addresses concerns from both your perspective and mine.

Lastly, my previous comment may have come across as a bit sharp, and I understand that might not have been your intent. When drafting the spec, it’s normal to leave some room for interpretation, but I’d encourage using consistent terminology and avoiding ambiguity during discussions. This can help prevent misunderstandings.

The address space of RV64ILP32* ABIs is not continuous in the
middle for medium any code model.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Liao Shihua <shihua@iscas.ac.cn>
Signed-off-by: Jia-Wei Chen <jiawei@iscas.ac.cn>
@kito-cheng
Copy link
Collaborator

This is generally LGTM as an experimental proposal, but I would like to defer the final approval to the next psABI meeting (time for next psABI meeting is not decide yet, will update once confirmed)

@jrtc27
Copy link
Collaborator

jrtc27 commented Dec 23, 2024

Although it’s experimental, as it’s a big enough feature that could be hard to change later down the line I’d like to comb through it myself after the holidays and wait for others to give more feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants