Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow VTYPE=VILL #454

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

palmer-dabbelt
Copy link
Contributor

Implementations are all over the place on whole register moves trapping under VILL, so let's just forbid that case in the psABI.

Signed-off-by: Palmer Dabbelt palmer@rivosinc.com


We've started seeing a bunch of fallout from the "whole register moves depend on TYPE" ISA change, and there's discussion all over the place:

It seems like there's no general consensus on what we do here -- some discussions say we're going to change the psABI (and presumably then the uABI), some say we're not. I don't personally care a ton if we make the ABI change or not, we just need to decide so we can figure out where the bugs are -- there's going to be fallout either way, but we can't really get things fixed until we decide one way or the other. As far as I can tell both paths are valid:

  • If we make these ABI changes then most code that predates the ISA change continues to function correctly after the ISA change. We just need to track down anything that sets VILL and fix it, but we should be able to do that incrementally (maybe even just with a trap handler). Right now I think that's just the kernel, but I'm not 100% sure there. Looks like the first round of HW doesn't trap, though, so we should be safe for a bit.
  • If we don't make these ABI changes then we'll have to fix the compilers and go rebuild everything to match the ISA change. I think the GCC change should be pretty straight-forward, I don't know about the LLVM side of things. I'm not sure what we'd do with the kernel here: we could say the VILL traps are just latent userspace bugs, or we could say we're breaking userspace -- kind of a grey area, so probably more of an LKML question.

I don't think one option is clearly simpler than the other, it's just a question of where we push the bugs.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 21, 2024

Philip had the following three options for changing the ABI:

  1. Require VTYPE to be non-vill on ABI boundaries.
  2. Require VTYPE to be equally vill on ABI boundaries; that is calls would have to preserve the single-bit state of whether vill was active.
  3. Require VTYPE to be no more vill on return than on entry to the function. That is, a non-vill VTYPE on entry must be non-vill on exit, but a vill VTYPE on entry can become non-vill on exit. This would allow callees to unconditionally set VTYPE to any non-vill value.

This picks option 1; is there a reason behind why that's your preferred one? I can see arguments for each of them.

Are we also sure that the Linux kernel is the only thing in the ecosystem setting VTYPE to be VILL on function calls? If so then that's an "easy" ABI break to deal with, but if there are other things out there then we need to know as that could change the story. I'm assuming at least that GCC, like LLVM, assumes VTYPE isn't VILL in various places and won't actively set it to that itself?

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 21, 2024

As for my own personal opinion, I imagine that, short of changing the specification to be more helpful for software, changing the psABI in one of these ways is the least bad option.

@palmer-dabbelt
Copy link
Contributor Author

Philip had the following three options for changing the ABI:

  1. Require VTYPE to be non-vill on ABI boundaries.
  2. Require VTYPE to be equally vill on ABI boundaries; that is calls would have to preserve the single-bit state of whether vill was active.
  3. Require VTYPE to be no more vill on return than on entry to the function. That is, a non-vill VTYPE on entry must be non-vill on exit, but a vill VTYPE on entry can become non-vill on exit. This would allow callees to unconditionally set VTYPE to any non-vill value.

This picks option 1; is there a reason behind why that's your preferred one? I can see arguments for each of them.

It was just the easiest to describe, and I didn't really see any downsides on the implementation side of things.

Are we also sure that the Linux kernel is the only thing in the ecosystem setting VTYPE to be VILL on function calls? If so then that's an "easy" ABI break to deal with, but if there are other things out there then we need to know as that could change the story. I'm assuming at least that GCC, like LLVM, assumes VTYPE isn't VILL in various places and won't actively set it to that itself?

I'm not sure it's only Linux setting this. Library code could of course be doing anything, but setting VILL doesn't seem useful so I'm hoping that doesn't happen too much. That said, there's some language in the V spec along the lines of

The `vill` bit is used to encode that a previous `vset{i}vl{i}`
instruction attempted to write an unsupported value to `vtype`.

which makes me worried there's unintended sets of VILL in normal-smelling code, but I'm not sure what's allowed to be unsupported. One could imagine code that tries to set some vector configuration and falls back to scalar code if the vector configuration doesn't succeed, thus leaving VILL around.

As far as I can tell we're not intentionally generating code that would result in VILL from GCC, but I'm not entirely sure of that. If we did generate that code I think we'd likely just have a functional bug, though, as most instructions aren't useful under VILL.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 21, 2024

Philip had the following three options for changing the ABI:

  1. Require VTYPE to be non-vill on ABI boundaries.
  2. Require VTYPE to be equally vill on ABI boundaries; that is calls would have to preserve the single-bit state of whether vill was active.
  3. Require VTYPE to be no more vill on return than on entry to the function. That is, a non-vill VTYPE on entry must be non-vill on exit, but a vill VTYPE on entry can become non-vill on exit. This would allow callees to unconditionally set VTYPE to any non-vill value.

This picks option 1; is there a reason behind why that's your preferred one? I can see arguments for each of them.

It was just the easiest to describe, and I didn't really see any downsides on the implementation side of things.

The main downside I can see is that the kernel would have to set VTYPE to be non-VILL for signal handlers (unless you forbid VTYPE ever being VILL whilst at least one signal is unblocked, which doesn't sound like a good idea to me). Options 2 and 3 avoid that, the function entry point itself would do it, at the expense of any function wanting to mess with vectors having to set VTYPE even if it only needs whole-register operations.

Are we also sure that the Linux kernel is the only thing in the ecosystem setting VTYPE to be VILL on function calls? If so then that's an "easy" ABI break to deal with, but if there are other things out there then we need to know as that could change the story. I'm assuming at least that GCC, like LLVM, assumes VTYPE isn't VILL in various places and won't actively set it to that itself?

I'm not sure it's only Linux setting this. Library code could of course be doing anything, but setting VILL doesn't seem useful so I'm hoping that doesn't happen too much. That said, there's some language in the V spec along the lines of

The `vill` bit is used to encode that a previous `vset{i}vl{i}`
instruction attempted to write an unsupported value to `vtype`.

which makes me worried there's unintended sets of VILL in normal-smelling code, but I'm not sure what's allowed to be unsupported. One could imagine code that tries to set some vector configuration and falls back to scalar code if the vector configuration doesn't succeed, thus leaving VILL around.

As far as I can tell we're not intentionally generating code that would result in VILL from GCC, but I'm not entirely sure of that. If we did generate that code I think we'd likely just have a functional bug, though, as most instructions aren't useful under VILL.

Yeah that line does sound a little concerning. I don't know enough about the vector world to have an opinion on how likely it is that's a problem, so I'll defer to your and others' experience and/or future investigations. At the end of the day, we need to form consensus between toolchain, OS and, perhaps, accelerated library developers about whether this is a reasonable ABI restriction and break to introduce.

@preames
Copy link

preames commented Nov 21, 2024

@palmer-dabbelt Can I ask you to spell out a couple details you mentioned offline? Specifically, how do you see an ABI change interacting with the kernel's lazy save/restore of vector state? And what is your opinion w.r.t. the existence of a kernel version which doesn't follow the new ABI variant?

@preames
Copy link

preames commented Nov 21, 2024

On the question of code branching on VILL, I think that is moderately unlikely.

I can't think of a case where the compiler would emit such code with semantics being anything other than full UB. If the user compiled with e.g. zvl1024b and tries to run on a zvl128b machine, you'd get VILL, but also trap on the first actual vector instruction (ignore whole register move for the moment), and there would be no recovery mechanism.

In terms of dynamic dispatching, the recommended answer would be to use hwprobe. It's possible that someone used VILL to do feature sniffing, but given existing hardware doesn't trap on a whole register move, it's unlikely anyone has production code which relies on that behavior. I could see them relying on trapping of e.g. a vadd.vv for feature sniffing, but the whole register move case seems unlikely.

In order for the vadd.vv case (with vtype left vill) to be problematic, we'd need to have a dynamic path which did that type of feature sniffing, and then unconditionally ran a vector move anyways. The only plausible case I can think of here is code that was sniffing VLEN, and given the availability of VLENB as a register, there's far more natural ways to write that code.

riscv-cc.adoc Outdated
@@ -120,7 +120,8 @@ The `vxrm` and `vxsat` fields of `vcsr` are not preserved across calls and their
values are unspecified upon entry.

Procedures may assume that `vstart` is zero upon entry. Procedures may assume
that `vstart` is zero upon return from a procedure call.
that `vstart` is zero upon return from a procedure call. Procedures may assume
that `vtype` is never `VILL`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to retroactively change a specification (which I think we should here), we should clearly call that out in the specification text itself. (That is, acknowledge that a change was made retroactively in the text itself.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a NOTE

@aswaterman
Copy link
Contributor

@palmer-dabbelt I agree with the sentiment of this proposal. How are we handling this property at syscall boundaries? I discussed this with a few folks at SiFive, and we are coming to the conclusion that, after the kernel zeroes out the state, it should set vtype to 0 (i.e. vill=0), achieving this property. It doesn't make sense to us to make userspace responsible for doing that after a syscall.

@palmer-dabbelt
Copy link
Contributor Author

@palmer-dabbelt Can I ask you to spell out a couple details you mentioned offline? Specifically, how do you see an ABI change interacting with the kernel's lazy save/restore of vector state?

Ya, sorry, I forgot about that -- and I think it also answers JRTC's comment about the signal handler stuff.

We use sstatus.vs to control the vector lazy save/restore in the kernel, not vtype. That same mechanism also allows for lazy initialization, which means there's really no cost to picking an arbitrary value that the vector state must be initialized to -- we're already taking a trap on that first use and doing a bunch of heavyweight stuff like allocating space for a vector context, so we can just initialize that context with anything.

The only thing that really needs to change here is how we initialize a vector context. That happens in two cases: when we trap on the first use of a vector, and when we destroy an active vector context on syscalls. Right now the uABI allows for vill in both of those cases, if we forbid vill then we just need to set vtype to some other value.

So I think just this would be enough to do it, assuming 0 is a legal vtype (I couldn't quite tell from the spec):

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index be7d309cca8a..2f323d5e9d07 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -58,6 +58,7 @@ static inline void riscv_v_vstate_off(struct pt_regs *regs)
 static inline void riscv_v_vstate_on(struct pt_regs *regs)
 {
 	regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL;
+	regs->vtype = 0;
 }
 
 static inline bool riscv_v_vstate_query(struct pt_regs *regs)
@@ -148,7 +149,7 @@ static inline void __riscv_v_vstate_restore(struct __riscv_v_ext_state *restore_
 
 static inline void __riscv_v_vstate_discard(void)
 {
-	unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
+	unsigned long vl;
 
 	riscv_v_enable();
 	asm volatile (
@@ -159,9 +160,8 @@ static inline void __riscv_v_vstate_discard(void)
 		"vmv.v.i	v8, -1\n\t"
 		"vmv.v.i	v16, -1\n\t"
 		"vmv.v.i	v24, -1\n\t"
-		"vsetvl		%0, x0, %1\n\t"
 		".option pop\n\t"
-		: "=&r" (vl) : "r" (vtype_inval) : "memory");
+		: "=&r" (vl) :: "memory");
 	riscv_v_disable();
 }
 
@@ -172,6 +172,7 @@ static inline void riscv_v_vstate_discard(struct pt_regs *regs)
 
 	__riscv_v_vstate_discard();
 	__riscv_v_vstate_dirty(regs);
+	regs->vtype = 0;
 }
 
 static inline void riscv_v_vstate_save(struct __riscv_v_ext_state *vstate,

And what is your opinion w.r.t. the existence of a kernel version which doesn't follow the new ABI variant?

This would be a psABI change, strictly speaking the kernel syscall ABI is a different thing (part of the kernel uABI). That said, the rationale for the kernel syscall ABI discarding the entire vector state (including vtype) was that the psABI resulted in that state always being useless. I don't think it'd be too controversial to change the uABI to match a changed psABI here, as what we're doing would be backwards compatible with the old uABI and would make existing software work on new implementations -- the usual kernel rules aren't "don't change the uABI" but instead "don't break userspace", this is one of those rare cases where there's a difference.

We'd have to talk about it on LKML, though, to be sure.

Implementations are all over the place on whole register moves trapping
under VILL, so let's just forbid that case in the psABI.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

We've started seeing a bunch of fallout from the "whole register moves
depend on TYPE" ISA change, and there's discussion all over the place:

* There's a GCC bug
  <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544>
* Also an LLVM bug <llvm/llvm-project#114518>
* QEMU changed behavior in 4eff52cd46 ("target/riscv: Add vill check for
  whole vector register move instructions")
* Philip has a writeup on some of the options in his notes
  <https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst>.
* This has also come up in most of the meetings I'v been in this week.

It seems like there's no general consensus on what we do here -- some
discussions say we're going to change the psABI (and presumably then the
uABI), some say we're not.  I don't personally care a ton if we make the
ABI change or not, we just need to decide so we can figure out where the
bugs are -- there's going to be fallout either way, but we can't really
get things fixed until we decide one way or the other.  As far as I can
tell both paths are valid:

* If we make these ABI changes then most code that predates the ISA
  change continues to function correctly after the ISA change.  We just
  need to track down anything that sets VILL and fix it, but we should
  be able to do that incrementally (maybe even just with a trap
  handler).  Right now I think that's just the kernel, but I'm not 100%
  sure there.  Looks like the first round of HW doesn't trap, though, so
  we should be safe for a bit.
* If we don't make these ABI changes then we'll have to fix the
  compilers and go rebuild everything to match the ISA change.  I think
  the GCC change should be pretty straight-forward, I don't know about
  the LLVM side of things.  I'm not sure what we'd do with the kernel
  here: we could say the VILL traps are just latent userspace bugs, or
  we could say we're breaking userspace -- kind of a grey area, so
  probably more of an LKML question.

I don't think one option is clearly simpler than the other, it's just a
question of where we push the bugs.
@aswaterman
Copy link
Contributor

@palmer-dabbelt Thanks, I think your most recent post answers my question about the kernel.

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 21, 2024

@palmer-dabbelt Can I ask you to spell out a couple details you mentioned offline? Specifically, how do you see an ABI change interacting with the kernel's lazy save/restore of vector state?

Ya, sorry, I forgot about that -- and I think it also answers JRTC's comment about the signal handler stuff.

I don't know the details of how the functions you mention fit into Linux's signal delivery flow. My concern is something like the following (excuse the pseudo-asm):

void myfunc(void) {
    ...
    __asm__ (
        "vgetvtype %[tmp]\n\t"
        "vsetvtype vill\n\t"
        ... /* <-- here */
        "vsetvtype %[tmp]"
        ...
    );
    ...
}

void mysighandler(int code) {
    /* Do a whole-register load/store (maybe looking at the signal content?) */
}

In isolation, myfunc has adhered to the ABI. However, if a signal is delivered somewhere in the region marked "here", mysighandler will, without kernel intervention, run with VTYPE=VILL. Does your proposed patch correctly deal with that case? Now this is obviously a stupid bit of code to write, but my point is that if anything ever sets VTYPE to VILL for any reason, explicitly or implicitly, even for one instruction, then the kernel must handle it. We can say that's invalid, but then that's making a stronger statement in the psABI that VTYPE can never be VILL even within a function, which generally isn't something psABIs like to do (TP, GP and sort of SP, thanks to sigaltstack, being notable exceptions).

@palmer-dabbelt
Copy link
Contributor Author

@palmer-dabbelt Can I ask you to spell out a couple details you mentioned offline? Specifically, how do you see an ABI change interacting with the kernel's lazy save/restore of vector state?

Ya, sorry, I forgot about that -- and I think it also answers JRTC's comment about the signal handler stuff.

I don't know the details of how the functions you mention fit into Linux's signal delivery flow. My concern is something like the following (excuse the pseudo-asm):

void myfunc(void) {
    ...
    __asm__ (
        "vgetvtype %[tmp]\n\t"
        "vsetvtype vill\n\t"
        ... /* <-- here */
        "vsetvtype %[tmp]"
        ...
    );
    ...
}

void mysighandler(int code) {
    /* Do a whole-register load/store (maybe looking at the signal content?) */
}

In isolation, myfunc has adhered to the ABI. However, if a signal is delivered somewhere in the region marked "here", mysighandler will, without kernel intervention, run with VTYPE=VILL. Does your proposed patch correctly deal with that case? Now this is obviously a stupid bit of code to write, but my point is that if anything ever sets VTYPE to VILL for any reason, explicitly or implicitly, even for one instruction, then the kernel must handle it. We can say that's invalid, but then that's making a stronger statement in the psABI that VTYPE can never be VILL even within a function, which generally isn't something psABIs like to do (TP, GP and sort of SP, thanks to sigaltstack, being notable exceptions).

I'm not entirely sure, someone else (maybe @bjoto?) tracked down that last vector signal context tracking bug. I thought we initialized a new vector context for signals, but I couldn't find that in the code (though it's just a pile of hooks, so it's kind of hard to follow).

So we might want something like

diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index dcd282419456..389fc4d78e82 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -283,9 +283,15 @@ static long setup_sigcontext(struct rt_sigframe __user *frame,
 	/* Save the floating-point state. */
 	if (has_fpu())
 		err |= save_fp_state(regs, &sc->sc_fpregs);
-	/* Save the vector state. */
+	/*
+	 * Save the vector state into the signal context, and then destroy the
+	 * context in registers to avoid 
+	 * */
 	if (has_vector() && riscv_v_vstate_query(regs))
+	{
 		err |= save_v_state(regs, (void __user **)&sc_ext_ptr);
+		riscv_v_vstate_discard(regs);
+	}
 	/* Write zero to fp-reserved space and check it on restore_sigcontext */
 	err |= __put_user(0, &sc->sc_extdesc.reserved);
 	/* And put END __riscv_ctx_hdr at the end. */

Though if that's the case we'd likely want it as an optimization even without the uABI change, as that state in registers isn't useful by the time userspace gets to it.

pbo-linaro pushed a commit to pbo-linaro/qemu-ci that referenced this pull request Nov 22, 2024
This is still under discussion in the psABI, but it's looking like we're
going to forbid VILL in userspace in order to maintain compatibility
with binaries that don't expect implementations to trap whole register
moves under VILL (as in QEMU before 4eff52c ("target/riscv: Add vill
check for whole vector register move instructions"), for example).

Fixes: f8c1f36 ("target/riscv: Set vtype.vill on CPU reset")
Link: riscv-non-isa/riscv-elf-psabi-doc#454
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
pbo-linaro pushed a commit to pbo-linaro/qemu-ci that referenced this pull request Nov 22, 2024
https://lore.kernel.org/qemu-devel/20241122003247.8955-1-palmer@rivosinc.com

---

From: Palmer Dabbelt <palmer@rivosinc.com>
To: qemu-devel@nongnu.org,          qemu-riscv@nongnu.org
Received-SPF: pass client-ip=2607:f8b0:4864:20::431;
 envelope-from=palmer@rivosinc.com; helo=mail-pf1-x431.google.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

This is still under discussion in the psABI, but it's looking like we're
going to forbid VILL in userspace in order to maintain compatibility
with binaries that don't expect implementations to trap whole register
moves under VILL (as in QEMU before 4eff52c ("target/riscv: Add vill
check for whole vector register move instructions"), for example).

Fixes: f8c1f36 ("target/riscv: Set vtype.vill on CPU reset")
Link: riscv-non-isa/riscv-elf-psabi-doc#454
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 target/riscv/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f219f0c3b5..d19a44de99 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1022,7 +1022,11 @@ static void riscv_cpu_reset_hold(Object *obj, ResetType type)
     cs->exception_index = RISCV_EXCP_NONE;
     env->load_res = -1;
     set_default_nan_mode(1, &env->fp_status);
+#ifdef CONFIG_USER_ONLY
+    env->vill = false;
+#else
     env->vill = true;
+#endif

 #ifndef CONFIG_USER_ONLY
     if (cpu->cfg.debug) {
--
2.45.2

Signed-off-by: GitHub Actions Bot <bot@github.com>
@kito-cheng
Copy link
Collaborator

After discussed with different people in the past weeks , I incline resolve that on the compiler side becasue:

  1. Reg move without VTYPE setting is rare, so the overhead is log
  2. vill generally will be 0, but it hard to guaranteed it to be 0 at arbitrary times, same concern as @jrtc27, not for linux signal, but also for interpreter handler in non-linux env like bare-metal.
  3. psABI incompatible change is not preferred unless there is no other solution.

Also GCC community already fix that on compiler side, LLVM community also has a proposed patch there, although LLVM community still having some concern on the performance impact, so I would keep this PR open until both compiler has addressed this issue.

@kito-cheng
Copy link
Collaborator

Both LLVM and GCC has landed the fix for whole register move:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants