-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/hyperv-guest: drop fb_hyperv in favor of drm_hyperv #372743
Conversation
Can confirm this allows my Hyper-V VMs to run Wayland just fine. |
Testing now, although rebuilding the kernel means getting the new image is taking a while. Given it's my work machine that I'm using Hyper-V on, it might not be until Monday that I can confirm the fix works for me. |
You can just blacklist |
@@ -525,6 +525,8 @@ let | |||
DRM_I915_GVT_KVMGT = module; | |||
# Enable Hyper-V Synthetic DRM Driver | |||
DRM_HYPERV = whenAtLeast "5.14" module; | |||
# And disable the legacy framebuffer driver | |||
FB_HYPERV = no; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i understand correctly FB_HYPERV
gets replaced by DRM_HYPERV
and is disabled before.
As we still have 5.4
and 5.10
kernels in nixpkgs, does it make sense to just disable it, if FB_HYPERV
is available (so past 5.14
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does. I'll update this after we confirm it works.
ada2811
to
5454388
Compare
@@ -525,6 +525,8 @@ let | |||
DRM_I915_GVT_KVMGT = module; | |||
# Enable Hyper-V Synthetic DRM Driver | |||
DRM_HYPERV = whenAtLeast "5.14" module; | |||
# And disable the legacy framebuffer driver when we have the new one | |||
FB_HYPERV = whenOlder "5.14" yes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does setting it to yes sometimes disable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I guess we have to explicitly force it to no on newer ones...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK just forcing it to no for newer ones and doing nothing for older ones does what we want.
It is still for some reason preferred over DRM_HYPERV despite being objectively worse.
5454388
to
94e1e0a
Compare
Kernel config change looks good. |
Looks like the force-push to https://github.com/NixOS/nixpkgs/compare/54543884d9965bbf2e66c87fc075cf01d953877c..94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 has picked up a bunch of changes that weren't meant to be there? I've been testing with the fix per ada2811, which I think should cover my scenario. It's definitely working better now: the greeter loads and I can log in and control the VM screen resolution through the Plasma settings. However, after logging in then rebooting, on subsequent reboots, clicking the ">" login button or pressing Enter at the login prompt shows the black KDE loading screen briefly, before swapping to tty1; the desktop never loads. Pressing Alt+F7 loads a systemd log that ends at Very happy to repro and collect logs, but I'm not sure what logs would be useful here! |
Those are changes from master, which I rebased on.
Plasma is not designed to run as root, please test as a non-root user. |
Ah, of course! Apologies!
Initial test had the same behaviour: first login as a regular user seemed fine, subsequent attempts to log in instead dropped me to tty1. For the sake of reproducibility, I've created https://github.com/me-and/nixpkgs-372703-testing and I'm retesting with the images generated there now. |
So I've done some more testing, and it looks like it's just pretty inconsistent with when it decides to start. I think I see some changes that could be relevant in newer kernels, so I'm currently trying again with 6.12. |
OK, I'm pretty sure it's some kind of weird race condition, and I can't reproduce it on 6.12 despite logging in and out like 20 times in a row. |
I've just built |
Just to clarify, how are you getting to the GUI? Are you logging out? Soft rebooting the OS? Hard rebooting the VM? |
I had been doing a mix of hard reboots, soft reboots and logouts; I think when I was testing with the latest kernel, both tests I did were with hard reboots. I've been testing with my work computer, because that's the one I have most easily set up to run Hyper-V, but the way its locked down is a bit tedious. Today I did the same tests but running the NixOS VM nested inside a less locked down Windows VM. Everything seems to be working just fine in that environment using the image with the updated Linux kernel, with both soft and hard reboots and just by logging out, so at least part of the problem is clearly in my environment. The most obvious environmental difference is that yesterday, Hyper-V Manager was generating VMs using Hyper-V configuration version 11.0, but the working case is using configuration version 12.0. I'm going to do some more differential testing today to try to narrow down where the difference comes from and get some more confidence it's an issue with my environment (or possibly me testing things incompetently) rather than with the fix. |
It's also definitely not the first time a Windows update changes VM behavior... |
Problem reliably reproduces with the updated kernel image on my system running Windows 11 Enterprise 23H2 build 22631.4751 with experience pack 1000.22700.1055.0. This machine is fairly stringently locked down; outside of specific approved interfaces (such as for setting up and running Hyper-V VMs) I don't have admin access to it. Problem does not reproduce with the updated kernel image on my system running Windows 11 Enterprise 24H2 build 26120.2705 with experience pack 1000.26100.42.0. This machine is itself a VM, with the machine in the previous paragraph as a host with nested virtualisation enabled. I am a system administrator on this machine. Notably, generating a version 11.0 VM on the first machine, exporting it and copying it to the second machine – so it's still a version 11.0 VM rather than the version 12.0 that the second machine would create itself – has things working. That implies whatever the relevant difference is, it's not the VM configuration version but something else about the host system. As you say, it's not the first time VM behaviour changes between Windows versions, but I am surprised by the difference in behaviour! I'm currently rebuilding test images for the broken and fixed-but-without-the-updated-kernel cases to run on the second machine, just to check if I'm able to reproduce the original problem there. |
For reference, the machine I'm testing on has all the latest updates installed. winver identifies as "Version 24H2 (OS Build 26100.2894)". |
Okay, I've confirmed that on the 24H2 system the broken image doesn't load the GUI, so testing the fix on this host is meaningful. Even without the kernel update, I've not been able to reproduce the problem only the first login getting to the Plasma desktop, so the fix seems good. Clearly whatever's going wrong here is at least in part down to the Windows version. Which is annoying for me, but it looks like this fix works just fine on up-to-date Windows Hyper-V hosts. I'm not sure if that means #368084 ought to be backed out of 24.11 given it's not a backwards-compatible change for at least some Hyper-V users. I don't have any sense of (a) the number of people affected by the problems that was due to fix versus the number of people affected by the problem I hit, or (b) the NixOS community consensus on balancing regressions for some people versus fixes for others for stable releases. |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-24.11
git worktree add -d .worktree/backport-372743-to-release-24.11 origin/release-24.11
cd .worktree/backport-372743-to-release-24.11
git switch --create backport-372743-to-release-24.11
git cherry-pick -x c00e9ebd9d2a931518b2346bf25b7648c49c8344 b20e6abfaf51eec59a154672344d0184e48840da 94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 |
1 similar comment
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-24.11
git worktree add -d .worktree/backport-372743-to-release-24.11 origin/release-24.11
cd .worktree/backport-372743-to-release-24.11
git switch --create backport-372743-to-release-24.11
git cherry-pick -x c00e9ebd9d2a931518b2346bf25b7648c49c8344 b20e6abfaf51eec59a154672344d0184e48840da 94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 |
This involves: - Removing my existing Hyper-V module, since it's now irrelevant per the discussion at NixOS/nixpkgs#372743 - Removing xserver and video mode configuration, ibid. - Updating system.stateVersion and hardware configuration thanks to achieving this by just creating a whole new VM. - Shifting the root filesystem to BTRFS for the sake of being able to perform online partition resizing.
Fixes #372703.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.