Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After enable and then disable and then disable --hard my system config hasn't reverted ot how it was before. #115

Closed
46cv8 opened this issue Nov 21, 2023 · 7 comments
Assignees

Comments

@46cv8
Copy link

46cv8 commented Nov 21, 2023

So the issue is prior to installing egpu switcher I could boot my notebook PC with the egpu connector and use it for hardware accelerated tasks only.
Now if I boot my PC with the egpu connected after having run disable and disable --hard, my PC never loads the desktop it just sits at the terminal.
If I reboot and disconnect the egpu the desktop will launch fine.
Alternatively if I connect egpu and run egpu-switcher enable it loads the desktop through the hdmi on the egpu (which is great).
So for me the issue is there doesn't appear to be anyway to set things back to how they were originally so I can use the egpu for compute only tasks with it connected at startup.
Any help would be much appreciated. I apologize if this covered somewhere else on an FAQ page but I wasn't able to find it with a quick search.

System:

  • Did you install egpu-switcher via ppa or via git + make (copied binary as described in readme)
  • What Linux distribution (+ version) are you using (Ubuntu 20.04 fully updated)
  • What brand / model is your laptop (XPS 9560 has an internal 1050 GTX + intel embedded graphics)
  • What brand / model is your GPU (+ enclosure) (Akito node with RTX 3060)
  • What drivers (+ version) are you using (NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2)
  • What Desktop-Environment do you use (+ Display-Manager) (Gnome Ubuntu Desktop)
  • If you are not using a Desktop-Environment, what Window-Manager do you use?
@hertg
Copy link
Owner

hertg commented Nov 21, 2023

Sorry for the inconvenience egpu-switcher might have caused for you. This doesn't sound like a report that I've seen before.
Could you check the following things and report back:

  1. Run egpu-switcher version --full
  2. Check whether egpu-switcher missed some configs in the disable command
    sudo systemctl status egpu # this unit should not exist
    sudo ls -al /etc/X11/xorg.conf.d/99-egpu-switcher.conf # this file should not exist
    sudo ls -al /usr/share/egpu-switcher/egpu.service # this file should not exist
    sudo ls -al /etc/egpu-switcher/config.yaml # this file should not exist

Have you made any other system configurations in the time after you first enabled egpu-switcher (e.g. using the nvidia-xconfig tool). If so, your problem might not be related to egpu-switcher but may be caused by another misconfiguration that happened in the meantime.


edit messed up the systemctl command first, the unit is called egpu not egpu-switcher

@46cv8
Copy link
Author

46cv8 commented Nov 21, 2023

Hey @hertg it's no worries at all, your tool is really great and I will definitely use it. I'm just a bit curious about why things have ended up in an altered state. I believe I fully installed all the updated NVIDIA drivers and tested the aformentioned behavior before I installed and tested egpu-switcher. So I do believe it is most probably a change related to something egpu-switched and perhaps didn't switch back. But since i don't know how egpu-switcher works yet there's a chance this may be an impossibility and something else may be going on. If that is the case I do appologize for the false alarm.

Here is the information you requested!

username@username-xps-15-9560:~$ egpu-switcher version --full
0.19.0_20230304.133052_gh
username@username-xps-15-9560:~$ sudo systemctl status egpu-switcher
[sudo] password for username: 
Unit egpu-switcher.service could not be found.
username@username-xps-15-9560:~$ sudo ls -al /etc/X11/xorg.conf.d/99-egpu-switcher.conf
ls: cannot access '/etc/X11/xorg.conf.d/99-egpu-switcher.conf': No such file or directory
username@username-xps-15-9560:~$ sudo ls -al /usr/share/egpu-switcher/egpu.service
ls: cannot access '/usr/share/egpu-switcher/egpu.service': No such file or directory
username@username-xps-15-9560:~$ sudo ls -al /etc/egpu-switcher/config.yaml
ls: cannot access '/etc/egpu-switcher/config.yaml': No such file or directory

@46cv8
Copy link
Author

46cv8 commented Nov 22, 2023

When my computer starts with the egpu plugged in and powered and gets stuck showing a console with all the services still starting (the ones that would normally be started in the background after the desktop starts). I can alt-shift-4 to open a terminal and run commands. Is there anything you can recommend I check to see why it might have not attempted to or been unable to load the desktop on my notebook gpu?

Sorry, I see now that I ran the command before you fixed it. I will run all these commands (including the corrected one) after rebooting my PC with egpu-switcher removed but the egpu connected a boot (the state where the desktop no longer launches). I'll report back once I have the results. :) (may take a day as I can't reboot right now)

@46cv8
Copy link
Author

46cv8 commented Nov 23, 2023

OK, so I ran the commands and got the results.

0.19.0_20230304.133052_gh
===================
Unit egpu.service could not be found.\n===================
===================
ls: cannot access '/etc/X11/xorg.conf.d/99-egpu-switcher.conf': No such file or directory
===================
ls: cannot access '/usr/share/egpu-switcher/egpu.service': No such file or directory
===================
ls: cannot access '/etc/egpu-switcher/config.yaml': No such file or directory

But whats stranger is that if I configure and enable egpu-switcher but don't plug the HDMI into the external GPU it now behaves as it did originally which is that if I have the egpu connected and turned on at boot but the HDMI plugged into my notebook it will launch a working desktop manager using the built in HDMI port and video card.
I swear this is how it worked before I installed egpu-switcher altogether. But now if I disable it or disable --hard it it leaves me in a boot style output window without launching the desktop manager. Based on the output above though it would seem like it's fully removed everything it installed, so I haven't any idea why it would have changed the default behavior, unless of course as you suggested I changed something else inadvertently that effected the default behavior. I can't really rule it out. As long as I keep running egpu-switcher I can at least boot with the egpu not as my primary device now (I'm not sure why I was running into issues with this before I suspect it may have been related to me calling "config" but not "enable" on one of my numerous tests (making me think I had enabled it when infact I had only configured it). Anyways I'm not sure what if anything else I would suggest at this point so feel free to close this issue.

@hertg
Copy link
Owner

hertg commented Nov 25, 2023

if I have the egpu connected and turned on at boot but the HDMI plugged into my notebook it will launch a working desktop manager using the built in HDMI port and video card.

This could be the case if you are not using the nvidia drivers. Do you see any useful information in the logs?

sudo journalctl -xe -u egpu

Unfortunately, I'm also out of ideas what's happening here. As implied in my earlier response, egpu-switcher seems to have cleaned up everything it touched. I could give you a short explanation on what egpu-switcher actually does, because it does not really touch a lot:

  • If you run config
    • Writes to /etc/egpu-switcher/config.yaml
  • If you run enable
    • May write /etc/egpu-switcher/config.yaml (it internally triggers the config command if no config already exists)
    • Creates systemd unit /etc/systemd/system/egpu.service
    • Creates symlink /etc/systemd/system/egpu.service -> /usr/share/egpu-switcher/egpu.service to enable the systemd unit
  • If you run switch egpu or boot your computer with the egpu connected (unless you are using non nvidia drivers, and no monitors are connected)
    • Writes to /etc/X11/xorg.conf.d/99-egpu-switcher.conf
  • If you run switch internal or boot your computer without the egpu connected
    • Deletes /etc/X11/xorg.conf.d/99-egpu-switcher.conf
  • If you run disable
    • Deletes /etc/X11/xorg.conf.d/99-egpu-switcher.conf
    • Deletes symlink /etc/systemd/system/egpu.service
    • Deletes systemd unit /usr/share/egpu-switcher/egpu.service
    • Deletes config directory /etc/egpu-switcher/ (only when using --hard)

@hertg hertg moved this to Feedback in egpu-switcher Nov 25, 2023
@46cv8
Copy link
Author

46cv8 commented Nov 26, 2023

Thanks so much for the explanation of how it all works. It does indeed seem like perhaps I was just remembering incorrectly the state of things before installing egpu switcher. I'll close this out, but if I get any additional info from sudo journalctl -xe -u egpu next time I can reboot I will include it below.

@46cv8 46cv8 closed this as completed Nov 26, 2023
@github-project-automation github-project-automation bot moved this from Feedback to Done in egpu-switcher Nov 26, 2023
@46cv8
Copy link
Author

46cv8 commented Feb 29, 2024

Just adding a note here I think I tracked down what the issue may have been however I'm not certain. The same issue happened again and the problem was related to the nvidia driver not loading. The reason it wasn't loading was unrelated to egpu-switcher and caused by conflicting cuda and nvidia drivers with debian packages that were trying to modify the same driver file. The issue and work-around posted here worked for me https://askubuntu.com/questions/1488018/nvidia-cuda-installation-package-conflicts
I don't think this is exactly what the issue was last time, but I think it was probably related to the kernel driver not being updated to match the user installed driver version prior to the reboot where it stopped working. If you get a user complaining about it booting to a blank screen where it says something about invalid xorg config and no matching resolution, it might be that the nvidia drivers just failed to load entirely which is what I most recently ran up against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants