-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Driver timeouts with RDNA3 GPUs on Windows + Vulkan #10720
Comments
I can't reproduce any GPU crash on my NV GPU, nor are any validation errors showing. So it's either an AMD driver issue, or something the validation layers aren't checking for. Can't really do much about it, or narrow it down to the exact draw/shader, as I don't have one of these GPUs. |
It'd be nice if someone with a non-RDNA3 AMD GPU could do a test. I saw a few people on the Discord server discussing about RNDA3-specific crashes and so I imagined that other AMD GPUs aren't affected. With my old RTX 3060 I never experienced any crashes. I own 2 other games (Gran Turismo 3 and 4), those don't seem to cause driver timeouts when running under Vulkan... But I should probably test them more to confirm that they're not affected. I am no expert and I really have no idea on what could cause these timeouts, but if the Mesa driver works just fine in Linux I think there's probably something weird going on with the AMD Windows driver. |
Today I had a bit of free time and so I did a few more tests... Update 1: I played Gran Turismo 3 for around 50 minutes, Windows + Vulkan, 8x resolution, 16x AF and Basic Blending Accuracy. I had zero issues while playing, everything looked normal. I then decided to crank Blending Accuracy all the way up to Maximum, the 7800 XT should have the horsepower to handle it. Result: driver timeouts after around 5-10 minutes of play. While I can't reproduce it consistently (like I managed to do in VCS), I changed the title of this bug report and made it more generic as I'm sure there are other games that trigger driver timeouts in particular conditions. I tried messing around with Blending Accuracy in VCS, but even with Minimum the crash occurs at the exact same spot. Update 2: I decided to try older PCSX2 builds with VCS. The first one I chose is v1.7.3722-64bit-AVX2-Qt from December 16th 2022, Vulkan, 8x resolution, 16x AF and Blending Accuracy set to Maximum, result: the crash didn't occur on the usual bridge, but after around 15 minutes in the ferris wheel area. I then picked an even older build, v1.7.2264-64bit-AVX2 from January 23rd 2022 (one of the earliest Vulkan implementations), and ran again VCS with the previous settings... After around 30 minutes of gameplay I didn't have a single crash! Of course I can't be 100% sure that this build will never trigger driver timeouts, but it definitely seems more stable than current ones. Maybe some newer Vulkan extension that doesn't play too well with the AMD Windows driver? This weekend I should have time to test more stuff. |
Here I am with another report, this will be a long one with lots of numbers... I decided to test if resolution and/or blending accuracy have an impact in driver timeouts (spoiler: yes), to do this I used the GS Dump that I posted here and each time I took note of the frame where the driver crashed. I started by checking all resolutions at default settings (including Blending Accuracy, which is set to Basic), I did 3 passes for each resolution:
With Blending Accuracy set to its default setting (Basic), it's clear that a driver timeout is triggered later at higher internal resolutions. I then decided to take 3 resolutions to test the other Blending Accuracy settings: Native (which gave the worst results), 3x (~1080p, seems to be a good middle ground) and 8x (best results). Let's see the results for the other Blending Accuracy levels... Minimum Blending Accuracy
Medium Blending Accuracy
High Blending Accuracy
Full Blending Accuracy
There's really not much to say here, the results seem to be in line with Basic. There's however one more level that I haven't mentioned yet, and it gave weird results...
Maximum Blending Accuracy completely changes everything, now Native resolution gives by far the best results by completing the whole GS Dump, while 8x either doesn't load at all or is able to show only 2 frames. EXTRA |
Could you please re-test on the latest release? Apparently the feedback loop extension was causing GPU crashes on RDNA3, which I've now dropped, so with any luck this should be resolved, if it's the same issue. |
Oh well. Will have to wait until someone with a RDNA3 GPU looks into it then. I'm not going to rush out and buy one just for PCSX2 :P |
One thing you could try, is pick a GS Dump that crashes, then enable the GS dumping stuff (you might need to enable advanced from the tools menu, then you'll find the "Debug" section in the settings), set the start draw to 0, set the number of draws to like 10000 or something and set a folder to dump to, then tick "Dump GS Draws", then open the GS dump wait for it to crash (don't forget to disable this again when you next reload). Once that happens, go to the folder you told it to dump, find the first file near the bottom which doesn't start with Also tell us which dump you used.. if you can do this run a couple of times on the same GS Dump and make sure it's the same number every time, that would be very helpful! |
Recent versions now always pass with my old GS Dump that I posted here and during gameplay crashes seem random, so I'm not able to trigger a driver timeout reliably. I then used it on build v1.7.5720 (20th April 2024), which is one of the latest ones that has this behaviour with my old GS Dump in relation to resolution and blending accuracy. With your suggested settings, I performed 10 tests, 5 with default blending accuracy and the other 5 with maximum blending accuracy. The other settings are Vulkan + 8x Resolution Scale + 16x AF. Default Blending Accuracy: the last non-vsync file across all 5 tests was 10001_vertex.txt which, without knowing anything, seems rather uninteresting given that number of draws was set to 10000 (and those files start from 1 and not 0). Maximum Blending Accuracy: the last non-vsync file across all 5 tests was 04755_vertex.txt. I'll upload here the vertex.txt and context.txt files from the last run: |
So Maximum died? confused why it's a lower number :D |
Maximum + 8x Res dies almost immediately (around frame 2), while Default + 8x Res dies around frame 490. |
okay, when you did each run, did you delete the old files after noting the last one? Just wanna make sure we're looking at the right draw |
Yep |
and just to clarify, it always died on the same one? |
Yes, all 5 tests (with max blending accuracy) ended with that exact file. |
perfect, thanks! :) |
I still have no idea what the actual issue here is, but given the feedback loop extension was apparently problematic, I wonder if using local reads, which formalize programmable blending will help. Try #11179. You'll need the latest driver, as of about a week ago (AMD was late adding the extension, NV's had it for months). |
Updated the driver and tried your build, unfortunately it's a regression compared to recent nightly builds. |
Oh well. Like I said above, it won't be fixed until someone with the knowledge and hardware looks into it. I'm out of random ideas to try, and there isn't any spec violation going on the current validation layers can detect. |
Thank you for trying though, I imagine the struggle if you don't have the hardware. Unfortunately my knowledge here is very limited, all I can do is helping test stuff... And whenever anyone wants me to try things, feel free to ask :) |
So like an idiot I went out and bought a GPU to debug this (and other uses too). It is completely random, and not a specific draw. But think I may have found a workaround. Give #11223 a shot. I can't trigger a crash in any of the games that I could before (Ratchet, VCS). |
Oh wow, ok tomorrow I'll give it a shot! |
I think you fixed it! One unrelated and minor thing that I noticed while testing is the water rendering being off... I don't know if this is again a RDNA3 exclusive thing, but this is what I'm seeing on Windows: While on Linux everything looks normal: I initially noticed it while playing on #11223, but then I also tried to see if it's the same on the last nightly build and the answer is yes. Those screenshots were taken from build 1.7.5799, same settings used on both OSes. I decided to post it here for now as it could be another RDNA3 issue, but if needed I can open another bug report. |
Post a GS dump of it. Can't do much without one. |
Reopening as there are still other crashes remaining. Also see #12144 |
After a year of driver version iteration, the rdna3 gpu uses the latest 24.12.1 driver and works normally, and rarely loses response. |
Describe the Bug
I already reported this issue on Discord one or two weeks ago but I decided to put it also here as I've seen other people having similar issues. At the time I didn't provide a GS Dump, but now I am doing so. The dump was recorded in VCS while playing in Software mode, when changing to Vulkan the driver timeout occurs between frames 480-500, usually around 490. Here's the link: https://drive.usercontent.google.com/download?id=19zbfSnzsAGUvRRz2T5x2beABn7KF1mxA&export=download
(check comment below in case this link doesn't work anymore)
From what I understand, this seems to be happening only on RDNA3 cards. If you want me to test other stuff with these 2 games and/or post more dumps, feel free to ask.
Reproduction Steps
In VCS I can always replicate the issue when going above the wooden bridge near the military base, in LCS I haven't found a spot where I can trigger it consistently. I haven't done extensive testing with other APIs, but after around 30 minutes of continuous playtime in VCS with DX12 I didn't notice a single crash.
I also did a few tests on Fedora Silverblue with one of the latest nightly AppImage builds, RADV never crashed so I would mark this as a Windows-only issue.
Expected Behavior
No response
PCSX2 Revision
v1.7.5509
Operating System
Windows 11
If Linux - Specify Distro
No response
CPU
Intel Core i5-12600KF
GPU
AMD Radeon RX 7800 XT
GS Settings
(Happens even at default settings)
Emulation Settings
No response
GS Window Screenshots
No response
Logs & Dumps
No response
The text was updated successfully, but these errors were encountered: