Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android - Fix segfault crash when native surface is destroyed on vulkan #17921

Merged
merged 1 commit into from
Jan 8, 2025

Conversation

emmauss
Copy link
Contributor

@emmauss emmauss commented Jan 8, 2025

What does the pull request do?

This pr checks for valid native surface on android when creating a vulkan surface.

What is the current behavior?

When android suspends an activity, all surfaces owned by that activity is destroyed. We do not do any checks on the validity if the surfaces when we start drawing and we try to create a khr surface using the handle of a destroyed surface, causing a segfault.

What is the updated/expected behavior with this PR?

How was the solution implemented (if it's not obvious)?

Checklist

Breaking changes

Obsoletions / Deprecations

Fixed issues

@emmauss emmauss requested a review from kekekeks January 8, 2025 09:24
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.3.999-cibuild0054141-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@kekekeks
Copy link
Member

kekekeks commented Jan 8, 2025

What happens to vulkan surface associated with android surface when android one gets destroyed? is it no longer valid?
If it's somehow marked as invalid, do we properly mark the relevant render target as corrupted so it would follow the normal dispose-and-recreate sequence?

@kekekeks
Copy link
Member

kekekeks commented Jan 8, 2025

Also, when it's not possible to create a valid render target for the current state of the native surface, the platform backend is supposed to throw RenderTargetNotReadyException rather than ArgumentException

@kekekeks
Copy link
Member

kekekeks commented Jan 8, 2025

I guess we silently try to re-create the surface/swapchain here:

while (true)
{
var acquireResult = _context.DeviceApi.AcquireNextImageKHR(
_context.DeviceHandle,
_swapchain,
ulong.MaxValue,
_semaphorePair.ImageAvailableSemaphore.Handle,
default, out _nextImage);
if (acquireResult is VkResult.VK_ERROR_OUT_OF_DATE_KHR or VkResult.VK_SUBOPTIMAL_KHR)
RecreateSwapchain();
else if (acquireResult is VkResult.VK_ERROR_SURFACE_LOST_KHR)
RecreateSurface();
else
{
acquireResult.ThrowOnError("vkAcquireNextImageKHR");
break;
}
}
but aren't throwing any exceptions that are expected by the compositor.

readonly object _lock = new object();
private readonly Handler _handler;

internal event EventHandler? SurfaceWindowCreated;

IntPtr IPlatformHandle.Handle => Holder?.Surface?.Handle is { } handle ?
IntPtr IPlatformHandle.Handle => _isSurfaceValid && Holder?.Surface?.Handle is { } handle ?
AndroidFramebuffer.ANativeWindow_fromSurface(JNIEnv.Handle, handle) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks memory, BTW. ANativeWindow_fromSurface increases the reference counter for returned ANativeWindow object.

It's out of the scope of the current PR but should be addressed at some point

Copy link
Member

@kekekeks kekekeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the scope of the PR.

We need some general refactoring for our vulkan render target loss recovery code.

@emmauss
Copy link
Contributor Author

emmauss commented Jan 8, 2025

What happens to vulkan surface associated with android surface when android one gets destroyed? is it no longer valid? If it's somehow marked as invalid, do we properly mark the relevant render target as corrupted so it would follow the normal dispose-and-recreate sequence?

When the android native surface is destroying, accessing the khr surface returns a surface lost vulkan error. We handle that by trying to recreate the vulkan surface, but since there's no valid native surface, it crashes with segfault at 2 points. First point of failure is using ANativeWindow api to access the native window for the destroyed surface, which will cause a segfault. The second point of failure is creating an android khr surface using a null handle.

@emmauss
Copy link
Contributor Author

emmauss commented Jan 8, 2025

Also, when it's not possible to create a valid render target for the current state of the native surface, the platform backend is supposed to throw RenderTargetNotReadyException rather than ArgumentException

The Opengl backend throws its own custom exception, OpenGlException which there's no special catch for it anywhere in code.

@kekekeks kekekeks added this pull request to the merge queue Jan 8, 2025
@kekekeks kekekeks added bug backport-candidate-11.2.x Consider this PR for backporting to 11.2 branch area-rendering labels Jan 8, 2025
Merged via the queue into master with commit 53e8d99 Jan 8, 2025
12 checks passed
@kekekeks kekekeks deleted the vulkan_android_surface_destroy_crash_fix branch January 8, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-rendering backport-candidate-11.2.x Consider this PR for backporting to 11.2 branch bug os-android
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants