Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Native multithreaded execution #4201

Draft
wants to merge 52 commits into
base: series/3.x
Choose a base branch
from

Conversation

durban
Copy link
Contributor

@durban durban commented Dec 15, 2024

This is on top of @djspiewak's wip/multithreaded-wstp branch. I did nothing so far, except default to SleepSystem in 489fbb9. This makes it possible to run IOSpec on scala-native; it (sometimes) passes on my machine ;-). It's probably too early to have this as a PR, but I'm doing it anyway to avoid duplicating work, and maybe to have a discussion about EpollSystem (see below). (@djspiewak feel free to close this PR if you have other plans with your branch.)

@durban
Copy link
Contributor Author

durban commented Dec 15, 2024

So, I'm running this on Linux, and without 489fbb9 it tries to use EpollSystem. But starting the tests (e.g., testsNative/testOnly cats.effect.IOSpec) only leads to a hanging process. It seems to me, that the WSTP threads are waiting in epoll_wait, and a (the?) GC thread seems to wait for mutator threads to reach a safepoint(?). (Details: the GC is in thread_yield, called by Synchronizer_acquire.) I'm speculating here, but maybe Thread.sleep() does "something", which EpollSystem doesn't, before epoll_waiting?

EDIT: what I wrote above is probably completely wrong... 🤷‍♂️

@djspiewak
Copy link
Member

djspiewak commented Dec 15, 2024

This is great! Thank you for pushing this forward to the next obstacle.

One of the things that occurred to me as I poked at my branch originally is SN's tooling for introspecting thread state is really really limited so far as I understand it. Maybe this is just my ignorance and LLVM has some magic we could turn on, but I strongly suspect we're going to need better introspection to run down some of these problems.

What I'm thinking is we're probably going to end up building that, or at least leaning in heavily to do so, and that's probably a large part of what we'll need to do to get this off the ground. We should chat with the SN folks.

@armanbilge
Copy link
Member

armanbilge commented Dec 15, 2024

The reason it's hanging is because we haven't implemented interruption yet for the Native I/O-polling systems. This wasn't necessary when it was single-threaded, but now it's critical :)

def interrupt(targetThread: Thread, targetPoller: Poller): Unit = ()

Compare with:

def interrupt(targetThread: Thread, targetPoller: Poller): Unit = {
targetPoller.selector.wakeup()
()
}

@armanbilge
Copy link
Member

Oh, the other reason it may be hanging is indeed related to GC. On Scala Native, blocking native calls need to be annotated explicitly with the @blocking annotation, so that it does the necessary book-keeping so it's possible to GC while a thread is stuck in that blocking call.

https://github.com/scala-native/scala-native/blob/c7b54a18e3ff11d8b2792f16fbb6e97780314014/nativelib/src/main/scala/scala/scalanative/unsafe/package.scala#L103-L106

For now it's fine to just mark it @blocking, but b/c this comes at a performance cost, we should actually make two separate epoll_wait bindings. One will use @blocking for when the timeout is > 0, and the other will not, for when the timeout == 0.

@djspiewak
Copy link
Member

The reason it's hanging is because we haven't implemented interruption yet for the Native I/O-polling systems. This wasn't necessary when it was single-threaded, but now it's critical :)

You know, I didn't even think about this. Makes loads of sense though. Pipes time!

@durban
Copy link
Contributor Author

durban commented Dec 20, 2024

@armanbilge Thanks, I've tried to do the 2 things you mentioned. In 62b8141 I turned on the EpollSystem again, and tried implementing interrupt, and added the scala-native blocking annotation. This way testsNative/testOnly cats.effect.IOSpec passes on my machine. It obviously needs more work (e.g., I think interrupt is not threadsafe), but at least it's a step in the right direction.

Comment on lines 257 to 258
// TODO: this is not threadsafe, we're reading `interruptFd` without synchronization:
if (unistd.write(this.interruptFd, buf, 8.toCSize) == -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be, it will be synchronized by workerThreadPublisher at its callsites.

workerThreadPublisher.get()
val worker = workerThreads(index)
system.interrupt(worker, pollers(index))

@djspiewak
Copy link
Member

djspiewak commented Dec 26, 2024

Interesting. So I merged your branch with series/3.x and now I'm getting the following:

[error] Unknown DWARF abbrev code: 26
[error] 
[error] STACKTRACE
[error] 
[error] java.lang.RuntimeException: Unknown DWARF abbrev code: 26
[error] 
[error] 
[error] This looks like a specs2 exception...
[error] Please report it with the preceding stacktrace at http://github.com/etorreborre/specs2/issues
[error]  
[error] Error: Total 1, Failed 0, Errors 1, Passed 0
[error] Error during tests:
[error] 	cats.effect.IOSpec

Edit: Appears to be a macOS only thing. Compiles and runs on Linux. Lovely.

@djspiewak
Copy link
Member

Okay got around the issue with Lorenzo's help. It's fixed in SN main, so I updated to a local snapshot (lol) on my branch and made progress. I'll dig into interruption for kqueue

@djspiewak
Copy link
Member

Update:

  1. We no longer need the snapshot. I've pushed the magic compiler settings incantation on my branch
  2. Also pushed is an initial stab at using EVFILT_USER to handle kqueue interrupts. I implemented it by basically putting that event permanently into the front of the changes array and then triggering it on the kqfd on interrupt(). In principle, this makes sense, but it doesn't actually work. The threads wake up but things spin forever. I'm probably being dumb. Have fun.

Will get back to this later.

@djspiewak
Copy link
Member

I pushed more. Kqueue is pretty close to working I think.

Copy link
Member

@djspiewak djspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to figure out how to start sharing code across platforms more sanely. :P

@armanbilge
Copy link
Member

Latest from CI ... this is reminiscent of the issues we are having in Cats, where test suites just error out 😕

[error] Error: Total 2141, Failed 0, Errors 32, Passed 2108, Skipped 1, Pending 1
[error] Error during tests:
[error] 	cats.effect.unsafe.IORuntimeConfigSpec
[error] 	cats.effect.std.internal.BinomialHeapSpec
[error] 	cats.effect.ExitCodeSpec
[error] 	cats.effect.IOPropSpec
[error] 	cats.effect.std.DroppingQueueSpec
[error] 	cats.effect.std.MapRefSpec
[error] 	cats.effect.std.BackpressureSpec
[error] 	cats.effect.std.UnboundedDequeueSpec
[error] 	cats.effect.OptionTIOSpec
[error] 	cats.effect.tracing.TracingSpec
[error] 	cats.effect.std.UnboundedPQueueSpec
[error] 	cats.effect.std.SystemPropertiesSpec
[error] 	cats.effect.KleisliIOSpec
[error] 	cats.effect.std.DeferredParallelism2Spec
[error] 	cats.effect.std.DeferredParallelism4Spec
[error] 	cats.effect.unsafe.IORuntimeBuilderSpec
[error] 	cats.effect.kernel.DeferredSpec
[error] 	cats.effect.IOSpec
[error] 	not.cats.effect.IOParImplicitSpec
[error] 	cats.effect.std.BoundedQueueSpec
[error] 	cats.effect.ContSpec
[error] 	cats.effect.std.internal.BankersQueueSpec
[error] 	cats.effect.IOMtlLocalSpec
[error] 	cats.effect.std.MutexSpec
[error] 	cats.effect.kernel.ParallelFSpec
[error] 	cats.effect.FileDescriptorPollerSpec
[error] 	cats.effect.std.EnvSpec
[error] 	cats.effect.ThunkSpec
[error] 	cats.effect.std.ConsoleSpec
[error] 	cats.effect.DefaultContSpec
[error] 	cats.effect.testkit.TestControlSpec

@armanbilge
Copy link
Member

Locally is a bit more promising 🤔

[error] Error: Total 2860, Failed 0, Errors 2, Passed 2857, Skipped 1, Pending 2
[error] Error during tests:
[error]         cats.effect.ContSpec
[error]         cats.effect.DefaultContSpec

@djspiewak
Copy link
Member

I saw that happen when any individual test segfaults: it seems to cause a whole cascade of ghost errors.

@djspiewak djspiewak changed the title WIP, doesn't even compile Add support for Native multithreaded execution Jan 3, 2025
@djspiewak
Copy link
Member

I no longer see segfaults locally on mac. Haven't tried Linux yet. stackalloc is really fucky on SN 0.5 afaict, even for really simple cases. I don't think anyone should be using the no-args version for anything ever.

Oh as an aside, the problems I had pushing to this remote were being caused by git-lfs. In case anyone else sees these types of issues, the solution is to use the --no-verify flag with git push.

@djspiewak djspiewak force-pushed the wip/multithreaded-wstp branch from 89c862c to 22271cc Compare January 3, 2025 15:37
@armanbilge
Copy link
Member

I no longer see segfaults locally on mac

I still do 😕

sbt:cats-effect> testsNative/testOnly *.FileDescriptorPollerSpec
[info] Build skipped: No changes detected in build configuration and class path contents since last build.
[info] Starting process '/Users/arman/code/cats-effect/tests/native/target/scala-2.13/cats-effect-tests-test' on port '63437'.
[info] Starting process '/Users/arman/code/cats-effect/tests/native/target/scala-2.13/cats-effect-tests-test' on port '63439'.
[error] Test runner interrupted by fatal signal 11
[warn] Force close java.lang.RuntimeException: Process /Users/arman/code/cats-effect/tests/native/target/scala-2.13/cats-effect-tests-test finished with non-zero value 139 (0x8b)
[info] FileDescriptorPollerSpec
[info] FileDescriptorPoller should
[info]   + notify read-ready events
[error] Error: Total 2, Failed 0, Errors 1, Passed 1
[error] Error during tests:
[error] 	cats.effect.FileDescriptorPollerSpec
[error] (testsNative / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 5 s, completed Jan 3, 2025, 9:59:12 AM
sbt:cats-effect>

@djspiewak
Copy link
Member

Nuts. I wonder if I just didn't run it enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants