Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GNU Make 4.4+ jobserver fifo / semaphore client support #2450

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hundeboll
Copy link
Contributor

The principle of such a job server is rather simple: Before starting a new job (edge in ninja-speak), a token must be acquired from an external entity. On posix systems, that entity is simply a fifo filled with N characters. On win32 systems it is a semaphore initialized to N. Once a job is finished, the token must be returned to the external entity.

This functionality is desired when ninja is used as part of a bigger build, such as builds with Yocto/OpenEmbedded, Buildroot and Android. Here, multiple compile jobs are executed in parallel to maximize cpu utilization, but if each compile job uses all available cores, the system is over loaded.

Note: this is a re-implementation of the last part[1] of the previous attempt to implement jobserver functionality. I have left out the server[2] part, and the older "pipe"[3] methods from here, as I don't need those. Doing so allows for a much simpler implementation.

Note note: I don't have windows or mac systems available. I would greatly appreciate anyone who can test on those for me.

[1] #2263
[2] #2260
[3] #1140

Copy link
Collaborator

@jhasse jhasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
My suggestions:

  1. Declare variables when you use them, not C89-style at the beginning of the scope.
  2. Move all function definitions to .cc files, not in a .h
  3. Use {} even for one line statements

src/jobserver.h Outdated Show resolved Hide resolved
src/jobserver.h Outdated Show resolved Hide resolved
src/build.h Outdated Show resolved Hide resolved
@hundeboll
Copy link
Contributor Author

Any idea why the windows build fails?

@jhasse
Copy link
Collaborator

jhasse commented May 18, 2024

missing #include <cassert>

@hundeboll
Copy link
Contributor Author

missing #include <cassert>

Aah, missed that.

I was referring to the other error:

D:\a\ninja\ninja\src\status_printer.cc(125,25): error C2589: '(': illegal token on right side of '::' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(125,20): error C2062: type 'unknown-type' unexpected [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(125,25): error C2059: syntax error: ')' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(127,25): error C2589: '(': illegal token on right side of '::' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(127,25): error C2059: syntax error: ')' [D:\a\ninja\ninja\build\libninja.vcxproj]

@hundeboll hundeboll force-pushed the jobserver branch 2 times, most recently from 27a269b to cc1044e Compare May 18, 2024 07:56
@hundeboll
Copy link
Contributor Author

missing #include <cassert>

Aah, missed that.

I was referring to the other error:

D:\a\ninja\ninja\src\status_printer.cc(125,25): error C2589: '(': illegal token on right side of '::' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(125,20): error C2062: type 'unknown-type' unexpected [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(125,25): error C2059: syntax error: ')' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(127,25): error C2589: '(': illegal token on right side of '::' [D:\a\ninja\ninja\build\libninja.vcxproj]
D:\a\ninja\ninja\src\status_printer.cc(127,25): error C2059: syntax error: ')' [D:\a\ninja\ninja\build\libninja.vcxproj]

Fixed now.

src/jobserver.h Outdated Show resolved Hide resolved
@hundeboll
Copy link
Contributor Author

hundeboll commented May 19, 2024 via email

src/jobserver.h Outdated Show resolved Hide resolved
@hundeboll
Copy link
Contributor Author

hundeboll commented May 19, 2024 via email

@hundeboll
Copy link
Contributor Author

hundeboll commented May 19, 2024 via email

@digit-google
Copy link
Contributor

At the moment, Ninja always passes its environment to sub-commands, so the MAKEFLAGS value will be passed to them as well.

When using a named FIFO mechanism, either Posix or Windows, this is enough for them to participate properly in token negotiation (Ninja taking implicit token for each sub-command before launching it, as expected).

The file descriptor-based scheme will fail though (because Ninja doesn't try to keep these open in the spawned processes), and it's probably not something worthy of supporting, though this should be documented.

I am trying to setup some tests on top of your commits to see how we can ensure everything works as expected, and that we never regress in the future.

OT: Your answer appears in the general conversation for the PR, and not in the specific comment's thread. This loses context and can make things hard to follow. On the other hand, Github doesn't preserve comments when new commit are force-pushed to upload fixes (unlike Gerrit which tracks these very well), so these are not ideal either. Feel free to use whatever you prefer :)

@hundeboll hundeboll force-pushed the jobserver branch 3 times, most recently from 69bf358 to e92f95b Compare May 24, 2024 08:07
@hundeboll
Copy link
Contributor Author

@jhasse @digit-google I have fixed most of the comments, and responded to the remaining ones. Should I mark the fixed ones as resolved, or do you want to do that?

Is there anything else I need to address?

@hundeboll
Copy link
Contributor Author

Rebased on master and removed the #define NOMINMAX from jobserver.h

Copy link
Collaborator

@jhasse jhasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! The documentation is awesome :)

I've added several nitpick comments.

For the parsing I think it might be a good idea to add a unit test which check for the error/warning cases, too (i.e. invalid MAKEFLAGS).

Should I mark the fixed ones as resolved, or do you want to do that?

Feel free to resolve the comments yourself.

src/build.cc Outdated Show resolved Hide resolved
src/build.cc Outdated Show resolved Hide resolved
src/build.cc Outdated Show resolved Hide resolved
src/build.cc Outdated Show resolved Hide resolved
src/build.h Outdated Show resolved Hide resolved
src/jobserver.cc Outdated Show resolved Hide resolved
src/jobserver.cc Outdated Show resolved Hide resolved
src/jobserver.cc Outdated Show resolved Hide resolved
src/jobserver.h Outdated Show resolved Hide resolved
src/build_test.cc Show resolved Hide resolved
@hundeboll hundeboll force-pushed the jobserver branch 2 times, most recently from 6d81a64 to d4f279a Compare May 29, 2024 13:19
@jhasse
Copy link
Collaborator

jhasse commented Jun 8, 2024

Would be great if someone could test it and comment here.

@robUx4
Copy link

robUx4 commented Jun 11, 2024

I did some test on VLC. We build 100+ contribs at once from autotools, CMake and meson projects. The build is started from a Makefile calling ninja for CMake and meson projects. We build ninja beforehand to have jobserver support. We use a prebuilt version of the Kitware version on our Docker.

This build is with the Kitware version of ninja with jobserver. This other build on the same machine as the same time, is done with this jobserver branch.

The first thing to notice is that this branch does build successfully. The other thing is that they take about the same time to build (10m51s vs 10m32s), suggesting the parallel usage (on this 48 cores machine) is working as expected. It's even slightly faster but I don't think we can really conclude it's faster.

@hundeboll
Copy link
Contributor Author

@robUx4 thanks for testing. I'm afraid you need to tweak the build system to use the fifo style jobserver instead of the old-style pipe-fd method:
https://code.videolan.org/robUx4/vlc/-/jobs/1800466#L1738

@hundeboll
Copy link
Contributor Author

@robUx4 btw: it's probably just a matter of updating make to version 4.4 or later...

@robUx4
Copy link

robUx4 commented Jun 11, 2024

We use whatever Debian is giving us. It seems Debian doesn't provide make 4.4 yet: https://packages.debian.org/bookworm/make, even in sid: https://packages.debian.org/sid/make

@digit-google
Copy link
Contributor

Hello, I could experiment today with this patch applied to a local Ninja binary, used to build a small subset of Fuchsia targets.
This subset involves launching, in the end, about 8 sub-Ninja builds in parallel which fight for CPU resources concurrently to buils around 19,000+ targets each (moderated by default with -j5 on a very powerful workstation). This includes many Rust and C++ compilation / link commands (whose toolchain support MAKEFLAGS natively).

Good news, I see a decent improvement in build times when all remote builds are disabled (which is not our default configuration): 13m47s -> 12m39s.

For fully remote builds, we go: 5m54s -> 4m56s which is even nicer.

So this PR looks really good to me.

NOTE: I wrote a Python script to setup and serve the tokens, then invoking Ninja, see d6c0c1a (probably not the final version).

@jdrouhard
Copy link
Contributor

jdrouhard commented Jun 17, 2024

Tested this briefly with our build, and for whatever reason, more tokens are returned to the pool than were originally acquired. This causes more and more parallel jobs to start as the build proceeds which eventually brings the system to a crawl.

At the end of our build:

make: INTERNAL: Exiting with 176 jobserver tokens available; should be 36!

I have 36 CPUs on the build machine, and specify -j 36 to the make invocation. make is 4.4.1 and the command recipe for doing the build spawns ninja with a + ninja logs which fifo it's using for the jobserver so I confirmed make is passing the appropriate file down and this PR is being used.

Is it guaranteed that there will be in equal number of FindWork() calls to EdgeFinished() calls? Briefly looking through the source, I'm seeing more EdgeFinished() calls nested within other aspects of the build process, such as NodeFinished(), EdgeMaybeReady(), etc. Seems like maybe it's possible to call EdgeFinished() more times than the initial FindWork() call.

EDIT:
This call - https://github.com/hundeboll/ninja/blob/be47d5de9312f486425c12e10b82b35d42a0d273/src/build.cc#L263 - is in EdgeMaybeReady() which is used when other nodes complete or dyndep discovery kicks in and the output (dependent edge) doesn't need to directly be built. This is an EdgeFinished() call that doesn't correlate to any FindWork().

This fixes the issue I'm seeing (applied to this PR):

diff --git a/src/build.cc b/src/build.cc
index f05e31e..a1e808e 100644
--- a/src/build.cc
+++ b/src/build.cc
@@ -170,6 +170,7 @@ Edge* Plan::FindWork() {
   }
 
   Edge* work = ready_.top();
+  work->acquired_job_server_token_ = jobserver_.Enabled();
   ready_.pop();
   return work;
 }
@@ -207,7 +208,7 @@ bool Plan::EdgeFinished(Edge* edge, EdgeResult result, string* err) {
   edge->pool()->RetrieveReadyEdges(&ready_);
 
   // Return the token acquired for this very edge to the jobserver
-  if (jobserver_.Enabled()) {
+  if (edge->acquired_job_server_token_) {
     jobserver_.Release();
   }
 
diff --git a/src/graph.h b/src/graph.h
index 314c442..f908d75 100644
--- a/src/graph.h
+++ b/src/graph.h
@@ -227,6 +227,7 @@ struct Edge {
   bool deps_loaded_ = false;
   bool deps_missing_ = false;
   bool generated_by_dep_loader_ = false;
+  bool acquired_job_server_token_ = false;
   TimeStamp command_start_time_ = 0;
 
   const Rule& rule() const { return *rule_; }

@hundeboll
Copy link
Contributor Author

For fully remote builds, we go: 5m54s -> 4m56s which is even nicer.

So this PR looks really good to me.

Nice numbers.

NOTE: I wrote a Python script to setup and serve the tokens, then invoking Ninja, see d6c0c1a (probably not the final version).

Looks good. I've also written something similar, albeit less feature rich:
https://lore.kernel.org/openembedded-core/20240404111613.2574424-6-martin@geanix.com/

@hundeboll
Copy link
Contributor Author

EDIT:
This call - https://github.com/hundeboll/ninja/blob/be47d5de9312f486425c12e10b82b35d42a0d273/src/build.cc#L263 - is in EdgeMaybeReady() which is used when other nodes complete or dyndep discovery kicks in and the output (dependent edge) doesn't need to directly be built. This is an EdgeFinished() call that doesn't correlate to any FindWork()

Uf, good catch. I'll have to look into how the tokens are released again. Suggestions are welcome...

@jdrouhard
Copy link
Contributor

EDIT:
This call - https://github.com/hundeboll/ninja/blob/be47d5de9312f486425c12e10b82b35d42a0d273/src/build.cc#L263 - is in EdgeMaybeReady() which is used when other nodes complete or dyndep discovery kicks in and the output (dependent edge) doesn't need to directly be built. This is an EdgeFinished() call that doesn't correlate to any FindWork()

Uf, good catch. I'll have to look into how the tokens are released again. Suggestions are welcome...

See my edit, really small diff that fixes the problem!

The principle of such a job server is rather simple: Before starting a
new job (edge in ninja-speak), a token must be acquired from an external
entity. On posix systems, that entity is simply a fifo filled with N
characters. On win32 systems it is a semaphore initialized to N.  Once a
job is finished, the token must be returned to the external entity.

This functionality is desired when ninja is used as part of a bigger
build, such as builds with Yocto/OpenEmbedded, Buildroot and Android.
Here, multiple compile jobs are executed in parallel to maximize cpu
utilization, but if each compile job uses all available cores, the
system is over loaded.
Implement proper testing of the MAKEFLAGS parsing, and the token
acquire/release logic in the jobserver class.
@hundeboll
Copy link
Contributor Author

See my edit, really small diff that fixes the problem!

Nice.

See my edit, really small diff that fixes the problem!

Thanks! Pushed the change with some comments added :)

@mcprat
Copy link
Contributor

mcprat commented Jul 22, 2024

@hundeboll good job so far

I have left out ... the older "pipe"[3] methods from here, as I don't need those.

In the openwrt project, we have been using stefan's PR, but recently I have noticed issues with it. I want to test this one but we actually need the "UNIX pipe" method instead of a fifo. IMO this is not ready to merge until you implement both forms. Also I would not refer to the other method as the "old" one, which suggests it is deprecated or less effective when it isn't...

since it appears your system uses the fifo by default, you have to pass --jobserver-style=pipe to test it.
otherwise, testing jobserver client implementations for ninja is top of my todo list right now, so I'll be testing this intensely once you add support for the simple pipe method, if I don't figure it out myself first.

(it should be easy for you, since you have gotten this far 😃 )

Comment on lines +75 to +86
// Ignore the argument if the length or the value of the type value doesn't
// match the requested type (i.e. "fifo" on posix or "sem" on windows).
if (strlen(type) != static_cast<size_t>(str_colon - str_begin) ||
strncmp(str_begin, type, str_colon - str_begin)) {
Warning("invalid jobserver type: got %.*s; expected %s",
str_colon - str_begin, str_begin, type);
return false;
}

// Advance the string pointer to just after the : character
str_begin = str_colon + 1;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is unnecessary. I think it should be the job of a different function, platform-specific, to make sure the value is sane and let this function simply collect the value and make sure it's not empty.

According to the docs for Windows jobserver implementation, apparently there is no colon at all (or at least it's not guarenteed)

If you really want to detect whether the value is sane here, this function should be setting the type value instead of being told which one as an input, i.e.:

A. if a \ is found, type = sem
B. if a : is found, type = fifo
C. if a , is found, type = pipe
D. if neither is found, assume sem

then you can match string values instead of lengths for the type in the next function to handle the jobserver string value, if you don't handle it all here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but like I said, sanity checks should really not be done here. It is possible although unlikely for a colon or comma to be in the name of the semaphore object itself.

@jhasse
Copy link
Collaborator

jhasse commented Jul 23, 2024

I want to test this one but we actually need the "UNIX pipe" method instead of a fifo.

Why?

@mcprat
Copy link
Contributor

mcprat commented Jul 23, 2024

I want to test this one but we actually need the "UNIX pipe" method instead of a fifo.

Why?

My machine has Make 4.3. I suppose I can compile my own Make for myself, but at some point, if we want to incorporate this for the entire openwrt community (and other projects) to see how it goes it would have to have support for some older versions, and otherwise it would default to no jobserver support for many users and maybe even a buildbot, which means spawning way too many jobs that would slow things down.

For example, when ninja is built on my system the jobs number defaults to 6. My machine has only 4 cores. Now imagine ninja being executed by make -j 4 with 6 extra jobs alongside whatever else. The load average easily goes beyond 12 depending on what ninja is building.

We can figure out a patch after the fact, but why not handle it upstream while it is being worked on? I find it strange to add support only for the latest and greatest version of Make first and then add a small change later to make it work for everyone when it can all be done together.

@jhasse
Copy link
Collaborator

jhasse commented Jul 23, 2024

So you're running an "old" make version which is why we refer to the pipe version as the "old" method ;)

[...] which suggests it is deprecated or less effective when it isn't...

I'm not an expert but from the comments I read on both PRs it is less effective.

We're very wary of changes that increase the complexity of Ninja, so a PR that implements both methods while one of them is technical superior and results in less code in Ninja (and to my understand that's the case for fifo), is very unlikely to get merged.

@eli-schwartz
Copy link

In the openwrt project, we have been using stefan's PR, but recently I have noticed issues with it. I want to test this one but we actually need the "UNIX pipe" method instead of a fifo. IMO this is not ready to merge until you implement both forms. Also I would not refer to the other method as the "old" one, which suggests it is deprecated or less effective when it isn't...

* New feature: The --jobserver-style command line option and named pipes
  A new jobserver method is used on systems where mkfifo(3) is supported.
  This solves a number of obscure issues related to using the jobserver
  and recursive invocations of GNU Make.  This change means that sub-makes
  will connect to the jobserver even if they are not marked as recursive.
  It also means that other tools that want to participate in the jobserver
  will need to be enhanced as described in the GNU Make manual.
  You can force GNU Make to use the simple pipe-based jobserver (perhaps if
  you are integrating with other tools or older versions of GNU Make) by
  adding the '--jobserver-style=pipe' option to the command line of the
  top-level invocation of GNU Make, or via MAKEFLAGS or GNUMAKEFLAGS.
  To detect this change search for 'jobserver-fifo' in the .FEATURES variable.

Sure sounds like the GNU Make documentation claims the anonymous pipe mode is less effective...

How about: https://www.gnu.org/software/make/manual/make.html#POSIX-Jobserver

On POSIX systems the jobserver is implemented in one of two ways: on systems that support it, GNU make will create a named pipe and use that for the jobserver. [...]

If the system doesn’t support named pipes, or if the user provided the --jobserver-style option and specified ‘pipe’, then the jobserver will be implemented as a simple UNIX pipe. [...]

So if possible, GNU Make prefers the fifo based jobserver. Sure sounds like the anonymous pipes are soft deprecated...

...

Overall, I do understand that you want support for anonymous pipes for backwards compatibility with older make. I just don't see why you are arguing that the need to support legacy systems is synonymous with being equally effective and equally un-deprecated. Anonymous pipes are clearly both deprecated and less effective, even if they are sometimes your only available option.

@mcprat
Copy link
Contributor

mcprat commented Jul 24, 2024

I understand your points. of course, the difference between the methods is not arbitrary, there was a reason for it, but the keyword is "obscure", and every major project that builds with Makefiles has been writing their Makefiles, in some cases for decades, in a way that handles the noted obscure caveat in order to support every version of Make up until the most recent one 2 years ago. Unless the caveat is not being handled, the two methods ought to be considered functionally equivalent, especially since Make does not care which order the tokens are received and returned as long as it is the same token, and I can't imagine there being some kind of performance difference...

I don't mean to be argumentative on this point, but the way I see deprecated options is that it would be labeled as "deprecated" or otherwise planned to no longer be recognized after a certain point in time. The people at GNU often make this perfectly clear instead of just "suggesting" it.

Speaking of complexity, I think it would be a fairly small diff in order to accommodate the support for the "old" way. If you want to prevent excess complexity from making it's way into ninja's source then put some focus on my review comment.

I understand it may not be urgent in your perspective, and that's fine as I do not make decisions here, but please be aware that in the perspective of any downstream project, without "full" support, the feature you're actually adding is instability in the form of code variability, where the code affects some group of users one way, and another group of user another way. Otherwise, if you're certain on not handling it in this PR I would be happy to open my own PR here afterward to add support for the older method if no one else wants to do so. However, it would be nice to hear if you support the idea in general, at least, so that the next version would have "full" support, regardless of what we do in this PR.

Comment on lines +22 to +43
bool Jobserver::Acquire() {
// The first token is implicitly handed to the ninja process, so don't
// acquire it from the jobserver
if (token_count_ == 0 || AcquireToken()) {
token_count_++;
return true;
}

return false;
}

void Jobserver::Release() {
assert(token_count_ >= 1);
token_count_--;

// Don't return first token to the jobserver, as it is implicitly handed
// to the ninja process
if (token_count_ > 0) {
ReleaseToken();
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these functions are so small, the logic here might as well be combined with the implementation-specific function that they call, unless there is some reason to split the functions between private and public members of the class. If so please correct me.

If that happens, there's only one function left. Merging all the functions in jobserver.cc to the platform-specific files would also allow simplification of the parse function, making it smaller for each build-type and further reduce complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this logic in a common place as it is quite important (and a bit non-intuitive) to not acquire/release the first token. Having it duplicated in platform specific implementations increases the risk of getting it wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following... I think comments are enough to point out the importance and lack of intuition of those lines. Get it right once and if it needs to be adjusted for a new platform it would start with a copy/paste of the previous one.

Comment on lines +64 to +65
void Jobserver::ReleaseToken() {
char token = '+';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the Make manual:

It’s important that when you release the job slot, you write back the same character you read. Don’t assume that all tokens are the same character; different characters may have different meanings to GNU make.

I have not looked at Make's code relevant to this yet, so I don't know the significance. It might be that the character being anything other than a + is a rare occurence... maybe someone else can clarify how important it is.

I will follow up with another comment with more information as I continue to play with the PR's changes and review Make's source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've noticed the same from the Make manual. It might make sense to store the acquired token in the (recently added) Edge::acquired_job_server_token_ member. I'll look into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at GNU Make source history, I changed my mind. I don't think this is important. There is no sign that they are going to introduce new characters that have special meaning, and if they somehow do that, it would be pretty easy to add support for it later. Let's save the idea for the future...

};

/// CommandRunner is an interface that wraps running the build
/// subcommands. This allows tests to abstract out running commands.
/// RealCommandRunner is an implementation that actually runs commands.
struct CommandRunner {
virtual ~CommandRunner() {}
virtual size_t CanRunMore() const = 0;
virtual size_t CanRunMore(bool jobserver_enabled) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not changing this method's signature at all, to minimize changes.

Since the flag value is not going to change during a build, you can simply pass plan_.JobserverEnabled() to the RealCommandRunner constructor to record its value, and have RealCommandRunner::CanRunMore() return SIZE_MAX when the flag is enabled.

This should work identically, without having to change the DryCommandRunner and TestCommandRunner interfaces / implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

}

const char* jobserver_auth = "--jobserver-auth=";
const char* str_begin = strstr(makeflags, jobserver_auth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://www.gnu.org/software/make/manual/html_node/Job-Slots.html:

Be aware that the MAKEFLAGS variable may contain multiple instances of the --jobserver-auth= option. Only the last instance is relevant.

Hence the code should probably try to reflect that.

Also it looks like there is no specific prefix on Win32 for the semaphore name (https://www.gnu.org/software/make/manual/html_node/Windows-Jobserver.html) but that --jobserver-style=sem should be specified in MAKEFLAGS as well.

I am also wary of multiple --jobserver-style values in the input string. I would assume that only the last one should be followed, and that --jobserver-style=pipe should be a condition to ignore the jobserver master.

Given this, I suggest creating a standalone static function that parses a MAKEFLAGS string value as input, and returns a struct describing the wanted jobserver type + argument (fifo path or semaphore name) from it (it doesn't have to contain Win32/Posix specific code paths), so this can be unit-tested properly with lots of edge cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@mcprat
Copy link
Contributor

mcprat commented Aug 2, 2024

@jhasse @hundeboll I have been working on my own version of this PR for a while. It would take a long time to explain and demonstrate some of the changes I made. Would it be appropriate to create my own PR with your commit, adding myself as a signoff, and we can review each other's ideas directly?

This could expedite the process of actually getting the feature accepted and merged, in case that the ninja developers are looking to hurry up the process instead of putting it off until the release after the next upcoming release...

Also I'm wondering, is there a timeline for versions or is it just based on how many changes since last release?

@hundeboll
Copy link
Contributor Author

My machine has Make 4.3.

I'm actually surprised that Debian / Ubuntu hasn't upgraded make in 1½ years by now. Maybe we should try to bug them a bit:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1029106

@hundeboll
Copy link
Contributor Author

@jhasse @hundeboll I have been working on my own version of this PR for a while. It would take a long time to explain and demonstrate some of the changes I made. Would it be appropriate to create my own PR with your commit, adding myself as a signoff, and we can review each other's ideas directly?

Sure, go ahead. My goal is to add jobserver support in OpenEmbedded, so adding it to ninja is just a means to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants