Use `os.sched_getaffinity` instead of `os.cpu_count` where possible #2160

bryant1410 · 2024-12-04T20:21:28Z

When automatically choosing the number of parallel workers, os.sched_getaffinity is a better choice than the currently used os.cpu_count. The former uses a process' assigned CPU count. See this Stack Overflow answer for an explanation.

I changed this codebase to first check os.sched_getaffinity and otherwise default to os.cpu_count (and then default to 1; as the latter could potentially be None). As some form of validation, this is something PyTorch uses as well.

In the (rare) case that os.sched_getaffinity isn't defined, I make it default to os.cpu_count. PyTorch's code behaves differently by using the value 0. I think using 0 doesn't make sense. Still, this shouldn't happen as I was reading that in Linux you have to assign at least one.

savingoyal · 2024-12-04T20:39:09Z

@bryant1410 thanks for the PR! Is there any before/after analysis for this change?

bryant1410 · 2024-12-04T20:51:42Z

@bryant1410 thanks for the PR! Is there any before/after analysis for this change?

What do you mean?

romain-intel · 2024-12-05T09:28:28Z

AWESOME!! We should probably also change S3_WORKER_COUNT to something like this instead of using 64 all the time (that may be more debatable but we have had issues when it brings down the machine).

@savingoyal -- this change is very nice because basically, cpu_count returns the number of CPUs on the entire box and not the ones for just your container. I just tested this and the affinity one returns the correct value which is much more likely what you want.

I remember this had come up in the past but I never stopped to make the (clearly simple) change.

npow · 2024-12-05T16:15:21Z

Thanks for fixing this! Looks like in 3.13+ we can use process_cpu_count(): https://docs.python.org/3/library/os.html#os.process_cpu_count

savingoyal · 2024-12-05T16:33:00Z

@bryant1410 thanks for the PR! Is there any before/after analysis for this change?

What do you mean?

I am curious to see what is observed change in behavior after this patch

bryant1410 · 2024-12-05T16:38:18Z

@bryant1410 thanks for the PR! Is there any before/after analysis for this change?

What do you mean?

I am curious to see what is observed change in behavior after this patch

Oh. I didn't test it myself in this repo, but in other cases that I made a similar change what I observed is that it started considering cgroup or container assigned CPUs, as opposed to the system total, which is the desired behavior IMHO.

savingoyal · 2024-12-05T16:59:14Z

Yeah, at the moment, exclusive CPU ownership doesn't happen by default on Kubernetes (I am not sure if it's even an option with AWS Batch—maybe @npow knows), so os.sched_getaffinity and os.cpu_count will return the same value and should be safe to roll out for now.

bryant1410 added 4 commits December 4, 2024 17:16

Use os.sched_getaffinity instead of os.cpu_count where possible

d34d47f

Update storage_executor.py

feab1d3

Update multicore_utils.py

58bdafd

Update storage_executor.py

7eefa91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `os.sched_getaffinity` instead of `os.cpu_count` where possible #2160

Use `os.sched_getaffinity` instead of `os.cpu_count` where possible #2160

bryant1410 commented Dec 4, 2024

savingoyal commented Dec 4, 2024

bryant1410 commented Dec 4, 2024

romain-intel commented Dec 5, 2024

npow commented Dec 5, 2024

savingoyal commented Dec 5, 2024

bryant1410 commented Dec 5, 2024

savingoyal commented Dec 5, 2024

Use os.sched_getaffinity instead of os.cpu_count where possible #2160

Are you sure you want to change the base?

Use os.sched_getaffinity instead of os.cpu_count where possible #2160

Conversation

bryant1410 commented Dec 4, 2024

savingoyal commented Dec 4, 2024

bryant1410 commented Dec 4, 2024

romain-intel commented Dec 5, 2024

npow commented Dec 5, 2024

savingoyal commented Dec 5, 2024

bryant1410 commented Dec 5, 2024

savingoyal commented Dec 5, 2024

Use `os.sched_getaffinity` instead of `os.cpu_count` where possible #2160

Use `os.sched_getaffinity` instead of `os.cpu_count` where possible #2160