Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect machine core detection on Windows platforms #1469

Open
anton-ubi opened this issue Aug 14, 2024 · 0 comments
Open

Incorrect machine core detection on Windows platforms #1469

anton-ubi opened this issue Aug 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@anton-ubi
Copy link
Contributor

anton-ubi commented Aug 14, 2024

Describe the bug
On Windows platforms, hosts cannot pick up jobs due to available processors incorrect detection.

The following error is constantly raised:

CoreReservationFailureException: Not launching, insufficient hyperthreading cores to reserve based on frameCores (0 < 36)

Details:
The reserveHT method of the RQD Machine class relies on a __procs_by_physid_and_coreid attribute that is not correctly filled when on a Windows platform. It is actually only filled when on Linux.

See:

To Reproduce

  • Submit a job so it is dispatched to a machine running on Windows.
  • Job has to have non fractional cores (the regular usage, nothing fancy here), else the reserveHT is skipped.

Expected behavior
The job should be picked up without raising a CoreReservationFailureException error.

Version Number
Spotted on 0.22.0. But looking at the current state of master, it seems nothing has changed since.

Additional context
Relates to #1171
A fix is already being addressed : #1468

@anton-ubi anton-ubi added the bug Something isn't working label Aug 14, 2024
DiegoTavares added a commit that referenced this issue Dec 4, 2024
**Link the Issue(s) this Pull Request is related to.**
#1469

**Summarize your change.**
Fix an error on Windows platforms where a submitted job could not be
picked up properly due to available processors incorrect detection.

The `reserveHT` method of the RQD `Machine` class relies on a
`__procs_by_physid_and_coreid` attribute that is not correctly filled
when on a Windows platform.

See:
- `reserveHT` relying on `__procs_by_physid_and_coreid`:
https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/rqd/rqd/rqmachine.py#L842
- `__procs_by_physid_and_coreid` being filled only if on a Linux
platform:
https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/rqd/rqd/rqmachine.py#L613

---------

Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Co-authored-by: Diego Tavares <dtavares@imageworks.com>
Co-authored-by: Kern Attila GERMAIN <5556461+KernAttila@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant