Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher-level application #45

Open
pentschev opened this issue Nov 9, 2022 · 5 comments
Open

Higher-level application #45

pentschev opened this issue Nov 9, 2022 · 5 comments

Comments

@pentschev
Copy link

PyNVML bindings are great to do all GPU information management from Python, but they are almost entirely an identical a copy of the C API. This can be a barrier for Python users who need to find out from the NVML API documentation what the API provides, and then what are the appropriate types that need to be passed, etc. We currently utilize PyNVML in both Distributed and Dask-CUDA, but there's also some overlap that leads to code duplication.

I feel one way to reduce code duplication and make it easier for new users, and thus make things overall better, is to provide a "High-level PyNVML library" that takes care of the basic needs for users. For example, I would imagine something like the following (but not limited to) to be available (implementation omitted for simplicity):

class Handle:
    """A handle to a GPU device.

    Parameters
    ----------
    index: int, optional
        Integer representing the CUDA device index to get a handle to.
    uuid: bytes or str, optional
        UUID of a CUDA device to get a handle to.

    Raises
    ------
    ValueError
        If neither `index` nor `uuid` are specified or if both are specified.
    """
    def __init__(
        self, index: Optional[int] = None, uuid: Optional[Union[bytes, str]] = None
    )

    @property
    def free_memory(self) -> int:
        """
        Free memory of the CUDA device.
        """

    @property
    def total_memory(self) -> int:
        """
        Total memory of the CUDA device.
        """

    @property
    def used_memory(self) -> int:
        """
        Used memory of the CUDA device.
        """

There would be more than the above to be covered, such as getting the number of available GPUs in the system, whether a GPU has a context currently created, if a handle is MIG or physical GPU, etc. Additionally, we would have simple tools that are generally useful, for example a small tool I wrote long ago to measure NVLink bandwidth and peak memory, and whatever else fits in the scope of a "High-level PyNVML library" that can make our users' lives easier.

So to begin this discussion I would like to know how people like @rjzamora and @kenhester feel about this idea. Would this be something that would fit in the scope of this project? Are there any impediments to adding such a library within the scope of this project/repository?

Also cc @quasiben for vis.

@rjzamora
Copy link
Collaborator

I strongly agree that it would be valuable to have a "higher-level" (pythonic) API for users to interact with. One that users can install without a GPU- or CUDA-enabled system.

A few years ago, we resurrected pynvml to make it easier for RAPIDS/python users to query basic system information. This project is no longer actively maintained, because the underlying NVML bindings are now directly copied from nvidia-ml-py (which was stale back in 2019, but is now regularly updated by the official NVML team). At this point, the only difference between pynvml and nvidia-ml-py is that pynvml still includes @kenhester’s smi module.


The fate of pynvml has been in limbo for a while now, and so it probably makes sense to figure out if the long-term plan is to officially archive the project in favor of nvidia-ml-py. If the plan is to archive this project, it probably makes more sense to attack the high-level API in a new project (perhaps one that can include an smi module and nvdashboard).

@walternat1ve
Copy link

is this project on ice? given its not in par with nvidia-ml-py
why is not everything available here for smi module tat i get from nvidia-smi?

@pentschev
Copy link
Author

nvidia-ml-py is now the more up-to-date bindings and is maintained by the same team that maintains the NVML library, thus it’s the preferred method to access NVML from Python. The PyNVML project was created in the past to fill the gap of Python support for NVML, before nvidia-ml-py existed, and PyNVML is still here to provide legacy compatibility.

Note that both PyNVML and nvidia-ml-py are wrappers for the NVML library and not nvidia-smi, and although I think they provide all the NVIDIA-specific tooling that is used by nvidia-smi, there are no guarantees.

@kenhester
Copy link
Contributor

kenhester commented Feb 10, 2023 via email

@jakirkham
Copy link
Collaborator

Might also be worth updating the nvidia-ml-py PyPI project description. Seeing mentions of Python 2.5, which (I don't think) are relevant any more. This is Python 3+ now right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants