All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
- Added compatibility with HIP
- Added
cu::Device::getArch()
- Added
cu::DeviceMemory
constructor to create non-owning slice of anothercu::DeviceMemory
object - Added
cu::DeviceMemory::memset()
- Added
cu::Stream::memsetAsync()
- Added
nvml::Device::getPower()
- Added
cu::Stream::memcpyHtoD2DAsync()
,cu::Stream::memcpyDtoHD2Async()
, andcu::Stream::memcpyDtoD2DAsync()
- Added
cu::DeviceMemory::memset2D()
andcu::Stream::memset2DAsync()
- Added
cufft::FFT1DR2C
andcufft::FFT1DC2R
- Added
cu::Device::getOrdinal()
cu::Context::{getCurrent, popCurrent, getDevice}
are no longer staticinline_local_includes
is now more robust: it properly handles commented includes and respects the location of an include in the original source file- Upgrade C++ standard to C++14
- Upgrade Catch2 to version v3.6.0
target_embed_source
is now more robust: it properly tracks dependencies and runs again whenever any of them changes- Expanded tests to cover the new 2D memory operations and FFT support
- Removed the
context
fromnvml::Device
constructors
- Added
cu::Function::occupancyMaxActiveBlocksPerMultiprocessor()
- Added
cu::Device::getUUID()
- Added initial cudawrappers::nvml target
- Added
nvrtc::findIncludePath()
- Added
nvml::Device::getClock
target_embed_source
will now automatically inline local header files
- Removed deprecated
cu::Context::setSharedMemConfig
- Added
cu::Context::getDevice()
- Added
cu::Module
constructor withCUjit_option
map argument - Added
cu::DeviceMemory::size
- Added
cu::HostMemory::size
- Added
cu::Function::name
- Added
cu::Stream::getContext()
- Added overloaded versions of
cu::Stream::memcpyDtoHAsync
andcu::Stream::memcpyDtoHAsync
that takeCUdeviceptr
as an argument - Added
cu::Function::setAttribute()
- Fixed the
cu::Module(CUmodule&)
constructor - Added
cu::Function::getAttribute
is now const - The
cu::DeviceMemory
constructor now works withsize == 0
- Fix compatibility with C++20 and C++23
- Fix
cu::HostMemory
constructor for registered memory - Fix
cu::DeviceMemory
operatorT *()
for managed memory - Fix
cu::Stream::memAllocAsync
returnscu::DeviceMemory
with initialized size
- Made the library header only
- Improved CMake configuration
- Moved asynchronous
::zero
fromcu::Device
tocu::Stream
- Replaced
include_cuda_code
helper withtarget_embed_source
- Changed some arguments from native to wrapped type
- cufft wrappers for 1D and 2D complex-to-complex FFTs
cu::HostMemory
constructor for pre-allocated memorycu::DeviceMemory
constructor for managed memorycu::Stream::cuMemPrefetchAsync
for pre-fetching of managed memorycu::Stream::memAllocAsync
andcu::Stream::memFreeAsync
cu::Context::getFreeMemory
andcu::Context::getTotalMemory
- The
vector_add example
has now become a test - Added
lib
prefix to shared libraries
getDevice
function ofcu::Context
, usecu::Device
constructor instead
- CTest for testing
- nvtx library
- bump2version
- Miscellaneous improvements to CMake and CI
cu::Source
class. Usenvrtc::Program
instead.- Commented out code that was not used (anymore)
- API documentation
- Fixed build issues of
vector_add
example intests
- Improved, linter rules
- Moved usage examples to separate repositories
- Several best practices were implemented, such as citation file, user and developer documentation, linters, formatters, pre-commit hooks, GitHub workflows, badges, and issue and pull request templates.
- The name of the repository and the library are now
cudawrappers
. - The folder structure has changed to better separate header and source files.
- First release with existing code.