-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda migration #122
base: master
Are you sure you want to change the base?
Cuda migration #122
Conversation
cava/samples/cudart/cudart.cpp
Outdated
} | ||
|
||
__host__ cudaError_t CUDARTAPI | ||
cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind) | ||
{ | ||
ava_argument(dst) { | ||
if (kind == cudaMemcpyHostToDevice) { | ||
ava_opaque; | ||
ava_handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this change requires to annotate devptr as handles which doesn't support the "offsetting" (+/-) operation I think we should merge this after we redesign the handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of any test applications that this breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another issue. Devptrs should be handles everywhere or nowhere. I don't think this spec makes the devptrs in arguments to kernels handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of any test applications that this breaks?
Not for Rodinia CUDA benchmarks. But for the supported AI frameworks (like TF), copying memory at offsets is a normal behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another issue. Devptrs should be handles everywhere or nowhere. I don't think this spec makes the devptrs in arguments to kernels handles.
@arthurp The spec for cuLaunchKernel already checked if the argument was a handle and applied the annotation accordingly. Is this not doing the right thing?
CUresult CUDAAPI
cuLaunchKernel(CUfunction f,
unsigned int gridDimX,
unsigned int gridDimY,
unsigned int gridDimZ,
unsigned int blockDimX,
unsigned int blockDimY,
unsigned int blockDimZ,
unsigned int sharedMemBytes,
CUstream hStream,
void **kernelParams,
void **extra)
{
ava_argument(hStream) ava_handle;
ava_argument(kernelParams) {
ava_in; ava_buffer(ava_metadata(f)->func->argc);
ava_element {
// FIXME: use the generated index name in the spec to
// reference the outer loop's loop index at this moment.
if (ava_metadata(f)->func->args[__kernelParams_index_0].is_handle) {
ava_type_cast(void *);
ava_buffer(ava_metadata(f)->func->args[__kernelParams_index_0].size);
ava_element ava_handle;
}
else {
ava_type_cast(void *);
ava_buffer(ava_metadata(f)->func->args[__kernelParams_index_0].size);
}
}
}
ava_argument(extra) {
ava_in; ava_buffer(__helper_launch_extra_size(extra));
#warning The buffer size below states that every kernelParams[i] is 1 byte long.
ava_element ava_buffer(1);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you haven't touched that piece of spec, then it treats devPtr kernel args as non-handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should it look like? It seems like it checks if each argument is a handle and if it is adds the ava_handle annotation. Is this doing something different?
cudaLaunchKernel spec:
ava_argument(args) {
ava_in;
ava_buffer(ava_metadata(func)->func->argc);
ava_element {
// FIXME: use the generated index name in the spec to
// reference the outer loop's loop index at this moment.
if (ava_metadata(func)->func->args[__args_index_0].is_handle) {
ava_type_cast(void *);
ava_buffer(ava_metadata(func)->func->args[__args_index_0].size);
// ava_element ava_handle;
} else {
ava_type_cast(void *);
ava_buffer(ava_metadata(func)->func->args[__args_index_0].size);
}
}
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's commented out now. Should it be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncommented ava_element ava_handle
and the behavior is unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you haven't touched that piece of spec, then it treats devPtr kernel args as non-handle.
@yuhc can you elaborate on this?
3313d5e
to
b63b9cf
Compare
b63b9cf
to
f5e3728
Compare
Memory allocations must be handles
f5e3728
to
6c955d7
Compare
This needs a range map to handle pointer arithmetic correctly. In what part of the source tree should the files for the range map go? |
Authors: @ArnavMohan and @arthurp
Added annotations to cudaMalloc, cudaFree for migration support.
Memory allocations must be handles.