Runtime kernel compilation in CUDA 7 #16

moon6pence · 2015-02-13T03:16:53Z

Reference: http://www.soa-world.de/echelon/2015/01/cuda-7-runtime-compilation.html

CUDA 7 is in release candidate state now and it has very interisting feature: runtime kernel compilation.
It is just like how OpenCL works. We can just pass kernel source as string and get CuFunction object.

I think it is very good news for CUDArt package, we can write CUDA kernels more easily rather than use external nvcc compiler to get PTX files.

Another usage is metaprogramming for generating kernels.
For example, @kk49 presented interesting concept to write GPU code for arithemetic operations https://github.com/kk49/julia-delayed-matrix

julia-delayed-matrix generates PTX code directly, but we can do better by generate .cu code in string.

I'm NVIDIA registered programmer and got fresh CUDA 7.0 RC. First work have to be finding how runtime kernel compilation API works. And we need such a gen-7.0 stuff.

The text was updated successfully, but these errors were encountered:

moon6pence · 2015-02-13T07:27:11Z

Runtime compilation is simple:

We can get PTX code by using a series of functions in nvcrt library.
Create CUModule with compiled PTX code with 'cuModuleLoadDataEx` function.
Remaining process is same.

However, I got error while using wrap_cuda.jl to generate new API binding.

khkim@gpuserver:~/CUDArt/gen-6.5$ julia wrap_cuda.jl 
WARNING: [a,b] concatenation is deprecated; use [a;b] instead
 in depwarn at ./deprecated.jl:40
 in oldstyle_vcat_warning at ./abstractarray.jl:26
 in sort_includes at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:674
 in run at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:749
 in include at ./boot.jl:249
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:319
 in _start at ./client.jl:403
WRAPPING HEADER: /usr/local/cuda-6.5/include/driver_types.h
WARNING: Not wrapping Clang.cindex.InclusionDirective   host_defines.h
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __GNUC__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __GNUC__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __device_builtin__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __device_builtin__

.. (warnings for macros)

INFO: Error thrown. Last cursor available in Clang.wrap_c.debug_cursors
ERROR: LoadError: No CLType translation available for: CLType (Clang.cindex.Int128) 
 in repr_jl at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:281
 in wrap at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:486
 in wrap_header at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:641
 in run at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:760
 in include at ./boot.jl:249
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:319
 in _start at ./client.jl:403
while loading /home/khkim/.julia/v0.4/CUDArt/gen-6.5/wrap_cuda.jl, in expression starting on line 70

Is there any ideas? It seams it has problem with Int128.
I'm using nightly build julia in Ubuntu 14.04.

timholy · 2015-02-13T11:29:24Z

While I was developing this package, I made quite a number of improvements to Clang.jl: https://github.com/ihnorton/Clang.jl/commits/master. Looks like it might need a few more.

moon6pence · 2015-02-16T02:38:58Z

JuliaInterop/Clang.jl@fab277d

Mapping for Int128 is added in recent master of Clang.jl and it works like a charm.

malmaud · 2015-09-24T13:17:41Z

IMO the best way forward is to get support for user-selectable LLVM backends into base Julia and then use the normal Julia codegen to compile marked Julia functions using the nvptx backend.

Transpiling Julia code into nvvm or C would be essentially a reimplementation of the existing codegen - it's easy for basic arithmetic expressions, but would quickly break down if you want your kernels to be able to use the full expressiveness of Julia (although the standard library will still be off-limits).

I am actually very excited to get this working - afaik, it would make Julia the first high-level language to support writing kernels directly as normal functions in the language without idiosyncratic restrictions on syntax. This could be a powerful selling point of Julia to the scientific communities that have essentially moved all their heavy computation to the GPU, such as the neural network community.

mattcbro · 2015-10-29T20:37:09Z

I thought this only supported cuda 6.5 am I wrong? Could we get 7.5 to work as an example?

timholy · 2015-10-29T22:17:20Z

#39

juliohm · 2015-11-30T04:52:41Z

@mattcbro I have the same question. I installed CUDA 7.5 from the repositories of my Linux distribution and CUDArt.jl is having trouble with it. The tests are failing for me.

timholy · 2015-11-30T11:17:47Z

Did you try @lucasb-eyer's branch in #39?

juliohm · 2015-11-30T16:25:20Z

I will check it @timholy, thanks.

vchuravy · 2017-02-17T14:24:26Z

This is now possible with CUDAnative.jl. NVRTC support might still be worthwhile but would need somebody interested in it to actively work on it. Closing for now

vchuravy closed this as completed Feb 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime kernel compilation in CUDA 7 #16

Runtime kernel compilation in CUDA 7 #16

moon6pence commented Feb 13, 2015

moon6pence commented Feb 13, 2015

timholy commented Feb 13, 2015

moon6pence commented Feb 16, 2015

malmaud commented Sep 24, 2015

mattcbro commented Oct 29, 2015

timholy commented Oct 29, 2015

juliohm commented Nov 30, 2015

timholy commented Nov 30, 2015

juliohm commented Nov 30, 2015

vchuravy commented Feb 17, 2017

Runtime kernel compilation in CUDA 7 #16

Runtime kernel compilation in CUDA 7 #16

Comments

moon6pence commented Feb 13, 2015

moon6pence commented Feb 13, 2015

timholy commented Feb 13, 2015

moon6pence commented Feb 16, 2015

malmaud commented Sep 24, 2015

mattcbro commented Oct 29, 2015

timholy commented Oct 29, 2015

juliohm commented Nov 30, 2015

timholy commented Nov 30, 2015

juliohm commented Nov 30, 2015

vchuravy commented Feb 17, 2017