Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime kernel compilation in CUDA 7 #16

Closed
moon6pence opened this issue Feb 13, 2015 · 10 comments
Closed

Runtime kernel compilation in CUDA 7 #16

moon6pence opened this issue Feb 13, 2015 · 10 comments

Comments

@moon6pence
Copy link
Contributor

Reference: http://www.soa-world.de/echelon/2015/01/cuda-7-runtime-compilation.html

CUDA 7 is in release candidate state now and it has very interisting feature: runtime kernel compilation.
It is just like how OpenCL works. We can just pass kernel source as string and get CuFunction object.

I think it is very good news for CUDArt package, we can write CUDA kernels more easily rather than use external nvcc compiler to get PTX files.

Another usage is metaprogramming for generating kernels.
For example, @kk49 presented interesting concept to write GPU code for arithemetic operations https://github.com/kk49/julia-delayed-matrix

julia-delayed-matrix generates PTX code directly, but we can do better by generate .cu code in string.

I'm NVIDIA registered programmer and got fresh CUDA 7.0 RC. First work have to be finding how runtime kernel compilation API works. And we need such a gen-7.0 stuff.

@moon6pence
Copy link
Contributor Author

Runtime compilation is simple:

  • We can get PTX code by using a series of functions in nvcrt library.
  • Create CUModule with compiled PTX code with 'cuModuleLoadDataEx` function.
  • Remaining process is same.

However, I got error while using wrap_cuda.jl to generate new API binding.

khkim@gpuserver:~/CUDArt/gen-6.5$ julia wrap_cuda.jl 
WARNING: [a,b] concatenation is deprecated; use [a;b] instead
 in depwarn at ./deprecated.jl:40
 in oldstyle_vcat_warning at ./abstractarray.jl:26
 in sort_includes at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:674
 in run at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:749
 in include at ./boot.jl:249
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:319
 in _start at ./client.jl:403
WRAPPING HEADER: /usr/local/cuda-6.5/include/driver_types.h
WARNING: Not wrapping Clang.cindex.InclusionDirective   host_defines.h
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __GNUC__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __GNUC__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __device_builtin__
WARNING: Not wrapping Clang.cindex.MacroInstantiation   __device_builtin__

.. (warnings for macros)

INFO: Error thrown. Last cursor available in Clang.wrap_c.debug_cursors
ERROR: LoadError: No CLType translation available for: CLType (Clang.cindex.Int128) 
 in repr_jl at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:281
 in wrap at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:486
 in wrap_header at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:641
 in run at /home/khkim/.julia/v0.4/Clang/src/wrap_c.jl:760
 in include at ./boot.jl:249
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:319
 in _start at ./client.jl:403
while loading /home/khkim/.julia/v0.4/CUDArt/gen-6.5/wrap_cuda.jl, in expression starting on line 70

Is there any ideas? It seams it has problem with Int128.
I'm using nightly build julia in Ubuntu 14.04.

@timholy
Copy link
Contributor

timholy commented Feb 13, 2015

While I was developing this package, I made quite a number of improvements to Clang.jl: https://github.com/ihnorton/Clang.jl/commits/master. Looks like it might need a few more.

@moon6pence
Copy link
Contributor Author

JuliaInterop/Clang.jl@fab277d

Mapping for Int128 is added in recent master of Clang.jl and it works like a charm.

@malmaud
Copy link
Contributor

malmaud commented Sep 24, 2015

IMO the best way forward is to get support for user-selectable LLVM backends into base Julia and then use the normal Julia codegen to compile marked Julia functions using the nvptx backend.

Transpiling Julia code into nvvm or C would be essentially a reimplementation of the existing codegen - it's easy for basic arithmetic expressions, but would quickly break down if you want your kernels to be able to use the full expressiveness of Julia (although the standard library will still be off-limits).

I am actually very excited to get this working - afaik, it would make Julia the first high-level language to support writing kernels directly as normal functions in the language without idiosyncratic restrictions on syntax. This could be a powerful selling point of Julia to the scientific communities that have essentially moved all their heavy computation to the GPU, such as the neural network community.

@mattcbro
Copy link

I thought this only supported cuda 6.5 am I wrong? Could we get 7.5 to work as an example?

@timholy
Copy link
Contributor

timholy commented Oct 29, 2015

#39

@juliohm
Copy link

juliohm commented Nov 30, 2015

@mattcbro I have the same question. I installed CUDA 7.5 from the repositories of my Linux distribution and CUDArt.jl is having trouble with it. The tests are failing for me.

@timholy
Copy link
Contributor

timholy commented Nov 30, 2015

Did you try @lucasb-eyer's branch in #39?

@juliohm
Copy link

juliohm commented Nov 30, 2015

I will check it @timholy, thanks.

@vchuravy
Copy link
Contributor

This is now possible with CUDAnative.jl. NVRTC support might still be worthwhile but would need somebody interested in it to actively work on it. Closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants