CUDA implementation of Canny edge detector in C/C++.
You can use cmake to compile the files. I have made a CMakeLists available for compilation.
I have made available a main file that executes the code.
In particular, these are the parameters to be given on the command line:
./main argv[1] argv[2] argv[3] argv[4] argv[5] argv[6]
where :
argv[1]
: input image pathargv[2]
: kernel size of Sobelargv[3]
: low threshold for Hysteresis stepargv[4]
: high threshold for Hysteresis stepargv[5]
: L2 norm -> 0 activated 1 deactivated (uses approximation with abs)argv[6]
: modes -> [0] CPU , [1] GPU custom (my implementation) , [2] Runs all modes. With [0] run OpenCV Canny CPU while with [1] run Opencv GPU. At last, with [2] run both.
During the execution of the algorithm, the execution times are also calculated, expressed in ms.
Examples of image output of my Canny GPU version.
Original | Canny GPU Output |
---|---|
Original | Canny GPU Output |
---|---|
Original | Canny GPU Output |
---|---|
N.B: obviously, the results may vary according to the value chosen for the thresholds in the hysteresis step.
I tried several kernel configurations but the one that gave the best results was the one where I used a thread block size of 16x16.
Kernel Configuration |
---|
This is the pie chart showing the execution times of the various kernel device function and data transfer memcpy routines on 720p image resolution.
Kernel time esec |
---|
This is the comparison analysis between the OpenCV CPU version and my parallel version on GPU.
CPU v.s. GPU |
---|
As you can see from the graph, with a low resolution image the results of the two versions are similar. As the image resolution increases, the parallel version gets significantly better results.