PDCON:Conference/Image Processing Algorithm Optimization with CUDA for Pure Data

Image processing algorithm optimization with CUDA for Pure Data

Author: Rudi Giot

Download full paper Media:Image Processing Algorithm Optimization with CUDA for Pd.pdf

Image Processing Production lines featuring industrial vision are becoming more and more widespread. That kind of automation needs systems able to capture pictures, analyze and learn from them in order to take appropriate action. These processes are often heavy and applied to high-definition images with important frame rate. Powerful calculators are thus needed to follow the ever growing production rate. NVIDIA is currently designing interfaces providing a CUDA^[1] allowing parallel data computation. This could increase the performance of every operating system using graphical processing units (GPU). A CUDA program is made up of two parts: one running on the host (CPU) and the other exploiting the device (GPU). The non-parallelizable stages of the program are run on the host, while the parallelizable ones are run on the device. Pure Data, thanks to its graphical modular development environment, allows fast prototype development. Those factors led us to start a research program dedicated to the realization of image processing modules for Pure Data written in CUDA. First, we will adapt the most often used algorithms (already existing within the GEM library). Our first results are encouraging. For instance, regarding RGB image conversion to grey scale image, tests demonstrate that GPU computing grants an average accelerating factor of 109 comparing to the “only-CPU” based computing. However a CPU + GPU architecture has a weakness regarding data transfers between the local memory and the graphics card. Most of the computation time (more than 90%) is spent on those transfers. There is indeed a double transfer between CPU and GPU for each CUDA function block in Pure Data. Considering this, performance is not optimal. We will thus spend some time in the future of the project to minimize those transfers. The idea is to have one first transfer, from CPU to GPU, at the start of the program and one second backward transfer at the end containing the result from the whole process. In conclusion, image processing algorithms by the graphics card is a really effective solution for complex processing. Integrating CUDA blocks inside Pure Data facilitates and accelerates the prototyping of applications. This would suit every field requiring a high frame rate, a high resolution, an important amount of operations or computation-greedy processes. It does include use for industry, medical or artistic purposes.

References

↑ Compute Unified Device Architecture

Sponsors and partners of the 4th internationals Pure Data Convention in Weimar 2011

4^th international Pure Data Convention 2011 Weimar ~ Berlin

[1] Compute Unified Device Architecture

[1]