GPU Optimization and Hardware Acceleration for Light-field Image Processing

Yuttakon Yuttakonkit (1461024)


Light-field image processing has been widely employed in many areas, from mobile devices to manufacturing applications. The fundamental process to extract the usable information requires a significant computation with high-resolution raw image data. A high performance computation unit such as a GPU has been an essential device, and also a limitation for such application to apply to an embedded device. Eradicating this limitation, we explore an architecture design that has better power efficiency with adequate performance. We found that a sparse memory access pattern of the applications costs high memory latency accessing. Therefore, we redesign this memory access pattern, which alleviates this memory bottleneck and achieve performance improvement up to 82% on desktop GPUs. However, a mobile GPU is affected by a lower occupancy from our optimization and performed worse. Since a GPU relies on the multiple threads to fully utilize the memory bandwidth such a method generates an enormous short-burst memory transfer and does not exploit the burst transfer mode of the conventional memory unit. Therefore, we propose a Coarse Grain Reconfigurable Accelerator (CGRA), called EMAXV, which utilizes a long-burst data transmission to improves the memory bandwidth utilization. Unlike the complex multi-threading with on-demand multiple data loading on GPU, EMAXV has a single thread design avoid cache race, and a pipeline stage to the obscure memory latency. Evaluating on identical host CPU's frequency and main memory bandwidth, EMAXV, which has only 38% computation capability of the mobile GPU, achieved 2.2x and 89% of mobile GPU's performance for light-field depth extraction and light-field image rendering, respectively.