Fast and efficient lossless image compression based on CUDA Parallel Wavelet Tree encoding
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Lossless compression is still in high demand in medical image applications despite improvements in the computing capability and decrease in storage cost in recent years. With the development of General Purpose Graphic Processing Unit (GPGPU) computing techniques, sequential lossless image compression algorithms can be modified to achieve more efficiency and speed. Backward Coding of Wavelet Trees (BCWT) is an efficient and fast algorithm, utilizing Maximum Quantization of Descendants (MQD) and it is quite suitable for lossless parallel compression because of its intrinsic parallelism and simplicity. However, the original implementation of BCWT is a CPU-based sequential codec and that implementation has multiple drawbacks which hinder the parallel extension of BCWT. Parallel Coding of Wavelet Trees (PCWT) modifies the BCWT from theoretical workflow to implementation details. PCWT introduces multiple new parallel stages, including parallel wavelet transform stage, parallel MQD calculation stage, parallel Qmax search stage, parallel element encoding stage and parallel group encoding stage, and change the encoding sequence from backward to forward. All those stages are designed to accelerate the compression process. PCWT implementation is designed with the consideration of Compute Unified Device Architecture (CUDA) hardware constrains and implementation scalability. With newly designed workflow and highly optimized parallel stages, PCWT performs faster than the lossless JPEG-XR algorithm, the current standard, with comparable compression ratios. Multiple possible improvements in speed and flexibility on PCWT are also proposed as future work.