AMD's 'rocJPEG' can decode JPEG images 50 times faster, demonstrating its power in accelerating AI learning

With the increasing size of datasets, improvements in image capture techniques, the ability to extract more information from visual data, and the move to large-scale language models that include images as input data, efficient image conversion and preparation are essential to properly run workloads. AMD's rocJPEG uses
Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG — ROCm Blogs
https://rocm.blogs.amd.com/artificial-intelligence/rocjpeg-decoding-performance-blog/README.html
rocJPEG documentation — rocJPEG 0.8.0 Documentation
https://rocm.docs.amd.com/projects/rocJPEG/en/latest/
'ROCm' stands for AMD's 'Radeon Open Compute platform,' which is AMD's focus and is described as 'the company's highest priority business' in 2023.
AMD executives state that AMD's top priority is the development of the 'ROCm (Radeon Open Compute platform),' and the company will fight back against NVIDIA by adopting the open source programming language 'Triton' - GIGAZINE

AMD's GPUs contain one or more Media Engines (VCNs), each of which contains one or more JPEG engines providing accelerated hardware-based JPEG decoding.
Hardware-based decoders consume less power than CPU-based decoding, and offloading the decoding task from the CPU increases overall decoding throughput. With proper power management, hardware decoding can improve decoding performance while also lowering overall system power consumption.
Using the rocJPEG API, you can decode the compressed JPEG stream and store the resulting YUV image in video memory, allowing you to perform post-processing of the image using
The rocJPEG API also allows you to create multiple instances of a JPEG decoder based on the number of VCNs and JPEG engines available on a GPU device. Once a decoder is configured for a device, it can seamlessly use all available VCNs to decode a batch of JPEG streams in parallel.
Efficient parallelization of JPEG decoding is particularly useful in applications where decoding is often a bottleneck, such as AI/ML training and high-throughput image processing.
AMD conducted a test in April 2025 to measure the number of images processed per second on a dataset containing 1,000 images. The test results showed that when running the rocJPEG benchmark with a batch size of 128 and 16 threads on the AMD Instinct MI300X GPU, the speed was up to 50 times faster than a conventional CPU as measured with the TurboJPEG library.
Below is a graph showing the decoding speed of a 1920 x 1080 JPEG image, with the left block being 1 thread and the right block being 16 threads. Both graphs show that the four rocJPEG processors on the right are faster than the CPU on the left.

A similar trend was observed for 4K images (3840 x 2160).

Related Posts:
in Software, Posted by logc_nt