In 2013-2014 a lot of effort was put into moving image decoding to a background thread. However it became obvious that doing parallel off-main-thread was still the critical path for presenting image heavy pages. The biggest problem we faced was that on B2G keeping active image uncompressed in main memory was something we simply could not afford with a 128 MB device even if it was just for visible images.
Enter image decoding on the GPU. The goal of image decoding is use the GPU to parallelize the decoding of each visible (and only the visible) -pixels- instead of just getting per image parallelization and doing full image decodes. However the biggest advantage comes from the reduced GPU upload bandwidth from being able to upload a compressed texture instead of a large 32-bit RGB bitmap.
We first explored using s3tc compressed textures. However this required us to still decode the image and re-compressing the image to s3tc on the CPU thus regressing page load times.
The trick we ending up doing instead was providing a texture that was the -raw- JPEG stream encoder as a -much- smaller RGB texture plane. Using a clever shader we sample from the compressed JPEG stream when compositing the texture to the frame buffer. This means that we don’t ever have to fit the uncompressed texture in main memory. This means that on pages that would normally cause a memory usage spike leading to an OOM no longer have any memory spike at all.
The non trivial bit was designing a shader that can sample from a JPEG texture and composite the decompressed results on the fly without any GPU driver modification. We bind a 3d LUT texture to the second texture unit to perform some approximations when doing the DCT lookup to speed up the shader units, this requires a single 64KB lookup 3D texture that is shared for the whole system. The challenging part of this project however is taking the texture coordinate S&T and looking up the relevant DCT in the JPEG stream. Since the JPEG stream uses a huffman encoding it’s not trivial to map (x, y) coordinate from the decompressed image to a position on the stream. For the lookup our technique uses the work of D. Charles et al.