I wanted to share my finding on optimizing CoreGraphic’ CGContextDrawImage, a performance critical part of my upcoming patch for the Core Animation drawing model.
After implementing the read back from openGL, I was expecting the bottleneck of my code to be the glReadPixels() call since we are copying from VRAM to RAM every frame, however I noticed using Shark profiling tool that I was actually spending more time in CGContextDrawImage.
After investigating further I noticed using Shark that CGContextDrawImage was invoking argb32_sample_RGBA32 where cairo’s calls where not hitting this sampling routine. After a few hours of tweaking I managed to get around the first cause of this sampling by using the monitor’s color profile instead of ‘kCGColorSpaceGenericRGB’.
The second problem less obvious. I had previously looked at the transformation matrix to make sure that no scaling was happening. The transformation matrix was translating by (tx, ty) and was scaling the Y axis by -1. I had incorrectly assumed that scaling Y by -1 would be handled as a special case ‘flip’ case instead of using scaling, however resetting the scaling to 1 resulted in a drastic performance increase. Since my rendering was now reversed I applied the transformation on my OpenGL context where scaling is very efficient.
After the tweaking the bottleneck was now glReadPixels as I had originally hypothesized. I did not grab accurate measurements of the performance difference but I would say the rendering is now at least twice as fast as it was when I was hitting the sampling routine. Once we do away with the glReadPixels step this patch will be even faster!