Profiler: Frame performance and debugging!

When we designed the platform profiler our goal was to expose internal information to answer performance questions that other tools could never practically answer without. I’m very excited to show off these prototypes because I feel that we are without a doubt finally there.

NOTE: We’re still missing at least two un-landed platform patches before this is ready in Nightly; Bug 926922 and Bug 1004899. Feel free to apply these patches locally and give me feedback! They should work on any platform when running with OMTCompositing.

Paint Marker Improvements

I’ve redesigned how frame markers work to expose more details. Painting in Gecko is triggered off a refresh timer. When that timer fires we will execute scripts that are listening for the refresh like pending requestAnimationFrame() callbacks for the document. This is denoted by the Scripts phase. If any changes were made to the styles or flow of the document we will show a marker for each, if applicable, as Styles and Reflow markers. Next we will need to build a display list and a layer tree to prepare for painting denoted by DisplayList marker. The next phase is rasterizing, the process of turning the display list into pixels, any changed region of the page. This is denoted by the, you guessed it, Rasterize phase. And finally, on the compositor thread, we will take the results and composite them to the phase under the Composite phase.

This further break down was necessary to facilitate exposing more information about each of these phases. Note that I’m exposing internal gecko decisions. The primary use should be to debug and optimize the platform. It’s not meant for web developers to micro optimize their pages to implementation details of our rendering engine.

New Frame Markers

New Frame Markers

Styles and Reflow Causes

As platform engineers we’re often asked to look into performance problems of complex web applications. It’s difficult to pin point what’s causing excessive style and reflow flushes in an app we’re not familiar with. Careful breakpoints may get you this information for simple performance problems but in my experience that’s not enough to diagnose the hard problems. One frame tends to have a disproportionately longer style or reflow flush and finding which exactly can be very tricky. With flush causes in the profiler we can now find what code in the past triggered the flush that’s taking so long to execute now.

Here’s a real world example of the b2g homescreen as of the current master tip. See Style #0? We’re spending 35ms in there so understanding why we’re flushing is critical to fixing it. Now it’s a pain free process with a simple mouse-over the problematic Styles block:

B2G Homescreen Style Flush Cause

B2G Homescreen Style Flush Cause

Here the flush is triggered by handleEvent() in grid.js. A closer inspection will point out several style changes that are occurring from that function. After taking a closer look at grid.js, we can pinpoint the problem to a CSS transform.

Display List & Composite Visualization

Sometimes the display list and layer tree is the key to the performance problem. It can point to rendering bugs, severe platform performance problems, an overly complicated page to paint and many more problems. The visualization is still an early implementation. In the future I’m hoping to display a preview of the display items themselves but we need to extend how they are dumped from gecko. This information is logged for each frame so it’s useful to track the evolution and how it corresponds with other things like slow paths being hit.

NOTE: This feature is only enabled if you’ve flipped a preference to dump the display list and/or layer tree for each frame. This will slow down your build and skew profiling sample but will still serve as a great debugging tool.

DisplayList Preview

DisplayList Preview

Frame Uniformity

And to finish off, the profiler now has a simple feature to visualize how consistent our frame rate is. Each dot represents the time between frames. If everything is working as expected, during scrolling each dot should be near 16ms. Below is a graph of fairly smooth scrolling.

Profiler Composite Uniformity Graph

Profiler Composite Uniformity Graph

Profiler: Charging expensive operations to cheap async causes

We’ve been using the profiler to diagnose snappy problems for several months now with great success. The profiler can tell us which images are slow to decode, what scripts take longer then 15 ms to execute and what they’re doing. However the callstack doesn’t provide enough information to show what caused the execution. For example a small script snippet that changes the style causing cascading style changes that morphs the entire page. The script execution may takes less then 1 ms (i.e. likely not show up in a profile) and mark the styles as having changed. Then the next event recomputes the style causing the entire document to be repainted. The goal is to blame a quick running cause (a page change) to long running task (repainting).

Charging to the rescue! My goal is to be able to record a source for something like a page change and tag it/charge it to the work that is triggered. This is going to be a big project and I’m not ready to abandon my current tasks so I plan on working on a small piece every time I encounter the need for this. If you’re interested in helping out please let me know. Here are my current plan:

Part 1) Bug 809317: Adding infrastructure for printing the current location for debugging (Ready to land soon).

Part 2) Bug 777828: Allow unwinding results to be returned

Part 3) No bug: Support chaining unwinds as ’causes’

Part 4) No bug: When sampling collect ’causes’ attributed to the current work.

Part 5) No bug: Display ’causes’ in the profiler UI

Part 6) No bug: Gather ’causes’ for expensive async operations.
Note this is just a rough plan. I don’t know how well this will work out in practice.