Profiler: Frame performance and debugging!

When we designed the platform profiler our goal was to expose internal information to answer performance questions that other tools could never practically answer without. I’m very excited to show off these prototypes because I feel that we are without a doubt finally there.

NOTE: We’re still missing at least two un-landed platform patches before this is ready in Nightly; Bug 926922 and Bug 1004899. Feel free to apply these patches locally and give me feedback! They should work on any platform when running with OMTCompositing.

Paint Marker Improvements

I’ve redesigned how frame markers work to expose more details. Painting in Gecko is triggered off a refresh timer. When that timer fires we will execute scripts that are listening for the refresh like pending requestAnimationFrame() callbacks for the document. This is denoted by the Scripts phase. If any changes were made to the styles or flow of the document we will show a marker for each, if applicable, as Styles and Reflow markers. Next we will need to build a display list and a layer tree to prepare for painting denoted by DisplayList marker. The next phase is rasterizing, the process of turning the display list into pixels, any changed region of the page. This is denoted by the, you guessed it, Rasterize phase. And finally, on the compositor thread, we will take the results and composite them to the phase under the Composite phase.

This further break down was necessary to facilitate exposing more information about each of these phases. Note that I’m exposing internal gecko decisions. The primary use should be to debug and optimize the platform. It’s not meant for web developers to micro optimize their pages to implementation details of our rendering engine.

New Frame Markers

New Frame Markers

Styles and Reflow Causes

As platform engineers we’re often asked to look into performance problems of complex web applications. It’s difficult to pin point what’s causing excessive style and reflow flushes in an app we’re not familiar with. Careful breakpoints may get you this information for simple performance problems but in my experience that’s not enough to diagnose the hard problems. One frame tends to have a disproportionately longer style or reflow flush and finding which exactly can be very tricky. With flush causes in the profiler we can now find what code in the past triggered the flush that’s taking so long to execute now.

Here’s a real world example of the b2g homescreen as of the current master tip. See Style #0? We’re spending 35ms in there so understanding why we’re flushing is critical to fixing it. Now it’s a pain free process with a simple mouse-over the problematic Styles block:

B2G Homescreen Style Flush Cause

B2G Homescreen Style Flush Cause

Here the flush is triggered by handleEvent() in grid.js. A closer inspection will point out several style changes that are occurring from that function. After taking a closer look at grid.js, we can pinpoint the problem to a CSS transform.

Display List & Composite Visualization

Sometimes the display list and layer tree is the key to the performance problem. It can point to rendering bugs, severe platform performance problems, an overly complicated page to paint and many more problems. The visualization is still an early implementation. In the future I’m hoping to display a preview of the display items themselves but we need to extend how they are dumped from gecko. This information is logged for each frame so it’s useful to track the evolution and how it corresponds with other things like slow paths being hit.

NOTE: This feature is only enabled if you’ve flipped a preference to dump the display list and/or layer tree for each frame. This will slow down your build and skew profiling sample but will still serve as a great debugging tool.

DisplayList Preview

DisplayList Preview

Frame Uniformity

And to finish off, the profiler now has a simple feature to visualize how consistent our frame rate is. Each dot represents the time between frames. If everything is working as expected, during scrolling each dot should be near 16ms. Below is a graph of fairly smooth scrolling.

Profiler Composite Uniformity Graph

Profiler Composite Uniformity Graph

Graphics Meetup 2014Q01

I just arrived from the Graphics Meetup in early 2014. Before the week we wrapped up the port of tiling from Fennec OpenGL specific code to the abstract Compositor API. Here a summary of the projects we discussed (from my point of view, I’m missing things that I couldn’t attend):

GFX Taipei

  • Off main thread compositing on desktop (OMTCompositing): We discussed our plan for shipping OMTCompositing to desktop and unify our compositing code. Moving compositing off the main thread is a prerequisite for the many projects that build on it such as OMTAnimation, OMTVideo, tiling and Async Pan Zoom. Matt Woodrow managed to make some sizable progress at the end of the week. Our plan is to double down on our resources to get this shipped on desktop.
  • Tiling: Bringing tiling to desktop will be important to better support 4k displays and to support Async Pan Zoom. We decided to focus on OMTCompositing before shipping tiling on desktop.
  • Async Pan Zoom: We discussed upcoming improvements to Async Pan Zoom like hit testing, scroll snap requirements. We discussed our plan to have Async Pan Zoom on the desktop. Mstange has a working prototype of APZ on mac. For now we will first focus on shipping OMTCompositing separately. Changes to the input event queue and dealing with the plugins window on Windows will be a significant problem.
  • Graphics regression test on b2g: We discussed with mchang from the b2g performance team the best way to get b2g performance regressions tests. We decided to focus on some micro benchmarks to isolate platform regressions from gaia regressions by using the Gfx Test App. Kats convinced me that FrameMetrics could be use to accurately measure ‘checkerboarding’ so we will be rolling out some tests based on that as well.
  • VSync: Vincent has been leading the effort of getting Gecko to correctly VSync. This project is very important because no matter how fast we render our animations will never be fluid if we don’t follow vsync carefully. We had a long design review and I’m fairly happy with the result. TL;DR: We will be interpolating input events and driving the refresh driver off the vsync signal.
  • Eideticker: We discussed the challenges of supporting Eideticker using an external camera instead of MHL.
  • WebGL: We reaffirmed our plans to continue to support new WebGL extensions, focus on conformance issues, update the conformance testsuite and continue to work on WebGL 2.
  • Skia: We decided to try to rebase once every 6 weeks. We will be focusing on Skia content on android and SkiaGL canvas on mac.
  • RR with graphics: Roc presented RR (blog). It really blew me away that RR already supported Firefox on Linux. We had a discussion on some of the challenges with using RR with graphics (OpenGL, X) and how it could benefit us.
  • LayerScope: LayerScope will be extended to show frame tree dumps and which display items are associated with which layer.
  • Task Tracer: Shelly presented Task Tracer. We discussed how to integrate it with the profiler and Cleopatra.
  • Ownerships: We’re looking into different approaches to add ownership of sub-modules within graphics and how it can help with improving design and reviews.
  • Designs: We discussed on how to bring better design to the graphics module. We’re going to perform design reviews in bugzilla and keep the final design in a docs folder in the graphics components. This means that design changes will be peer reviewed and versioned.

Introducing Scroll-Graph

For awhile I’ve been warning people that the FPS counter is very misleading. There’s many case where the FPS might be high but the scroll may still not be smooth. To give a better indicator of the scrolling performance I’ve written Scroll-Graph which can be enabled via layers.scroll-graph. Currently it requires OMTC+OGL but if this is useful we can easily port this to OMTC+D3D. Currently it works best on Desktop, it will need a bit more work for mobile to be compatible with APZC.

To understand how Scroll-Graph works lets look at scrolling from the point of view of a physicist. Scrolling and pan should be smooth. The ideal behavior is for scrolling to behave like you have a page on a low friction surface. Imagine that the page is on a low friction ice arena and that every time you fling the page with your finger or the trackpad you’re applying force to the page. The friction of the ice is applying a small force in the opposite direction. If you plot the velocity graph you’d expect to see something like this roughly:

Expected Velocity Graph

Expected Velocity Graph

Now if we scroll perfectly then we expect the velocity of a layer to follow a similar curve. The important part is *not* the position of the velocity curve but it’s the smoothness of the velocity graph. Smooth scrolling means the page has a smooth velocity curve.

Now on to the fun part. Here’s some examples of Scroll-Graph in the browser. Here’s I pan a page and get smooth scrolling with 2 minor inperfections (Excuse the low FPS gif, the scrolling was smooth):

Smooth ScrollGraph velocity = Smooth Scrolling

Smooth ScrollGraph velocity = Smooth Scrolling

Now here’s an example of a jerky page scrolling. Note the Scroll-Graph is not smooth at all but very ‘spiky’ and it was noticeable to the eye:

Jerky scrolling = Jerky Scroll-Graph

Jerky scrolling = Jerky Scroll-Graph

WebGL Project: Galaxy Map

I made my first small WebGL app that fetches a star catalog and displays stars within 25 parsec (80 lightyear) using WebGL+Three.JS. I also created a second view that will only show named star and will try to fetch their Wikipedia page on mouse over. I didn’t trim the catalog so be prepared to wait for 8mb of text to load :).

Try it out:

View 1: Stars within 80 light years

View 1: Stars within 80 light years

View 2: Named stars only

View 2: Named stars only

This demo is missing a lot of optimization so it shouldn’t be used for benchmarking. I also unfortunately didn’t get around to calculating and displaying the relative size of each star.

GFX Changes in Firefox 13

Every release the GFX team makes major changes to Firefox. By the time the release ships we’re often focused on future releases. Last week we decided in the public Graphics meeting (Join us) that we wanted to go back and announce the changes as they move in the release cycle to bring more visibility to what users are getting when they updating to Firefox 13. We hope to continue these posts every release. Here is the list I quickly put together using a few bugzilla and hg log queries. I tried to cherry pick important changes without any hard set of criteria. This list is far from inclusive. The best approximation of the Graphics changes in Firefox 13 is this.

I would also like to point out that we received major patches from the community in Firefox 13 and that this trend continues in Firefox 14, 15. We are starting development on Firefox 16 so get in touch with us if you’d like to contribute to GFX for Firefox 16.

Changes in Gecko 13

This released we focused on implementing the foundation for Off-main-thread compositing (OMTC). See my blog post for what OMTC buys us. It is still alpha quality but we will be shipping for mobile shortly and are in the process of getting it ready for Desktop:

  • Bug 598873 – Implemented alpha quality OMTC for Mac and Android

Improved our support for macbooks with dual GPU. You’ll find that Firefox will make better decisions about using the integrated GPU for low power usage and switch to the discrete GPU to handle intensive WebGL and plug-ins:

  • Bug 713305, 713305 – Improve dual GPU support on Mac with WebGL

Improved startup performance by optimizating how we query for D3D support on start-up:

  • Bug 722225 – Improve Firefox startup speed by ~5% (-70ms) on Windows by optimizing D3D10CreateDevice1

Updated our Graphics libraries:

  • Bug 721068 – Update to latest graphite2 code from upstream
  • Bug 698519 – Update to libjpeg-turbo 1.2

WebGL Improvements. Anisotropic support has been added. Check out the demo of anisotropic filter and select the checkbox to see the difference. NOTE: Anisotropic is not yet supported on windows with ANGLE renderer but we are working on it!

  • Bug 728354 – WebGL anisotropic texture filtering (Contributor)
WebGL anisotropic support

WebGL anisotropic support (on the right). Click to expand.

  • Bug 676071 – WebGL long identifier mapping (security and conformance)
  • Bug 710594 – WebGL about:memory improvements (Contributor)
WebGL Memory reporting

WebGL Memory reporting (1)

WebGL Memory reporting

WebGL Memory reporting (2)

  • Various WebGL conformance improvements (Bug 730554, Bug 727547) (Contributor)

Added support for apitrace as a debugging tool for powerusers on Android:

  • Bug 674753 – Add support for loading apitrace explicitly on Android
APITrace support for Fennec

APITrace support for Fennec

Various improvements to hardware accelerated compositing for all platforms.
Various improvements to font support.
Various improvements to the Mac Azure Quartz Canvas implementation.

Off Main Thread Compositing (OMTC) and why it matters

Like most desktop applications, Firefox is driven by an event loop. Currently this event loop is servicing a lot of events including page layout, drawing, image decoding and don’t forget JS. We do our best to handle events quickly (in a few milliseconds) and break up the ones that we know will take longer (such as image decoding). Any event, such as poorly written JS, that takes too long to process will cause the application to feel sluggish and will cause updates, animations and videos to pause.

Every web page is broken into a set of layers (backgrounds, canvas, video, web contents, position fixed element, elements that are being animated) by our layers system. When these layers are updated they must be flattened to the screen to show a final frame. This process is called ‘composition’. Currently this composition is driven by the main event loop. While compositing does add more load to the event loop from my experience the load it adds is often negligible. After all with hardware accelerated layers backends the work is mostly just queuing a few simple graphics command to the GPU.

Simplified painting pipeline (Desktop Current)

So why move compositing to a second thread if it’s relatively cheap? Because we need the event loop to be responsive to service composite events and any slow events (>15ms), such as a long JS script, will cause us to drop frames. By compositing on a separate thread we can still service layers update and keep the browser partially responsive even if the event loop is momentarily blocked by a long running script.

Simplified OMTC painting pipeline (Fennec Beta)

There is one catch however. If the main event loop is being hogged by an event then that will block certain updates, like page reflows, from reaching the compositor. However other types of updates can still be made while the main event loop is bogged down such as video which we decode in yet another thread, css/gif animations and plugins to name a few. Another use case is to support smooth panning and zooming on mobile. The goal we’re aiming for is to make Firefox more responsive.

The feature just shipped into the latest Fennec beta and provides a smoother experience while consuming less resources.

Simplified OMTC painting pipeline (Desktop/Fennec Future)

In the future we plan on landing more changes that will leverage this new architecture. We are considering many proposal shown above. Some are uncertain at this point such as having canvas workers. Right now our current focus is passing decoded video frames directly to the compositor and handling simple CSS animation asynchronously. These features will allow video and animations to perform smoothly over short periods of blocking similar to async scrolling on Fennec. We’re hoping to have some of these features landed in Q2 for mobile and desktop where OMTC will likely still be behind a preference until later this year. Stay tuned to these bugs to try out these features when they land.