Analyzing Shutdown Performance

With the landing of bug 799640, the profiler add-on can now profile shutdown. I’ve created a dashboard that summarizes data from 10 of my shutdown profiles. Take a look at the dashboard here. You will find the shutdown profiles in the second section half way down the page. You’ll see 10 profiles under total or per function, clicking on any of them will open them in the preview and will filter by the function. Shutdown is at the end of the profile so I recommend focusing from the first pink shutdown marker and onward. You’ll see the following shutdown markers:

  • Shutdown start: A shutdown has been requested.
  • Shutdown early: Where we hope to call exit(0) eventually. Nearly everything after will not run with a few exceptions that will be moved before this point.
  • Shutdown xpcom: Where we hope to call exit(0) soon.

What are we doing during shutdown? Here’s some trivia about my shutdowns:

  • Takes from 1,000ms to 2,700ms making shutdown highly variable. As expected runs with higher uptime had slower shutdowns.
  • Most of the shutdown time is spent doing GC/CC. This is because we persist data on destructors forcing us to clean up. This is getting fixed by exit(0) in bug 662444.
  • Placing exit(0) at ‘Shutdown early’ would save between 1/3 to 2/3 of the shutdown time depending on the instance.
  • Our shutdown is blocked by plugins because they may also need to persist data. Sadly this has to happen synchronously on the main thread and takes a consistent 90ms for me. We’re discussing options in bug 818265.
  • Pages such as block shutdown by running scripts on ‘page hide’ which is called on shutdown. This is unbounded but in my case was about 100ms.
  • Startup and shutdown performance bugs are being tracked in bug 810156.
  • Shutting down with less the 10 seconds of uptime will cause the startup cache to be saved on shutdown bugĀ 816656.

After I’ve had time to study this data more I’m hoping gather performance data on other people’s profile to give a better sample of shutdown times since my Firefox profile is simple and I don’t stockpile 100s of tabs in my session like most do.

Quick bugzilla queries

Here’s a tip to speed up entering a bugzilla search query. Set a ‘bug’ keyword to bugzilla’s quicksearch field:

Now you can perform bugzilla queries from your URL bar. Here are some examples:

bug FIXED reporter:bgirard
bug component:Graphics
bug blocking-fennec:+ assigned:bgirard

Here is a complete list of flags: Note that status: does not work in 4.0 (The current BMO version), see bug 662444.

Profiler: Charging expensive operations to cheap async causes

We’ve been using the profiler to diagnose snappy problems for several months now with great success. The profiler can tell us which images are slow to decode, what scripts take longer then 15 ms to execute and what they’re doing. However the callstack doesn’t provide enough information to show what caused the execution. For example a small script snippet that changes the style causing cascading style changes that morphs the entire page. The script execution may takes less then 1 ms (i.e. likely not show up in a profile) and mark the styles as having changed. Then the next event recomputes the style causing the entire document to be repainted. The goal is to blame a quick running cause (a page change) to long running task (repainting).

Charging to the rescue! My goal is to be able to record a source for something like a page change and tag it/charge it to the work that is triggered. This is going to be a big project and I’m not ready to abandon my current tasks so I plan on working on a small piece every time I encounter the need for this. If you’re interested in helping out please let me know. Here are my current plan:

Part 1) Bug 809317: Adding infrastructure for printing the current location for debugging (Ready to land soon).

Part 2) Bug 777828: Allow unwinding results to be returned

Part 3) No bug: Support chaining unwinds as ’causes’

Part 4) No bug: When sampling collect ’causes’ attributed to the current work.

Part 5) No bug: Display ’causes’ in the profiler UI

Part 6) No bug: Gather ’causes’ for expensive async operations.
Note this is just a rough plan. I don’t know how well this will work out in practice.

Upcoming Gecko Profiler Changes

Just a quick update on the changes being worked on in the Gecko Profiler github repo:

  • Moving the profiler over to the toolbar.
  • Button to profile startup. This restarts the browser and give you a profile (example, bug 799638)
  • A target feature that will allow profiling either the current application or open a remote connection to profile over TCP or ADB+TCP. This will replace the confusing set of Mobile buttons and use the remote debugging protocol like the debugger/console.
  • Cleopatra links will now reflect the current selection (if the profile has been uploaded). This will let you share the exact location where the problem is. (example)

Gecko Profiler 1.9.4

Just a quick update in whats new in Gecko Profiler in the past month:
– Added support to profile Fennec nightly builds if you have an rooted phone. (Changes are still on inbound but should be ready soon)
– Added support for Windows XP.
– Show only Javascript frames (Ehsan said he would blog about this one soon)
– Improved the selection code to guess the optimal selection when changing views (top-down to bottom-up, focus on a subview, normal to js-only).
– Profiler now listens to the GC stats.

GC stats provide more details

Details can be expended
– View the JS Source
Javascript source listing

Javascript Profiling with The Gecko Profiler and JS Anti-pattern

The Gecko Profiler now supports profiling JS along side with C++ thanks to the amazing work of Alex Crichton. Note that this feature is available across all platforms including desktop, Android and B2G! We’ve already been testing it out over the past few weeks and it has already yield important insight into performance bottlenecks caused by js scripts (766333, 767070, 774811, 777266, 765930, and more) as well as it has identified many problematic add-ons we plan to follow up on.

This new tool will let us make sure the browser is always snappy. Unlike other profilers each sample is not treated equality but instead looks for areas of the code that block the responsiveness of the browser. As Taras calls it, this is an interactivity profiler. Below are some of the examples of problems with profiles and an explanation of the diagnostic we derived.

Example 1: document.write() and Adblock

Here’s an example of the data we can collect, you can see the profile yourself here. In this profile the forum has a script called writeLink() which invokes Document.write() repeatedly. This is something that should be avoided as we can see further down the stack it causes synchronous HTML5 parsing to happen for each call to document.write(). This in of itself is slow but not enough for a web author to notice on its own. However with adblock installed this causes the content policy to be checked for the whole document after each and every call to Document.write() slowing this page to a crawl.

Forum+Adblock jank caused by document.write interaction

Forum+Adblock jank caused by document.write interaction

Example 2: Thumbnail blocking the main thread

Here’s an example of a JS call that takes over 125ms to return leading to a short UI stutter. Follow along with the profile. Here we have a timer invoking ‘Thumbnails_capture() @ browser.js:5552′ (outside the screenshot, see the profile link). This causes the page to be rendered into an offscreen surface then compressed as a PNG. This particular page happens to have a WebGL context which makes the matters worse. To get a WebGL context into an offscreen image, as opposed to the screen in a hardware accelerated configuration, required this surface to be read back causing a synchronous GPU flush. Then the page must be painted and composite into the offscreen image. Once this is completed the image is encoded to a PNG. Running all these steps at once is enough to block the UI for 0.1 seconds. While this isn’t particularly long it is enough for a video, scrolling and animations to briefly feel jerky.
The solution in this case is to break up the image rendering and the encode into two separate events. Yielding two shorter pauses of 60ms letting the animations run smoother. In the future we will want to either optimize, break down or move off the main thread the rendering and the PNG encoding.

Thumbnailing causing Jank

Thumbnail code causing Jank

Example 3: Flushing layout from JS

Laying out the page is a difficult process that is heavily optimized in all major browsers. For the most part this step is invisible to most pages and Just Works. The real problem occurs when a script queries the position of an element after it has modified the page. When the page is modified the layout state is marked dirty and any query to it will cause your page to be laid out synchronously to calculate the up-to-date position of your element.

Here’s a real world profile of this we found in the nightly version of Fennec (this code wasn’t shipped). In this case we where doing something that resembles this:

var y1 = myElement1.offsetLeft; // <-- Causes a layout flush = y1 + 5; // <-- Dirty layout information
var y2 = myElement2.offsetLeft; // <-- Causes another layout flush = y2 + 5; // <-- Dirty layout information
// Repeat a few times

For a page of the complexity of a layout will take more then 50ms+ on mobile, now doing 4 layouts synchronously from JS just to layout a few elements near each other instead of using HTML/CSS will cost you over 200ms. You will have likely dirty the layout information before your script is finished needing a layout and finally a paint so you’ve just blocked the main thread for over 0.4 seconds on mobile. Be very careful to avoid triggering synchronous layout!

JS Flushing Layout

JS Flushing Layout synchronously to get position information

Find your own examples and blog them!

With the new JS support is now easier to diagnose issues so try it out, file bug and write about your commonly found anti-patterns!

Making Profiling Easier: Automated Profiler Diagnostics

The Gecko Profiler now supports automated diagnostics. I’ve added simple logic to look for common known signature of main thread IO, synchronous plugin operations and garbage collection to make profiler even easier to read. The goal here is to make the profiles easier to read and warn others where we already have bugs on file so instead of filing a dupe they can jump right into the discussion and help resolving the bug.

Automated Diagnostics

Automated Diagnostics – Click to expand!

GC Heavy Page - Click to expand!

GC Heavy Page – Click to expand!

Currently we only look for a few signatures but the plan is to expand the list. The signatures are for Mac but I’ll be accepting signatures for all platforms. If you have your own issue you would like to add either ping me, file a bug under ‘Core::Gecko Profiler’ or send me a pull request.

GFX Changes in Firefox 13

Every release the GFX team makes major changes to Firefox. By the time the release ships we’re often focused on future releases. Last week we decided in the public Graphics meeting (Join us) that we wanted to go back and announce the changes as they move in the release cycle to bring more visibility to what users are getting when they updating to Firefox 13. We hope to continue these posts every release. Here is the list I quickly put together using a few bugzilla and hg log queries. I tried to cherry pick important changes without any hard set of criteria. This list is far from inclusive. The best approximation of the Graphics changes in Firefox 13 is this.

I would also like to point out that we received major patches from the community in Firefox 13 and that this trend continues in Firefox 14, 15. We are starting development on Firefox 16 so get in touch with us if you’d like to contribute to GFX for Firefox 16.

Changes in Gecko 13

This released we focused on implementing the foundation for Off-main-thread compositing (OMTC). See my blog post for what OMTC buys us. It is still alpha quality but we will be shipping for mobile shortly and are in the process of getting it ready for Desktop:

  • BugĀ 598873 – Implemented alpha quality OMTC for Mac and Android

Improved our support for macbooks with dual GPU. You’ll find that Firefox will make better decisions about using the integrated GPU for low power usage and switch to the discrete GPU to handle intensive WebGL and plug-ins:

  • Bug 713305, 713305 – Improve dual GPU support on Mac with WebGL

Improved startup performance by optimizating how we query for D3D support on start-up:

  • Bug 722225 – Improve Firefox startup speed by ~5% (-70ms) on Windows by optimizing D3D10CreateDevice1

Updated our Graphics libraries:

  • Bug 721068 – Update to latest graphite2 code from upstream
  • Bug 698519 – Update to libjpeg-turbo 1.2

WebGL Improvements. Anisotropic support has been added. Check out the demo of anisotropic filter and select the checkbox to see the difference. NOTE: Anisotropic is not yet supported on windows with ANGLE renderer but we are working on it!

  • Bug 728354 – WebGL anisotropic texture filtering (Contributor)
WebGL anisotropic support

WebGL anisotropic support (on the right). Click to expand.

  • Bug 676071 – WebGL long identifier mapping (security and conformance)
  • Bug 710594 – WebGL about:memory improvements (Contributor)
WebGL Memory reporting

WebGL Memory reporting (1)

WebGL Memory reporting

WebGL Memory reporting (2)

  • Various WebGL conformance improvements (Bug 730554, Bug 727547) (Contributor)

Added support for apitrace as a debugging tool for powerusers on Android:

  • Bug 674753 – Add support for loading apitrace explicitly on Android
APITrace support for Fennec

APITrace support for Fennec

Various improvements to hardware accelerated compositing for all platforms.
Various improvements to font support.
Various improvements to the Mac Azure Quartz Canvas implementation.