Efficient Multi-Process profiling on B2G

Until a few days ago profiling on b2g was either off or on for the whole system. Worse profiling secondary threads would profile secondary threads of every process. These limitations caused profiling to overwhelm the system and skew performance numbers. Additionally it was difficult to follow how processes waited on each other.

With the landing Bug 914654 it is now possible to profile specific threads on specific processes and merge the results with little effort. Currently profiling secondary threads is disabled on b2g but just locally remove the gonk #ifdef from mozilla_sampler_register_thread and mozilla_sampler_unregister_thread and you’re good to go. Once that’s ready use the profile.sh to start profiling the important threads of your choice. For example if you’re looking into animation delay with the Homescreen you want to run ‘./profile.sh start b2g Compositor && ./profile.sh start Homescreen’ then run ./profile.sh pull which will prepare and merge the data into profile_captured.sym.

Here’s a sample profile collected for the Homescreen swipe animations. Here you can notice the b2g compositor waiting for the paint from the Homescreen then furiously compositing afterwards at nearly 60 FPS. Thus the delay in starting the Homescreen swipe is not caused by the compositor but rather caused by the Homescreen taking too long to rasterize the layers containing the app icons. In this case it’s taking 100ms to prepare which means we’ve missed the first 6 frames of the animation! Happy profiling!

Multi-Process b2g profile

Multi-Process b2g profile

Better Visual Studio Projects

I just wanted to add to Vlad’s post and show the current progress of generating the Visual Studio Project made this weekend by Vlad and I.

By using the approach mentioned in Vlad’s post we can target Visual Studio directly instead of a build system like Ninja. This gives Visual Studio the correct build flags for each file allowing features like IntelliSense and building from Visual Studio using MSBuild to work perfectly. I’m generating a project per shared library and a Visual Studio solution to group the results together. The results are quite neat because it shows at a glance which sources files are included in each library. Any files listed in the visual studio project are safe to edit which should cover the common developer workflow of editing cpp/headers and recompiling. Other rarer changes for platform developers like adding a new file and tweaking build settings should be done by closing visual studio, doing a local change and rebuilding with the mozilla build system and re-openning the updated Visual Studio Solution.

Visual Studio Build

Visual Studio IntelliSense + Build

This isn’t ready for general use yet because we still don’t generate the Link step properly and handle some corner cases in XUL yet but simpler libraries like gkmedias.dll is already working well. Supporting XUL should only require a few days of work. Once this is fixed we can move to supporting advanced Visual Studio features like “Edit and Continue” and distributed compiling.

Profiler @ Snappy Work Week

Last week I attended the snappy work week with goal of syncing up with Julian Seward and Mike Hommey regarding the new Breakpad unwinding feature, planning and prioritizing new feature work. I also got lots of valuable feedback. Here’s what you need to know if you’re a frequent user of the Gecko Profiler.

Breakpad Unwinding (Mobile)

For the last few months Julian has been spearheading integrating the breakpad unwinding feature to the Gecko Profiler. This turned out to be a very complicated task because we’re trying to use Breakpad for something it wasn’t meant to do. This work is now landed but it only works in a very specific build configuration (unwind tables, unpacking the libraries, no elfhack). We are currently working on addressing these problems to make it work in the default Nightly configuration.

Our current target is to use Breakpad unwinding in the Linux May 1st nightly. A week later have the Fennec nightly support breakpad unwinding. I’m hoping afterwards to support Breakpad unwinding on B2G central (not b2g18) but I expect this may be difficult on a memory constraint phone so I currently have no estimate for this at this time.

If you looked at the code for the profiler you may notice TableTicker1+TableTicker2. During the work week we refactored the code significantly to address this code duplication while resolving inconsistencies between the frame-pointer unwinder and the Breakpad unwinder.

Multi-Thread/Multi-Process Support

The profiler will soon support showing samples and timelines for multiple threads. This is already somewhat supported on B2G if you use a script written by Vlad. This will become easier as we begin to add multi-thread support to all the platform backends (already in progress for Mac and Linux) thanks to the help of James Willcox. This ties in heavily with the next two section (IPDL/Plugins).

Plugin Support

Plugin issues have long been identified as the single biggest contributor to responsiveness issues by Vladan‘s Chrome Hang work. This was the topic of the break out discussion headed by Benjamin Smedberg. We discussed a plan of attack by adding profiling to the plugin process which should land this week but require a manual step. This feature will let us know what’s going on in the plugin process when the browser’ main thread is waiting for an IPC response (and vice versa). It will improve as we add profiling support for profiling IPDL messages.

Flash Profiling

Flash initializing audio on plugin-container startup.

IPDL Message Support

During the week we decided to add profiler labels to each IPDL call automatically. That will help where the unwinder isn’t supported but doesn’t give us the data we need. In particular we’re looking to get better data about how IPDL messages are queued, how long they take to get dequeued and to processed. Combined with the Plugin support this should help us diagnose complex issues with our plugin-container or get better data on how particular plugins are hurting the responsiveness of the browser. Georg Fritzsche has an excellent plan in bug 853864 on how to implement this.

Shutdown Reporter Results

In a previous blog post I asked for volunteers to send in their shutdown profiles automatically. I’ve received hundreds of slow shutdown profiles, 30 profiles of which are over 10 seconds and one as high as 80 seconds! These profiles show us where Firefox is spending its time when closing down to the millisecond.

Before I started this experiment I asked around for what people thought was the cause of the slow shutdowns. The overwhelming response I got was that it was Garbage Collection and Cycle Collection. While those operations do make the shutdowns slower, they were rarely responsible for those 10 to 80 seconds shutdown times.

Here’s some interesting issues we found:

  • A few start-up optimizations moved operations off the startup path and into timers that would update the startup cache, or load fonts. These timers were never canceled when shutting down so they would lead to things such as font loading while shutting down. Since you have to open & close the browser within a minute to see these issues they are not critical for our users but these are certainly bugs we are cleaning up. Keep in mind if you are running timers you should stop them on shutdown before they fire if they are no longer useful.
  • On a related note we have timers going off during the shutdown. Particularly GC and CC timers. There is no guarantee that the shutdown will take long enough for these timers to run. If these are useless then we will waste time but if they are important they may not run.
  • Stopping plug-ins can take up to a second. I do not capture what is happening in the plug-in process so I can’t tell if it’s our container or the plug-in.
  • Releasing style sheets on shutdown will sometimes total over 1 second of execution on slow shutdown.
  • Time spent in sqlite3close in some profiles is as high as 10 seconds.
  • Bookmark backups are hurting us.
Shutdown metrics

Shutdown metrics

A big thumbs up to people who are submitting shutdown profiles. The data we are collecting is very valuable in helping us understand how Firefox behaves outside the labs.

GTest Has Landed. Start Writing Your Unit Tests.

This weekend GTest landed in the mozilla-central tree. See Bug 767231 for the changes and follow up bugs. We have some follow-up changes coming such as adding a mach target, replacing –enable-gtest by –enable-tests and adding gtests to tinderbox. Everything is now ready for developers to start adding their own unit tests to libxul.

All the unit tests will be linked within libxul. This means that you don’t have to export any symbols you intend on testing. See Bug 767231 for the pros and cons of this solution. The summary is that unit tests will run against a different libxul library (libxul+unittest) than the one we will ship (libxul only) at the benefit of having access to all the symbols. Unit tests will not be shipped in a normal release build.

To run GTest build with ‘–enable-gtest’ and simply add the -unittest option when running firefox. To add an unit test create a test .cpp file, declare it to the makefile ‘GTEST_CPPSRCS = TestFoo.cpp’ and you’re done.

For more details see the up-to-date documentation: https://developer.mozilla.org/en-US/docs/GTest


Is Your Shutdown Slow? Turn On The Performance Reporter!

Does your Firefox take a long time to shutdown? Do you get “Firefox is already running” messages when restarting? Well the Firefox Performance team would like to fix that, so please keep reading!

The Gecko Profiler extension now has a new feature called the (Automatic) ‘Performance Reporter’. If you opt into this feature, the profiler will watch your browser for any performance hiccups and will automatically send a detailed bug report for the Performance team to fix! Currently, this Performance Reporter feature monitors only your browser’s shutdown behavior and only sends us a report if shutdown takes longer then 1.5 seconds, but we would like to extend it to cover all aspects of Firefox performance. The reporter works in the background and does not interfere with your use of the browser.

The performance reports contain the URLs of the tabs you have open and are uploaded publicly for processing, so if you are not comfortable with communicating this information, do not enable this feature — we understand! We have taken precautions to never record anything coming from a private browsing window. Also, please note that the Performance Reporter is not related to Firefox’s Telemetry or Health Report features which never send any sensitive information such as URLs. The Performance Reporter is different — it collects very rich data so we can get a complete picture of any performance problem.

How do you turn this on? Make sure you’re running the latest version of the Gecko Profiler which should read as v1.11.17 or later. If I have not scared you away great, turning on the Performance Reporter is as simple as flipping ‘profiler.performanceReporter’ in about:config.

Analyzing page load performance frame-by-frame

We’ve previously added video profiles in mobile and run them daily with eideticker. Here’s a proof a concept for doing the same thing for tracking page load on desktop.

This works by tagging frames and CPU samples with a frame number. During the analysis we use this information to correlate the video stream with the CPU samples. This lets us take something like a page load, slow it down and watch what we’re executing at each frame.

Take a look at this profile and follow along. I’ll only describe 4 interesting frames but feel free to look at the rest. When seeking with the video the corresponding region will be highlighted in the profile timeline. You can then click on samples in that timeline and they will be selected and appear as green:

  • Seek to 4 seconds where the page is requested. This will continue to 5.5 seconds. During this we see a little network activity and a bit of work compositing to update the throbber. This is no surprise since my WiFi signal was poor.
  • Seek to 5.5 seconds where the page starts to appear on screen. Lets look closer at the first frame in order.
    1. 20 ms compiling JS
    2. 8 ms executing the JS from the page
    3. 5 ms is spent in Adblock
    4. 11 ms processing styles
    5. 37 ms reflowing the page, 16 ms of which is doing main thread IO loading fonts when calling ‘CGFontCreateWithFontName’.
    6. Negligible time building the display list (profiler sampling rate too high)
    7. 19 ms drawing and compositing
  • Seek to 5.9 seconds where we are waiting on the plugin to load. The plugin takes 200ms to load. This means we miss several frames.
  • Seek to about 7.3 seconds. During this frame the plugin stream is created. This adds another 100 ms delay.
Video Profile of Page Load

Video Profile of Page Load

My goal is to start making video capture features into the profiler add-on to make these profiles easier to capture.