I posted a change to TryChooser that will fetch the build queue and display it when selecting tests. Do your part by avoiding platforms with large queues if possible to get your results faster and help reduce contention.
With the landing of bug 799640, the profiler add-on can now profile shutdown. I’ve created a dashboard that summarizes data from 10 of my shutdown profiles. Take a look at the dashboard here. You will find the shutdown profiles in the second section half way down the page. You’ll see 10 profiles under total or per function, clicking on any of them will open them in the preview and will filter by the function. Shutdown is at the end of the profile so I recommend focusing from the first pink shutdown marker and onward. You’ll see the following shutdown markers:
What are we doing during shutdown? Here’s some trivia about my shutdowns:
After I’ve had time to study this data more I’m hoping gather performance data on other people’s profile to give a better sample of shutdown times since my Firefox profile is simple and I don’t stockpile 100s of tabs in my session like most do.
Now you can perform bugzilla queries from your URL bar. Here are some examples:
bug FIXED reporter:bgirard
bug blocking-fennec:+ assigned:bgirard
Here is a complete list of flags: https://bugzilla.mozilla.org/page.cgi?id=quicksearch.html#fields. Note that status: does not work in 4.0 (The current BMO version), see bug 662444.
We’ve been using the profiler to diagnose snappy problems for several months now with great success. The profiler can tell us which images are slow to decode, what scripts take longer then 15 ms to execute and what they’re doing. However the callstack doesn’t provide enough information to show what caused the execution. For example a small script snippet that changes the style causing cascading style changes that morphs the entire page. The script execution may takes less then 1 ms (i.e. likely not show up in a profile) and mark the styles as having changed. Then the next event recomputes the style causing the entire document to be repainted. The goal is to blame a quick running cause (a page change) to long running task (repainting).
Charging to the rescue! My goal is to be able to record a source for something like a page change and tag it/charge it to the work that is triggered. This is going to be a big project and I’m not ready to abandon my current tasks so I plan on working on a small piece every time I encounter the need for this. If you’re interested in helping out please let me know. Here are my current plan:
Part 1) Bug 809317: Adding infrastructure for printing the current location for debugging (Ready to land soon).
Part 2) Bug 777828: Allow unwinding results to be returned
Part 3) No bug: Support chaining unwinds as ’causes’
Part 4) No bug: When sampling collect ’causes’ attributed to the current work.
Part 5) No bug: Display ’causes’ in the profiler UI
Part 6) No bug: Gather ’causes’ for expensive async operations.
Note this is just a rough plan. I don’t know how well this will work out in practice.
Just a quick update on the changes being worked on in the Gecko Profiler github repo:
Just a quick update on the Eideticker profiling support William and I have been working on. All the changes needed to sync a video recording with a profile have landed. They will show up as a binary counter in the top left of the frame. This counter is read and the samples collected for that frame are highlighted. It’s simple but effective and very useful for optimizing how we draw.
You can try this yourself by checking out this real life example recording this morning. When stepping through the video the selection will be updated to match the current frame in the top left. You can then filter samples for the current frame. Note that in mobile because of Off-Main-Thread-Compositing we typically present many intermediate frames before getting an update from the main thread.
Just a quick update in whats new in Gecko Profiler in the past month:
- Added support to profile Fennec nightly builds if you have an rooted phone. (Changes are still on inbound but should be ready soon)
- Added support for Windows XP.
- Improved the selection code to guess the optimal selection when changing views (top-down to bottom-up, focus on a subview, normal to js-only).
- Profiler now listens to the GC stats.
With the native release we refactored how we render to use a tiling approach. This is beneficial because it lets us minimize the work needed to paint as we pan and zoom. The goal is to be able to increase and decrease the size of our view and move its position and in a logically unbounded page without having to reallocate and copy our retained page buffer.
This refactoring was also a blockers for other optimizations that I am currently working on implementing. First I landed a patch to add the ability to draw progressively and interrupt drawing in chunks of tiles (bug 771219). This lets our content thread and compositor paint+upload in parallel instead of serially. This opens up the possibility of showing painting progressively tile by tile. Interrupting drawing will let us decide that the user panned outside of where we are painting, abort the operation and re-target our paint.
Next up I’m currently working on drawing tiles at a low resolution to replace the ‘screenshot code’. Currently we try to detect when the page changed and we paint it into a small offscreen ‘screenshot’ buffer. This ‘screenshot’ is drawn in areas of the page that we’re still working on painting. This is a huge improvement to the user experience. However the current code isn’t integrated to our Layers system which means that the page change notifications are not reliable and updates are more expensive. The goal is move this code inside the Layers system where it can overcome these limitations and make it tile based so we can improve this code.
Once all of these tile improvements are ready it will let us improve our painting code from our current approach of predicting where the view is going to be and painting it start to finish, sending it to the gpu all at once and hoping that what we painting is still inside the view. With these new changes we will be able to improve our heuristics by aborting painting if it’s outside the view, drawing quickly at a reduced resolution first if we’re panning quickly, drawing the most important tiles first, presenting the painting progressively and uploading it the gpu in parallel piece by piece.
Here’s a demonstration of my current set of patches. Note that the performance isn’t tweaked and it’s tested on a slow page (i.e. gradient) to better demonstrate the progressive and reduced resolution painting.
This demonstrates progressive tile painting and a simple heuristics to draw new tiles first at low resolution then to draw them at a full resolution. It does not abort or prioritize tile painting yet which would be useful between 2 second and 3 second where we’re still painting outside the screen.
The Gecko Profiler now supports profiling JS along side with C++ thanks to the amazing work of Alex Crichton. Note that this feature is available across all platforms including desktop, Android and B2G! We’ve already been testing it out over the past few weeks and it has already yield important insight into performance bottlenecks caused by js scripts (766333, 767070, 774811, 777266, 765930, and more) as well as it has identified many problematic add-ons we plan to follow up on.
This new tool will let us make sure the browser is always snappy. Unlike other profilers each sample is not treated equality but instead looks for areas of the code that block the responsiveness of the browser. As Taras calls it, this is an interactivity profiler. Below are some of the examples of problems with profiles and an explanation of the diagnostic we derived.
Here’s an example of the data we can collect, you can see the profile yourself here. In this profile the forum has a script called writeLink() which invokes Document.write() repeatedly. This is something that should be avoided as we can see further down the stack it causes synchronous HTML5 parsing to happen for each call to document.write(). This in of itself is slow but not enough for a web author to notice on its own. However with adblock installed this causes the content policy to be checked for the whole document after each and every call to Document.write() slowing this page to a crawl.
Here’s an example of a JS call that takes over 125ms to return leading to a short UI stutter. Follow along with the profile. Here we have a timer invoking ‘Thumbnails_capture() @ browser.js:5552′ (outside the screenshot, see the profile link). This causes the page to be rendered into an offscreen surface then compressed as a PNG. This particular page happens to have a WebGL context which makes the matters worse. To get a WebGL context into an offscreen image, as opposed to the screen in a hardware accelerated configuration, required this surface to be read back causing a synchronous GPU flush. Then the page must be painted and composite into the offscreen image. Once this is completed the image is encoded to a PNG. Running all these steps at once is enough to block the UI for 0.1 seconds. While this isn’t particularly long it is enough for a video, scrolling and animations to briefly feel jerky.
The solution in this case is to break up the image rendering and the encode into two separate events. Yielding two shorter pauses of 60ms letting the animations run smoother. In the future we will want to either optimize, break down or move off the main thread the rendering and the PNG encoding.
Laying out the page is a difficult process that is heavily optimized in all major browsers. For the most part this step is invisible to most pages and Just Works. The real problem occurs when a script queries the position of an element after it has modified the page. When the page is modified the layout state is marked dirty and any query to it will cause your page to be laid out synchronously to calculate the up-to-date position of your element.
Here’s a real world profile of this we found in the nightly version of Fennec (this code wasn’t shipped). In this case we where doing something that resembles this:
var y1 = myElement1.offsetLeft; // <-- Causes a layout flush myElement2.style.left = y1 + 5; // <-- Dirty layout information var y2 = myElement2.offsetLeft; // <-- Causes another layout flush myElement2.style.left = y2 + 5; //
<-- Dirty layout information// Repeat a few times
For a page of the complexity of CNN.com a layout will take more then 50ms+ on mobile, now doing 4 layouts synchronously from JS just to layout a few elements near each other instead of using HTML/CSS will cost you over 200ms. You will have likely dirty the layout information before your script is finished needing a layout and finally a paint so you’ve just blocked the main thread for over 0.4 seconds on mobile. Be very careful to avoid triggering synchronous layout!
With the new JS support is now easier to diagnose issues so try it out, file bug and write about your commonly found anti-patterns!
I ran a quick experiment after someone pointed out to me that second generation Intel GPU provide features for querying power management status of the CPU such as the current frequency and power usage of the CPU in Watts. I re-purposed the responsiveness correlation in the Gecko Profiler to instead use the average power consumption of the CPU over the last few milliseconds. This let you see the current power usage of Firefox as it does different tasks (JS, layout, gfx). The idea is similar to looking at throughput performance: Find that area of the code that has the highest power and energy consumption and optimize it. Here’s a sample:
I should note that the data is expected to be noisy because I had applications running and the typical set of background processes you’d find on Mac. Nevertheless every profile I ran showed the power usage drop significantly at every point where Firefox was waiting for events so this proves that it is in fact working. I haven’t done much analysis on the data but a quick look at the profiles suggest that our SSE2 code is particularly power hungry.
A neat idea would be to compute the energy consumption from the power consumption and break it down into Gecko Modules.
The biggest roadblock to implementing this is that the power information isn’t available in user mode and I’m don’t think that APIs are widely exposed by operating system. Luckily Intel provides a sample library and driver that let you access this information. Once I had this in place it was simply a matter of querying this information rather then the event loop status like the Profiler normally does. Because this data requires a driver you wont see this feature hit the Profiler unless I see a big demand for it.