Posted by: benoitgirard | November 2, 2012

Upcoming Gecko Profiler Changes

Just a quick update on the changes being worked on in the Gecko Profiler github repo:

  • Moving the profiler over to the toolbar.
  • Button to profile startup. This restarts the browser and give you a profile (example, bug 799638)
  • A target feature that will allow profiling either the current application or open a remote connection to profile over TCP or ADB+TCP. This will replace the confusing set of Mobile buttons and use the remote debugging protocol like the debugger/console.
  • Cleopatra links will now reflect the current selection (if the profile has been uploaded). This will let you share the exact location where the problem is. (example)

Posted by: benoitgirard | September 21, 2012

Video Synced Profiling

Just a quick update on the Eideticker profiling support William and I have been working on. All the changes needed to sync a video recording with a profile have landed. They will show up as a binary counter in the top left of the frame. This counter is read and the samples collected for that frame are highlighted. It’s simple but effective and very useful for optimizing how we draw.

Video correlation allows stepping samples frame by frame

You can try this yourself by checking out this real life example recording this morning. When stepping through the video the selection will be updated to match the current frame in the top left. You can then filter samples for the current frame. Note that in mobile because of Off-Main-Thread-Compositing we typically present many intermediate frames before getting an update from the main thread.

Posted by: benoitgirard | September 4, 2012

Gecko Profiler 1.9.4

Just a quick update in whats new in Gecko Profiler in the past month:
- Added support to profile Fennec nightly builds if you have an rooted phone. (Changes are still on inbound but should be ready soon)
- Added support for Windows XP.
- Show only Javascript frames (Ehsan said he would blog about this one soon)
- Improved the selection code to guess the optimal selection when changing views (top-down to bottom-up, focus on a subview, normal to js-only).
- Profiler now listens to the GC stats.

GC stats provide more details


Details can be expended
- View the JS Source
Javascript source listing

Posted by: benoitgirard | August 29, 2012

Tiling Improvements in Fennec

With the native release we refactored how we render to use a tiling approach. This is beneficial because it lets us minimize the work needed to paint as we pan and zoom. The goal is to be able to increase and decrease the size of our view and move its position and in a logically unbounded page without having to reallocate and copy our retained page buffer.

This refactoring was also a blockers for other optimizations that I am currently working on implementing. First I landed a patch to add the ability to draw progressively and interrupt drawing in chunks of tiles (bug 771219). This lets our content thread and compositor paint+upload in parallel instead of serially. This opens up the possibility of showing painting progressively tile by tile. Interrupting drawing will let us decide that the user panned outside of where we are painting, abort the operation and re-target our paint.

Next up I’m currently working on drawing tiles at a low resolution to replace the ‘screenshot code’. Currently we try to detect when the page changed and we paint it into a small offscreen ‘screenshot’ buffer. This ‘screenshot’ is drawn in areas of the page that we’re still working on painting. This is a huge improvement to the user experience. However the current code isn’t integrated to our Layers system which means that the page change notifications are not reliable and updates are more expensive. The goal is move this code inside the Layers system where it can overcome these limitations and make it tile based so we can improve this code.

Once all of these tile improvements are ready it will let us improve our painting code from our current approach of predicting where the view is going to be and painting it start to finish, sending it to the gpu all at once and hoping that what we painting is still inside the view. With these new changes we will be able to improve our heuristics by aborting painting if it’s outside the view, drawing quickly at a reduced resolution first if we’re panning quickly, drawing the most important tiles first, presenting the painting progressively and uploading it the gpu in parallel piece by piece.

Here’s a demonstration of my current set of patches. Note that the performance isn’t tweaked and it’s tested on a slow page (i.e. gradient) to better demonstrate the progressive and reduced resolution painting.

This demonstrates progressive tile painting and a simple heuristics to draw new tiles first at low resolution then to draw them at a full resolution. It does not abort or prioritize tile painting yet which would be useful between 2 second and 3 second where we’re still painting outside the screen.

The Gecko Profiler now supports profiling JS along side with C++ thanks to the amazing work of Alex Crichton. Note that this feature is available across all platforms including desktop, Android and B2G! We’ve already been testing it out over the past few weeks and it has already yield important insight into performance bottlenecks caused by js scripts (766333, 767070, 774811, 777266, 765930, and more) as well as it has identified many problematic add-ons we plan to follow up on.

This new tool will let us make sure the browser is always snappy. Unlike other profilers each sample is not treated equality but instead looks for areas of the code that block the responsiveness of the browser. As Taras calls it, this is an interactivity profiler. Below are some of the examples of problems with profiles and an explanation of the diagnostic we derived.

Example 1: document.write() and Adblock

Here’s an example of the data we can collect, you can see the profile yourself here. In this profile the forum has a script called writeLink() which invokes Document.write() repeatedly. This is something that should be avoided as we can see further down the stack it causes synchronous HTML5 parsing to happen for each call to document.write(). This in of itself is slow but not enough for a web author to notice on its own. However with adblock installed this causes the content policy to be checked for the whole document after each and every call to Document.write() slowing this page to a crawl.

Forum+Adblock jank caused by document.write interaction

Forum+Adblock jank caused by document.write interaction

Example 2: Thumbnail blocking the main thread

Here’s an example of a JS call that takes over 125ms to return leading to a short UI stutter. Follow along with the profile. Here we have a timer invoking ‘Thumbnails_capture() @ browser.js:5552′ (outside the screenshot, see the profile link). This causes the page to be rendered into an offscreen surface then compressed as a PNG. This particular page happens to have a WebGL context which makes the matters worse. To get a WebGL context into an offscreen image, as opposed to the screen in a hardware accelerated configuration, required this surface to be read back causing a synchronous GPU flush. Then the page must be painted and composite into the offscreen image. Once this is completed the image is encoded to a PNG. Running all these steps at once is enough to block the UI for 0.1 seconds. While this isn’t particularly long it is enough for a video, scrolling and animations to briefly feel jerky.
The solution in this case is to break up the image rendering and the encode into two separate events. Yielding two shorter pauses of 60ms letting the animations run smoother. In the future we will want to either optimize, break down or move off the main thread the rendering and the PNG encoding.

Thumbnailing causing Jank

Thumbnail code causing Jank

Example 3: Flushing layout from JS

Laying out the page is a difficult process that is heavily optimized in all major browsers. For the most part this step is invisible to most pages and Just Works. The real problem occurs when a script queries the position of an element after it has modified the page. When the page is modified the layout state is marked dirty and any query to it will cause your page to be laid out synchronously to calculate the up-to-date position of your element.

Here’s a real world profile of this we found in the nightly version of Fennec (this code wasn’t shipped). In this case we where doing something that resembles this:

var y1 = myElement1.offsetLeft; // <-- Causes a layout flush
myElement2.style.left = y1 + 5; // <-- Dirty layout information
var y2 = myElement2.offsetLeft; // <-- Causes another layout flush
myElement2.style.left = y2 + 5; // <-- Dirty layout information
// Repeat a few times

For a page of the complexity of CNN.com a layout will take more then 50ms+ on mobile, now doing 4 layouts synchronously from JS just to layout a few elements near each other instead of using HTML/CSS will cost you over 200ms. You will have likely dirty the layout information before your script is finished needing a layout and finally a paint so you’ve just blocked the main thread for over 0.4 seconds on mobile. Be very careful to avoid triggering synchronous layout!

JS Flushing Layout

JS Flushing Layout synchronously to get position information

Find your own examples and blog them!

With the new JS support is now easier to diagnose issues so try it out, file bug and write about your commonly found anti-patterns!

I ran a quick experiment after someone pointed out to me that second generation Intel GPU provide features for querying power management status of the CPU such as the current frequency and power usage of the CPU in Watts. I re-purposed the responsiveness correlation in the Gecko Profiler to instead use the average power consumption of the CPU over the last few milliseconds. This let you see the current power usage of Firefox as it does different tasks (JS, layout, gfx). The idea is similar to looking at throughput performance: Find that area of the code that has the highest power and energy consumption and optimize it. Here’s a sample:

Power Usage in the Gecko Profiler

Power Usage in the Gecko Profiler (View the profile yourself) Color=power usage, Height=Call depth, x=time

I should note that the data is expected to be noisy because I had applications running and the typical set of background processes you’d find on Mac. Nevertheless every profile I ran showed the power usage drop significantly at every point where Firefox was waiting for events so this proves that it is in fact working. I haven’t done much analysis on the data but a quick look at the profiles suggest that our SSE2 code is particularly power hungry.

A neat idea would be to compute the energy consumption from the power consumption and break it down into Gecko Modules.

Implementation

The biggest roadblock to implementing this is that the power information isn’t available in user mode and I’m don’t think that APIs are widely exposed by operating system. Luckily Intel provides a sample library and driver that let you access this information. Once I had this in place it was simply a matter of querying this information rather then the event loop status like the Profiler normally does. Because this data requires a driver you wont see this feature hit the Profiler unless I see a big demand for it.

Posted by: benoitgirard | June 8, 2012

Making Profiling Easier: Automated Profiler Diagnostics

The Gecko Profiler now supports automated diagnostics. I’ve added simple logic to look for common known signature of main thread IO, synchronous plugin operations and garbage collection to make profiler even easier to read. The goal here is to make the profiles easier to read and warn others where we already have bugs on file so instead of filing a dupe they can jump right into the discussion and help resolving the bug.

Automated Diagnostics

Automated Diagnostics – Click to expand!

GC Heavy Page - Click to expand!

GC Heavy Page – Click to expand!

Currently we only look for a few signatures but the plan is to expand the list. The signatures are for Mac but I’ll be accepting signatures for all platforms. If you have your own issue you would like to add either ping me, file a bug under ‘Core::Gecko Profiler’ or send me a pull request.

Posted by: benoitgirard | May 24, 2012

GFX Changes in Firefox 13

Every release the GFX team makes major changes to Firefox. By the time the release ships we’re often focused on future releases. Last week we decided in the public Graphics meeting (Join us) that we wanted to go back and announce the changes as they move in the release cycle to bring more visibility to what users are getting when they updating to Firefox 13. We hope to continue these posts every release. Here is the list I quickly put together using a few bugzilla and hg log queries. I tried to cherry pick important changes without any hard set of criteria. This list is far from inclusive. The best approximation of the Graphics changes in Firefox 13 is this.

I would also like to point out that we received major patches from the community in Firefox 13 and that this trend continues in Firefox 14, 15. We are starting development on Firefox 16 so get in touch with us if you’d like to contribute to GFX for Firefox 16.

Changes in Gecko 13

This released we focused on implementing the foundation for Off-main-thread compositing (OMTC). See my blog post for what OMTC buys us. It is still alpha quality but we will be shipping for mobile shortly and are in the process of getting it ready for Desktop:

  • BugĀ 598873 – Implemented alpha quality OMTC for Mac and Android

Improved our support for macbooks with dual GPU. You’ll find that Firefox will make better decisions about using the integrated GPU for low power usage and switch to the discrete GPU to handle intensive WebGL and plug-ins:

  • Bug 713305, 713305 – Improve dual GPU support on Mac with WebGL

Improved startup performance by optimizating how we query for D3D support on start-up:

  • Bug 722225 – Improve Firefox startup speed by ~5% (-70ms) on Windows by optimizing D3D10CreateDevice1

Updated our Graphics libraries:

  • Bug 721068 – Update to latest graphite2 code from upstream
  • Bug 698519 – Update to libjpeg-turbo 1.2

WebGL Improvements. Anisotropic support has been added. Check out the demo of anisotropic filter and select the checkbox to see the difference. NOTE: Anisotropic is not yet supported on windows with ANGLE renderer but we are working on it!

  • Bug 728354 – WebGL anisotropic texture filtering (Contributor)
WebGL anisotropic support

WebGL anisotropic support (on the right). Click to expand.

  • Bug 676071 – WebGL long identifier mapping (security and conformance)
  • Bug 710594 – WebGL about:memory improvements (Contributor)
WebGL Memory reporting

WebGL Memory reporting (1)

WebGL Memory reporting

WebGL Memory reporting (2)

  • Various WebGL conformance improvements (Bug 730554, Bug 727547) (Contributor)

Added support for apitrace as a debugging tool for powerusers on Android:

  • Bug 674753 – Add support for loading apitrace explicitly on Android
APITrace support for Fennec

APITrace support for Fennec

Various improvements to hardware accelerated compositing for all platforms.
Various improvements to font support.
Various improvements to the Mac Azure Quartz Canvas implementation.

Posted by: benoitgirard | May 17, 2012

Dev Tip: Clang-complete for Vim with mozilla

This blog post is a quick explanation of how to setup clang_complete with mozilla. Note: You don’t need to build with clang to use it to complete, this means you can use it for Fennec.

 

Since these instruction will change from time to time I put them on the wiki page for everyone to help correct errors:

https://wiki.mozilla.org/Clang_complete_in_mozilla

Sometimes you have an optimized build for whatever reason (say you’re doing a lot of profiling) but optimizations make non trivial debugging impossible. You don’t have an up to date build without optimization so you whine, start a non optimize build and start looking at bugzilla for 20 mins.

Frankenstein optimized/non optimized build to the rescue! Simply add:

CXXFLAGS += -O0 -g

to the Makefile for the module(s) you’re interested in debugging, for me it was gfx/layers/Makefile.in.

How does this work? Well optimizations are done at the object level and each object file are built to follow the ABI. As long as the ABI is followed, and it really really should, then you can expect this to work without any problems.

Disclaimer: This isn’t supported! If you have problems then do a clobber build.

« Newer Posts - Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.