Posted by: benoitgirard | January 5, 2013

WebGL Project: Galaxy Map

I made my first small WebGL app that fetches a star catalog and displays stars within 25 parsec (80 lightyear) using WebGL+Three.JS. I also created a second view that will only show named star and will try to fetch their Wikipedia page on mouse over. I didn’t trim the catalog so be prepared to wait for 8mb of text to load :).

Try it out:

View 1: Stars within 80 light years

View 1: Stars within 80 light years

View 2: Named stars only

View 2: Named stars only

This demo is missing a lot of optimization so it shouldn’t be used for benchmarking. I also unfortunately didn’t get around to calculating and displaying the relative size of each star.

Posted by: benoitgirard | January 2, 2013

TryChooser now helps you get results faster

I posted a change to TryChooser that will fetch the build queue and display it when selecting tests. Do your part by avoiding platforms with large queues if possible to get your results faster and help reduce contention.

TryChooser Load

TryChooser Load: Avoid platforms with high numbers.

Posted by: benoitgirard | December 5, 2012

Analyzing Shutdown Performance

With the landing of bug 799640, the profiler add-on can now profile shutdown. I’ve created a dashboard that summarizes data from 10 of my shutdown profiles. Take a look at the dashboard here. You will find the shutdown profiles in the second section half way down the page. You’ll see 10 profiles under total or per function, clicking on any of them will open them in the preview and will filter by the function. Shutdown is at the end of the profile so I recommend focusing from the first pink shutdown marker and onward. You’ll see the following shutdown markers:

  • Shutdown start: A shutdown has been requested.
  • Shutdown early: Where we hope to call exit(0) eventually. Nearly everything after will not run with a few exceptions that will be moved before this point.
  • Shutdown xpcom: Where we hope to call exit(0) soon.

What are we doing during shutdown? Here’s some trivia about my shutdowns:

  • Takes from 1,000ms to 2,700ms making shutdown highly variable. As expected runs with higher uptime had slower shutdowns.
  • Most of the shutdown time is spent doing GC/CC. This is because we persist data on destructors forcing us to clean up. This is getting fixed by exit(0) in bug 662444.
  • Placing exit(0) at ‘Shutdown early’ would save between 1/3 to 2/3 of the shutdown time depending on the instance.
  • Our shutdown is blocked by plugins because they may also need to persist data. Sadly this has to happen synchronously on the main thread and takes a consistent 90ms for me. We’re discussing options in bug 818265.
  • Pages such as block shutdown by running scripts on ‘page hide’ which is called on shutdown. This is unbounded but in my case was about 100ms.
  • Startup and shutdown performance bugs are being tracked in bug 810156.
  • Shutting down with less the 10 seconds of uptime will cause the startup cache to be saved on shutdown bugĀ 816656.

After I’ve had time to study this data more I’m hoping gather performance data on other people’s profile to give a better sample of shutdown times since my Firefox profile is simple and I don’t stockpile 100s of tabs in my session like most do.

Posted by: benoitgirard | November 22, 2012

Quick bugzilla queries

Here’s a tip to speed up entering a bugzilla search query. Set a ‘bug’ keyword to bugzilla’s quicksearch field:

Now you can perform bugzilla queries from your URL bar. Here are some examples:

bug FIXED reporter:bgirard
bug component:Graphics
bug blocking-fennec:+ assigned:bgirard

Here is a complete list of flags: Note that status: does not work in 4.0 (The current BMO version), see bug 662444.

Posted by: benoitgirard | November 7, 2012

Profiler: Charging expensive operations to cheap async causes

We’ve been using the profiler to diagnose snappy problems for several months now with great success. The profiler can tell us which images are slow to decode, what scripts take longer then 15 ms to execute and what they’re doing. However the callstack doesn’t provide enough information to show what caused the execution. For example a small script snippet that changes the style causing cascading style changes that morphs the entire page. The script execution may takes less then 1 ms (i.e. likely not show up in a profile) and mark the styles as having changed. Then the next event recomputes the style causing the entire document to be repainted. The goal is to blame a quick running cause (a page change) to long running task (repainting).

Charging to the rescue! My goal is to be able to record a source for something like a page change and tag it/charge it to the work that is triggered. This is going to be a big project and I’m not ready to abandon my current tasks so I plan on working on a small piece every time I encounter the need for this. If you’re interested in helping out please let me know. Here are my current plan:

Part 1) Bug 809317: Adding infrastructure for printing the current location for debugging (Ready to land soon).

Part 2) Bug 777828: Allow unwinding results to be returned

Part 3) No bug: Support chaining unwinds as ’causes’

Part 4) No bug: When sampling collect ’causes’ attributed to the current work.

Part 5) No bug: Display ’causes’ in the profiler UI

Part 6) No bug: Gather ’causes’ for expensive async operations.
Note this is just a rough plan. I don’t know how well this will work out in practice.

Posted by: benoitgirard | November 2, 2012

Upcoming Gecko Profiler Changes

Just a quick update on the changes being worked on in the Gecko Profiler github repo:

  • Moving the profiler over to the toolbar.
  • Button to profile startup. This restarts the browser and give you a profile (example, bug 799638)
  • A target feature that will allow profiling either the current application or open a remote connection to profile over TCP or ADB+TCP. This will replace the confusing set of Mobile buttons and use the remote debugging protocol like the debugger/console.
  • Cleopatra links will now reflect the current selection (if the profile has been uploaded). This will let you share the exact location where the problem is. (example)

Posted by: benoitgirard | September 21, 2012

Video Synced Profiling

Just a quick update on the Eideticker profiling support William and I have been working on. All the changes needed to sync a video recording with a profile have landed. They will show up as a binary counter in the top left of the frame. This counter is read and the samples collected for that frame are highlighted. It’s simple but effective and very useful for optimizing how we draw.

Video correlation allows stepping samples frame by frame

You can try this yourself by checking out this real life example recording this morning. When stepping through the video the selection will be updated to match the current frame in the top left. You can then filter samples for the current frame. Note that in mobile because of Off-Main-Thread-Compositing we typically present many intermediate frames before getting an update from the main thread.

Posted by: benoitgirard | September 4, 2012

Gecko Profiler 1.9.4

Just a quick update in whats new in Gecko Profiler in the past month:
- Added support to profile Fennec nightly builds if you have an rooted phone. (Changes are still on inbound but should be ready soon)
- Added support for Windows XP.
- Show only Javascript frames (Ehsan said he would blog about this one soon)
- Improved the selection code to guess the optimal selection when changing views (top-down to bottom-up, focus on a subview, normal to js-only).
- Profiler now listens to the GC stats.

GC stats provide more details

Details can be expended
- View the JS Source
Javascript source listing

Posted by: benoitgirard | August 29, 2012

Tiling Improvements in Fennec

With the native release we refactored how we render to use a tiling approach. This is beneficial because it lets us minimize the work needed to paint as we pan and zoom. The goal is to be able to increase and decrease the size of our view and move its position and in a logically unbounded page without having to reallocate and copy our retained page buffer.

This refactoring was also a blockers for other optimizations that I am currently working on implementing. First I landed a patch to add the ability to draw progressively and interrupt drawing in chunks of tiles (bug 771219). This lets our content thread and compositor paint+upload in parallel instead of serially. This opens up the possibility of showing painting progressively tile by tile. Interrupting drawing will let us decide that the user panned outside of where we are painting, abort the operation and re-target our paint.

Next up I’m currently working on drawing tiles at a low resolution to replace the ‘screenshot code’. Currently we try to detect when the page changed and we paint it into a small offscreen ‘screenshot’ buffer. This ‘screenshot’ is drawn in areas of the page that we’re still working on painting. This is a huge improvement to the user experience. However the current code isn’t integrated to our Layers system which means that the page change notifications are not reliable and updates are more expensive. The goal is move this code inside the Layers system where it can overcome these limitations and make it tile based so we can improve this code.

Once all of these tile improvements are ready it will let us improve our painting code from our current approach of predicting where the view is going to be and painting it start to finish, sending it to the gpu all at once and hoping that what we painting is still inside the view. With these new changes we will be able to improve our heuristics by aborting painting if it’s outside the view, drawing quickly at a reduced resolution first if we’re panning quickly, drawing the most important tiles first, presenting the painting progressively and uploading it the gpu in parallel piece by piece.

Here’s a demonstration of my current set of patches. Note that the performance isn’t tweaked and it’s tested on a slow page (i.e. gradient) to better demonstrate the progressive and reduced resolution painting.

This demonstrates progressive tile painting and a simple heuristics to draw new tiles first at low resolution then to draw them at a full resolution. It does not abort or prioritize tile painting yet which would be useful between 2 second and 3 second where we’re still painting outside the screen.

The Gecko Profiler now supports profiling JS along side with C++ thanks to the amazing work of Alex Crichton. Note that this feature is available across all platforms including desktop, Android and B2G! We’ve already been testing it out over the past few weeks and it has already yield important insight into performance bottlenecks caused by js scripts (766333, 767070, 774811, 777266, 765930, and more) as well as it has identified many problematic add-ons we plan to follow up on.

This new tool will let us make sure the browser is always snappy. Unlike other profilers each sample is not treated equality but instead looks for areas of the code that block the responsiveness of the browser. As Taras calls it, this is an interactivity profiler. Below are some of the examples of problems with profiles and an explanation of the diagnostic we derived.

Example 1: document.write() and Adblock

Here’s an example of the data we can collect, you can see the profile yourself here. In this profile the forum has a script called writeLink() which invokes Document.write() repeatedly. This is something that should be avoided as we can see further down the stack it causes synchronous HTML5 parsing to happen for each call to document.write(). This in of itself is slow but not enough for a web author to notice on its own. However with adblock installed this causes the content policy to be checked for the whole document after each and every call to Document.write() slowing this page to a crawl.

Forum+Adblock jank caused by document.write interaction

Forum+Adblock jank caused by document.write interaction

Example 2: Thumbnail blocking the main thread

Here’s an example of a JS call that takes over 125ms to return leading to a short UI stutter. Follow along with the profile. Here we have a timer invoking ‘Thumbnails_capture() @ browser.js:5552′ (outside the screenshot, see the profile link). This causes the page to be rendered into an offscreen surface then compressed as a PNG. This particular page happens to have a WebGL context which makes the matters worse. To get a WebGL context into an offscreen image, as opposed to the screen in a hardware accelerated configuration, required this surface to be read back causing a synchronous GPU flush. Then the page must be painted and composite into the offscreen image. Once this is completed the image is encoded to a PNG. Running all these steps at once is enough to block the UI for 0.1 seconds. While this isn’t particularly long it is enough for a video, scrolling and animations to briefly feel jerky.
The solution in this case is to break up the image rendering and the encode into two separate events. Yielding two shorter pauses of 60ms letting the animations run smoother. In the future we will want to either optimize, break down or move off the main thread the rendering and the PNG encoding.

Thumbnailing causing Jank

Thumbnail code causing Jank

Example 3: Flushing layout from JS

Laying out the page is a difficult process that is heavily optimized in all major browsers. For the most part this step is invisible to most pages and Just Works. The real problem occurs when a script queries the position of an element after it has modified the page. When the page is modified the layout state is marked dirty and any query to it will cause your page to be laid out synchronously to calculate the up-to-date position of your element.

Here’s a real world profile of this we found in the nightly version of Fennec (this code wasn’t shipped). In this case we where doing something that resembles this:

var y1 = myElement1.offsetLeft; // <-- Causes a layout flush = y1 + 5; // <-- Dirty layout information
var y2 = myElement2.offsetLeft; // <-- Causes another layout flush = y2 + 5; // <-- Dirty layout information
// Repeat a few times

For a page of the complexity of a layout will take more then 50ms+ on mobile, now doing 4 layouts synchronously from JS just to layout a few elements near each other instead of using HTML/CSS will cost you over 200ms. You will have likely dirty the layout information before your script is finished needing a layout and finally a paint so you’ve just blocked the main thread for over 0.4 seconds on mobile. Be very careful to avoid triggering synchronous layout!

JS Flushing Layout

JS Flushing Layout synchronously to get position information

Find your own examples and blog them!

With the new JS support is now easier to diagnose issues so try it out, file bug and write about your commonly found anti-patterns!

« Newer Posts - Older Posts »



Get every new post delivered to your Inbox.