Testing a JS WebApp

Test Requirements

I’ve been putting off testing my cleopatra project for a while now https://github.com/bgirard/cleopatra because I wanted to take the time to find a solution that would satisfy the following:

  1. The tests can be executed by running a particularly URL.
  2. The tests can be executed headless using a script.
  3. No server side component or proxy is required.
  4. Stretch goal: Continuous integration tests.

After a bit of research I came up with a solution that addressed my requirements. I’m sharing here in case this helps others.

First I found that the easiest way to achieve this is to find a Test Framework to get 1) and find a solution to run a headless browser for 3.

Picking a test framework

For the Test Framework I picked QUnit. I didn’t have any strong requirements there so you may want to review your options if you do. With QUnit I load my page in an iframe and inspect the resulting document after performing operations. Here’s an example:

QUnit.test("Select Filter", function(assert) {
  loadCleopatra({
    query: "?report=4c013822c9b91ffdebfbe6b9ef300adec6d5a99f&select=200,400",
    assert: assert,
    testFunc: function(cleopatraObj) {
    },
    profileLoadFunc: function(cleopatraObj) {
    },
    updatedFiltersFunc: function(cleopatraObj) {
      var samples = shownSamples(cleopatraObj);

      // Sample count for one of the two threads in the profile are both 150
      assert.ok(samples === 150, "Loaded profile");
    }
  });
});

Here I just load a profile, and once the document fires an updateFilters event I check that the right number of samples are selected.

You can run the latest cleopatra test here: http://people.mozilla.org/~bgirard/cleopatra/test.html

Picking a browser (test) driver

Now that we have a page that can run our test suite we just need a way to automate the execution. Turns out that PhantomJS, for webkit, and SlimerJS, for Gecko, provides exactly this. With a small driver script we can load our test.html page and set the process return code based on the result of our test framework, QUnit in this case.

Stretch goal: Continuous integration

If you hooked up the browser driver to run via a simple test.sh script adding continuous integration should be simple. Thanks to Travis-CI and Github it’s easy to setup your test script to run per check-in and set notifications.

All you need is to configure Travis-CI to looks at your repo and to check-in an appropriate .travis.cml config file. Your travis.yml should configure the environment. PhantomJS is pre-installed and should just work. SlimerJS requires a Firefox binary and a virtual display so it just requires a few more configuration lines. Here’s the final configuration:

env:
  - SLIMERJSLAUNCHER=$(which firefox) DISPLAY=:99.0 PATH=$TRAVIS_BUILD_DIR/slimerjs:$PATH
addons:
  firefox: "33.1"
before_script:
  - "sh -e /etc/init.d/xvfb start"
  - "echo 'Installing Slimer'"
  - "wget http://download.slimerjs.org/releases/0.9.4/slimerjs-0.9.4.zip"
  - "unzip slimerjs-0.9.4.zip"
  - "mv slimerjs-0.9.4 ./slimerjs"

notifications:
  irc:
    channels:
      - "irc.mozilla.org#perf"
    template:
     - "BenWa: %{repository} (%{commit}) : %{message} %{build_url}"
    on_success: change
    on_failure: change

script: phantomjs js/tests/run_qunit.js test.html && ./slimerjs/slimerjs js/tests/run_qunit.js $PWD/test.html

Happy testing!

Gecko Bootcamp Talks

Last summer we held a short bootcamp crash course for Gecko. The talks have been posted to air.mozilla.com and collected under the TorontoBootcamp tag. The talks are about an hour each but will be very informative to some. They are aimed at people wanting a deeper understanding of Gecko.

View the talks here: https://air.mozilla.org/search/?q=tag%3A+TorontoBootcamp

Gecko Pipeline

Gecko Pipeline

In the talks you’ll find my first talk covering an overall discussion of the pipeline, what stages run when and how to skip stages for better performance. Kannan’s talk discusses Baseline, our first tier JIT. Boris’ talk discusses Restyle and Reflow. Benoit Jacob’s talk discusses the graphics stack (Rasterization + Compositing + IPC layer) but sadly the camera is off center for the first half. Jeff’s talk goes into depth into Rasterization, particularly path drawing. My second talk discusses performance analysis in Gecko using the gecko profiler where we look at real profiles of real performance problems.

I’m trying to locate two more videos about layout and graphics that were given at another session but would elaborate more the DisplayList/Layer Tree/Invalidation phase and another on Compositing.

CallGraph Added to the Gecko Profiler

In the profiler you’ll now find a new tab called ‘CallGraph’. This will construct a call graph from the sample data. This is the same information that you can extract from the tree view and the timeline but just formatted so that it can be scanned better. Keep in mind that this is only a call graph of what occurred between sample points and not a fully instrumented Call Graph dump. This has a lower collection overhead but missing anything that occurs between sample points. You’ll still want to use the Tree view to get aggregate costs. You can interact with the view using your mouse or with the W/A/S/D equivalent keys of your keyboard layout.

Profiler CallGraph

Profiler CallGraph

Big thanks to Victor Porof for writing the initial widget. This visualization will be coming to the devtools profiler shortly.

Improving Layer Dump Visualization

I’ve blogged before about adding a feature to visualize platforms log dumps including the layer tree. This week while working on bug 1097941 I had no idea which module the bug was coming from. I used this opportunity to improve the layer visualization features hoping that it would help me identify the bug. Here are the results (working for both desktop and mobile):

Layer Tree Visualization Demo
Layer Tree Visualization Demo – Maximize me

This tools works by parsing the output of layers.dump and layers.dump-texture (not yet landed). I reconstruct the data as DOM nodes which can quite trivially support the features of a layers tree because layers tree are designed to be mapped from CSS. From there some javascript or the browser devtools can be used to inspect the tree. In my case all I had to do was locat from which layer my bad texture data was coming from: 0xAC5F2C00.

If you want to give it a spin just copy this pastebin and paste it here and hit ‘Parse’. Note: I don’t intend to keep backwards compatibility with this format so this pastebin may break after I go through the review for the new layers.dump-texture format.

GPU Profiling has landed

A quick remainder that one of the biggest benefit to having our own built-in profiler is that individual teams and project can add their own performance reporting features. The graphics team just landed a feature to measure how much GPU time is consumed when compositing.

I already started using this in bug 1087530 where I used it to measure the improvement from recycling our temporary intermediate surfaces.

Screenshot 2014-10-23 14.35.29Here we can see that the frame had two rendering phases (group opacity test case) totaling 7.42ms of GPU time. After applying the patch from the bug and measuring again I get:

Screenshot 2014-10-23 14.38.15Now with retaining the surface the rendering GPU time drops to 5.7ms of GPU time. Measuring the GPU time is important because timing things on the CPU time is not accurate.

Currently we still haven’t completed the D3D implementation or hooked it up to WebGL, we will do that as the need arises. To implement this, when profiling, we insert a query object into the GPU pipeline for each rendering phase (framebuffer switches).

B2G Performance Polish: Unlock Animation (Part 2)

Our destination

Our starting point is a 200ms unlock delay, and an uniform ~25 FPS animation. Our aim should be a 0ms unlock delay and a uniform 60 FPS (or whatever vsync is). The former we will minimize as much as we can but the latter is entirely possible.

Let’s talk about how we should design a lock screen animation in the optimal case. When we go and apply it in practice we often hit some requirements and constraint that make it impossible to behave like we want but let’s ignore that for a second and discuss we want to get to.

In the ideal case we would have the lockscreen rendered offscreen to a GPU texture(s) set. We would have the background app ready in another GPU texture(s) set. These are ‘Layers’. We place the background app behind the lockscreen. When the transition begins we notify the compositor to start fading out the lockscreen. Keep them around costs memory but if we keep the right thing around we can reduce or eliminate repaints entirely.

Our memory requirement is what’s required by the background app + about one fullscreen layer required for the lockscreen. This should be fine for even low end of B2G phones. Our overdraw should be about 200%-300%, again enough for mobile GPUs to keep up at 60 FPS/vsync.

Ideal Lockscreen Layer Tree

Now let’s look at what we hope our timeline for our Main Thread and our Compositor Thread to look like:

Ideal Unlock Timeline

We want to use Off-Main-Thread-Animation to perform the fade entirely on the Compositor. This will be initiated on the main thread and will require a style flush to set a CSS transform transition. If done right we don’t expect to have to reflow or repaint any part of the page if we properly built the layer like shown in the first figure. Note that the style flush will contribute to the unlock delay (and so will the first composite time as incorrectly shown in the diagram). If we can keep that style flush + first composite in under say 50ms and each composite in 16ms or less then we should have a smooth unlock animation.

Next up let’s look at what’s actually happening in the unlock animation in practice…

B2G Performance Polish: Unlock Animation (Part 1)

I’ve decided to start a blog series documenting my workflow for performance investigation. Let me know if you find this useful and I’ll try to make this a regular thing. I’ll update this blog to track the progress made by myself, and anyone who wants to jump in and help.

I wanted to start with the b2g unlock animation. The animation is O.K. but not great and is core to the phone experience. I can notice a delay from the touch up to the start of the animation. I can notice that the animation isn’t smooth. Where do we go from here? First we need to quantity how things stand.

Measuring The Starting Point

The first thing is to grab a profile. From the profile we can extract a lot of information (we will look at it again in future parts). I run the following command:

./profile.sh start -p b2g -t GeckoMain,Compositor
*unlock the screen, wait until the end of the animation*
./profile.sh capture
*open profile_captured.sym in http://people.mozilla.org/~bgirard/cleopatra/*

This results in the following profile. I recommend that you open it and follow along. The lock animation starts from t=7673 and runs until 8656. That’s about 1 second. We can also note that we don’t see any CPU idle time so we’re working really hard during the unlock animation. Things aren’t looking great from a first glance.

I said that there was a long delay at the start of the animation. We can see a large transaction at the start near t=7673. The first composition completes at t=7873. That mean that our unlock delay is about 200ms.

Now let’s look at how the frame rate is for this animation. In the profile open the ‘Frames’ tab. You should see this (minus my overlay):

Lockscreen Frame Uniformity

Alright so our starting point is:

Unlock delay: 200ms

Frame Uniformity: ~25 FPS, poor uniformity

Next step

In part 2 we’re going to discuss the ideal implementation for a lock screen. This is useful because we established a starting point in part 1, part 2 will establish a destination.