Using RecordReplay to investigate intermittent oranges, bug #2 part 2

As promised here’s the follow up to part 1.

Getting Lost in a Replay

In part 1 I made a mistake. I accidentally got lost in the replay and started debugging from the wrong point. I believe I might have started with the wrong event number.

The best way to make sure you’re not getting lost in a replay is to use the ‘when’ command to print the current timeline event id that you’re at. With this event ID you can compare it against the stdout event marks and make sure you are at the point in the replay that you expect. Because of issue #1653 it wasn’t possible to use ‘when’ without risking a crash so I wasn’t able to see that I was lost. However this is now fixed on trunk! This was fixed today and I was able to use it from time to time and make sure I was where I wanted to be.

On this note I think this RR needs a feature to make it easier to stay within a certain bounds. It’s easy when doing continue/reverse continue to jump past the reftest that you’re interested in and suddenly you’re debugging in something unrelated.

Restarting from the Display List

This time the display list was reporting that both the good and the bad frame had the correct Image Rotation. However the frame size was wrong.

I had originally assumed that the image wasn’t rotated because of the proportion it had. However on a closer look the image was in fact rotated properly but it was stretched out in the opposite direction giving the impression that the image was not rotated.

With this information it was now clear that the rotation was correct but the image frame size was not.

Building a Timeline and Exploiting Checkpoints

It turns out that there’s a few interesting events for how the nsImageFrame is sized. There’s ‘nsLayoutUtils::OrientImage’, ‘nsImageFrame::Reflow’, ‘nsImageFrame::GetIntrinsicSize’ and ‘nsImageFrame::UpdateIntrinsicSize’.

Now since we’re suspected a race conditions it’s likely that these things are called in a bad sequence in the ‘bad’ trace. Turns out that guess was correct.

You can build a timeline manually using ‘when’ and ‘when-ticks’ and keeping notes however instead we exploited the checkpoint feature which also records the ‘when’. This keeps track of the events that you care about, ‘when’ they occurs and also makes it easy to jump to back to these key moments. Once we’ve created checkpoints at the interesting location we could can call ‘info checkpoint’ and read off the call sequence by manually sorting on the ‘when’. This keeps better notes and makes it easier to jump to relevant points. We ended up doing a lot of back and forth with these checkpoints trying to understand the timeline differences. Here’s the good trace:

(rr) info checkpoint
ID    When    Where
1    646393      PresShell::RenderDocument (this=0x3c831e67f000, aRect=..., aFlags=12, aBackgroundColor=4294967295, aThebesContext=0x2f701d8a5280) at /home/bgirard/mozilla-central/tree/layout/base/nsPresShell.cpp:4497
2    646484      nsDisplayList::PaintRoot (this=0x7ffede3eaf50, aBuilder=0x7ffede3eb180, aCtx=0x7ffede3eb970, aFlags=1) at /home/bgirard/mozilla-central/tree/layout/base/nsDisplayList.cpp:1565
3    644554      nsLayoutUtils::OrientImage (aContainer=0xf3568bed200, aOrientation=...) at /home/bgirard/mozilla-central/tree/layout/base/nsLayoutUtils.cpp:6702
4    644598      0x00003c831654ff9e in nsImageFrame::Reflow (this=0x20940978ad38, aPresContext=0x3c831e512000, aMetrics=..., aReflowState=..., aStatus=@0x7ffede3f10b4: 0) at /home/bgirard/mozilla-central/tree/layout/generic/nsImageFrame.cpp:959
6    644566      nsImageFrame::GetIntrinsicSize (this=0x20940978ad38) at /home/bgirard/mozilla-central/tree/layout/generic/nsImageFrame.cpp:919
7    644554      nsImageFrame::UpdateIntrinsicSize (this=0x20940978ad38, aImage=0xf3568b17070) at /home/bgirard/mozilla-central/tree/layout/generic/nsImageFrame.cpp:289

He we can tell that the ‘good’ call sequence is UpdateIntrinsicSize, GetIntrinsicSize, Reflow. Doing the same for the ‘bad’ replay gives us: GetIntrinsicSize, UpdateIntrinsicSize, Reflow. Note how we call UpdateIntrinsicSize with the right value but never query them. That’s bad!

At some point I think RR could use some UI improvements to make it easier to build timelines (and deal with ‘when’ ties) but not today.

Testing Timeline Theories

Now at this point we had several theories about the timeline. Theories were tested mostly using reverse-continue to get a better idea of the timeline differences and a few reverse-continue with a memory watchpoint (watch -l).

The Bug at Last

Turns out that the bug is pretty complicated. There’s a race condition in which we fire the ‘load’ event on the ImageDocument. At this point we’ve actually fully decoded the image and have the correct size and rotation for the image. However in the ‘bad’ trace we’re missing something very important. We haven’t built the proper StyleVisibility for the image frame which is missing an important ‘image-orientation’ property. Without this CSS property we will ignore the image rotation when computing the size ImageDocument even though it’s set correctly and on time. We also have no hooks to invalidate the size of the ImageDocument when the CSS propertie finally comes in so in this race condition the bad value sticks forever. We will get the correct image rotation value when we paint and thus the image will be rotated correctly but since the ImageDocument has the wrong size it will be stretched out vertically.

The Fix

Turns out that we can avoid sizing the image frame entirely and just use CSS to implement the shrinkToFit while preserving the aspect ratio. The fix was to use ‘object-fit: contain;’ instead of trying to sync up the width/height of ImageDocument, nsImageFrame and nsIDOMHTMLImageElement as the CSS styles are changing. Using CSS to implement the shrinkToFit behavior is much more robust.

The good news is fixing this wasn’t just fixing an infrastructure race. The bad behavior can actually be seen manually by toggling ‘image-orientation’ in an image document with exif rotation. This intermittent was warning us about a real problem.

Using RecordReplay to investigate intermittent oranges, bug #2

This is a follow up from my previous blog entry. I wanted to see if I could debug faster with the previous lessons learned. I managed to reproduced the #2 intermittent reftest (#1 which was recently fixed).

This bug was nicer because I had a single replay that had a passing and a failling run in the same trace. This means that I could jump around my replay and compare an expected execution vs. a failing a execution simply by running ‘run 343663’ and ‘run 344535’ (from the reftest TEST-START event log).

Step 1. Compare the frame tree

$ (rr) run 343651

$ (rr) goto-snapshot

$ (rr) next-display-list

Image p=0x20187201a1a0 f=0x7f1261051d38(ImageFrame(img)(0)

(rr) print *((nsIFrame*)0x7f1261051d38)->mImage.mRawPtr

$26 = (mozilla::image::OrientedImage) {
rotation = mozilla::image::Angle::D90


$ (rr) run 344535

$ (rr) goto-snapshot

$ (rr) next-display-list
Image p=0x210a130a89a0 f=0x30f01d0acd38(ImageFrame(img)(0) class:shrinkToFit)

(rr) print *((nsIFrame*)0x30f01d0acd38)->mImage.mRawPtr
$24 = (mozilla::image::OrientedImage) {

rotation = mozilla::image::Angle::D0,

step 2. Trace the mismatched value

Alright so this means that when the intermittent failure occurs the rotation field is incorrectly set incorrectly in the ImageFrame. It’s hard to find out why a field isn’t being set so let’s focus on finding out what is setting D90 in the correct case.

$ (rr) watch -l ((nsIFrame*)0x7f1261051d38)->mImage.mRawPtr->mOrientation.rotation

$ (rr) reverse-continue

(…) We see the rotation is provided from StyleVisibility()->mImageOrientation

mImage = nsLayoutUtils::OrientImage(image, StyleVisibility()->mImageOrientation);

Another reverse continue step and we see that it’s being from mozilla::image::RasterImage::SetMetadata. This is called in both the good and bad test with D90 and D0 respectively.

Another step backwards to ‘mozilla::image::Decoder::PostSize(int, int, mozilla::image::Orientation)’. Again the mismatch is seen.

Digging further brought me to the code in ‘mozilla::image::EXIFParser::ParseOrientation’. Both are trying to read at position 24 of the Exif buffer of the same length.

(rr) x/30x mStart
0x1b807e4201e0:    0x66697845    0x4d4d0000    0x00002a00    0x07000800
0x1b807e4201f0:    0x03001201    0x01000000    0x00000600


0x49394230e1e0:    0x66697845    0x4d4d0000    0x00002a00    0x07000800
0x49394230e1f0:    0x03001201    0x01000000    0x00000100

My original theory was a race in getting the data from the decoder and displaying the frame but at this point it looks like something entirely different. We can see here that the Exif data is the same except for the single rotation byte that we want to read.

Data is still wrong in save_marker at tree/media/libjpeg/jdmarker.c:811. reverse-continue from here leads to me:

#0  __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238
#1  0x00007b605468563a in mozilla::image::SourceBuffer::Append (this=0x344b09866e20, aData=0x21b203e9e000 “\377\330\377”, <incomplete sequence \340>, aLength=32768) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:311
#2  0x00007b60546859d3 in mozilla::image::AppendToSourceBuffer (aClosure=0x344b09866e20, aFromRawSegment=0x21b203e9e000 “\377\330\377”, <incomplete sequence \340>, aCount=32768, aWriteCount=0x7ffcab77bba8) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:368
#3  0x00007b6052fe40d1 in nsPipeInputStream::ReadSegments (this=0x21b203e8d980, aWriter=0x7b6054685996 <mozilla::image::AppendToSourceBuffer(nsIInputStream*, void*, char const*, uint32_t, uint32_t, uint32_t*)>, aClosure=0x344b09866e20, aCount=90700, aReadCount=0x7ffcab77bc20) at /home/bgirard/mozilla-central/tree/xpcom/io/nsPipe3.cpp:1283
#4  0x00007b6054685a3f in mozilla::image::SourceBuffer::AppendFromInputStream (this=0x344b09866e20, aInputStream=0x21b203e8d980, aCount=90700) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:385
#5  0x00007b605463d893 in mozilla::image::RasterImage::OnImageDataAvailable (this=0x21b203e9abc0, aInputStream=0x21b203e8d980, aCount=90700) at /home/bgirard/mozilla-central/tree/image/RasterImage.cpp:1072
#6  0x00007b605465fc72 in imgRequest::OnDataAvailable (this=0x57a03547860, aRequest=0x4f7d247b7950, aContext=0x0, aInStr=0x21b203e8d980, aOffset=0, aCount=90700) at /home/bgirard/mozilla-central/tree/image/imgRequest.cpp:1106

and for the base case:

#0  __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:238
#1  0x00007b6054684c66 in mozilla::image::SourceBuffer::Compact (this=0xe95558cee70) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:153
#2  0x00007b6054685f7d in mozilla::image::SourceBuffer::OnIteratorRelease (this=0xe95558cee70) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:478
#3  0x00007b6054684414 in mozilla::image::SourceBufferIterator::~SourceBufferIterator (this=0x6c1e7a56a0b8, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/SourceBuffer.cpp:29
#4  0x00007b60546803b4 in mozilla::Maybe<mozilla::image::SourceBufferIterator>::reset (this=0x6c1e7a56a0b0) at /home/bgirard/mozilla-central/tree/obj-x86_64-unknown-linux-gnu/dist/include/mozilla/Maybe.h:373
#5  0x00007b605467e8e4 in mozilla::Maybe<mozilla::image::SourceBufferIterator>::~Maybe (this=0x6c1e7a56a0b0, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/obj-x86_64-unknown-linux-gnu/dist/include/mozilla/Maybe.h:92
#6  0x00007b605466bc74 in mozilla::image::Decoder::~Decoder (this=0x6c1e7a56a000, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/Decoder.cpp:47
#7  0x00007b60546ad304 in mozilla::image::nsJPEGDecoder::~nsJPEGDecoder (this=0x6c1e7a56a000, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/decoders/nsJPEGDecoder.cpp:98
#8  0x00007b60546ad33c in mozilla::image::nsJPEGDecoder::~nsJPEGDecoder (this=0x6c1e7a56a000, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/decoders/nsJPEGDecoder.cpp:115
#9  0x00007b605467b357 in mozilla::image::Decoder::Release (this=0x6c1e7a56a000) at /home/bgirard/mozilla-central/tree/image/Decoder.h:91
#10 0x00007b6054644e55 in RefPtr<mozilla::image::Decoder>::AddRefTraitsReleaseHelper (aPtr=0x6c1e7a56a000) at /home/bgirard/mozilla-central/tree/obj-x86_64-unknown-linux-gnu/dist/include/mozilla/RefPtr.h:362
#11 0x00007b605464467c in RefPtr<mozilla::image::Decoder>::AddRefTraits<mozilla::image::Decoder>::Release (aPtr=0x6c1e7a56a000) at /home/bgirard/mozilla-central/tree/obj-x86_64-unknown-linux-gnu/dist/include/mozilla/RefPtr.h:372
#12 0x00007b605464322f in RefPtr<mozilla::image::Decoder>::~RefPtr (this=0x344b09865598, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/obj-x86_64-unknown-linux-gnu/dist/include/mozilla/RefPtr.h:56
#13 0x00007b60546816e2 in mozilla::image::NotifyDecodeCompleteWorker::~NotifyDecodeCompleteWorker (this=0x344b09865580, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/DecodePool.cpp:82
#14 0x00007b6054681726 in mozilla::image::NotifyDecodeCompleteWorker::~NotifyDecodeCompleteWorker (this=0x344b09865580, __in_chrg=<optimized out>) at /home/bgirard/mozilla-central/tree/image/DecodePool.cpp:82

Assuming there’s no mistake above it means were getting the wrong data from the input stream. However it would be odd that the single byte for orientation would get corrupted. Something is fishy here! I’ll continue this debugging session and blog post next week!


Testing a JS WebApp

Пост доступен на сайте Тестирование приложения JS WebApp.

Test Requirements

I’ve been putting off testing my cleopatra project for a while now because I wanted to take the time to find a solution that would satisfy the following:

  1. The tests can be executed by running a particularly URL.
  2. The tests can be executed headless using a script.
  3. No server side component or proxy is required.
  4. Stretch goal: Continuous integration tests.

After a bit of research I came up with a solution that addressed my requirements. I’m sharing here in case this helps others.

First I found that the easiest way to achieve this is to find a Test Framework to get 1) and find a solution to run a headless browser for 3.

Picking a test framework

For the Test Framework I picked QUnit. I didn’t have any strong requirements there so you may want to review your options if you do. With QUnit I load my page in an iframe and inspect the resulting document after performing operations. Here’s an example:

QUnit.test("Select Filter", function(assert) {
    query: "?report=4c013822c9b91ffdebfbe6b9ef300adec6d5a99f&select=200,400",
    assert: assert,
    testFunc: function(cleopatraObj) {
    profileLoadFunc: function(cleopatraObj) {
    updatedFiltersFunc: function(cleopatraObj) {
      var samples = shownSamples(cleopatraObj);

      // Sample count for one of the two threads in the profile are both 150
      assert.ok(samples === 150, "Loaded profile");

Here I just load a profile, and once the document fires an updateFilters event I check that the right number of samples are selected.

You can run the latest cleopatra test here:

Picking a browser (test) driver

Now that we have a page that can run our test suite we just need a way to automate the execution. Turns out that PhantomJS, for webkit, and SlimerJS, for Gecko, provides exactly this. With a small driver script we can load our test.html page and set the process return code based on the result of our test framework, QUnit in this case.

Stretch goal: Continuous integration

If you hooked up the browser driver to run via a simple script adding continuous integration should be simple. Thanks to Travis-CI and Github it’s easy to setup your test script to run per check-in and set notifications.

All you need is to configure Travis-CI to looks at your repo and to check-in an appropriate .travis.cml config file. Your travis.yml should configure the environment. PhantomJS is pre-installed and should just work. SlimerJS requires a Firefox binary and a virtual display so it just requires a few more configuration lines. Here’s the final configuration:

  firefox: "33.1"
  - "sh -e /etc/init.d/xvfb start"
  - "echo 'Installing Slimer'"
  - "wget"
  - "unzip"
  - "mv slimerjs-0.9.4 ./slimerjs"

      - ""
     - "BenWa: %{repository} (%{commit}) : %{message} %{build_url}"
    on_success: change
    on_failure: change

script: phantomjs js/tests/run_qunit.js test.html && ./slimerjs/slimerjs js/tests/run_qunit.js $PWD/test.html

Happy testing!

Gecko Bootcamp Talks

Last summer we held a short bootcamp crash course for Gecko. The talks have been posted to and collected under the TorontoBootcamp tag. The talks are about an hour each but will be very informative to some. They are aimed at people wanting a deeper understanding of Gecko.

View the talks here:

Gecko Pipeline

Gecko Pipeline

In the talks you’ll find my first talk covering an overall discussion of the pipeline, what stages run when and how to skip stages for better performance. Kannan’s talk discusses Baseline, our first tier JIT. Boris’ talk discusses Restyle and Reflow. Benoit Jacob’s talk discusses the graphics stack (Rasterization + Compositing + IPC layer) but sadly the camera is off center for the first half. Jeff’s talk goes into depth into Rasterization, particularly path drawing. My second talk discusses performance analysis in Gecko using the gecko profiler where we look at real profiles of real performance problems.

I’m trying to locate two more videos about layout and graphics that were given at another session but would elaborate more the DisplayList/Layer Tree/Invalidation phase and another on Compositing.

CallGraph Added to the Gecko Profiler

In the profiler you’ll now find a new tab called ‘CallGraph’. This will construct a call graph from the sample data. This is the same information that you can extract from the tree view and the timeline but just formatted so that it can be scanned better. Keep in mind that this is only a call graph of what occurred between sample points and not a fully instrumented Call Graph dump. This has a lower collection overhead but missing anything that occurs between sample points. You’ll still want to use the Tree view to get aggregate costs. You can interact with the view using your mouse or with the W/A/S/D equivalent keys of your keyboard layout.

Profiler CallGraph

Profiler CallGraph

Big thanks to Victor Porof for writing the initial widget. This visualization will be coming to the devtools profiler shortly.

Improving Layer Dump Visualization

I’ve blogged before about adding a feature to visualize platforms log dumps including the layer tree. This week while working on bug 1097941 I had no idea which module the bug was coming from. I used this opportunity to improve the layer visualization features hoping that it would help me identify the bug. Here are the results (working for both desktop and mobile):

Layer Tree Visualization Demo
Layer Tree Visualization Demo – Maximize me

This tools works by parsing the output of layers.dump and layers.dump-texture (not yet landed). I reconstruct the data as DOM nodes which can quite trivially support the features of a layers tree because layers tree are designed to be mapped from CSS. From there some javascript or the browser devtools can be used to inspect the tree. In my case all I had to do was locat from which layer my bad texture data was coming from: 0xAC5F2C00.

If you want to give it a spin just copy this pastebin and paste it here and hit ‘Parse’. Note: I don’t intend to keep backwards compatibility with this format so this pastebin may break after I go through the review for the new layers.dump-texture format.

GPU Profiling has landed

A quick remainder that one of the biggest benefit to having our own built-in profiler is that individual teams and project can add their own performance reporting features. The graphics team just landed a feature to measure how much GPU time is consumed when compositing.

I already started using this in bug 1087530 where I used it to measure the improvement from recycling our temporary intermediate surfaces.

Screenshot 2014-10-23 14.35.29Here we can see that the frame had two rendering phases (group opacity test case) totaling 7.42ms of GPU time. After applying the patch from the bug and measuring again I get:

Screenshot 2014-10-23 14.38.15Now with retaining the surface the rendering GPU time drops to 5.7ms of GPU time. Measuring the GPU time is important because timing things on the CPU time is not accurate.

Currently we still haven’t completed the D3D implementation or hooked it up to WebGL, we will do that as the need arises. To implement this, when profiling, we insert a query object into the GPU pipeline for each rendering phase (framebuffer switches).