Using Mercurial’s bisect extension to find bugs

One week ago Openbravo ERP code was moved from Subversion to Mercurial. I am completely new to Mercurial and to distributed SCMs since I always worked with Subversion but, apart from the new concepts it incorporates, the transition was very smooth, at least till the moment. I’ve spent some time during the last days looking the extensions Mercurial has and, for me, one of the nicest ones is bisect. Bisect can be very useful to find the changeset when a bug was introduced in the code.

A real example

Recently I was assigned this bug. I discovered that it was not present in  r2.40 but it was in development branch, furthermore, I find out the bug was caused because a line had been removed from the code. At this point I wanted to know which commit removed that line, just to know if it was a mistake or it was done on purpose trying to fix another bug, so the tedious work started: given two revisions one that has the bug (head of development branch) and another one that has not it (r2.40 tag) try different revisions in between to find which one removed the line that causes the bug. This is not only tedious but also very time consuming.

A good solution: bisect

Bisect is an extension for Mercurial that makes this kind of work much faster. Bisect’s behavior is pretty simple, you tell it the good and bad changesets and it updates your working copy to another one between them, then you can test if in that one the bug is present and you mark it as good or bad, then the process is run again until it finds the changeset that introduces the bug.  In fact bisect just decides for you which is the next changeset to test in. It sounds not to be a very high improvement, but if you combine it with some simple (at least in some occasions) automated test the results can be awesome.

Let me explain it through a simple example:

  • First prepare the environment: a file with a lot of lines, and a commit somewhere there removing a line which we’ll look for afterwards.
$ hg init testBisect
$ cd testFile
$ cat testFile
This is
a file
with some feature
and this
line here
is needed to work fine
$ hg ci -A -m "init file"
adding testFile
$   for (( i = 0; i < 335; i++ )); do echo "line"$i >> testFile; hg commit -m "change here"; done
$ sed -i 's/line here//' testFile
$ hg ci -m "this commit is buggy"
$ for (( i = 0; i < 872; i++ )); do      echo "line"$i >> testFile;      hg commit -m "change here";      done
$ hg parents
changeset:   1208:7568a581b554
tag:         tip
user:        Asier Lostalé
date:        Mon Mar 02 11:13:39 2009 +0100
summary:     change here
  • Now we have 1208 changests! Let’s make the script to decide whether a revision is buggy or not and to continue looking in case it is not:
$  cat test1.sh
#!/bin/sh
MIN_ARGS=2
if [ $# -lt $MIN_ARGS ]; then
  echo "Usage: $(basename $0) FILE TEXT_TO_FIND" >&2
  exit 1
fi
FILE=$1
shift
TEXT_TO_FIND=$*
check() {
   grep -q "$TEXT_TO_FIND" $FILE && RESULT=good || RESULT=bad
   echo $RESULT

   hg bisect --$RESULT
}
while :
do
  if check | grep -q 'Testing changeset'
then
  echo
  hg bisect
else
  hg bisect
  exit 0
fi
done
  • Now we are ready to start testing, first of all reset bisect and tell it which is the bad and good known revisions. Bad is current one and good is the first one.
$ hg bisect --reset
$ hg bisect --bad
$ hg bisect --good 1
Testing changeset 604:9d6a42635e81 (1207 changesets remaining, ~10 tests)
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  • Finally just execute the test to find out who removed the line.
$ time ./test1.sh testFile 'line here'
Testing changeset 302:132a5339324e (603 changesets remaining, ~9 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 453:aa92eb899545 (302 changesets remaining, ~8 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 377:5c8e69bdb1ce (151 changesets remaining, ~7 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 339:05f7bb18e505 (75 changesets remaining, ~6 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 320:3107aee2dbd2 (37 changesets remaining, ~5 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 329:0d907ee53cdb (19 changesets remaining, ~4 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 334:8a0d38375333 (10 changesets remaining, ~3 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved

Testing changeset 336:dc2037e24dfc (5 changesets remaining, ~2 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
Testing changeset 335:939ca611ae0f (2 changesets remaining, ~1 tests)
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
The first bad revision is:
changeset:   336:dc2037e24dfc
user:        Asier Lostalé
date:        Mon Mar 02 11:11:52 2009 +0100
summary:     this commit is buggy
real    0m1.374s
user    0m1.128s
sys    0m0.204s

And we are done: in less than 1.5s we know which is the commit that removed the line!

Though this example is quite theoretical I think bisect is a very good solution for this kind of searches. Do you have experience with this extension? Any comment about it is welcome.

5 thoughts on “Using Mercurial’s bisect extension to find bugs

  1. Hi Henryk,

    afaic that’s true in case the line already exists in the file, but, how would you use annotate to know when a line was removed from a file? as this line is not longer there it will not be displayed in the annotate.

    Like

  2. That’s why i’ve mentioned
    hg log [-r start:stop] -p filename
    which lists all patches [between start and stop] applied to filename. The removed line will appear with
    – removed line
    scrolling a bit up gives you the changeset-id of this patch.
    The idea of bisect is to identify which patch caused a bug (your example is still instructive, though – thanks).
    So if your function add(2,2) returns 4 all the time and then at some point later 5, you want to find the changeset (*) that caused this bug. If you just correct the bug directly, you might cause a bug somewhere else since (*) changed the behavior possibly for some reason.

    Like

  3. Hy Henryk,

    You are right, with hg log -p would be enough for this example and it is not required to do the script for bisect 🙂

    Thanks for your comments!

    Like

  4. Tom A says:

    Another, even more precise, way to do this is with hg grep. hg grep, as the name suggests, searches for patterns in files or one particular file in the repository; unlike normal grep, it can search throughout history. Also unlike normal grep, this multidimensional view lets it detect where matches appear and disappear, rather than simply where they are – you can use the –all flag to get that information.

    In your example, the command and its output would be:

    $ hg grep –all “line here” testFile
    testFile:336:-:line here
    testFile:0:+:line here

    Telling you that the pattern in question appeared in testFile with changeset 0, and disappeared again with changeset 336

    Like

Leave a comment