Robot: One! Uglies: Zerooo!

The OPW session is behind and it's a time to gather stones together.

Stone #1: Estimation vs implementation

Everything planned was implemented. Nothing unplanned was implemented. Full congruence, not bad, not bad…

Stone #2: Bugs found

Ten. Good work, fuzzer!

Stone #3: Community

I lost my attachment to people fond of what they're doing, and having fun from it. It's extremely infectious and I hope to become a “chronic”.

Stone #4: Incremental programming

It helped me to implement a working and rather effective application, when its architecture could not be entirely designed from the beginning. But the resulting system came out too monolithic, that slightly complicated the testing process and made my inner perfectionist to scornfully sniff.

Stone #5: Plans

  1. Continue to support the fuzzer
  2. Join another Open Source community to improve my skills and make the Universe better (I think that OpenHatch will be a good start for this)
  3. Get used to write posts about interesting technical stuff
  4. Participate in another internship-like project

Stone #6: Thanks

Well, what closing credits without thanks! In the first place I want to thank my parents mentors: Stefan Hajnoczi and Fam Zheng. Also special thanks to Kevin Wolf and QEMU community for technical consultations. Thanks to Sumana Harihareswara for showing a full spectrum of opportunities around. And hats off to all coordinators and organisers of OPW. You made a great work.


Fuzzing of self-referencing structures

In most sources devoted to file format fuzzing a file structure is described as a set of fields, where a field is a triplet of an offset, format and value. Fields can be logically organized in larger structure elements. Fields or subsets of fields can refer to each other, e.g. a field value can contain an offset or more sophisticated characteristic of another structure element. Another common variant is when presence of a field or a set of fields depends from another structure element existence or absence.

To my pleasant surprise the qcow2 format has self-referenced structures. There is a two-level structure that contains information about all allocated clusters in the image including ones occupied by the structure itself.

A small informational digression about this structure.

The low level — refcount block — contains information about some amount of adjacent clusters. This amount depends on a cluster size and size of one refcount entry, but it's at least several hundreds of clusters. Let's call it Nblock. Refcount blocks are ordered, i.e. the first refcount block describes first Nblock clusters, second one — next bunch of clusters and so on. If none of Nblock clusters for this refcount block is allocated, the block is not created. The size of each refcount block is exactly one cluster.

The high level — refcount table — contains references to all existing refcount blocks. These references are ordered, i.e. the first refcount table entry relates to the first refcount block, the second one — to the second refcount block, etc. If a refcount block is not created, the corresponding refcount table entry will be empty. So the size of the refcount table is defined by the maximum index of allocated refcount blocks and can take several clusters.

Let's get back to the fuzzer.

The goal is to represent the refcount table and blocks as a set of fields. If image and cluster sizes vary from generation to generation, it's impossible to predefine amount and positions of refcount blocks as well as size and position of the refcount table. So the generation process has to be iterative.

The main problem here is that an addition of a refcount block may cause a growth of the refcount table to the next cluster, that in its turn may require an additional refcount block and so on.

The most simple solution here is creation of a self-describing refcount block in the end of the image. In other words, Nblock clusters are appended to the image and all refcount blocks, including the refcount block describing these last clusters, and the refcount table are located in the last Nblock of clusters.

But this solution has several drawbacks. It doesn't work in some extreme cases, e.g. a small cluster size and a large image size may require more than Nblock refcount blocks. Test variability is very important property and it's not good to fix location of the refcount metadata for all generated images.

Let's improve the solution.

Allocation of refcount blocks:

  1. Calculate the number of refcount blocks required for all non-empty clusters.
  2. Allocate clusters for refcount blocks in any empty space of the image, e.g. take a random sample from indices of empty clusters. If there is no empty space left, append necessary clusters to the end of the image.
  3. Check if these newly allocated clusters require additional refcount blocks. If they do, go to the step #2, else exit.

As far as each step requires less allocations than the previous one, the algorithm converges at most when the empty space in between allocated clusters ends and all new clusters start being appended.

Allocation of the refcount tables:

  1. Calculate the refcount table size (Stable) after allocation of all refcount blocks.
  2. Find a sequence of adjacent empty clusters with the length equal to Stable + a tolerance on the refcount table growth. In my case, one cluster was an optimal tolerance.
  3. Allocate first Stable clusters of the sequence defined on the step #2.
  4. If newly allocated clusters require new refcount blocks, then use the algorithm for the refcount block allocation pointed above, but on every iteration check whether the refcount table should be extended to the next cluster. If it should, add this cluster to the list of newly allocated ones.

Now as all clusters are allocated, all refcount structures can be easily generated.

Useful links:


Old snakes. Part II

I've scraped up another sack of sweets, tanked up my keyboard with a fresh Oxygen font and ready to tell stories about writing a code compatible with Python 2.4-2.7 versions using only standard libraries.

Module imports

Sometimes modules have to be imported dynamically. Yes, yes, let in your little cozy kawaii code something that you don't even know by name.

Python 2.4+ has the imp library that takes in your guests, cleans their boots and shows their places. But Python 3.1+ trained another guard the importlib and made the imp retire from the Python version 3.4.

No one wants to use deprecated libraries. So no guards! Let's be good hosts by ourselves.

The sys library has the sys.path attribute, that contains all module search paths, default and specified via the PYTHONPATH environment variable.

import sys

And it's possible to add your own path to sys.path in your code:

import sys
import os

module_path = '../module_dir/module'

The os.path.realpath method is used to convert symlinks and relative paths to absolute ones.

Note, that os.path.realpath cuts the trailing slash. It doesn't matter for the example above, but can be painful while processing paths that can end with a directory. In these situations os.path.dirname should be called before os.path.realpath, otherwise one level of the directory tree will be missed.

And now it's time to say “Hello, dear guest!”

# Obtain the module name and remove its extension if any
module_name = os.path.splitext(os.path.basename(module_path))[0]

# End this!
    module_var = __import__(module_name)
except ImportError:
    print "Error: The module cannot be imported."

Short circuit conditions

Sometimes I have a great notion… to feel self fashionable, contemporary, hipster and write something long and artistic using list comprehensions:

[light  + ' is black' if light in pierre_soulages
 else light + ' is a wave' for light in art_and_science]

But Python 2.4 is an utilitarian language for writing of good programs, not good poems and it doesn't support shortcut conditional expressions. So if you want to write a poem just wrap the condition up in a small auxiliary function:

def write_poem(word):
    if rhyme(word):
        return word
        return 'shm' + word[1:]

new_poem = [write_poem(w) for w in text]

Of course, if you're an adept of Conceptual art, logical operators are always at your service.


Sometimes it's necessary to send some nontrivial configuration information to the script. The conventional way to do it is to use JSON or YAML. Python has no standard tools for YAML support. So JSON is the only choice or would be it, if the standard json library appeared in the Python version earlier than 2.6. There is the third-party simplejson library that is spread sufficiently to rely on the assumption that it's installed on most of machines with old Python versions. But simplejson stopped supporting Python 2.4 some versions ago. It's still possible to install simplejson under Python 2.4 but with some pain as a bonus.

I found no panacea for this. I just cowardly dropped users of the Python 2.4 with no simplejson installed from a part of absolutely-not-so-important functionality.

    import json
except ImportError:
        import simplejson as json
    except ImportError:
        print "Sorry. No milk for you."

But if the configuration object is not too sophisticated, alternative methods can be tried, e.g. good old ini files.

Well, my sack looks empty. Oh, I'm lying! There are useful links on its bottom:


Equator! Hooray!

The project midterm was crossed. It's time to sum up the things.

Things done:

The fuzzer able to generate qcow2 headers! Not impressive, huh? Okey, let's look closer.

Next problems were solved:

  1. Variability of the generated images with compliance of qcow2 standard
  2. Intelligent randomization of different types of data, as integers, bitmasks and strings
  3. Execution of tests without human interactions.

And voila! We have the fuzzer that can generated various valid images, fuzz their fields dependently on their types and do it again and again until you get bored.

Things to be done:

Well, awful amount of work to be honest.

  1. Generation of remaining image elements
  2. Refinement of fuzzing values based on test coverage
  3. Restore the test runner state to the simple and light-weight from the current 'ZOMBG-I-am-Jack-Of-All-Trades!!!11' one.

Things learnt:

More than I expected, really.

Achievements unlocked:

  1. 'Zoo keeper' - more than 100 lines of the code compatible with Python 2.4-2.7 versions

  2. 'Snake-charmer' - know at least three types of module imports, decorators, magic methods and implement all of them without googling

  3. 'Master of willpower' - work at home 40 hours in a week at the table having at arm's length a downy bed warmed by your cats

  4. 'Dr. Web' - successfully communicate with a distributed team

  5. 'Soft paws' - don't bother your mentors after a midnight because of different time zones (unlocks with 'Dr. Web')

  6. 'POST' - send a patch for the production branch (prerequisite for '201' - have a patch merged to the production brunch)


The project gives me a feeling of incomparable satisfaction. I like all challenges and fun I have doing this work.


Fuzzy introduction

A long time ago in a galaxy not so far away K.V. Hanford wrote the article about the usage of random inputs for testing purpose. Several years after Professor Barton Miller, University of Wisconsin, gave the technique its name. By the words of Professor himself term “fuzz” was inspired by noise on a line connected to the tested modem. But I'm sure he was thinking about kittens in that moment. Everyone who has one knows their incredible ability to make a mess in the places that look airtight and impregnable.

Okay, we have the name. Time to get the meaning.

Fuzzing is a method of automated negative testing by providing to a system under test invalid random data. It's a kind of “monkey testing” but can be more controllable, reproducible and automatic. Like this.

What's cool about fuzz testing? It's negative and automated. Whereas positive testing exploits a rather limited set of inputs, negative testing covers almost infinite amount of inputs that are not used by its positive fellow. Manual testers will whisk cold sweat with relief, if the task of negative testing and generation of appropriate test inputs is executed automatically. Also issues discovered by fuzz testing are critical since they manifest themselves as crashes or assertions.

What is not so cool? All issues discovered by fuzz testing manifest themselves as crashes or assertions. Other bugs more modest in their behavior cannot be caught by a fuzzer. Another sad point is that fuzzing effectiveness is a double-edged sword.The simpler fuzzer is used the lower implementation and deployment costs it requires and more unexpected inputs it produces. Whereas more complex fuzzer can discover more sophisticated issues but takes much more resources for implementation.

But I will continue with an assumption that fuzz testing is cool.

To test something we need this something, input data, a way to feed the something input data and an indicator that the something is safe and sound or dead and silent after the feeding process.

After mapping this to fuzzing technique we got a system under test (SUT), a fuzz generator, a delivery mechanism and a monitoring system.

The delivery mechanism sets a test environment necessary to make the SUT able to consume the test data and sends the data to the SUT.

The monitoring system holds a hand on pulse of the SUT and decides if the test is failed or passed. There can be different criteria for test failures depending of SUT and tests architecture, but the most common one is a SUT crash. Besides the test status the monitoring system can provide additional information about a SUT state during a test, e.g. lines of code were executed during a test. This feedback is valuable for effective data generation.

Oh, yes, data generation… So, the last but not least the fuzz generator!

The fuzz generator is the generator of invalid input data. Surprise! To be more specific I have to refer to the classification. Fuzz generators are classified on two bases: the source of test data and “intelligence”.

The first basis in its turn produces two types of fuzz generators — mutative and generative. Mutative generators take valid test data pregenerated by some external helper and corrupt it by injecting invalid parts. Generative ones on the contrary produce all the test data from scratch.

So first ones are much easier to implement as it's not necessary to generate an entire structure on a SUT input. But in the same time variability of test data and, as result, code coverage are limited but the structure of pregenerated valid inputs. Generators of the second type don't have this problem, but they are harder to implement and more sensitive to issues and incompleteness of a SUT model used to generate fuzzed data.

While the first classification basis provides the clear type discrimination and terminology, the second one gave birth to plenty of variations with their own names and grouping rules. Examples can be found among the links in the end of the post.

The goal of that classifications is to group fuzz techniques and sort them by ascending their effectiveness. Test effectiveness can be measured by code coverage and number of tests verifying the same code routine. First one should be high, the second one low.

Stop! What's about this equivocal intelligence in no less equivocal quotes in the paragraph above?

Don't panic! It's a shortcut for awareness of structure of SUT inputs and the SUT itself and ability to use this knowledge for test data generation. “Intelligence” is the primary way to increase the fuzzing effectiveness.

So at one end of our abstract scale of fuzzing effectiveness we have a “monkey” fuzzer producing entirely random data and having no idea about the target. On the other end it's a fuzzer having the full SUT specification under its hand and knowing how to use it to generate valid data and the optimal set of most penetrating test cases. I think the second one will lead the Rise of the Machines in its time.

Somewhere in between grey-box fuzzers of different shades are placed. They can inject invalid data in known places of inputs or use common heuristics suitable for the current kind of SUTs or can generate their own heuristics based on the feedback provided by the monitoring system.

An ascending fuzzing effectiveness increases a complexity of implementation and not only it. Since an effectiveness depends on a SUT specification an effective fuzzer will be bound to an architecture of its SUT.

And for the dessert some links:

  1. Great introductory article

  2. The article with references to the classic work “Fuzzing: Brute Force Vulnerability Discovery”

  3. Introduction to fuzz testing with an empirical flavor


Old snakes

The first obstacle on my project steeplechase was Python versions supported by QEMU. I, a kid spoiled by Python 2.7-3.x sweets, was rather discouraged by the limitation to versions 2.4-2.7. Well, for pessimists obstacles are difficulties, for optimists they are opportunities.

In this post I'll collect all ambiguities I meet during project and ways I resolve them. Also there are some notes about the state of things in Python 3.x.

Context manager

I have the test runner that will execute several tests in a row. Tests are instances of Test class. The goal is to clean up work environment after each test (close files, change the current directory and so on). The ideal case for context managers!

class Test(object):

    def execute(self):
        # Run test

    def __enter__(self):
        return self

    def __exit__(self, *args):
        # Do cleanup

with Test() as test:

But their birth dated by Python 2.5. Well, let's return back to roots. Unwrap “with … as” construction to explicit setup, execution and finalization, forming last two in “try … finally” statement. Here we are!

class Test(object):

    def execute(self):
        # Run test

    def finish(self):
        # Do cleanup

test = Test()

But there is one little note… What if you want to catch exceptions?


A piece of cake! Even a three month child can tap by her foot

    raise Exception
    print 'Internal error'
    print 'Bye!'

But her mom that remembers the times of Python 2.4 would wish to correct it on the sly to

        raise Exception
        print 'Internal error'
    print 'Bye!'

Because Python 2.4 doesn't support “except” and “finally” clauses in the same “try” statement.

But she is a good mom and lets her child live in the world with less lines of code.

Another thing that is worth to keep in mind is the exception root class. Python 2.4 has Exception class as a root one. From Python 2.5 the root class is BaseException, and SystemExit, KeyboardInterrupt, GeneratorExit classes don't inherit from Exception class anymore. So a catch of “except Exception” statement depends on the python version.

The problem can be resolved by special handling of unnecessary exceptions. In my case every exception but related to the program exit should be logged.

import sys

    raise Exception
    # Silent exit on user break
    except (KeyboardInterrupt, SystemExit):
        e = sys.exc_info()[1]
        print e

Hey, adepts of Universality and Portability bordering with Nerdiness, if you're going to write something Python 2.4-3.x compatible, remember the power of sys.exc_info() because reasons.

Parsing of command line arguments

This task wasn't an issue if optparse library wouldn't be deprecated since Python 2.7. Of course, there is some stability in the world and the name of this stability is entropy “getopt”. The one point worth to be noted here is that presence of mandatory positional parameters should be checked by script itself. The library doesn't provide this option.

Oops, it looks like my keyboard are running out of ink. I will update this post as soon as I tank up my keyboard and find something exiting on this topic.