A long time ago in a galaxy not so far away K.V. Hanford wrote the article about the usage of random inputs for testing purpose. Several years after Professor Barton Miller, University of Wisconsin, gave the technique its name. By the words of Professor himself term “fuzz” was inspired by noise on a line connected to the tested modem. But I'm sure he was thinking about kittens in that moment. Everyone who has one knows their incredible ability to make a mess in the places that look airtight and impregnable.
Okay, we have the name. Time to get the meaning.
Fuzzing is a method of automated negative testing by providing to a system under test invalid random data. It's a kind of “monkey testing” but can be more controllable, reproducible and automatic. Like this.
What's cool about fuzz testing? It's negative and automated. Whereas positive testing exploits a rather limited set of inputs, negative testing covers almost infinite amount of inputs that are not used by its positive fellow. Manual testers will whisk cold sweat with relief, if the task of negative testing and generation of appropriate test inputs is executed automatically. Also issues discovered by fuzz testing are critical since they manifest themselves as crashes or assertions.
What is not so cool? All issues discovered by fuzz testing manifest themselves as crashes or assertions. Other bugs more modest in their behavior cannot be caught by a fuzzer. Another sad point is that fuzzing effectiveness is a double-edged sword.The simpler fuzzer is used the lower implementation and deployment costs it requires and more unexpected inputs it produces. Whereas more complex fuzzer can discover more sophisticated issues but takes much more resources for implementation.
But I will continue with an assumption that fuzz testing is cool.
To test something we need this something, input data, a way to feed the something input data and an indicator that the something is safe and sound or dead and silent after the feeding process.
After mapping this to fuzzing technique we got a system under test (SUT), a fuzz generator, a delivery mechanism and a monitoring system.
The delivery mechanism sets a test environment necessary to make the SUT able to consume the test data and sends the data to the SUT.
The monitoring system holds a hand on pulse of the SUT and decides if the test is failed or passed. There can be different criteria for test failures depending of SUT and tests architecture, but the most common one is a SUT crash. Besides the test status the monitoring system can provide additional information about a SUT state during a test, e.g. lines of code were executed during a test. This feedback is valuable for effective data generation.
Oh, yes, data generation… So, the last but not least the fuzz generator!
The fuzz generator is the generator of invalid input data. Surprise! To be more specific I have to refer to the classification. Fuzz generators are classified on two bases: the source of test data and “intelligence”.
The first basis in its turn produces two types of fuzz generators — mutative and generative. Mutative generators take valid test data pregenerated by some external helper and corrupt it by injecting invalid parts. Generative ones on the contrary produce all the test data from scratch.
So first ones are much easier to implement as it's not necessary to generate an entire structure on a SUT input. But in the same time variability of test data and, as result, code coverage are limited but the structure of pregenerated valid inputs. Generators of the second type don't have this problem, but they are harder to implement and more sensitive to issues and incompleteness of a SUT model used to generate fuzzed data.
While the first classification basis provides the clear type discrimination and terminology, the second one gave birth to plenty of variations with their own names and grouping rules. Examples can be found among the links in the end of the post.
The goal of that classifications is to group fuzz techniques and sort them by ascending their effectiveness. Test effectiveness can be measured by code coverage and number of tests verifying the same code routine. First one should be high, the second one low.
Stop! What's about this equivocal intelligence in no less equivocal quotes in the paragraph above?
Don't panic! It's a shortcut for awareness of structure of SUT inputs and the SUT itself and ability to use this knowledge for test data generation. “Intelligence” is the primary way to increase the fuzzing effectiveness.
So at one end of our abstract scale of fuzzing effectiveness we have a “monkey” fuzzer producing entirely random data and having no idea about the target. On the other end it's a fuzzer having the full SUT specification under its hand and knowing how to use it to generate valid data and the optimal set of most penetrating test cases. I think the second one will lead the Rise of the Machines in its time.
Somewhere in between grey-box fuzzers of different shades are placed. They can inject invalid data in known places of inputs or use common heuristics suitable for the current kind of SUTs or can generate their own heuristics based on the feedback provided by the monitoring system.
An ascending fuzzing effectiveness increases a complexity of implementation and not only it. Since an effectiveness depends on a SUT specification an effective fuzzer will be bound to an architecture of its SUT.
And for the dessert some links:
Great introductory article
The article with references to the classic work “Fuzzing: Brute Force Vulnerability Discovery”
Introduction to fuzz testing with an empirical flavor