Fuzzing in Go
Fuzzing is a testing technique with randomized inputs that is used to find problematic edge cases or security problems in code that accepts user input. Go package developers can use Dmitry Vyukov's popular go-fuzz tool for fuzz testing their code; it has found hundreds of obscure bugs in the Go standard library as well as in third-party packages. However, this tool is not built in, and is not as simple to use as it could be; to address this, Go team member Katie Hockman recently published a draft design that proposes adding fuzz testing as a first-class feature of the standard go test command.
Using random test inputs to find bugs has a history that goes back to the days of punch cards. Author and long-time programmer Gerald Weinberg recollects:
We didn't call it fuzzing back in the 1950s, but it was our standard practice to test programs by inputting decks of punch cards taken from the trash. We also used decks of random number punch cards. We weren't networked in those days, so we weren't much worried about security, but our random/trash decks often turned up undesirable behavior.
More recently, fuzz testing has been used to find countless bugs, and some notable security issues, in software from Bash and libjpeg to the Linux kernel, using tools such as american fuzzy lop (AFL) and Vyukov's Go-based syzkaller tool.
The basic idea of fuzz testing is to generate random inputs for a function to see if it crashes or raises an exception that is not part of the function's API. However, using a naive method to generate random inputs is extremely time-consuming, and doesn't find edge cases efficiently. That is why most modern fuzzing tools use "coverage-guided fuzzing" to drive the testing and determine whether newly-generated inputs are executing new code paths. Vyukov co-authored a proposal which has a succinct description of how this technique works:
start with some (potentially empty) corpus of inputs for { choose a random input from the corpus mutate the input execute the mutated input and collect code coverage if the input gives new coverage, add it to the corpus }
Collecting code coverage data and detecting when an input "gives new coverage" is not trivial; it requires a tool to instrument code with special calls to a coverage recorder. When the instrumented code runs, the fuzzing framework compares code coverage from previous test inputs with coverage from a new input, and if different code blocks have been executed, it adds that new input to the corpus. Obviously this glosses over a lot of details, such as how the input is mutated, how exactly the coverage instrumentation works, and so on. But the basic technique is effective: AFL has used it on many C and C++ programs, and has a section on its web page listing the huge number of bugs found and fixed.
The go-fuzz tool
AFL is an excellent tool, but it only works for programs written in C, C++, or Objective C, which need to be compiled with GCC or Clang. Vyukov's go-fuzz tool operates in a similar way to AFL, but is written specifically for Go. In order to add coverage recording to a Go program, a developer first runs the go-fuzz-build command (instead of go build), which uses the built-in ast package to add instrumentation to each block in the source code, and sends the result through the regular Go compiler. Once the instrumented binary has been built, the go-fuzz command runs it over and over on multiple CPU cores with randomly mutating inputs, recording any crashes (along with their stack traces and the inputs that caused them) as it goes.
Damian Gryski has written a tutorial showing how to use the go-fuzz tool in more detail. As mentioned, the go-fuzz README lists the many bugs it has found, however, there are almost certainly many more in third-party packages that have not been listed there; I personally used go-fuzz on GoAWK and it found several "crashers".
Journey to first class
Go has a built-in command, go test, that automatically finds
and runs a project's tests (and, optionally, benchmarks). Fuzzing is a type
of testing, but without built-in tool support it is somewhat cumbersome to
set up. Back in February 2017, an issue was filed on the
Go GitHub repository on behalf of Vyukov and Konstantin
Serebryany, proposing that the go tool "support fuzzing
natively, just like it does tests and benchmarks and race detection
today
". The issue notes that "go-fuzz exists but it's not as
easy as writing tests and benchmarks and running
go test -race
". This issue has garnered a huge
amount of support and
many comments.
At some point Vyukov and others added a motivation
document as well as the API
and tooling proposal for what such an integration would look like. Go
tech lead Russ Cox pressed for a prototype version of "exactly what
you want the new go test fuzz mode to be
". In January 2019
"thepudds" shared
just that — a tool called fzgo that implements most of
the original proposal in a separate tool. This was well-received at the
time, but does not seem to have turned into anything official.
More recently, however, the Go team has picked this idea back up, with Hockman writing the recent draft design for first-class fuzzing. The goal is similar, to make it easy to run fuzz tests with the standard go test tool, but the proposed API is slightly more complex to allow seeding the initial corpus programmatically and to support input types other than byte strings ("slice of byte" or []byte in Go).
Currently, developers can write test functions with the signature TestFoo(t *testing.T) in a *_test.go source file, and go test will automatically run those functions as unit tests. The existing testing.T type is passed to test functions to control the test and record failures. The new draft design adds the ability to write FuzzFoo(f *testing.F) fuzz tests in a similar way and then run them using a simple command like go test -fuzz. The proposed testing.F type is used to add inputs to the seed corpus and implement the fuzz test itself (using a nested anonymous function). Here is an example that might be part of calc_test.go for a calculator library:
func FuzzEval(f *testing.F) { // Seed the initial corpus f.Add("1+2") f.Add("1+2*3") f.Add("(1+2)*3") // Run the fuzz test f.Fuzz(func(t *testing.T, expr string) { t.Parallel() // allow parallel execution _, _ = Eval(expr) // function under test (discard result and error) }) }
Just these few lines of code form a basic fuzz test that will run the calculator library's Eval() function with randomized inputs and record any crashes ("panics" in Go terminology). Some examples of panics are out-of-bounds array access, dereferencing a nil pointer, or division by zero. A more involved fuzz test might compare the result against another library (called calclib in this example):
... // Run the fuzz test f.Fuzz(func(t *testing.T, expr string) { t.Parallel() r1, err := Eval(expr) if err != nil { t.Skip() // got parse error, skip rest of test } // Compare result against calclib r2, err := calclib.Eval(expr) if err != nil { t.Errorf("Eval succeeded but calclib had error: %v", err) } if r1 != r2 { t.Errorf("Eval got %d, calclib got %d", r1, r2) } }) }
In addition to describing fuzzing functions and the new
testing.F type, Hockman's draft design proposes
that a new coverage-guided fuzzing engine be built that "will be
responsible for using compiler instrumentation to understand coverage
information, generating test arguments with a mutator, and maintaining the
corpus
". Hockman makes
it clear that this would be a new implementation, but would draw
heavily from existing work (go-fuzz and fzgo). The mutator would generate
new randomized inputs (the "generated corpus") from existing inputs, and
would work automatically for built-in types or structs composed of built-in
types. Other types would also be supported if they implemented the existing
BinaryUnmarshaler
or TextUnmarshaler
interfaces.
By default, the engine would run fuzz tests indefinitely, stopping a particular test run when the first crash is found. Users will be able to tell it to run for a certain duration with the -fuzztime command line flag (for use in continuous integration scripts), and tell it to keep running after crashes with the -keepfuzzing flag. Crash reports will be written to files in a testdata directory, and will contain the inputs that caused the crash as well as the error message or stack trace.
Discussion and what's next
As with the recent draft design on filesystems and file embedding, official discussion for this design was done using a Reddit thread; overall, the feedback was positive.
There was some discussion about the testing.F interface. David
Crawshaw suggested
that it should implement the existing testing.TB interface
for consistency with testing.T and testing.B (used for
benchmarking); Hockman agreed, updating
the design to reflect that. Based on a suggestion
by "etherealflaim", Hockman also updated
the design to avoid reusing testing.F in both the top level and
the fuzz function. There was also some bikeshedding
over whether the command should be spelled go test -fuzz or go
fuzz; etherealflaim suggested that reusing go test
would be a bad idea because it "has history and lots of folks
have configured timeouts for it and such
".
Jeremy Bowers recommended that the mutation engine should be pluggable:
I think the fuzz engine needs to be pluggable. Certainly a default one can be shipped, and pluggability can even be pushed to a "version 2", but I think it ought to be in the plan. Fuzzing can be one-size-fits-most but there's always going to be the need for more specialized stuff.
Hockman, however, responded
that pluggability is not required in order to add the feature, but might be
"considered later in the design phase
".
The draft design states up front that "the goal of circulating this
draft design is to collect feedback to shape an intended eventual
proposal
", so it's hard to say exactly what the next steps will be
and when they will happen. However, it is good to see some official energy
being put behind this from the Go team. Based on Cox's feedback on Vyukov's
original proposal, my guess is that we'll see a prototype of the updated
proposal being developed on a branch, or in a separate tool that developers
can run, similar to fzgo.
Discussion on the Reddit thread is ongoing, so it seems unlikely that a formal proposal and an implementation for a feature this large would be ready when the Go 1.16 release freeze hits in November 2020. Inclusion in Go 1.17, due out in August 2021, would be more likely.
Index entries for this article | |
---|---|
GuestArticles | Hoyt, Ben |
Posted Sep 4, 2020 10:28 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (34 responses)
Posted Sep 4, 2020 14:56 UTC (Fri)
by NAR (subscriber, #1313)
[Link] (4 responses)
Posted Sep 4, 2020 23:43 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (3 responses)
Posted Sep 5, 2020 10:43 UTC (Sat)
by NAR (subscriber, #1313)
[Link] (2 responses)
Posted Sep 5, 2020 10:52 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (1 responses)
> The basic idea of fuzz testing is to generate random inputs for a function to see if it crashes or raises an exception that is not part of the function's API.
Posted Sep 11, 2020 15:07 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Sep 4, 2020 23:31 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
It may well have been practically impossible for him, of course, but that seems like it could say more about his ability to teach than about such students' ability to be taught.
Posted Sep 5, 2020 10:40 UTC (Sat)
by HelloWorld (guest, #56129)
[Link]
Posted Sep 5, 2020 0:03 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (17 responses)
And it shows in his work. In particular, countless students were mutilated by his "structured programming" orthodoxy.
Posted Sep 5, 2020 10:49 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (16 responses)
Dijkstra was right about one thing: we can't ever hope to get programs right unless we employ some kind of formal method for proving the absence of bugs, or at least certain kinds of bugs. Type systems are crucial here due to the Curry-Howard isomorphism.
Posted Sep 5, 2020 18:01 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (15 responses)
Here's an analogy, I studied the circuit theory and quantum electrodynamics in university. That doesn't mean that I'm qualified giving advice to an electrician on how to use their tools.
> You also completely fail to explain how structured programming mutilated anybody, so you're basically just trolling rather than making an actual point.
Posted Sep 5, 2020 23:09 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (14 responses)
I also disagree that single exit and banning break/continue leads to bad code. A loop with a break statement looks something like this:
Posted Sep 6, 2020 0:19 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
See this thread: https://lkml.org/lkml/2003/1/12/156
Posted Sep 6, 2020 0:38 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (11 responses)
Posted Sep 6, 2020 0:47 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
Posted Sep 6, 2020 1:16 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (6 responses)
Posted Sep 6, 2020 1:40 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Of course if you remove ALL structure, the code becomes bad. 'for' loops are there for a reason.
However, there's an easy test to tell that early return/break/continue is well-placed. It's if it reduces (or keeps the same) the overall number of lines of code.
Posted Sep 6, 2020 4:03 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (4 responses)
I would, but it's just too hard to read…
Anyway, I'll point out that
Frankly these discussions with you are just tiresome, because when it comes to programming, you're completely stuck in a 1980s-style imperative mindset. There's nothing interesting to learn from that.
Posted Sep 6, 2020 4:15 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Again, you seem to not understand that the theory says that every loop can be rewritten as a loop with invariant in its condition.
In practice a lot of invariants are too unwieldy to write in one condition and benefit from being split into multiple statements (with break/continue to help). Often because you need to introduce additional variables. All "structured" alternatives result in additional levels of indentation in this case.
> 4. you keep harping on this structured vs. non-structured programming thing instead of addressing the much more important point that in order to avoid bugs we need to employ formal methods (which is another thing that Dijkstra was right about)
Posted Sep 6, 2020 19:23 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Sep 6, 2020 19:26 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Sep 6, 2020 23:06 UTC (Sun)
by HelloWorld (guest, #56129)
[Link]
This is actually kinda funny, because it shows that what Dijkstra said about BASIC also applies to C: you've been mentally mutilated by C enough to not be able to tell the condition from the body of the loop any more. I'm sorry that happened to you (-:
Posted Sep 6, 2020 1:04 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Sep 6, 2020 1:11 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
I have, however, encountered code with deeply nested ifs, and I concur with the paper Cyberax cited that reading it is a horrible experience.
(It's still more pleasant than trying to read JSP, though.)
Posted Sep 6, 2020 4:41 UTC (Sun)
by flussence (guest, #85566)
[Link]
Posted Sep 6, 2020 0:41 UTC (Sun)
by HelloWorld (guest, #56129)
[Link]
Posted Sep 5, 2020 16:53 UTC (Sat)
by bellminator (subscriber, #103702)
[Link] (8 responses)
Posted Sep 5, 2020 22:43 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (7 responses)
> I'm not sure what you are trying to accomplish by telling people that using certain programming languages mentally mutilates them, other than scaring them away from computer science/programming all together.
That's like saying that I mustn't say that I hate celery because that might scare somebody away from cooking.
Posted Sep 5, 2020 23:27 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (6 responses)
Describing it in terms of "mental mutilation" and "impossibility" is neither civilized nor accurate, and thus fails the "rude or wrong, pick at most one and ideally neither" test.
Saying you hate celery is not an equivalent case, because (a) hating celery is a de gustibus matter anyway and (b) just saying you hate celery isn't going to scare people off cooking unless you go off on an unprompted rant about how celery is the most disgusting thing on earth and anyone who cooks with it is clearly so deranged that they cannot possibly learn to prepare tasty food.
Posted Sep 6, 2020 0:13 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Sep 6, 2020 0:49 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
I've eaten their cooking.
It was delicious.
Posted Sep 6, 2020 2:07 UTC (Sun)
by HelloWorld (guest, #56129)
[Link]
Posted Sep 6, 2020 0:30 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Sep 6, 2020 4:50 UTC (Sun)
by flussence (guest, #85566)
[Link] (1 responses)
It is my opinion that you should be at least smart enough to read the room before posting here, and smart enough to show yourself out when people start laughing at your bad takes instead of trashing the place in a slur-slinging tantrum.
Do you have an alternate venue already prepared to continue spewing your feelings for when you inevitably lose access to this one? Work on that if you're too fragile to work on yourself.
Posted Sep 7, 2020 0:47 UTC (Mon)
by corbet (editor, #1)
[Link]
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
His theoretical achievements are great, nobody argues about it. It's his practical skills that were clearly lacking.
Ask Linus Torvalds about it, he's way more eloquent than me. But in practice enforcing the single exit from functions and banning break/continue in loops (since they break loop invariants) lead to bad code.
I won't ask Torvalds, because you're the one who made that point, so the burden of proof is on you.
It's not a coincidence that Fuzzing is added to Go
while (foo)
{
# several statements
if (bar)
break;
# more statements
}
But this is necessary only because of the useless distinction between statements and expressions that many languages still have. Blocks are expressions in e. g. Ruby, so you can write this:
while foo and
begin
# several statements
! bar
end
do
# more statements
end
“break” etc. are completely unnecessary clutter, and the fact that Go nevertheless includes them only shows what a poorly designed language it is.
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
There actually is: https://www.cs.umd.edu/~ben/papers/Miara1983Program.pdf - deep indentation (be it purely visual or a result of deep nesting) hampers the comprehension.
It's not a coincidence that Fuzzing is added to Go
I've seen other studies, but I'm too lazy to find them again. Deep nesting is definitely bad, and "structured programming" often requires it, while break/continue allow to un-nest some of the code.
I'm personally fond of this style:
for (i : someCollection) {
if (i.frobBarBaz == SomeConstant) {
// The flag frobBarBaz disqualifies the object
continue;
}
if (!somethingElse(i)) {
// This is not the object you're looking for.
continue;
}
...
}
Sure, it can be rewritten as a one long condition, split into predicates for map/filter, but often at the expense of readability.
It's not a coincidence that Fuzzing is added to Go
Deep nesting is definitely bad
Let's take some code that's indented twice:
while (foo)
while (bar)
baz();
And rewrite it with one level of indentation:
l1:
if (! foo)
goto l4;
l2:
if (! bar)
goto l3;
baz();
goto l2;
l3:
goto l1;
l4:
Yeah, you're right. That is so much more readable, I wonder why I never noticed…
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
1. the paper you've presented is meaningless. It deals with the question how deep a single level of indentation should be (2 to 6 spaces). That says nothing about the benefits (or lack thereof) of structured programming
2. you haven't addressed my point that with structured programming it's easier to understand which conditions must apply for a certain piece of code to execute, because every condition corresponds to one level of indentation
3. you haven't addressed my point that it's easier to get resource cleanup right with structured programming (no need for any "goto fail" nonsense)
4. you keep harping on this structured vs. non-structured programming thing instead of addressing the much more important point that in order to avoid bugs we need to employ formal methods (which is another thing that Dijkstra was right about)
It's not a coincidence that Fuzzing is added to Go
It doesn't matter if you use break/continue for automated formal methods, they can just reconstruct the formal invariant anyway. And manually applied formal methods basically failed for anything non-trivial.
It's not a coincidence that Fuzzing is added to Go
All "structured" alternatives result in additional levels of indentation in this case.
This is purely a matter of how you choose to indent your code. It works just fine with only one level of indentation:
while foo and begin
# several statements
not bar
end do
# more statements
end
But anyway, you've clearly made up your mind about this, and fortunately I don't need to convince you.
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
In fact, it's the other way around: it's the *lack* of indentation caused by early returns that obscures the code.
Somebody came up with the idea that one should return early from functions when encountering errors:
It's not a coincidence that Fuzzing is added to Go
do_stuff();
if (foo)
return bar;
do_more_stuff();
But this is downright retarded. The whole idea about indenting the contents of blocks guarded by if
and while
is that it allows you to tell at a glance that a piece of code is not always executed but only when some condition is true. Early returns break this: you need to read the code and notice the return statement in the if (foo)
block to know that do_more_stuff()
will not be executed unconditionally but only when foo
is false. And even worse, it's very easy to mess up resource cleanup when coding in this style. People then came up with even more retarded ideas like “goto fail” to “fix” that, and when the Go developers noticed that people were messing that up too, they added *yet more* crap to the language, i. e. the defer statement. And then they give talks about how “Less is exponentially more” 🤦
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
Hi,
Please take into consideration that this is just hurtful to anyone who programs/programmed in Go or BASIC. I'm not sure what you are trying to accomplish by telling people that using certain programming languages mentally mutilates them, other than scaring them away from computer science/programming all together.
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
So what? When Darwin discovered that humans are descended from apes, that was hurtful to plenty of people, and in fact it still is. That doesn't mean he shouldn't have published his discovery. This idea that one mustn't utter opinions that might conceivably hurt someone's feelings is a disease that needs to go away sooner rather than later. Or, to put it more succinctly: https://youtu.be/PAqxWa9Rbe0
(The fact that that movie was banned from a major streaming service because some snowflake felt offended adds a nice ironic touch).
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
It's not a coincidence that Fuzzing is added to Go
This seems like a good place for this thread to stop, thanks.
Enough