[fuzzing] MoKB take?
Disco Jonny
discojonny at gmail.com
Thu Nov 9 08:57:29 CST 2006
Hi Charlie,
See inline :)
On 08/11/06, Charlie Miller <cmiller at securityevaluators.com> wrote:
> Comments to comments below
>
> Disco Jonny wrote:
> > Hi,
> >
> > sorry for replying to two posts (charlie and gadi) in one post. and
> > for diverting from the original question.
> >
> >
> > if you mean statistical then yes. if you mean static as in look at
> > the binary, then no, i doubt it would be feasible, well it would be
> > but there are much easier ways to find this bug.
>
> By statistical, I assume you mean looking at code coverage metrics. If
> this is what you are saying, then I think it is harder than you let on.
> This is really more of a two-factor bug, it needs a comma at beginning
> and a long string. It is certainly possible you could have sent in a
> comma at the beginning at some point and gotten code coverage of the
> comma special case (statement and branch coverage at least) but not
> found the bug.
not really, code coverage metrics is a very vague definition. what
do your metrics include? :) I am not really doing code coverage in
the classical sense, well from what I have seen from the code coverage
tools that are used here. they are all pretty graphs and percentages.
I use 3d visualisation to do this [I wont go into the technical
details]. the eyes are much better at assimilating vast amounts of
data. You can then use 3d collision maps to extend this further, and
automatically calculate some of your equivalences.
Think 3d ven diagrams [with colours, shapes and light], and you have
just started a journey from which you might not return :)
> Thanks for sharing this. But, again, if I'm understanding, this is a
> heuristic you developed in response to this particular bug. What
> happens when the next odd thing comes along for which there is no
> heuristic yet? (dj will say use reactive techniques - and in a sense I
> agree with him, but see my next comment)
This is exactly my point. The software leverages the knowledge of the
user, not the creator. it is kinda like a word processor v a c
compiler.
>
> dj, it is clear your method is better than just black box fuzzing, and I
> used to believe fully in it. Recently, my faith has been tested,
> though. Consider the example of the sendmail crackaddr bug from a few
> years back
> (http://www.iss.net/issEn/delivery/xforce/alertdetail.jsp?oid=21950 and
> http://www.securityfocus.com/archive/1/313757/2003-03-01/2003-03-07/0)
> Basically, this bug boils down to the following: In the "from" string,
> every occurrence of both a "<" and a ">", in that order, allows for the
> possible overflow of a static buffer by one additional byte.
> Assuming
> you are only detecting SIGSEGV's, it will probably take quite a few of
> these to actually detect something has happened.
[i have not looked at this bug, and I am making some wild assumptions]
Just looking for sigsegv's is not all that helpful, there are many
exploitable bugs that will have a probability so close to 0 for
causing a crash, you will never find them if you only look for
crashes.
As I hope most people would know here, it is very rare for a bug to
manifest ittself and only alter the very thing it breaks... normally
you will feel the vibrations before you can see the train, the thing
is to look out for these vibrations. (yeah filtering them out from the
noise is not the eaiset thing in the world, but once it has been done,
there is no reason to redo it.)
So we create a 'glass house' around the functions and code we are
testing. (using BVA, EP, Run time analysis, visualisation, dependency
mapping, etc.)
> It seems hard to
> believe this would have been found randomly or with heuristics (of that
> day, it is now probably common to add a bunch of <><><><>'s)
and most probably, those heuristics only exist because of this bug.
> The
> function this vulnerability takes place in is very complicated, with
> something like 2^100 execution paths through it - not including loops.
of these 1,267,650,600,228,229,401,496,703,205,376 paths through the
code how many are actually genuine paths? and how many are only
exposed if a previous branch has been taken? dependency mapping is
needed. from this we can get the boundary values.
without dependency mapping (i have never know a coder to even bother
trying) it even more unlikely that code coverage alone will find this,
and path/conditional only coverage will probably spiral into itsself
before it gives you any meaningful data. however, without getting and
taking this bug apart (rather hard for me, seeing as none of my stuff
works on *nix) it would be impossible to say how I would have found
it, but I am pretty confident that I _could_ have done, because on the
surface it looks like it would break the 'glass house' everytime an
email was sent... (because of the <> not being in the correct space,
and hitting a collision wall.)
was this bug only caused by erroneous <> in the *from* address? it is
hinted that it would work in any address box, but securiteam only used
the from address to check to see if the software crashed, what about
reply to?
If it is any address then a testcase that said something like "ensure
that sendmail will work if an email is addressed to 2500 users" or an
email with "2500 users in the return address" would have brought this
to light?
> Using only branch or statement coverage won't help find this bug since
> many typical "from" addresses will contain a "<" followed by a ">".
> There are too many paths to do path coverage.
[Just on a side note - testing is meant to test all the non typical
from addresses too, then a simple comparison of behaviour without <>
and behaviour with <> should have flagged something somewhere, meaning
that their should have been tests for it...]
path coverage alone doesnt tell you much, and if you are going to go
that far then you might aswell just create a cpu emulator, and run the
whole program through that. (yeah I have been toying with that idea -
although I am not sure that it will be quicker)
> This is a complicated bug
> to trigger in a complicated function. I don't see how partition
> analysis helps in this case, please try to convince me otherwise.
not meaning to be pedantic, but it is Equivalence *Partitioning* and
Boundary Value *Analysis*. I am not sure I can convince you, but I
can show you ... also if you have any bugs on windoze (pref 2k) then I
would be more than happy to spend a few days on it showing how you
would generate BV's and EP's that will find it, but more over they
will not be specific, they will be generic - so I would show how I
would go about doing it, then run it and see if it trips the alarms.
I dont know the bug well enough (i dont know it at all) So I will
address the value of this in a separate email, I will try to find what
I believe to be similar bugs, yet ones I know about. (would the IIS
unicode one be a fair example?)
> The
> function this vulnerability takes place in is very complicated, with
> something like 2^100 execution paths through it - not including loops.
of these 1,267,650,600,228,229,401,496,703,205,376 paths through the
code how many are actually genuine paths? and how many are only
exposed if a previous branch has been taken? how many are reliant on
non user supplied input? how many use user supplied inputs? once we
know these values we can set the boundaries.
critical path analysis (which is what most people seem to try to
implement in their fuzzers) is a technique that is used for
quality/budget/time restrictions, it is a cheap way of bypassing the
halting problem by creating an entire suite of soft halts. (it is not
a part of testing, although it is a part of QA)
Have you heard of the terms 'soft halts' and 'hard halts'?
If you are only looking for segfaults then you are using a combination
of hard halts, one boundary value, and only two output equivalents
(crash/not crash).
If you can develop decent soft halt values, (again from BVA, EP
combined with runtime analysis/code coverage) you can mitigate the
halting problem for your software (and only your software, because the
soft halts are not valid for any other software).
I best get on with writing that other email.
cheers,
dj.
More information about the fuzzing
mailing list