[fuzzing] Do Pictures Help?
David Dagon
dagon at cc.gatech.edu
Fri Nov 2 12:20:09 CDT 2007
On Fri, Nov 02, 2007 at 09:58:56AM -0800, J.M. Seitz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I threw a post up on OpenRCE regarding the usage of a "visual" to
> determine how to best approach fuzzing a file format.
>
> http://www.openrce.org/blog/view/922/Visual_Patterns_for_File_Format_Fuzzing
>
> Am I way off or could this help target certain spots in a file that are
> best suited for being fuzzed? Would it be useful to have the image
> actually use a legend to show a byte offset range?
>
> Anyways, really looking for some feedback more than anything.
Your approach is the first step of a path that could lead to:
http://www1.cs.columbia.edu/ids/publications/raid-camerav.pdf
which Columbia extended to find files-hidden-in-files (e.g. exes in MS
Office). This could be extended to fuzzing problems.
An idea: Instead of looking at whether each byte is
printable/not-printable, perhaps:
-- get lots of example files;
-- note the distribution of byte values in offsets (see the PAYL
paper, and related works, for ideas on how to compare
densities of byte distributions);
-- perhaps plot w/ PIL as you have, or look around Conti's book
for more ideas.
Bytes that appear to be 'fixed' or show limited variation might be
candidates for fuzzing/altering.
Since you got started on this from some books, maybe have a look at
the 'Fuzzing' book, reviewed at:
http://seclists.org/dailydave/2007/q4/0024.html
and perhaps combine ideas from Conti and Sutton, etc.
A simple improvement from Sutton+'s book is to analyze/plot/target the
file using a semantic-aware parsing of the protocol. I.e., instead of
plotting each byte, plot each _field_ (sets of bytes, based on the
proto/header, or variable ranges of bytes, in some cases). Or, if the
protocol/header of the file is cumbersome, unknown, or note a
well-formed language, do this for successively smaller sub-ranges
(e.g., like DCC does for partial ranges of spam messages) of your
file, until a convergence appears, or until you reach a minimum
sub-range width. I.e., do this for every N byte block, and you'll
likely discover the .ctors, .dtors, .data segments of an executable,
or get a general picture of the document's structure, which in turn
might suggest subsections of the file to fuzz against.
Thanks for sharing your RCE post; very fun stuff over a cup of coffee.
--
David Dagon /"\ "When cryptography
dagon at cc.gatech.edu \ / ASCII RIBBON CAMPAIGN is outlawed, bayl
Ph.D. Student X AGAINST HTML MAIL bhgynjf jvyy unir
Georgia Inst. of Tech. / \ cevinpl."
More information about the fuzzing
mailing list