[fuzzing] Do Pictures Help?

David Dagon dagon at cc.gatech.edu
Fri Nov 2 12:20:09 CDT 2007


On Fri, Nov 02, 2007 at 09:58:56AM -0800, J.M. Seitz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I threw a post up on OpenRCE regarding the usage of a "visual" to
> determine how to best approach fuzzing a file format.
> 
> http://www.openrce.org/blog/view/922/Visual_Patterns_for_File_Format_Fuzzing
> 
> Am I way off or could this help target certain spots in a file that are
> best suited for being fuzzed? Would it be useful to have the image
> actually use a legend to show a byte offset range?
> 
> Anyways, really looking for some feedback more than anything.

Your approach is the first step of a path that could lead to:

  http://www1.cs.columbia.edu/ids/publications/raid-camerav.pdf
  
which Columbia extended to find files-hidden-in-files (e.g. exes in MS
Office).  This could be extended to fuzzing problems.

An idea: Instead of looking at whether each byte is
printable/not-printable, perhaps:

  -- get lots of example files;

  -- note the distribution of byte values in offsets (see the PAYL
     paper, and related works, for ideas on how to compare 
     densities of byte distributions);

  -- perhaps plot w/ PIL as you have, or look around Conti's book
     for more ideas.

Bytes that appear to be 'fixed' or show limited variation might be
candidates for fuzzing/altering.

Since you got started on this from some books, maybe have a look at
the 'Fuzzing' book, reviewed at:

  http://seclists.org/dailydave/2007/q4/0024.html

and perhaps combine ideas from Conti and Sutton, etc.

A simple improvement from Sutton+'s book is to analyze/plot/target the
file using a semantic-aware parsing of the protocol.  I.e., instead of
plotting each byte, plot each _field_ (sets of bytes, based on the
proto/header, or variable ranges of bytes, in some cases).  Or, if the
protocol/header of the file is cumbersome, unknown, or note a
well-formed language, do this for successively smaller sub-ranges
(e.g., like DCC does for partial ranges of spam messages) of your
file, until a convergence appears, or until you reach a minimum
sub-range width.  I.e., do this for every N byte block, and you'll
likely discover the .ctors, .dtors, .data segments of an executable,
or get a general picture of the document's structure, which in turn
might suggest subsections of the file to fuzz against.

Thanks for sharing your RCE post; very fun stuff over a cup of coffee.

-- 
David Dagon              /"\                          "When cryptography
dagon at cc.gatech.edu      \ /  ASCII RIBBON CAMPAIGN    is outlawed, bayl
Ph.D. Student             X     AGAINST HTML MAIL      bhgynjf jvyy unir
Georgia Inst. of Tech.   / \                           cevinpl."


More information about the fuzzing mailing list