[comp.lang.c] Help on disassembler/decompilers

wwho@ucdavis.edu (W. Wilson Ho) (09/06/90)

	I am looking for any information related to disassembling
object code into assembly langauge or even higher-level language such
as C.  Would someone please give me pointers to program sources,
documentation or papers related to this?

	Thanks in advance!

  W. Wilson Ho		        |  INTERNET:  how@ivy.ucdavis.edu
  Division of Computer Science	|  UUCP:      ...!ucbvax!ucdavis!ivy!how
  EECS Department		|  BITNET:    wwho@ucdavis.bitnet
  University of California	|
  Davis, CA 95616		|
[Turning object code back into assembler is pretty straightforward, and
every debugger does it.  Someone else asked about disassembling into higher
level languages a little while ago, but I didn't see any responses. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

hankd@dynamo.ecn.purdue.edu (Hank Dietz) (09/10/90)

In article <HOW.90Sep5173755@sundrops.ucdavis.edu> you write:
>	I am looking for any information related to disassembling
>object code into assembly langauge or even higher-level language such
>as C.  Would someone please give me pointers to program sources,
>documentation or papers related to this?

Basic disassembly is trivial, particularly if you have an object
module with a name list.  The interesting problems are:

[1]	Determining which portions of a raw memory image are
	code and which are data.  Typically, this is done by
	providing a set of code entry points and having the
	disassembler trace program flow marking each word with
	type information as each flow path is followed.

[2]	Dealing with self-modifying code.  At least the
	technique of [1] can detect when this might happen....
	I don't know of any reasonable way to deal with it.

Notice that indirect jump tables are particularly difficult to flow
trace (see [1]), as are techniques which use a Call instruction but
follow the instruction with the argument values (raw data) and tweak
the return address appropriately (as in some threaded interpreters).
Notice that knowing that the code image came from a particular
compiler can make these problems much easier to deal with, since you
can simply recognize the compiler's code generation idiom.

						-hankd@ecn.purdue.edu

PS: Back around 1981-2 I did a flow analyzing disassembler for several
    then-popular microprocessors (e.g., 8080).  I still have it, but
    it really isn't very impressive... especially when it hits some of
    those problem cases noted above (e.g., PCHL).
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.