[news.admin] Faking out a disassembler

pete@octopus.UUCP (Pete Holzmann) (07/15/88)
In article <2274@sugar.UUCP> karl@sugar.UUCP (Karl Lehenbauer) writes:
>[I wrote]
>> ...There are plenty of
>> PC-based tools for binary analysis that can be quickly run over reasonble-
>> sized programs (and slowly run over big programs)...
>> automatic disassemblers that produce comments for anything that touches
>> the external environment (system memory, I/O ports, interrupts, system
>> calls), etc...

>Well, I assume automatic dissasemblers blow off stuff they don't understand
>and just .DATA it or whatever as binary data.  It would be no problem for a 
>Trojan Horse to decrypt the portion of itself that actually trashes your 
>system when it has decided that the time is right.  That way, string searches
>and code that looks for anything that touches system memory, I/O ports,
>interrupts, system calls, etc, will fail to locate the Trojan.  More clever 
>variations can be envisioned in which the encrypted part, or code that 
>generates the code to do the trashing, etc, etc appears to be something 
>useful, or at least seems too complicated to bother to decipher.

Let me first say that I am sure it *is* possible to confuse the heck out
of a great disassembler, even if it is human :-)! The same goes for source
code. And my response to either is the same: If I can't understand what the
code is doing, or if it looks 'funny', I don't trust it.

As far as your examples go, a good disassembler (Sourcer for the PC is a
pretty good one, for example) keeps working at the program until it
understands everything that is at all normal, and *marks* everything that
isn't. Thus:
	- all code is simulated [simple simulation, but enough to handle
		most requirements] enough to trace every path of existing
		code. [Encrypted code is not properly disassembled, but...]
	- all destinations of flow control transfer are marked. If the
		destination is a data area, that's a heavy duty flag for
		wierd code [either we're dealing with encrypted code, self-
		modifying code, or something similar]

The job is much harder on an Intel-architecture CPU than for a good CPU :-).
The addressing is rather strange, with overlapped address bits from the
segment and offset registers for the current instruction address.

From experience, the examples you gave *would* be found by a good disassembler.
Actually, what gives it the worst fits is inline data following a subroutine
call. It marks everything, but in order to get a usable disassembly, I've got
to go through and fix up the markers by hand. Fortunately, normal languages
don't do this.

MY NEWS.ADMIN POINT:
Sure, somebody could trick me. But the resources I have available
for verifying binary programs [not the least of which is trusting the members
of the net] make me as confident about using binaries as about using source
code.

Followups about disassembler technology should probably be redirected to
comp.arch or something.

-- 
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746