[comp.lang.c] Universal Disassemblers vs. Universal MIILs

eric@snark.UUCP (Eric S. Raymond) (10/17/88)

In article <7226@ihlpl.att.com>, knudsen@ihlpl.ATT.COM (Knudsen) writes:
>In article <e6m10#2eDFfC=eric@snark.UUCP>, I write:
>>Excuse me, but I thought the security problem in for-sale software was to
>>guard it from unauthorized *copying* and *use*, not unauthorized
>>*understanding.
> 
> Well, some vendors are afraid of people (competitors?) understanding
> their code. 

And distributing a uMIIL isn't going to make automatic disassembly *easier*?

Long ago, in my pre-UNIX days, I once started writing a smart disassembler for
8086 code, one that would recognize illegal instructions, do flow-of-control
analysis on jumps and assign symbolic labels (then allow you to change the
names to meaningful ones). It would recognize and interpret OS service calls,
so you'd be able to spot I/O subroutines at a glance. It would keep its
deductions and your comments on them in a database so you could analyze code
interactively in stages. The Cracker, I called it. 

You'd sic this thing on a binary, watch the listings it generated, and add
comments through it. The end product; a text database which, when merged
against the binary through the cracker, would produce a neat  commented
listing.

What stopped me? Well, I got this 68010 UNIX box (which is now dying and being
replaced by an 80386). Suddenly cracking 8086 machine code didn't look very
interesting anymore...but if the code for both machines had been distributed
in a uMIIL, I would have had lots of incentive to *finish* the cracker.

And then I'd have given it to the world. BBSs would start swapping
comment/label databases for popular programs.  And the uMIIL-using
manufacturers' code would suddenly be naked, stripped of whatever dubious
prtection the uMIIL was supposed to get them.

Now, even if (in this uMIIL-using alternate reality) *I'd* never finished
a uMIIL Cracker *someone would have*! Machine-language distribution doesn't
concentrate the incentive to produce such a program the way a uMIIL would,
because in a uMIIL wotld the program would only have to be done *once*.

What price 'knowledge security' then? No, manufacturers are better off without
a uMIIL and *with* multiple barriers to code-cracking.

P.S. on the Cracker concept:

Does anyone know of something like this having been actually implemented?

Notice that all the code except the single-instruction disassembler
would have been machine-independent; plug in a new such routine, and you
support a new instruction set.

Since the only code the Cracker would need to be smart about was control
transfers and service-request traps, I even thought about trying to make it
table-driven from some kind of instruction-set description language.

You know, maybe I *should* go back and finish it, just as an interesting
research problem of course...
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718

usenet@cps3xx.UUCP (Usenet file owner) (10/18/88)

In article <e8amX#27Cbjc=eric@snark.UUCP>, Eric S. Raymond
(eric@snark.uucp) writes:
> [...]
>Long ago, in my pre-UNIX days, I once started writing a smart disassembler for
>8086 code, one that would recognize illegal instructions, do flow-of-control
>analysis on jumps and assign symbolic labels (then allow you to change the
>names to meaningful ones). It would recognize and interpret OS service calls,
>so you'd be able to spot I/O subroutines at a glance. It would keep its
>deductions and your comments on them in a database so you could analyze code
>interactively in stages. The Cracker, I called it. 
>
> [... stuff about implications deleted ...]
>
>Does anyone know of something like this having been actually implemented?
>

  There's a program called "Sourceror" on the Apple ][ series, by Glen
Bredon I think.  It came with the Big Mac/Merlin assemblers.  It knew
about all Apple's ROM calls (and many DOS calls), most global
variables, the 6502 instruction set, and the "Sweet-16" pseudo-code
instruction set.
  It didn't add comments to the code, but it did take care of things
like assigning symbolic names to labels, etc. which helped a lot if
you wanted to understand programs (I used it on Applesoft BASIC, for
example).  It's a good start, anyway....

			Anton Rang
			rang@cpswh.cps.msu.edu

+---------------------------+------------------------+----------------------+
| Anton Rang (grad student) | "UNIX: Just Say No!"   | "Do worry...be SAD!" |
| Michigan State University | rang@cpswh.cps.msu.edu |   -- Jill Belscamper |
+---------------------------+------------------------+----------------------+

bcase@cup.portal.com (Brian bcase Case) (10/20/88)

In article <7226@ihlpl.att.com>, knudsen@ihlpl.ATT.COM (Knudsen) writes:
>And distributing a uMIIL isn't going to make automatic disassembly *easier*?

This, I think, is the one real hurdle is getting a the MIIL concept accepted.

>I once started writing a smart disassembler for 8086 code, ... recognize
>illegal instructions, flow-of-control analysis on jumps and assign symbolic
>labels. It would recognize and interpret OS service calls, so you'd be able
>to spot I/O subroutines at a glance. It would keep its deductions and your
>comments on them in a database so you could analyze code interactively in
>stages. The Cracker, I called it. 

You have described "MacNosey" for the Mac by Jasik Designs!  Check it out.

>...but if the code for both machines had been distributed
>in a uMIIL, I would have had lots of incentive to *finish* the cracker.
>And then I'd have given it to the world.  And the uMIIL-using
>manufacturers' code would suddenly be naked, stripped of protection...

Yes, this is the problem.  But the point of a MIIL is to prevent obsolete
software, not prevent reverse engineering.  However, the prevention of
reverse engineering will probably be required to gain the kind of acceptance
it needs to make an appreciable impact.

pardo@june.cs.washington.edu (David Keppel) (10/21/88)

bcase@cup.portal.com (Brian bcase Case) writes:
>knudsen@ihlpl.ATT.COM (Knudsen) writes:
>>And distributing a uMIIL isn't going to make automatic disassembly *easier*?
>This, I think, is the one real hurdle is getting a the MIIL concept accepted.

I think the nub of the matter is that it makes disassembly more
*useful*, not any easier.

I claim that I can distribute C code to my programs and it is
completely useless.  I gave an example of this quite a while back.
I need to do things such as:

* Rename all variables.
* Hoist (inline) functions.
* Do loop transformations (e.g. for() loop to a goto loop).
* Strip out all comments.
* Run the preprocessor to remove #ifdefs  (Is this the same
  value "4" that appeared in the line before, or are they
  unrelated?)
* Avoid standard libraries.
* Do code motion.
* Delcare wasted variables, dead code, unoptimize code that
  an optimizer can put back together again later, ...

Essentially, preform all the optimizations that I can on the C source,
and  steal liberally from the Obfusacted C Code Contest.  Consider the
following (well-formated) program.  What does it do?

extern	struct	_a7F9a1Xs3 {
	int	_a7F6a1Xs3;
	char	*_a7G9a1xs3;
	char	*_a7G6a1xs3;
	int	_a7G6a1xs7;
	short	_a7F9a1xs7;
	char	_a7F9a1xf7;
} _iob[3];

main(_a7F9a1xf3, _a7F61axf3)
    int _a7F9a1xf3;
    char *_a7F61axf3[];
{
    int _a7G61asf3, _a7G61faf3;

    goto _a7G61afx3;
  _a7G61afs3:
    exit(0), _a7G61asf3&=(0x10)+1;
  _a7G61afx3:
    ((_a7G61asf3=(--((&_iob[0]))->_a7F6a1Xs3>=0
	? *((&_iob[0]))->_a7G9a1xs3++&0377
	:_filbuf((&_iob[0]))))
    !=(-1));
    if (_a7G61asf3*(3-1)==(0-2))  goto _a7G61afs3;
    (--((&_iob[1]))->_a7F6a1Xs3>=0
	? ((int)(*((&_iob[1]))->_a7G9a1xs3++=(unsigned)(_a7G61asf3)))
	:_flsbuf((unsigned)(_a7G61asf3),(&_iob[1])));
    goto _a7G61afx3;
  _a7G71afs3:
    (--((&_iob[1]))->_a7F6a1Xs3>=0
	? ((int)(*((&_iob[1]))->_a7G9a1xs3++=(unsigned)(_a7G61asf3)))
	:_flsbuf((unsigned)(_a7G61asf3),(&_iob[1])));
    exit(1);
}

Did you guess:

#include <stdio.h>

main(argc, argv)
    int argc;
    char *argv[];
{
    int c;

    while ((c=getchar())!=EOF)
	putchar(c);
}

Enough.

	;-D on  ( Throw a monkey in the wrench )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

lyndon@nexus.ca (Lyndon Nerenberg) (10/21/88)

In article <e8amX#27Cbjc=eric@snark.UUCP>, eric@snark (Eric S. Raymond) writes:
>P.S. on the Cracker concept:
>
>Does anyone know of something like this having been actually implemented?

Yes indeed! Some friends had this running on an Amdahl under the
Michigan Terminal System back around 1980. The disassembler was
part of a program called Glass that ran on 3270 terminals. Glass
was a "window" onto your virtual memory space. By using the PF
keys, you could "page" forward/backward in memory, or jump to
an arbitrary address. You were also able to toggle between three
display modes: EBCDIC, hex, and disassembly. Glass was aware of
the loader's symble table lookup conventions, therefore it was
capable of inserting symbolic names for system subroutines and
entry points into user loaded code. There was talk of adding
knowledge of symbolic debugger load records, but this never got
implemented.

The program was (apparently) inspired by a similar utility found
floating around SHARE someplace.

Oh yes, you could also use Glass to modify memory contents by setting
the display to hex mode, making changes to the screen, and hitting
ENTER to write the changes. There was also an undocumented PF key
combination that would invoke spells to nuke the hardware protection
bits. Given that the entire OS resided in shared virtual memory,
this feature contributed to some rather interesting evenings ... :-)

*** INLOOP PROTECTION TROUBLE SNARK
*** HELP! SNARK IN MTS!

--lyndon
-- 

eric@snark.UUCP (Eric S. Raymond) (10/21/88)

In <10191@cup.portal.com>, bcase@cup.portal.com (Brian bcase Case) writes:
> In article <7226@ihlpl.att.com>, knudsen@ihlpl.ATT.COM (Knudsen) writes:
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >And distributing a uMIIL isn't going to make automatic disassembly *easier*?

Eh? I wrote what he's quoting, and I am *not* "knudsen@ihlpl.ATT.COM".

> You have described "MacNosey" for the Mac by Jasik Designs!  Check it out.

Interesting...does anybody know of a similar product for Intel-family machines?

> Yes, this is the problem.  But the point of a MIIL is to prevent obsolete
> software, not prevent reverse engineering. 

My article was in reply to a claim that a uMIIL would somehow offer better
security against what we politely call "reverse engineering" than current
machine code.

>                                            However, the prevention of
> reverse engineering will probably be required to gain the kind of acceptance
> it needs to make an appreciable impact.

True. And this raises an insuperable problem for uMIIL proponents, because
"preventing reverse engineering" and "easy portability" are diametrically
opposing goals. The uMIIL concept seems to me to be a particularly
ill-thought-ought stab at serving both masters -- but machine-coded proprietary
software already well meets the requirements of the former and HLL source 
is pretty good for the latter (modulo OS-standardization problems that any
uMIIL will *also* have).

In case it isn't obvious, I think this is yet another reason to class the
whole notion of uMIIL as a distracting red herring and forget it.
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718

paul@unisoft.UUCP (n) (10/21/88)

Subject: Re: Universal Disassemblers vs. Universal MIILs

Isn't a 'Universal Disassembler' what the nanotechnology people use 
for reverse engineering a competitor's products?

	:-) Paul

-- 
Paul Campbell, UniSoft Corp. 6121 Hollis, Emeryville, Ca ..ucbvax!unisoft!paul  
Nothing here represents the opinions of UniSoft or its employees (except me)
"Gorbachev is returning to the heritage of the great Lenin" - Ronald Reagan 1988
  (then the Wasington Post attacked RR [from the right] for being a Leninist)

fox@marlow.uucp (Paul Fox) (10/25/88)

In article <e8amX#27Cbjc=eric@snark.UUCP> eric@snark.UUCP (Eric S. Raymond) writes:
>In article <7226@ihlpl.att.com>, knudsen@ihlpl.ATT.COM (Knudsen) writes:
>>In article <e6m10#2eDFfC=eric@snark.UUCP>, I write:
>
>P.S. on the Cracker concept:
>
>Does anyone know of something like this having been actually implemented?
>
Yes - I did this once. It was for Z-80 machine code, and I did it for
a Z-80 ICE for which I needed to extend its functionality. It
was pretty easy, and it was command line driven. (You would create
shell scripts containing the long command lines).

It allowed you to do things like specify what the RST instructions
were for, and allowed things like having some of the RST instructions
being followed by a byte of sub-opcode;

It allowed you to add labels (although not comments for particular lines).
Thus as you understood what parts of the code were doing you could tell
it the labels to use, and any references to that address would come out
symbolically.

Also, since its diffcult to make the machine decide whether something
is code or data, it allowed you to mark selected areas as being 
tables and thus avoid disassembling it. 


=====================
     //        o      All opinions are my own.
   (O)        ( )     The powers that be ...
  /    \_____( )
 o  \         |
    /\____\__/        Tel: +44 628 891313 x. 212
  _/_/   _/_/         UUCP:     fox@marlow.uucp

bpendlet@esunix.UUCP (Bob Pendleton) (10/27/88)

From article <6152@june.cs.washington.edu>, by pardo@june.cs.washington.edu (David Keppel):
> I claim that I can distribute C code to my programs and it is
> completely useless.  I gave an example of this quite a while back.
> I need to do things such as:
> 
> * Rename all variables.
> * Hoist (inline) functions.
> * Do loop transformations (e.g. for() loop to a goto loop).
> * Strip out all comments.
> * Run the preprocessor to remove #ifdefs  (Is this the same
>   value "4" that appeared in the line before, or are they
>   unrelated?)
> * Avoid standard libraries.

Why? 

> * Do code motion.
> * Delcare wasted variables, dead code, unoptimize code that
>   an optimizer can put back together again later, ...

Again why?

> Essentially, preform all the optimizations that I can on the C source,
> and  steal liberally from the Obfusacted C Code Contest.  

Ignoring the deliberate obfuscation this gives you source code that a
fairly dumb compiler can convert to reasonably good object code.  One
trouble with it is that it is portable, but not machine independent.
It can only become machine dependent by establishing a standard for
the sizes of all data types and the semantics of Cs "defined to be
undefined" operator/operand pairs. The MIIL cannot be C because C is
not machine independent.

Another problem with using C as an MIIL is that the only subroutine
calling conventions and scoping rules that can be efficiently
represented in C are those of C.  the scoping rules and subroutine
linking mechanisms of languages like MODULA-2 and LISP do not map well
onto C.

Maybe this discussion will get a little farther if we drop the
"Intermediate Language" part of MIIL and try looking at it as a MISDL
(Machine Independent Software Distribution Language). Is it safe to
even try to talk about a machine independent dialect of C? With
extensions that provide low level mechanisms to allow several
different subroutine linking and scoping rules to be implemented
efficiently?

BTW, a quick pass with an editor to convert all your hard to read
names into short names like i1 for ints and c3 for chars makes your
example a lot easier to read. It's the macro expansions that make it
hard to follow.

			Bob P.
-- 
              Bob Pendleton, speaking only for myself.
An average hammer is better for driving nails than a superior wrench.
When your only tool is a hammer, everything starts looking like a nail.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

wcs@alice.UUCP (Bill Stewart, usually) (11/04/88)

In article <6152@june.cs.washington.edu> pardo@cs.washington.edu (David Keppel) writes:
:I think the nub of the matter is that it makes disassembly more
:*useful*, not any easier.
:I claim that I can distribute C code to my programs and it is
:completely useless.  I gave an example of this quite a while back.
:I need to do things such as:
:* Rename all variables.
:* Strip out all comments.
:* Avoid standard libraries.
:* Do code motion.
:[.......]
:Essentially, preform all the optimizations that I can on the C source,
:and  steal liberally from the Obfusacted C Code Contest.  Consider the

There was once a consultant at a large telecommunications company who
was required to provide source for the products he developed.  In
addition to preprocessing the source, he ran it through the C
equivalent of a "jive" filter: all the variable names were combinations
of capital O, lower-case L, and 0 and 1.  Useless!
-- 
#				Thanks;
# Bill Stewart, att!ho95c!wcs, AT&T Bell Labs Holmdel NJ 1-201-949-0705
# and/or
# Shelley Rosenbaum, att!ho95c!slr, 1-201-949-3615   ho95c.att.com