[sci.lang] Using C to describe itself

steven@mcvax.UUCP (04/22/87)

In article <1403@uwmacc.UUCP> edwards@unix.macc.wisc.edu.UUCP (mark edwards) writes:
>   Yes indeed C can describe itself.

Actually, I'm not completely convinced of this. The question of whether C
can be used to describe itself or not seems to be the same as whether the
sentence "This sentence is true" is true or false: it depends on your
assumptions.

Evidence of this is in Dennis Ritchie's Turing award lecture: he had built a
trojan horse into the C compiler. The compiler looked identical to its
previous self, but actually compiled a (slightly) different language. This
suggests to me that you can't look at the compiler source to determine what
a certain construct in C means, even if you know the meaning of the language
it compiles to.

Steven Pemberton, CWI, Amsterdam; steven@cwi.nl or try steven@mcvax.uucp

sjc@arthur.cs.purdue.edu (Steve Chapin) (04/24/87)

In article <7360@boring.mcvax.cwi.nl}, steven@mcvax.cwi.nl (Steven Pemberton) writes:
} In article <1403@uwmacc.UUCP> edwards@unix.macc.wisc.edu.UUCP (mark edwards) writes:
} >   Yes indeed C can describe itself.
} 
} Actually, I'm not completely convinced of this. The question of whether C
} can be used to describe itself or not seems to be the same as whether the
} sentence "This sentence is true" is true or false: it depends on your
} assumptions.
} 
} Evidence of this is in Dennis Ritchie's Turing award lecture: he had built a
} trojan horse into the C compiler. The compiler looked identical to its
} previous self, but actually compiled a (slightly) different language. This
} suggests to me that you can't look at the compiler source to determine what
} a certain construct in C means, even if you know the meaning of the language
} it compiles to.
} 
} Steven Pemberton, CWI, Amsterdam; steven@cwi.nl or try steven@mcvax.uucp

If I recall correctly, he first wrote a compiler that on a special
string input, output a binary that did NOT correctly match the
source code.  Then he fed it a "clean" C compiler, and it output
that C compiler with the additional property that it also contained
the trojan horse.  Thus, as long as you kept compiling the C
compiler with the C compiler, you would get a trojan horse C,
which would do a couple of interesting things when fed the right
input file.

But, the point is, that the original compiler and all its
descendants were NOT (!!!!!) actual C compilers, since the
did NOT correctly transform all proper inputs into equivalent
outputs.  So, this example is not relevant.

---------------------------------------------------------------------------
Steve Chapin                    |      Chapin's Law of Motion:  
ARPA:  sjc@gwen.cs.purdue.edu   |      You can get anywhere in 10 minutes
UUCP:  ...!purdue!sjc           |      if you drive fast enough.
---------------------------------------------------------------------------

chips@usfvax2.UUCP (Chip Salzenberg) (04/24/87)

In article <7360@boring.mcvax.cwi.nl>, Steven Pemberton writes:
> In article <1403@uwmacc.UUCP> Mark Edwards writes:
> >   Yes indeed C can describe itself.
> 
> Actually, I'm not completely convinced of this.
> [...]
> Evidence of this is in Dennis Ritchie's Turing award lecture: he had built a
> trojan horse into the C compiler. The compiler looked identical to its
> previous self, but actually compiled a (slightly) different language.

I've never heard of Ritchie's "Turing award lecture".  What was that lecture,
and what exactly was the Trojan horse you mention?
-- 
Chip Salzenberg		    Address: "{gatech,cbatt}!usfvax2!ateng!chip"
AT Engineering, Tampa, FL   Redress: "chips@usfvax2"
"Use the Source, Luke!"	    My opinions do not necessarily agree with anything.

john@frog.UUCP (John Woods, Software) (05/01/87)

>In article <7360@boring.mcvax.cwi.nl>, Steven Pemberton writes:
>> In article <1403@uwmacc.UUCP> Mark Edwards writes:
>>>   Yes indeed C can describe itself.
>>Actually, I'm not completely convinced of this...
>>Evidence of this is in Dennis Ritchie's Turing award lecture: he had built a
>>trojan horse into the C compiler. The compiler looked identical to its
>>previous self, but actually compiled a (slightly) different language.
> 
>I've never heard of Ritchie's "Turing award lecture".  What was that lecture,
>and what exactly was the Trojan horse you mention?

See Communications of the ACM, August 1984 (Vol 27 #8), page 761:
"Reflections on Trusting Trust", Ken Thompson (NOTE:  'twas KT who
"enbugged" :-) the compiler, not DMR) (They both won the 1983 Turing Award).
His Trojan Horse had two parts:

(1) A section of code inserted into the C compiler which would detect that it
was compiling login, and add code to login to enable a "magic password".

(2) A second section of code which detected that it was compiling the C
compiler, and inserted "features" (1) and (2).  This is a special case of the
famous "self-reproducing program" concept.

Once compiler (2) was compiled, the source was thrown away, and in effect, the
source to compiler (0) WAS the source to compiler (2)!.

C can't "describe" itself, at least not in the person of the text of the C
compiler, unless you ALREADY have a higher authority to appeal to to decide
what the C compiler itself means.  This is, in fact, quite a deep observation,
which extends to far more than the C language, or computer languages in
general.

The text of this message is actually instructions on how to construct an
atomic bomb, if you interpret it correctly :-).


--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

"Happiness is the planet Earth
	in your rear-view mirror."
		- Sam Hurt, in "Eyebeam, Therefore I Am"

bob#@andrew.cmu.edu (Bob Sidebotham) (05/06/87)

X-Andrew-Authenticated-as: 9

X-Trace: MS Version 3.24 on ibm032 host carnot, by bob (9).

To: outnews#ext.nn.sci.lang@andrew.cmu.edu,
	outnews#ext.nn.comp.lang.misc@andrew.cmu.edu

In-Reply-To: <1336@frog.UUCP>


>>>>   Yes indeed C can describe itself.
>>>Actually, I'm not completely convinced of this...
>>>Evidence of this is in Dennis Ritchie's Turing award lecture: he had built
a
>>>trojan horse into the C compiler.

If anyone is unconvinced by this argument, here are two real examples that I
ran into years ago in a Pascal compiler and another compiler derived from
Pascal.  Both of these compilers compiled themselves, and both had hidden
aspects to them which were completely invisible in the source code:

Pascal had a concept of an "alfa", which was a string object holding a fixed
maximum number  characters, typically corresponding to the number of bytes
available in a computer word.  On different machines, this was defined
differently.  The predefined constant ALFALENG defined this maximum length.
The compiler created a symbol table entry for this constant by something
equivalent to:

	AddNumericConstant("ALFALENG", ALFALENG);

where the first parameter is the name of the constant, and the second is the
value.  The only way to determine the actual value of ALFALENG was to examine
the object code or to compile a program which printed it out.  On different
machines you got different values.

Another, considerably more interesting example was a bug that we managed to
introduce into the floating point constant recognition.  Originally, there
were no floating point constants in the compiler, the routine did a divide by
10 every time it came across another digit to the right of the decimal.
Since the divide on our machine was relatively slow, someone replaced the
divide by 10 with a multiply by 0.1.  This appeared to work fine, until we
discovered, much later, that the compiler's representation for 0.1 changed,
by 1 bit, for each generation of the compiler.  That is, you compiled the
compiler, used the results to compile a new compiler, then did a byte for
byte comparison of the two generated programs.  They differed by one bit!  It
turned out that every second generation was identical.

There's also the well known C-preprocessor hack:  every C preprocessor
defines a symbol which declares which machine type is being compiled for.  On
a vax, for example, you can test the symbol "vax" and conditionally compile
code for it.  It turns out that the preprocessor itself has identical source
code for the various machine types, as:

	#ifdef vax
		Define the vax symbol
	#endif
	#ifdef ibm032
		Define the ibm032 symbol	
	#endif	
	etc.

Again, it's impossible to determine, purely from the source, which symbol
will actually be defined.

mouse@mcgill-vision.UUCP (05/07/87)

In article <1286@arthur.cs.purdue.edu>, sjc@arthur.cs.purdue.edu (Steve Chapin) writes:
> But, the point is, that the original compiler and all its descendants
> were NOT (!!!!!) actual C compilers, since the did NOT correctly
> transform all proper inputs into equivalent outputs.  So, this
> example is not relevant.

By this definition there are no C compilers in existence.

					der Mouse

				(mouse@mcgill-vision.uucp)

adam@gec-mi-at.co.uk (Adam Quantrill) (05/12/87)

In article <7360@boring.mcvax.cwi.nl> steven@cwi.nl (Steven Pemberton, or try steven@mcvax.uucp) writes:
>In article <1403@uwmacc.UUCP> edwards@unix.macc.wisc.edu.UUCP (mark edwards) writes:
>>   Yes indeed C can describe itself.
>
>Actually, I'm not completely convinced of this. The question of whether C
>can be used to describe itself [] Ritchie built a
>trojan horse into the C compiler. The compiler looked identical to its
>previous self, but actually compiled a (slightly) different language. This
>
>[] you can't look at the compiler source to determine what
>a certain construct in C means, even if you know the meaning of the language
>it compiles to.
>Steven Pemberton, CWI, Amsterdam; steven@cwi.nl or try steven@mcvax.uucp

Yes, but I hold that any language that can parse itself, can describe itself.
A non-rigorous argument follows:

1. If the language L can parse itself, it can break down any program into
L primitives.

2. L can implement L primitives.

3. So L can implement any program in L, thus it can describe L given a suitable
parser program.

This is defining L in terms of implementation, not a theoretical description.

As C can parse itself (evidence - cc) I reckon there's a good case that it
can describe itself. The 'trojan horse' compiler mentioned above may be
able to implement 'trojan horse' C, but not C.
       -Adam.

/* If at first it don't compile, kludge, kludge again.*/

biep@cs.vu.nl (J. A. "Biep" Durieux) (05/13/87)

In article <1286@arthur.cs.purdue.edu>, sjc@arthur.cs.purdue.edu (Steve Chapin) writes:
> ... C compilers ...

In article <764@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
> ... C compilers ...

Shouldn't this whole discussion become a lot more useful if we started to
talk about (metacircular) interpreters instead of compilers to any non-C
goal language?

<The other possibility would be a C-to-C compiler, of course :-) >


-- 
						Biep.  (biep@cs.vu.nl via mcvax)
When a doctor doctors a doctor, does the doctoring doctor doctor the doc-
tored doctor with the doctoring doctor's doctrine,  or does the doctoring
doctor doctor  the doctored doctor  with the doctored doctor's  doctrine?