[comp.arch] Software Distribution

chaim@taux01.UUCP (Chaim Bendelac) (08/18/88)

Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have
expressed the wish for portability standards. Many organizations are spending
tremendous resources to promote such standards. Nothing new there.

I wondered, if there is no room for another standard layer, specially 
designed for software DISTRIBUTION.  Imagine an Intermediate Program 
Representation Standard (IPRS) along the lines of an intermediate language
within a compiler.  Language independent, architecture independent.
The distributor compiles and optimizes his program with his favorite 
language compiler into its IPR, copies the IPR onto a tape and sells. 
The buyer uses a variation on 'tar' to unload the tape and to post-process 
the IPR program with the system-supplied architecture-optimized IPRS-to-binary 
compiler backend.

No need for cumbersome source-distributions, no more different binary copies
of the software. Utopia! You introduce a weirdly new, non-compatible 
architecture? Just supply a Standard Unix (ala X/Open or OSF or whatever), 
an IPRS-to-binary backend, and you are in business. The Software Crisis 
is over! 

:-)   ?   :-(

No free lunch, of course. The programmer still has to write "portable"
software, which is a difficult problem. A truly language- and architecture-
independent interface is almost as difficult to design as the old "universal 
assembler" idea. But with enough incentives, perhaps?  Questions:

	1. How desperate is the need for such a standard? (I know: GNU
	   does not need ISPRs nor ABIs...)
	2. Assuming LOTS of need, how practical might this be?
	3. What are the main obstacles? Economical? Political? Technical?
	4. What are the other advantages or disadvantages?

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (08/18/88)

In article <891@taux01.UUCP> chaim@taux01.UUCP (Chaim Bendelac) writes:
| Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have
| expressed the wish for portability standards. Many organizations are spending
| tremendous resources to promote such standards. Nothing new there.
| 
| I wondered, if there is no room for another standard layer, specially 
| designed for software DISTRIBUTION.  Imagine an Intermediate Program 
| Representation Standard (IPRS) along the lines of an intermediate language
| within a compiler.  Language independent, architecture independent.
| The distributor compiles and optimizes his program with his favorite 
| language compiler into its IPR, copies the IPR onto a tape and sells. 
| The buyer uses a variation on 'tar' to unload the tape and to post-process 
| the IPR program with the system-supplied architecture-optimized IPRS-to-binary 
| compiler backend.

  This has been done before. The "UCSD Pascal" system was done this way,
and Fortran (and I think Ada) compilers were created to generate the
P-code (pseudo code).

  The original version of B I saw worked this way, and you could either
interpret or compile to binary. The compile cycle was (a) 2 pass compile
to P-code, (b) global machine independent optimize of the P-code, (c) 2
pass compile to assembler, (d) peephole optimize the assembler source,
and (e) two pass assembler.

  This was slow, but it produced some very good code, and the
interpreter actually ran fairly well after (b). The interpreter
translated the text tokens into two byte strings before execution, so it
was useful if not blindingly fast.

  As we got it, B didn't have the optimizers, but I added them when I
was creating a derivetive language, IMP, which used the same P-codes. A
version of this for CP/M-*) was floating around the BBS's called IL/1.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

henry@utzoo.uucp (Henry Spencer) (08/20/88)

In article <891@taux01.UUCP> chaim@taux01.UUCP (Chaim Bendelac) writes:
>...  Imagine an Intermediate Program 
>Representation Standard (IPRS) along the lines of an intermediate language
>within a compiler.  Language independent, architecture independent.
>The distributor compiles and optimizes his program with his favorite 
>language compiler into its IPR, copies the IPR onto a tape and sells. 
>The buyer uses a variation on 'tar' to unload the tape and to post-process 
>the IPR program with the system-supplied architecture-optimized IPRS-to-binary 
>compiler backend.

The one problem I can think of is that it's tricky to build such a
representation in which the front end doesn't need to know *anything* about
the machine.  Things like data-type sizes often have to be decided before
the intermediate representation is generated, even if the details of the
code generation get deferred.  Perhaps not impossible, but tricky.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

cik@l.cc.purdue.edu (Herman Rubin) (08/20/88)

In article <891@taux01.UUCP>, chaim@taux01.UUCP (Chaim Bendelac) writes:
> Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have
> expressed the wish for portability standards. Many organizations are spending
> tremendous resources to promote such standards. Nothing new there.
> 
> I wondered, if there is no room for another standard layer, specially 
> designed for software DISTRIBUTION.  Imagine an Intermediate Program 
> Representation Standard (IPRS) along the lines of an intermediate language
> within a compiler.  Language independent, architecture independent.
> The distributor compiles and optimizes his program with his favorite 
> language compiler into its IPR, copies the IPR onto a tape and sells. 
> The buyer uses a variation on 'tar' to unload the tape and to post-process 
> the IPR program with the system-supplied architecture-optimized IPRS-to-binary 
> compiler backend.

This would be very useful, _but_ consider the following problems.  I could 
probably list over 1000 hardware-type operations (I am not including such
things as the elementary transcendental functions; only those things for
which I can come up with a nanocode-type bit-handling algorithm, such as
multiplication and division) which I would find useful.  My decision as to
which algorithm to use for a particular problem would be highly dependent on
the timing of these operations.  To give a simple example, one would be well-
advised to avoid too many divisions on a CRAY.  Integer division on a CRAY or
a CYBER 205 is more expensive than floating point, and on the CRAY it is even
necessary to work to ensure the correct quotient.  Packing or unpacking a
floating point number is trivial on some machines but much more difficult
on others.

Thus one cannot optimize a program without knowing the explicit properties of
operations on the target machine.  We used to have a CDC6500 and a 6600 at
Purdue.  These machines had exactly the same instruction set, and unless there
was a fancy speedup attempt using parallel IO and computing in an unsafe
manner, exactly the same results would occur.  However, optimization was 
totally different.

I suggest instead that we have a highly flexible intermediate language, with
relatively easy but flexible syntax, and a versatile macro processor.  This
would be enough by itself in many situations, but I know of none.  An example
of a macro is
		x = y - z

which I would like to treat as the (= -) macro.  Then we could have various
algorithms which an optimizing macro assembler could assemble and estimate
the timing.

Another advantage of something like this, and this is particularly relevant
to this group, is that it can be pointed out the multitudinous situations where
simple hardware instructions not now available can greatly speed up operations.
I personally consider the present "CISC" machines as RISCy.

> 
> No need for cumbersome source-distributions, no more different binary copies
> of the software. Utopia! You introduce a weirdly new, non-compatible 
> architecture? Just supply a Standard Unix (ala X/Open or OSF or whatever), 
> an IPRS-to-binary backend, and you are in business. The Software Crisis 
> is over! 
> 
See above.  I think it will be simpler, but not what was proposed.
> :-)   ?   :-(
> 
> No free lunch, of course. The programmer still has to write "portable"
> software, which is a difficult problem. A truly language- and architecture-
> independent interface is almost as difficult to design as the old "universal 
> assembler" idea. But with enough incentives, perhaps?  Questions:
> 
The "universal assembler" is more practical, if it written more like CAL
with overloaded operators.  The interface would require a macro processor,
but could be done with little more.  However, as I have pointed out, the
portable software mentioned above cannot exist.  The examples above are not
for truly parallel machines.  On a parallel machine, how would one break a
vector into the positive and negative elements, use a separate algorithm to
compute a function for these cases, and put the results back together in the
order of the original arguments?  Something can be done, but I suggest one
would be better served by kludging in additional hardware.  Now if one is 
stuck with this situation, and does not have the additional hardware, an
algorithm somewhat slower may be in order.

We must face the fact that there cannot be efficient portable software.  We
may be able to produce reasonably efficient semi-portable software, and we
should try for that.  I believe that the tools for that can be developed.

We also should try to improve the hardware to be able to use the "crazy"
instructions implementable in nanocode or hardwired.  There are useful
instructions, not present in the HLLs, which may be so slow as to be
impractical if not hardware.

> 	1. How desperate is the need for such a standard? (I know: GNU
> 	   does not need ISPRs nor ABIs...)
> 	2. Assuming LOTS of need, how practical might this be?
> 	3. What are the main obstacles? Economical? Political? Technical?
> 	4. What are the other advantages or disadvantages?


-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

mrspock@hubcap.UUCP (Steve Benz) (08/20/88)

>[Chaim Bendalac wrote:]
>>[Use intermediate language instead of a common binary code for
>> distribution purposes, and make a utility common to all machines
>> for translating this intermediate code to executable form.]

To which Henry Spencer replies:
> The one problem I can think of is that it's tricky to build such a
> representation in which the front end doesn't need to know *anything* about
> the machine.  Things like data-type sizes often have to be decided before
> the intermediate representation is generated, even if the details of the
> code generation get deferred.  Perhaps not impossible, but tricky.

I'm not sure what Henry talks about is really that big a problem,
If the *source* is portable, then the intermediate code could be
made to be portable.  The worst case would be if one had to abstract
a data type all the way back to the level of the source language.
That sort of worst-case scenario would result in a compilational
problem, but not a representational problem.  (i.e. the onus would
be on the vendor to fix the problem, not on the standards committee.)

I think the real bugaboo here will be system calls and the like.
Granted that they are theoretically semantically identical, but
in reality, they're not so.  In order for such a scheme to work,
vendors would still have to break down and come to an agreement
on what the standard semantics of all system calls will be.

And don't talk to me about graphical interfaces...

				- Steve Benz

bzs@encore.UUCP (Barry Shein) (08/21/88)

Re: intermediate p-code for distribution...

Look into the Portable Standard Lisp effort at University of Utah.
This was one of their areas of effort, a LAP (Lisp Assembly Program,
psuedo-asm generated by the compiler) which would be highly portable,
allowing porting of the compiler and bootstrapping of the entire
system.

	-Barry Shein, ||Encore||

henry@utzoo.uucp (Henry Spencer) (08/24/88)

In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes:
>> Things like data-type sizes often have to be decided before
>> the intermediate representation is generated, even if the details of the
>> code generation get deferred...
>
>I'm not sure what Henry talks about is really that big a problem,

Is the layout of structs in memory decided before or after the intermediate
representation is generated?  What about the results of "sizeof"?  How is
"varargs" handled?  And so forth.  If you try to build a completely machine-
independent "intermediate" form, I think you will end up with something that
looks very much like a tokenized version of the source.  This might or might
not be satisfactory for the original purposes, but an intermediate represen-
tation (in the usual sense of the word) it's not.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

mitch@Stride.COM (Thomas Mitchell) (08/25/88)

In article <3503@encore.UUCP> bzs@encore.UUCP (Barry Shein) writes:
>
>Re: intermediate p-code for distribution...
                  ^^^^^^

Caution: p-Code, p-System have some prior use and perhaps
trademark  associated with them.  Now called the "Power System"
by Pecan of Brooklyn it is an interpreted OS.  

It had its origins at UCSD and is the origin of UCSD Pascal (aka
Apple Pascal).

Their p-Code (pseudo code) is portable from one machine to
another.  Their CODE file format has a field which lets the
interpreter determine the byte sex of the p-code.

-- 
Thomas P. Mitchell (mitch@stride1.Stride.COM)
Phone: (702)322-6868	TWX: 910-395-6073	FAX: (702)322-7975
MicroSage Computer Systems Inc.
Opinions expressed are probably mine. 

dick@ccb.ucsf.edu (Dick Karpinski) (08/26/88)

In article <1988Aug23.180420.28483@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes:
>>> Things like data-type sizes often have to be decided before
>>> the intermediate representation is generated....
>... layout of structs in memory ... the results of "sizeof" ... "varargs" 
>...
>looks very much like a tokenized version of the source.  

Indeed, it can be argued that the Gnu C Compiler's Register Transfer
Language (gcc's RTL) does look like a tokenized version of the source.
I don't mind that a bit.  But would it work??  Would vendors fear to
have their products canabalized and reused in pieces?  Probably not.

Do I really understand correctly that this Stallman product accomplishes
the hitherto unrealistic UNiversal Computer Oriented Language (UNCOL)??
Of course, in the presence of array/vector processors and the like, the
universal part is a bit diminished, but still, does he have it right for
the ordinary 32-bit workstation of today?  I'm not at all sure, but it
looks awfully good to me.

I would forsee a sort of validation suite to test both the gcc backend
(with the machine description) and the specific properties of system
calls on the target system.  Since such a suite would tell it like it
is, I presume that users would like it more than hardware vendors and
the sales support staff.

Dick

Dick Karpinski  Manager of Minicomputer Services, UCSF Computer Center
UUCP:  ...!ucbvax!ucsfcgl!cca.ucsf!dick        (415) 476-4529 (11-7)
BITNET:  dick@ucsfcca or dick@ucsfvm           Compuserve: 70215,1277  
USPS:  U-76 UCSF, San Francisco, CA 94143-0704   Telemail: RKarpinski   
Domain: dick@cca.ucsf.edu  Home (415) 658-6803  Ans 658-3797

tainter@ihlpb.ATT.COM (Tainter) (08/26/88)

In article <847@stride.Stride.COM> mitch@stride.stride.com.UUCP (Thomas Mitchell) writes:
>Caution: p-Code, p-System have some prior use and perhaps
>trademark  associated with them.  Now called the "Power System"
>by Pecan of Brooklyn it is an interpreted OS.  

>It had its origins at UCSD and is the origin of UCSD Pascal (aka
>Apple Pascal).

There is a trademarked thing called "UCSD p-System".
But p-code is not UCSD's.  UCSD just did an extension of Wirth's
original p-code system.
P-code is how Wirth did his original implementation of Pascal.
M-code is how he did his original implementation of Modula-2,
as is what the Lilith runs.

I don't know about the original Modula.

>Their [UCSD's] p-Code (pseudo code) is portable from one machine to
>another.  Their CODE file format has a field which lets the
>interpreter determine the byte sex of the p-code.

Yup.  And it cripples all programs down to 16 bit integers, 16 bit addresses
(although text addresses can be fudged through segments).
This is a nasty thing to do to a 680x0.  The Macintosh version
had a non-portable 32 bit integer extension and Pecan has recently
released a 32 bit version of the Power System.  I doubt though, that
code is portable between the 16 bit and 32 bit versions.

What p-code is really good for is shoe horning onto small machines.
p-code is very compact, and the segmentation allows some virtual memory
albeit at a significant peformance hit.  I also rather line units as
a for of modularity.

>Thomas P. Mitchell (mitch@stride1.Stride.COM)
>Phone: (702)322-6868	TWX: 910-395-6073	FAX: (702)322-7975
>MicroSage Computer Systems Inc.

First it was Sage.  Then it was Stride Micro.  Now it's MicroSage ?

--j.a.tainter

bpendlet@esunix.UUCP (Bob Pendleton) (08/26/88)

From article <1988Aug23.180420.28483@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes:
>>> Things like data-type sizes often have to be decided before

Usually you can get away with specifying the radix of the data and
the minimum number of digits required. Some times you need to specify
the maximum number of digits as well. For example "short int x;" (a
somewhat ambiguous declaration) can be translated into the portable
form "x: static allocated signed binary min 16", or "char *name" can
be represented as "name:stack allocated machine_pointer ASCII", or
more loosely as "name:machine_pointer signed binary min 7 max 9". 

I think you can get the feel from these examples. The translator would
translate declarations into constraints on the valid representations
of the declared items. 

>>> the intermediate representation is generated, even if the details of the
>>> code generation get deferred...
>>
>>I'm not sure what Henry talks about is really that big a problem,
> 
> Is the layout of structs in memory decided before or after the intermediate
> representation is generated?  What about the results of "sizeof"? How is
                                                           ^^^^^^

The layout of structs must be done by the machine specific code
generator. NOT by the translator. "sizeof" becomes a symbolic
expression that can be evaluated by the code generator, but not by the
translator. In one system I wrote, all data size computations were done
in the linker. Worked out very well. 

> "varargs" handled?  And so forth.  
  ^^^^^^^^^
Now that looks hard, for a minute. The general rule is that hardware
dependent problems must be pushed through to the hardware dependent
code generator. The machine indepenent code for a varargs call could
look something like this:

vararg_block 
	code for arg 1
	code for arg 2
		.
		.
		.
	code for arg n
vararg_end N call what_ever

How did the translator find out it was a varargs call? By looking at the
declaration of the procedure and/or the way it was used.

It's important to remember that this intermediate language must be
usable by ALL programming languages, not just C. 


> If you try to build a completely machine-
> independent "intermediate" form, I think you will end up with something that
> looks very much like a tokenized version of the source.  This might or might
> not be satisfactory for the original purposes, but an intermediate represen-
> tation (in the usual sense of the word) it's not.

Off the top of my head I can think of two different intermediate forms
that could be used for this. Each include a symbol table, I hope you
include a symbol table as part of your usual sense of the phrase
"intermediate form."

One is a simple reverse polish form of the source program. The
operations can be generic like "+", and the operands can be indexes
into the symbol table. This form can be converted directly to code or
into a more "normal" intermediate form by symbolic execution of the
RPN. The intermediate values generated by during symbolic execution
can be constant values, registers, all sorts of things. By using faily
complex patterns to decide how to "execute" an operator this approach
can give you a surprisingly good quick and dirty code generator or a
very machine specific intermediate form suitable for machine specific
optimizations.

Another possible intermediate form is good old quads. A quad specifies
2 operands (well... sometimes just 1), an operation and a destination.
The operands and destinations can be other quads or variables. That
is, quads are a way of representing a parse tree in a nice flat file.
Actually, both of these are simple ways to represent parse trees in
flat files.

Both forms make it possible to recover the original parse tree. You can
do machine independent optimization on both forms (though I think its
easier with quads). You can also do machine independent linking of
these forms.

The problems are just not that big. I used to spend a lot of time
thinking about this kind of thing. My senior project, oh these many
years gone by, was the design of a language for writing machine
independent LISP interpretors. I've looked very carefully at the PSL
work at the University of Utah since I was there when a lot of it was
being done.

> -- 
> Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
> they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

I just got back from Xhibition, someone from OSF said they are planning
to establish a standard for a portable intermediate langauge. Nice to
see that the market is finally growing up enough to need something
like this.

Imagine being able to buy a program, take it home and pop into the
drive, wait a few minutes while a machine specific version is created
from the machine independent version on the disk and then just use it.
The only thing you have to worry about is wheather or not your machine
has enough horse power to run the program well. Will it ever happen? I
doubt it, but it sure would be nice.

			Bob P.

-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.

chase@Ozona.orc.olivetti.com (David Chase) (08/27/88)

In article <1347@ucsfcca.ucsf.edu> dick@ucsfccb.UUCP (Dick Karpinski) writes:
>Do I really understand correctly that this Stallman product accomplishes
>the hitherto unrealistic UNiversal Computer Oriented Language (UNCOL)??

RTL isn't a UNCOL, no.  RTL (as realized in the Gnu C compiler)
contains all sorts of hard-coded register assignments (R15 is my SP)
and calling conventions (my stack grows thataway).  I'm afraid these
make it rather non-universal.

If you are interested in this sort of thing, you might check out
papers by Fraser and Davidson in the compiler construction conferences
of 1988, 1986, and 1984.

David

aglew@urbsdc.Urbana.Gould.COM (08/27/88)

..> Pseudo-code as an exchange format between different architectures

In a recent UNIX World Omri Serlin (I think) mentioned that
OSF is considering something with a name like "Architecture Independent
Exchange Format" as a challenge to the plethora of ABIs in the
AT&T/SUN world.

henry@utzoo.uucp (Henry Spencer) (08/30/88)

In article <958@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>... The general rule is that hardware
>dependent problems must be pushed through to the hardware dependent
>code generator...

The trouble is that, in the real world, the part of the compiler that
does not make hardware-dependent decisions is the easy and small part.
The idea that parsing, type checking, etc. is a big deal is basically
an academic prejudice in favor of things that are easy to formalize.
These things aren't trivial, mind you, but they're not the hard part
of a production-quality compiler.  This is what brought on my comment
about the "intermediate" form being little more than tokenized source.

What this would amount to, almost, is a sort of encrypted source.
That's pretty much what's wanted for portable distributions that don't
give away the farm or permit users to meddle.  (Of course, this is
anathema to the Stallmanites...  Don't expect GNU to support such a
portable distribution form.)

>I just got back from Xhibition, someone from OSF said they are planning
>to establish a standard for a portable intermediate langauge. Nice to
>see that the market is finally growing up enough to need something
>like this.

One can read this two ways, however:  are they talking about standardizing
the form, or the content?  Standardizing the form makes it easy to build
multiple compiler front ends feeding into the same code generation, but
doesn't remove machine-dependencies from the front ends or their output.
(The PCC intermediate format is a de-facto standard form, but its contents
are machine-specific.)  Standardizing the content is what we've been talking
about.  I can see OSF doing either.

>Imagine being able to buy a program, take it home and pop into the
>drive, wait a few minutes while a machine specific version is created
>from the machine independent version on the disk and then just use it.
>The only thing you have to worry about is wheather or not your machine
>has enough horse power to run the program well...

And about whether the programmer was competent enough to make the code
really portable.  Don't forget that condition.  Since you haven't really
got source, you can't (readily) go in and fix it if there's a problem.
This new flexibility also opens the door to a whole new range of bugs,
since the code can now be run on machines which the author never even
compiled it on.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

rpw3@amdcad.AMD.COM (Rob Warnock) (08/30/88)

There are a number of companies out there these days who have made major
advances in the art of emulation of one CPU on another, particularly when
the emulated CPU is the IBM PC (of some flavor). These products do a variety
of things (some products do one or more), including: (1) pre-processing of
the to-be-emulated program; (2) straight emulation, but with caching of the
"instruction decode" step; (3) detection of basic blocks and optimization and
caching of the whole basic block; (4) [other things I don't know about...?].
When running the emulator on, say, a Mac or Sun workstation the emulated
speed can exceed the native speed of a PC/AT.

The thought occurs that if one designed a virtual "machine" that was
specifically easy to emulate -- given these modern techniques -- that
this *might* be a suitable form for "portable" object programs (as
contrasted with some "universal intermediate form"). At least it bears
some thought.

(Hmmm... how hard is it for each of the current RISC CPUs to emulate each
of the others?)


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

mash@mips.COM (John Mashey) (08/31/88)

In article <22778@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes:
.....
>(Hmmm... how hard is it for each of the current RISC CPUs to emulate each
>of the others?)

The R3000 is pretty easy to convert; actually, we used to
convert it to VAX code all of the time, and we've though about the
conversions to some of the others.  The hardest machines to convert
FROM are those with condition codes (whether in condition code register,
or when computed into another GP register).  They've always been a pain
for emulation, especially if there's any irregularity, and if the
emulating machine doesn't have an almost identical set of conditions.

In fact, just today at lunch, this discussion came up, and I proposed
the R3000 as the obvious architecture to use as the standard binary form
......but there were 2 Sun folks and only one of me, so it got voted down :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bpendlet@esunix.UUCP (Bob Pendleton) (08/31/88)

From article <1988Aug29.202603.13897@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> In article <958@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>... The general rule is that hardware
>>dependent problems must be pushed through to the hardware dependent
>>code generator...
> 
> The trouble is that, in the real world, the part of the compiler that
> does not make hardware-dependent decisions is the easy and small part.
> The idea that parsing, type checking, etc. is a big deal is basically
> an academic prejudice in favor of things that are easy to formalize.

Some how I think I've just been called an academic. Perhaps because I
mentioned some of my academic background in compiler writing. Oh well,
if I didn't generaly agree with your opinion of academic compiler
writers I wouldn't feel so offended.

To make another academic reference, go way back into the literature
and take a look at the Hearn, Griss LISP compiler. You might have to
look for references to REDUCE to find the stuff. Their work indicates
that ~90% of all optimization is architechture specific. They looked
at register machines and stack machines and found that optimization
was very dependent on whether you were targeting a stack machine or a
register machine, but not on which specific register or stack machine.

If you want examples of real world products that formalize code
generation for complex machines, take a look at the ADA compiler from
JRS, or the C compiler from QTC. Code generation can be formalized,
and has been formalized. But, I think a lot of academics haven't
noticed. The January 21, 1988 issue of "Electronics" has several
articles on QTCs product. I don't know if JRS is publishing anything.

> These things aren't trivial, mind you, but they're not the hard part
> of a production-quality compiler.  This is what brought on my comment
> about the "intermediate" form being little more than tokenized source.

Ok, I see what you're saying. Have you looked at the Ada Intermediate
Language? I can't claim I've taken a close look, but it looks like the
sort of thing we can expect to see.

But, I still don't agree with you. There is a lot of machine
independent optimizations that can be done on the intermediate form.
There may also be a number of application dependent optimizations that
can be done.

> (Of course, this is
> anathema to the Stallmanites...  Don't expect GNU to support such a
> portable distribution form.)

Why not? If I have a generic machine running GNUix (I know its not
called that), why should I be barred from buying a commercial software
package? The Stallmanites may think its wrong to sell software, but
why would they try to stop me from buying software if I want to? I
just plain don't understand Stallman. From reading the GNU manifesto
it looks like he's trying to tell me that hardware engineers and
hardware companies have a right to sell what they produce, but
software engineers and software companies don't. This seems just plain
insane to me. If anyone wants to set me right on this, do it by
private mail. Please don't clutter the net with it.

> Standardizing the content is what we've been talking
> about.  I can see OSF doing either.

Yes, I don't trust OSF, or AT&T/SUN as far as I can throw a UNIVAC
1108 core memory cabinet (had to do something to tie this into
architectures), but I don't think the OSF member companies would be
willing to give up their proprietary architecures.

> And about whether the programmer was competent enough to make the code
> really portable.  Don't forget that condition.  Since you haven't really
> got source, you can't (readily) go in and fix it if there's a problem.
> This new flexibility also opens the door to a whole new range of bugs,
> since the code can now be run on machines which the author never even
> compiled it on.

Yep, look at the Ada experience for perspective on this problem. Thats
why I described declarations in the intermediate form as constraints.
The intermediate form is going to have to be full of constraints. If
the program has a bunch of declarations that require 36 bit ones
complement integers, you can be sure that it will run very slowly on a
68020. But, you can make it run without error if those contraints are
available to the target machine code generator. I'll bet that a
standard conforming implementation on a 68020 will be allowed to
refuse to translate a program that is a gross missmatch for the target
machine.

			Bob P.
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.

bpendlet@esunix.UUCP (Bob Pendleton) (09/01/88)

From article <22778@amdcad.AMD.COM>, by rpw3@amdcad.AMD.COM (Rob Warnock):

> The thought occurs that if one designed a virtual "machine" that was
> specifically easy to emulate -- given these modern techniques -- that
> this *might* be a suitable form for "portable" object programs (as
> contrasted with some "universal intermediate form"). At least it bears
> some thought.

There really is no difference between an easy to emulate virtual
machine and an universal intermediate form. Quads map directly to a 3
address machine, reverse polish maps to a stack machine, and trees can
also be directly executed. (EVAL in microcode anyone? Yes, I know its
been done.) If the intension is to translate it for the target machine
anyway, why not use a form with as much information left in as
possible?

The problems come from incompatable data formats, addressing modes,
and high level operations. Most modern computers use 8 bit bytes and
support 16 and 32 bit twos complement integers so there isn't much of
a problem there. Floating point, packed decimal, and fixed point might
cause some problems.

Not all modern computers are byte addressable, so virtual machine code
that assumes byte adressablitiy isn't going to run well on a word
addressed machine. Also, if the virtual machine code lays out data in
structures and arrays you can take a serious performance hit if the
virtual machine doesn't allign data to match the addressing
granularity of your machine.

If the virtual machine is specified at too low a level it will be very
difficult to take advantage of any special instructions the target
processor may have. Even something as simple as a block copy
instruction may not be usable if the virtual machine code has loops to
do copies.

So, if you have an universal intermediate language, you can write an
emulator for it and execute it with a minimal amount of preprocessing,
or you can convert it native machine code, or you can build a machine
that directly executes it.


> Rob Warnock
> Systems Architecture Consultant

-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.

chaim@taux02.UUCP (Chaim Bendelac) (09/02/88)

In article <891@taux01.UUCP>, chaim@taux01.UUCP (I) wondered:

>     ...if there is no room for another standard layer, specially 
>     designed for software DISTRIBUTION.  Imagine an Intermediate Program 
>     Representation ... language independent, architecture independent...

I asked:
> 	3. What are the main obstacles? Economical? Political? Technical?
> 	4. What are the other advantages or disadvantages?

Below is a [strongly edited] summary of the discussion so far. It seems to me
that the problem-solvers outvoice the problem-raisers. I somehow have the
feeling that we have not covered many of the problems. Let me try to be more
specific: Let the goal be a IR for distribution purposes, that covers 
"traditional" architectures (you know what I mean) WITHOUT giving one a clear
advantage over others (a particular object-format is out), that covers 
"popular" languages (C, Cobol, Modula-2, Pascal, Fortran, perhaps Ada and 
Lisp), that lives under a REAL unix-standard (AT&T's, or OSF's or who-ever),
and that does not attempt to solve "system-code" - ONLY "application programs"
(rule-of-thumb: if you cannot write the program in any language but C the 
program probably disqualifies). Squeezing the last 1% of performance out of 
system is NOT a goal, but the IR should be optimizable, both before and after 
distribution. "Tokenizing the source" is fine, if that allows me to write ONE
single code generator for all these language translators out there, with a
100%-proof semantic definition of the IR. I want a  R E A L  separation. I
want a "IR code-generator validation suite" to test my code generator, so
that applications can be assured their stuff runs on my machine/architecture.

What are the constraints?


-- Chaim Bendelac (National Semiconductor Corporation)
----------------------------------------------------------------------------

Summary of current status of discussion:


> From: henry@utzoo.uucp (Henry Spencer) it's tricky to build such a
> representation in which the front end doesn't need to know *anything* about
> the machine.  Things like data-type sizes

= From: henry@utzoo.uucp (Henry Spencer) Is the layout of structs in memory 
= decided before or after the IR is generated? What about "sizeof"?  "varargs"?
= you will end up with a tokenized version of the source.  

> From: bpendlet@esunix.UUCP (Bob Pendleton) you can get away with specifying 
> the data radix and the minimum number of digits required. "short int x;"
> can be translated into "x: static allocated signed binary min 16". The 
> translator would translate declarations into constraints on the valid 
> representations of the declared items. 
> The layout of structs must be done by the machine specific code
> generator. "sizeof" becomes a symbolic expression evaluated by the
> code generator. In one system I wrote, all data size computations were done
> in the linker. Worked out very well. The machine independent code for a 
> varargs call could look something like this:	vararg_block 
> 							code for arg 1
> 							code for arg 2
> 							    :
> It's important to remember that this intermediate language must be
> usable by ALL programming languages, not just C. 
> The problems are just not that big. 

= From: cik@l.cc.purdue.edu (Herman Rubin) I could probably list over 1000 
= hardware-type operations which I would find useful.  which algorithm to use 
= would be dependent on the timing of these operations.  one cannot optimize 
= a program without knowing the explicit properties of the target machine.
= We must face the fact that there cannot be efficient portable software.  

> From: mrspock@hubcap.UUCP (Steve Benz) I think the real bugaboo here will 
> be system calls and the like. Granted that they are theoretically identical, 
> but in reality, they're not so.  

= From: aglew@urbsdc.Urbana.Gould.COM In a recent UNIX World Omri Serlin 
= (I think) mentioned that OSF is  considering something with a name like 
= "Architecture Independent Exchange Format" as a challenge to the plethora 
= of ABIs in the AT&T/SUN world.

> From: henry@utzoo.uucp (Henry Spencer) in the real world, the part of the 
> compiler that does not make hardware-dependent decisions is the easy and 
> small part. What this would amount to, almost, is a sort of encrypted source.
> And about whether the programmer was competent enough to make the code
> really portable.  Don't forget that condition.  
> This new flexibility also opens the door to a whole new range of bugs,
> since the code can now be run on machines which the author never even
> compiled it on.

= From: dick@ccb.ucsf.edu (Dick Karpinski) the Gnu C's Register Transfer
= Language (gcc's RTL) does look like a tokenized version of the source.
= I would forsee a sort of validation suite to test both the gcc backend (with 
= the machine description) and system calls on the target system.  

> From: chase@Ozona.orc.olivetti.com (David Chase) RTL isn't a UNCOL, no.  I'm 
> afraid these make it rather non-universal.

= From: rpw3@amdcad.AMD.COM (Rob Warnock) Some companies have made major
= advances in the art of emulation of one CPU on another, particularly when
= the emulated CPU is the IBM PC. If one designed a virtual "machine" that was
= specifically easy to emulate this *might* be a suitable form for "portable" 
= object programs (as contrasted with some "universal intermediate form"). 

> From: mash@mips.COM (John Mashey) The R3000 is pretty easy to convert; the 
> hardest machines to convert FROM are those with condition codes.

-----------------------------------------------------------------------------
-- chaim@nsc

rminnich@super.ORG (Ronald G Minnich) (09/03/88)

In article <127@taux02.UUCP> chaim@taux02.UUCP (Chaim Bendelac) writes:
>In article <891@taux01.UUCP>, chaim@taux01.UUCP (I) wondered:
>>     ...if there is no room for another standard layer, specially 
>>     designed for software DISTRIBUTION.  Imagine an Intermediate Program 
>>     Representation ... language independent, architecture independent...
OK, everybody out there who remembers the IEEE attempt
(what, circa 1978-9?) to promulgate a standard assembly language,
raise your middle hand. :-)
  I don't know about this, given that after 31 years i can't even
get fortran programs to move easily. 
  Maybe we should see how far the ISO-oids get with their structured
data streams??
ron

bcase@cup.portal.com (09/04/88)

Rob Warnock asks the soon-to-be 64 $ question:

|(Hmmm... how hard is it for each of the current RISC CPUs to emulate each
|of the others?)

Ah!  Something to which I can speak informedly.  I assume that Rob is really
asking about binary recompilation, but the problems are similar for straight
(not g... oh never mind) emulation.  By far, the biggest problems in
architectural emulation are:

1) Dealing with byte-sex differences (big vs. little endian, that is),
2) Dealing with alignment restrictions (everything aligned to natural
   boundaries?),
3) Dealing with indirect branches.

Number 2 is seldom a problem when you are talking about emulating a RISC
on a RISC since RISC architectures better reflect the realities of memory
system design and call for each data type to be aligned on its natural
boundary.  However, when emulating a machine that allows 32-bit words to
be aligned on byte boundaries on a machine that doesn't, say a 68020 on an
88000, a significant performance hit is taken unless exhaustive analysis
is done.  Even with exhaustive analysis, it might still be necessary to
allow for the worst case (hmmm, this pointer is being passed and then
dereferenced and I have no information about its origin, sh*t, I'll have
to assume that it could be aligned any ol' which way.).

Then there are indirect branches.  What are the targets?  If you can't
tell, say by locating and decoding an associated switch branch address
table or something, then the only 100% safe, bullet-proof thing to do is
to assume that any instruction is a potential target of this branch.  With
that assumption forced upon your binary recompiler, many inter-instruction
optimizations are prohibited.  Another way of stating this problem:  you
can't tell where the basic blocks are.

The indirect branch problem is not an architectural one, but one of loss
of information:  the basic block boundaries that were marked by labels in
the intemediate or assembly form of the program are no longer explicitly
marked.  Note that the C language allows a case within a switch to fall
through to its lexical successor!!!  Thus, for an intermediate language
or virtual machine architecture to be usefully portable, this information
must not be compiled or linked away.

As tough as this problem is, things like alignment restrictions and single-
sized instructions, RISC characterists for the most part, make this problem
easier to handle.  There are a couple of tricks, not necessary without
tradeoffs, that can be played.  But I'm not telling....

Other things like lots of registers help, especially if the machine being
emulated has a condition code definition different from the host machine
(virtually guaranteed; at least the SPARC has a bit that determines when
the condition codes are modified, and, when they are, they are modified
in the same way by all instructions).

Even under the best circumstances, you don't get something for nothing;
emulation or recompilation isn't a panacea.  If it were, we'd all just
design our favorite machine and then buy Zycad simulators to "run" them!

bcase@cup.portal.com (09/04/88)

Re:  architectural emulation

|The problems come from incompatable data formats, addressing modes,
|and high level operations. Most modern computers use 8 bit bytes and
|support 16 and 32 bit twos complement integers so there isn't much of
|a problem there. Floating point, packed decimal, and fixed point might
|cause some problems.

Ha ha!  I wish the problem were as easy as "most modern computers use 8
bit bytes ... so there isn't much of a problem there."  Alignment and
byte-sex incompatibilities can make it not worth the trouble.  Floating-
point can indeed be a problem:  try making extendeds run fast on a
machine that doesn't support them....  (SPARC does, does anyone else?)

pardo@june.cs.washington.edu (David Keppel) (09/04/88)

Yes, what we need is a high-level machine language that we can
translate our programs into and then is sufficiently general that we
can compile *that* efficiently into native machine code.

How about C, the portable assembler?  :-) :-) :-)

	;-D on  ( Looking for a Potable assembler )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

aglew@urbsdc.Urbana.Gould.COM (09/05/88)

..> A standard intermediate language for software distribution:
>There really is no difference between an easy to emulate virtual
>machine and an universal intermediate form. Quads map directly to a 3
>address machine, reverse polish maps to a stack machine, and trees can
>also be directly executed. (EVAL in microcode anyone? Yes, I know its
>been done.) If the intension is to translate it for the target machine
>anyway, why not use a form with as much information left in as
>possible?

The form with as much information left in as possible is the source code.

The purpose of an intermediate language for software distribution 
is to REMOVE information, namely that information that would let
the customer reproduce the original program easily.

aglew@urbsdc.Urbana.Gould.COM (09/05/88)

>Yes, what we need is a high-level machine language that we can
>translate our programs into and then is sufficiently general that we
>can compile *that* efficiently into native machine code.
>
>How about C, the portable assembler?  :-) :-) :-)
>
>		    pardo@cs.washington.edu

Treating this seriously - C isn't acceptable because the customer
can take the C code and modify it relatively easily.

Why do we want intermediate code distributions? So that software
vendors can sell code from which it is difficult or impossible to
reproduce the original HLL code, therefor making it difficult for
their customers to "steal" software products.
    The rest of us, who aren't software vendors, can and should 
continue to distribute source code.

chase@Ozona.orc.olivetti.com (David Chase) (09/07/88)

In article <28200195@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes:
>
>>Yes, what we need is a high-level machine language that we can
>>translate our programs into and then is sufficiently general that we
>>can compile *that* efficiently into native machine code.
>>
>>How about C, the portable assembler?  :-) :-) :-)
>>
>>		    pardo@cs.washington.edu
>
>Treating this seriously - C isn't acceptable because the customer
>can take the C code and modify it relatively easily.
>

C loses technically, also.  It does not allow a code generator to
take the address of a label.  It does not allow a code generator to
reference the PC.  It provides no guarantees about register assignment
(not allocation; assignment).  These three non-features preclude the
use of some delightful tricks in the compilation of a language with
exceptions and exception handling (yes, I know about setjmp; it allows
a solution to our problem, but it is an inefficient solution).

C also loses as an intermediate language when the source language uses
nested procedures.  Again, you CAN do it, but it isn't very pretty.

C compilers also make pessimistic assumptions about aliasing, and
since there is no way for the front-end to communicate what it knows
to the C compiler.

The C "volatile" keyword is also overkill.  Efficient compilation of
exceptions is helped by more detailed descriptions of volatile change
and reference.  (For example, "volatile out" meaning that writes
cannot be optimized away, but reads can.)  This can be achieved,
painfully, by use of non-volatile temporaries to simulate caching of
values in registers.

Even if the front end does get very clever and performs register
allocation in C, it cannot know how many registers there are and it
cannot know how they are organized (do floats and ints share
registers?  How about floats and doubles?).

See, I've been figuring out how to use C as an intermediate language
in the last few months, and it really doesn't measure up.

David

pardo@june.cs.washington.edu (David Keppel) (09/07/88)

>>>[ portable intermediate representation ]

>pardo@cs.washington.edu writes:
>>How about C, the portable assembler?  :-) :-) :-)

aglew@urbsdc.Urbana.Gould.COM writes:
>[ High-level IR (intermediate representation) needed for distribution ]
>[ The customer can modify C too easily ]

I'll claim that I can pretty easily write a program that takes
ordinary C programs and makes them gosh-almighty hard to understand.
Here are some things I can do:

* Change names so that none of them are meaningful.
* Perform function inlining so that content is replicated (e.g., hard
  to find).
* Use random rewrite rules to change the structure of native
  constructs so that the code has no apparant style.
* Intentionally include dead code that is removed by the optimizer.
* Perform non-local data motion.
* Study the Obfuscated C Code competition very closely :-)

As a simple example:

    void
foo(n)
    int		n;
{
    int		i;

    for (i=0; i<n; ++i)
	howdy( "Hello, world\n" );
}


    void
x(p)
    int		p;
{
    const char	*m = "Hello, world\n";
    int		r;
    int		z;

    r = p;
loop:
    if (r>=p)
	return;
    else {
	a (m), z=r++;
	goto loop;
    }
    a (z);
    exit (0);
}

A fair optimizer should make the same code for both.  Now C is
probably sub-optimal for some (language) distributions, but you can
perform the same "tricks" with nearly any language.

BTW, I have a tool that takes unoptimized (.o) output from pcc on a
VAX and turns it back into something close to the original code.
Having a high-level IR may not gain the vendors anything over an
existing "standard" such as C.

	    ;-D on  ( Clear as a bell )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

yba@sword.bellcore.com (Mark Levine) (09/07/88)

In article <5642@june.cs.washington.edu> pardo@cs.washingtone.edu (David Keppel) writes:
>Yes, what we need is a high-level machine language that we can
>translate our programs into and then is sufficiently general that we
>can compile *that* efficiently into native machine code.
>

In days gone by I was quite fond of MARY, a language in use (still, I
think) at the University of Trondheim and on various North Sea oil
drilling platforms.  It was one of several topics of ye olde Machine
Oriented Languages Bulletin from IFIPS, and even showed up at IFIPS
w.g. 2.4 while I watched.  Since those days, I have seen few attempts
at high level machine oriented languages.

The folks at Tartan seemed to have the best shots at an intermediate
representation for these purposes, based on Wulf et. al.'s work at
CMU.

MARY would require you to have an implementation prelude for each machine
on which you compiled to give machine specific definitions of basic
constructs (allowing for assembler level optimization, since the language
gurantees access to all machine ops).  You could also supply preludes
which defined the semantics of operations from other architectures and
what to do with them on a new machine.  The language is fully typed,
and has precedence-less left to right evaluation, and has the notion of
a current value -- this makes it easy to program, if somewhat unnatural
for those trained to think an assignment statement has the variable on
the left.

There is an awful lot of good stuff on the shelf, but I think it stays
there as long as proprietary interests remain larger than portability
concerns.  Given the recent movement toward UNIX OS standards, window
system standards, and ignoring ADA, perhaps it is time for some archaeology;
but, how serious is anyone's interest in a really portable machine oriented
high level language?

sher@sunybcs.uucp (David Sher) (09/07/88)

Out of curiosity since I've not seen this question asked, if the problem
is that source code is too easy to play with, why not distribute encoded
source.  Then one needs only create compilers with decoders built in for
each machine.  Since compilers are purchaseable this seems possible 
(building in a decoder is trivial).  This would be far more secure than 
any intermediate form which can be messed with and redistributed (good 
luck in proving that code x really = code y in court).  I can imagine
all sorts of idempotent program transformations on 3 object code say that
would render the program unrecognizable.  I realize that you can always
play this game with the compiled code (decoders in hardware anyone?).


-David Sher
ARPA: sher@cs.buffalo.edu	BITNET: sher@sunybcs
UUCP: {rutgers,ames,boulder,decvax}!sunybcs!sher

mash@mips.COM (John Mashey) (09/07/88)

In article <848@sword.bellcore.com> yba@sabre.bellcore.com (Mark Levine) writes:
>In article <5642@june.cs.washington.edu> pardo@cs.washingtone.edu (David Keppel) writes:
>>Yes, what we need is a high-level machine language that we can
>>translate our programs into and then is sufficiently general that we
>>can compile *that* efficiently into native machine code.
.....
>There is an awful lot of good stuff on the shelf, but I think it stays
>there as long as proprietary interests remain larger than portability
>concerns.  Given the recent movement toward UNIX OS standards, window
>system standards, and ignoring ADA, perhaps it is time for some archaeology;
>but, how serious is anyone's interest in a really portable machine oriented
>high level language?

1) high-level machine language definitely != C (suggested sometime
earlier in this sequence.)  Not enough semantics.

2) one interesting possibility is Stanford's U-code,
which Fred Chow's dissertation used to show a machine-independent
optimizer for several machines (Stanford MIPS, 68K, others).

3) MIPSco took this, and extended as necessary as C was beefed up
(volatile needs to get thru, etc), and PL/1, COBOL, and Ada were added.
A few things in the optimizer got slightly machine-dependent in the process,
although I don't think inextricably so.

4) As I understand it [correct me if wrong], the HP Precision compilers
started from Stanford U-code also, and I assume was extended, too.

It is NOT that unreasonable to have an intermediate code that
is language-independent (covering at least some languages),
and reasonably target-independent.  This is a great boon to vendors
who like to support highly-integrated compiler systems,
and their customers who like them also.  Whether or not it solves
the other problems that started this discussion is yet to be known.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

henry@utzoo.uucp (Henry Spencer) (09/08/88)

In article <963@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>... Have you looked at the Ada Intermediate
>Language? I can't claim I've taken a close look, but it looks like the
>sort of thing we can expect to see.

I saw some of the earlier stuff leading into it, but haven't looked at
AIL itself.

>> (Of course, this is
>> anathema to the Stallmanites...  Don't expect GNU to support such a
>> portable distribution form.)
>
>Why not?

Distribution with sources is Good.  Distribution without sources is Evil.
It's as simple as that.  GNUnix might include something to handle such a
form if it were truly universal, but this would be in the spirit of
grudging adaptation to an unpleasant reality.  Making it easier to send
out software without source is precisely what Stallman does *NOT* want.

>> This new flexibility also opens the door to a whole new range of bugs,
>> since the code can now be run on machines which the author never even
>> compiled it on.
>
>Yep, look at the Ada experience for perspective on this problem. Thats
>why I described declarations in the intermediate form as constraints.
>The intermediate form is going to have to be full of constraints. If
>the program has a bunch of declarations that require 36 bit ones
>complement integers, you can be sure that it will run very slowly on a
>68020. But, you can make it run without error if those contraints are
>available to the target machine code generator...

You miss my point.  36-bit ints can be dealt with in the software fairly
well.  What I'm thinking of is much more subtle things that the compiler
can't easily discover and put in the intermediate form, e.g. "this program
depends on being able to dereference NULL pointers".  Or, for that matter,
"the details of the arithmetic in this program assume that integers are
at least 36 bits".  This is the sort of thing that will not be known to
the compiler unless the programmer is explicit about it -- and lack of
programmer attention to such details is exactly the problem.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ok@quintus.uucp (Richard A. O'Keefe) (09/08/88)

In article <1076@cs.Buffalo.EDU> sher@wolf.UUCP (David Sher) writes:
>Out of curiosity since I've not seen this question asked, if the problem
>is that source code is too easy to play with, why not distribute encoded
>source.

If you have a product which you want to make available on several systems,
and the language you are using is the same on those systems, this is quite
an effective method.  We have an add-on product for our Prolog system which
is distributed as encrypted source.  This means that we have to maintain a
single kit of files, not a separate kit for each supported machine type.

There is, however, one reason why people might want to have a common
intermediate form, and that is that the customers for one's product might
not have the compiler you want.  If you have Fortran/Pascal/C/Modula/..
compilers sharing a common intermediate representation and back end, then
the customer only needs to have the back end to install a CIR distribution,
but with encrypted source he needs each compiler.

henry@utzoo.uucp (Henry Spencer) (09/09/88)

In article <1076@cs.Buffalo.EDU> sher@wolf.UUCP (David Sher) writes:
>Out of curiosity since I've not seen this question asked, if the problem
>is that source code is too easy to play with, why not distribute encoded
>source...

Could be done, and I think it is being done by some groups.  It is a
weaker form of protection, though, because you can recover full source
by decrypting, and the compilers have to know how to do that.  Anything
that the compiler knows how to do can, in principle, be analyzed and
duplicated by a sufficiently determined programmer with a disassembler.
Especially if he can then sell his decrypter as a commercial product.
Look at how copy-protection schemes on PCs have fared.

This is especially a problem if we're talking about a vendor-independent
scheme, which of necessity has to be known to many people.

The advantage of the "intermediate"-form approach is that quite a bit of
information is lost during the transformation from source (e.g. one would
presumably lose the spellings of most identifiers), and this information
cannot be recovered just by being clever.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bpendlet@esunix.UUCP (Bob Pendleton) (09/15/88)

From article <1988Sep7.210317.5781@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> In article <963@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
> 
> Distribution with sources is Good.  Distribution without sources is Evil.

Oh, great... GOOD versus EVIL! Send me your definition of good and
evil and how you feel about absolute moral values and I'll send you
mine. Then we can debate the whole issue. But PLEASE do it off line.
I've found that very few people are much interested in these kinds of
debates. 

>>68020. But, you can make it run without error if those contraints are
>>available to the target machine code generator...
> 
> You miss my point.  36-bit ints can be dealt with in the software fairly
> well.  What I'm thinking of is much more subtle things that the compiler
> can't easily discover and put in the intermediate form, e.g. "this program
> depends on being able to dereference NULL pointers".  Or, for that matter,
> "the details of the arithmetic in this program assume that integers are
> at least 36 bits".  This is the sort of thing that will not be known to
> the compiler unless the programmer is explicit about it -- and lack of
> programmer attention to such details is exactly the problem.

I've addressed this problem in another posting, but what the hey, I'll
do it again. To be truely portable the intermediate form MUST address
the issues you mention. Even if the source language doesn't define the
semantics of dereferencing NULL pointers, the intermediate form must
define the semantics of dereferencing NULL pointers. Otherwise, just
as C code that counts on being able to dereference NULL pointers is
not fully portable, an intermediate form that doesn't define the
semantics of dereferencing NULL pointers will not be fully portable.

No matter what the source language, a compiler that generates a
portable intermediate form will have to explicitly state the
assumptions it is making about things like word size, arithmetic, and
the semantics of dereferencing NULL pointers, or conform to the
semantics defined for these things in the portable intermediate form.
Otherwise it just won't work.

Yes, that means that C compilers will have to put information into the
intermediate form that does not derive from any programmer provided
declarations. That indicates a flaw in C, not a problem with the idea
of a portable intermediate language. 

			Bob P.-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.

henry@utzoo.uucp (Henry Spencer) (09/18/88)

In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>> Distribution with sources is Good.  Distribution without sources is Evil.
>
>Oh, great... GOOD versus EVIL! Send me your definition of good and
>evil and how you feel about absolute moral values and I'll send you
>mine. Then we can debate the whole issue. But PLEASE do it off line.

You miss the point; these are not my beliefs, but Richard Stallman's, and
by extension, those of the Gnu project.  This is why I'd expect them to
be very unenthusiastic about anything that would facilitate sourceless
distribution.  We are indeed talking about absolute moral values, not mere
considerations of tactics.

>> ... What I'm thinking of is much more subtle things that the compiler
>> can't easily discover and put in the intermediate form, e.g. "this program
>> depends on being able to dereference NULL pointers".  Or, for that matter,
>> "the details of the arithmetic in this program assume that integers are
>> at least 36 bits"...
>
>... To be truely portable the intermediate form MUST address
>the issues you mention. Even if the source language doesn't define the
>semantics of dereferencing NULL pointers, the intermediate form must
>define the semantics of dereferencing NULL pointers.

Unfortunately, it *can't*, without being machine-specific.  Some machines
allow it; some do not.  If the intermediate form allows dereferencing NULL,
then the intermediate form's pointer-dereference operation is inherently
expensive on machines which do not permit dereferencing NULL, making it
impossible to generate good code from the intermediate form.  If the
intermediate form forbids it, then the compilers must guarantee that
no program will try to do so... which for normal compilers will boil down
to inserting run-time checks, again making efficient code impossible.
This is an inherently unportable issue, which an intermediate form MUST
NOT try to resolve if it is to be both efficient and portable.

>Yes, that means that C compilers will have to put information into the
>intermediate form that does not derive from any programmer provided
>declarations. That indicates a flaw in C, not a problem with the idea
>of a portable intermediate language. 

This is like saying that the impossibility of reaching the Moon with a
balloon indicates a flaw in the position of the Moon, not a problem with
the idea of using balloons for space travel!  All of a sudden, our
universal intermediate form is useless for most of today's programming
languages, unless the compilers are far more sophisticated than current
ones.  (NULL pointers are a C-ism, but deducing the size of integers that
the program's arithmetic needs is a problem for most languages.)

I assumed that we were talking about *practical* portable intermediate
forms, ones that could be used with current languages and current compiler
technology.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

itcp@ist.CO.UK (News reading a/c for itcp) (09/21/88)

I have felt in my bones that an efficient Intermediate Language for
conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC
architectures) is realistic proposition. This discussion has encouraged me
to think that I am not alone (and thus less likely to be wrong?).

As people have noted it has to have something like the functionality of
C, only with extensions to allow (where the source language required
it) the specific semantics of a data type (storage size and address
alignment) and operation (precision of operation).  Use of these
specifications may reduce performance on some architectures so the IL
includes unspecified versions where the semantics are `as loose as
possible' to allow local optimisations (block move, particular integer
length, etc.).  The code generator for this IL is larger and more
complex than for C.  It may not be possible to support an architecture
not in the list above efficiently - to try (in some standards committee
composed of any processor manufacturer who wanted in) would doom the
project.

But, I would like to see this not for Software Distribution but for the
development of Programming Languages.  The definition of such an IL and
the wide availability of code generators would do for Programming
Languages what UNIX and C did for Processor Architectures.  It provides
a portablility route that drastically reduces the time and costs of
getting to market.  By concentrating code generation in a single place
it should also allow advances in code optimisation techniques and
processor architecture design (unless the IL is not general enough and
ends up constraining it :-( ).

Good though this would be for the advancement of Computer Science I
cannot see it being commercial.  That is, I could not imagine a
Company that produces the IL definition and sufficient code generators
and compiler front ends to establish a momentum making a profit. :-(.
Maybe Stallman and the FSF could do it, how they pay the rent beats me.

For software distribution I think Doug Keppel has the most pragmatic
and cost effective solution - the use of obfuscated C source as the IL:

From article <5655@june.cs.washington.edu>, by pardo@june.cs.washington.edu (David Keppel):
>>[ High-level IR (intermediate representation) needed for distribution ]
>>[ The customer can modify C too easily ]
> 
> I'll claim that I can pretty easily write a program that takes
> ordinary C programs and makes them gosh-almighty hard to understand.
> [example follows]
> 

[Usual disclaimer: this represents only my hastily assembled opinion and
		   spelling, and not necessarily anyonelse's]

	Tom (itcp@uk.co.ist)

shankar@hpclscu.HP.COM (Shankar Unni) (09/24/88)

> I have felt in my bones that an efficient Intermediate Language for
> conventional processors [(examples)] is realistic proposition....
> 
> [Description of IL..]
> 

Such a piece of research was done years ago at Stanford ("Ucode".
Exact reference not available at the moment).

> Good though this would be for the advancement of Computer Science I
> cannot see it being commercial.  That is, I could not imagine a
> Company that produces the IL definition and sufficient code generators
> and compiler front ends to establish a momentum making a profit. :-(.

Well, surprise, surprise. Both HP and MIPS use such an intermediate language
for their RISC processors. And at least one of them is making a profit :-).
(Disclaimer: I have no information on the other. No flames..) So there!
--
Shankar.

bpendlet@esunix.UUCP (Bob Pendleton) (09/24/88)

>In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>> Distribution with sources is Good.  Distribution without sources is Evil.
>>
>>> ... What I'm thinking of is much more subtle things that the compiler
>>> can't easily discover and put in the intermediate form, e.g. "this program
>>> depends on being able to dereference NULL pointers".  Or, for that matter,
>>> "the details of the arithmetic in this program assume that integers are
>>> at least 36 bits"...
>>
>>... To be truly portable the intermediate form MUST address
>>the issues you mention. Even if the source language doesn't define the
>>semantics of dereferencing NULL pointers, the intermediate form must
>>define the semantics of dereferencing NULL pointers.
>
>Unfortunately, it *can't*, without being machine-specific.

Try this scenario:

There are two kinds of computers in the world, brand X and brand Y.
Brand X computers define the value pointed to by a NULL pointer to be
a NULL value. That is, the load indirect instruction given the value
that C uses for NULL is guaranteed to return NULL.  On the other hand
brand Y computers core dump if you try to load a value from the
address that is equivalent to NULL.

In all other respects X and Y computers are similar enough in word
size, data formats, and so on, that software that doesn't dereference
NULL ports easily from one brand of machine to the other.

Let's assume that on both brands of computers people want code to run
as fast as possible. So, the native code generators for the machines
will generate the shortest possible code sequence for dereferencing a
pointer.  Of course they don't want to do run time checking to see if
a NULL pointer is being dereferenced if they don't have to.

A programmer uses brand X computers. He writes a pointer chasing
program that assumes that *NULL == NULL. He's using a compiler suite
that generates code in UIF (Universal Intermediate Form). Now he
distributes the UIF to people with both brand X and brand Y computers.
They run it through their UIF to machine code translators and run the
code. What Happens?

Well that depends on the definition of UIF. If UIF ignores the *NULL
problem then the code will run on brand X computers and bomb on brand
Y computers. But, if UIF allows a compiler to put a flag in the UIF
that says that *NULL == NULL, or if UIF defines *NULL == NULL, then
the code will run on brand Y machines, but with a speed penalty caused
by the run time checks that the code generator had to insert to comply
with the the brand X compilers request that *NULL == NULL.

So, the compiler that runs on brand X machines must, at least, put a
flag in the UIF stating that dereferencing NULL is allowed. The
compiler on brand Y machines should state that dereferencing NULL is
not allowed. That way the code can be made to run on any machine,
though with a preformance hit when the original compilers assumptions
don't match the reality of a specific machine. Obviously the compilers
and code generators for brand X machines are going to be set up to
produce good code for brand X computers and the same is true for brand
Y computers.  But, it is still possible for UIF code generated for one
machine to be translated to be run on the other machine.

So, to restate what I've said so many times (am I getting boring yet?):

UIF must, at the very least, require that machine dependent
assumptions be stated in the UIF form of a program. If the assumptions
made by the original compiler and the target machine are a close match
then the program will run efficiently on the target machine. If the
assumptions don't match then the program will still run, it just won't
run an as fast as it might have.

This means that the UIF is not machine specific, but programs that
make machine specific assumptions will pay a penalty when they are run
on machines that don't support their assumptions.

>If the intermediate form allows dereferencing NULL, then the
>intermediate form's pointer-dereference operation is inherently
>expensive on machines which do not permit dereferencing NULL, making
>it impossible to generate good code from the intermediate form.

It would seem that our definitions of "good code" are very different.
My definition requires that the code do what I said to do. As I've
tried to point out, not everything I say in a program is explicit in
the source code.  Several critical declarations are made by default
based on the computer I'm using, the compiler I'm using, and the
operating system I'm using. A complete set of declarations for a
program includes all these things. For a compiler to generate code
that matches the complete declaration of a program on a machine other
than the one it was designed for may require that code sequences be
generated that slow the program down. That's engineering folks, but it
isn't impossible. By my definition, it's even good. I would prefer
that programmers not write code that do things like dereferencing
NULL. But, if the language allows, I want to support it an make it
portable.

>>Yes, that means that C compilers will have to put information into the
>>intermediate form that does not derive from any programmer provided
>>declarations. That indicates a flaw in C, not a problem with the idea
>>of a portable intermediate language. 
>
>This is like saying that the impossibility of reaching the Moon with a
>balloon indicates a flaw in the position of the Moon, not a problem with
>the idea of using balloons for space travel!

This is a very good example of the use of a false analogy to build a
strawman argument.

>All of a sudden, our
>universal intermediate form is useless for most of today's programming
>languages, unless the compilers are far more sophisticated than current
>ones.  (NULL pointers are a C-ism, but deducing the size of integers that
>the program's arithmetic needs is a problem for most languages.)

This is a good example of justifying a false conclusion with a false
premise.

I can't find any thing about requiring compilers to deduce number
ranges in anything in my author_copy file. What I keep saying is that
the compiler must explicitly state its ASSUMPTIONS in the UIF form of
a program. If the compiler can deduce number ranges, then it would be
nice if it passed that information along in the UIF. If the compiler
assumes that NULL can be dereferenced, as it would on a computer with
hardware that allows it, then the compiler must state that fact in the
UIF it generates.

>I assumed that we were talking about *practical* portable intermediate
>forms, ones that could be used with current languages and current compiler
>technology.

An ad hominem attack on my credibility? Incredible! But I'll address
it anyway.

No, I've been talking about old languages like C, COBOL, BASIC, LISP
FORTRAN, Pascal, and MODULA-2. I've worked on compilers or
interpretors for, or in, all of these languages. These languages
comprise a small subset of the off the wall languages I've used and/or
implemented over the last 17 years. So I'm convinced I know a little
something about them.

Anyway, it's very hard to keep up with all the current languages being
developed there are so many of them. :-)

As for practical, I've already cited examples of commercial products that
aren't far from using UIF already.

One of the problems I think we've had with this entire exchange is
that it has centered around C. C is not yet standardized, and because
it was intended to be a systems programming language C has always
tolerated machine dependent variations in the semantics of some of its
operators. I believe the variation has been tolerated because it was
believed to be justified by the resulting increase in speed. I believe
Henry published a paper that showed that using better algorithms is
much better than using nonportable hardware features.

If this discussion had centered around COBOL or BASIC there would have
been little to discuss because the standards for these languages already
require source level declarations that solve most of the problems we have
been discussing. 

In the long run I think that the kind of discipline that could result
from the use of a UIF would be a very good thing.

			Bob P.
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet
Alternate:     utah-cs!esunix!bpendlet
        I am solely responsible for what I say.

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/25/88)

In article <340@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes:
:
>I have felt in my bones that an efficient Intermediate Language for
>conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC
>architectures) is realistic proposition. This discussion has encouraged me
:
>As people have noted it has to have something like the functionality of
>C, only with extensions to allow (where the source language required
>it) the specific semantics of a data type (storage size and address
>alignment) and operation (precision of operation).  Use of these
>specifications may reduce performance on some architectures so the IL
:
>Good though this would be for the advancement of Computer Science I
>cannot see it being commercial.  That is, I could not imagine a
>
>
>
>
>
>
>
>
>
>
>
>
I know many people will argue with this, so, feel free to argue - 
but here goes anyway (Hugh LaMaster's $.02):

Prediction: In 4-6 years vector microprocessors will be "conventional"-
they will not have replaced current architectures, but they will be out there,
and will be fairly cheap.

Request: If anyone is actually contemplating creating such a fairly
portable IL, please include linear arrays (of whatever your basic data 
types are) in your IL, so people can write conforming
vectorizing front ends and vector generating code generators.

The cost of including it in the IL is practically nil.  It is trivial to
generate code for a non-vector machine from vectorized IL, and, in fact,
it makes certain optimizations much easier, so it will usually result in
faster code on pipelined machines.  Non-vectorizing front-ends will still
work just fine on vector machines (the generated code will not be as fast
as possible, of course).

Also, please do not make assumptions about the size of addresses, or the
interchangeability of integers, addresses, chars, or floating point, 
in your IL.  It should not be necessary, and, a smart code generator 
will be able to optimize as necessary for specific architectures.

Please note that CDC has presumably come up with an IL which is portable
in the above sense: 
In order to solve the MxN problem for their machines, they decided to
build a set of compilers around common, portable front ends (written
in a common C-like (but not C) language), with vector constructs, and
with the ability to target multiple back end machines with different
integer and address sizes ( I believe all the target machines have 
64 bit floating point, but I don't know if any assumptions are made there).
Anyway, I don't know if the IL has been published.  I am sure the compilers
haven't been! (How many person-years of development?)  Anyway, I think
that this, and other examples, may be an existence proof.  But there are
many subtleties to defining a portable IL, and I don't think it is a trivial
job. 
 


-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/25/88)

In article <650004@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes:

>Well, surprise, surprise. Both HP and MIPS use such an intermediate language
>for their RISC processors. And at least one of them is making a profit :-).
>(Disclaimer: I have no information on the other. No flames..) So there!

I am not sure what Kuck and Associates uses in their vectorizer, or what
Pacific Sierra Research uses, but there are other commercial products out
there from third parties.  

Have the IL's for any of these ever been published?  Would the IL itself
be considered to be proprietary, or just the code which uses it?


-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

cik@l.cc.purdue.edu (Herman Rubin) (09/25/88)

In article <15440@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
> In article <340@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes:
> :
> >I have felt in my bones that an efficient Intermediate Language for
> >conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC
> >architectures) is realistic proposition. This discussion has encouraged me

An intermediate language should exist, which should include everything that
these machines and others can do.  But we should realize that many, if not
most, machine operations do not exist on many machines.

> >As people have noted it has to have something like the functionality of
> >C, only with extensions to allow (where the source language required
> >it) the specific semantics of a data type (storage size and address
> >alignment) and operation (precision of operation).

The differences are even greater.  There are operations which are hardware
on some machines, and which are so clumsy, difficult, or expensive on others
that any decision as to whether or not to use them should be highly machine
specific.  I know of no machine for which I would attempt to restrict a
programmer to HLLs.

			..............

> I know many people will argue with this, so, feel free to argue - 
> but here goes anyway (Hugh LaMaster's $.02):
> 
> Prediction: In 4-6 years vector microprocessors will be "conventional"-
> they will not have replaced current architectures, but they will be out there,
> and will be fairly cheap.

I agree that vector microprocessors will be fairly cheap.  But which type of
architecture?  I am familiar with several of them.  I have used the CYBER 205,
and it has useful instructions which are not vectorizable at all, or vectoriz-
able only with difficulty and at considerable cost, on vector register
machines.  Or will we be using massive parallelism?  Try procedures which
are necessarily branched on vector or (even worse) parallel processors.
Some of them can be reasonably done on stream machines, but they are likely
to be difficult on vector register machines, and almost unworkable on SIMD
machines.  An IL should be highly expressible and with an easy-to-use (from
the human standpoint) syntax.  But if it is good, many of its features will
be directly usable only on few machines.  There seems to be more useful
constructs, Hugh, than are in your philosophy.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

bcase@cup.portal.com (09/26/88)

>> I have felt in my bones that an efficient Intermediate Language for 
>> conventional processors [(examples)] is realistic proposition....       
>Such a piece of research was done years ago at Stanford ("Ucode".             
>Exact reference not available at the moment).                                 
                                                                               
Well, Ucode doesn't really fit the needs here, and certainly "machine-         
independent representation for the distribution of application                 
programs" wasn't the point behind Ucode.  For something closer, but            
not quite there yet, see the work done at DECWRL:  "The Mahler                 
Experience:  Using an Intermediate Language as the Machine Description,"
by David Wall & Michael Powell, WRL Research Report 87/1; this is just
one piece of the great research done for the Titan/MultiTitan project.

Note that one of the major benefits of a MIIL is that a manufacturer
can release a new version of a machine and the poor users of the old
machine won't have to throw away all their software if they want the
features of the new machine.  Notice that this is the promise and, by
and large, the delivery of the current IBM PC and Mac lines, but the
level of compatibility, at the processor instruction set, is too low.  
At least the Mac II lets you install display systems with impunity
(and type in any combination up to six at once!) *without* having to
manually install different drivers in every application, etc. etc.
                                           
But note further that a MIIL can't be limited to just a standard for
expressing application algorithms, it must also specify a great deal
about the operating system (geeze, call it BIOS or TOOLBOX if you're
a little insecure at this point).  For the Mac line, this should not
be a very hard thing to do; the IBM PC world is a little more cloudy.

As an example of this, I was always pleased that I could take a binary
program from 4.2 BSD and run it on Dec's Ultrix, most of the time.  I
know that doesn't exhibit a MIIL, but it does show what kind of
operating system specification is needed.

As someone said earlier in a posting, what we need for processor 
instruction sets is what UNIX provides for computer operating systems
(please no flames about how *good* UNIX is; I am just trying to say
that the idea of a standard *interface* is there).

And, we don't have to have just *one* MIIL, why not have many?  Then,
if you want access to a certain application, you must have the compiler
for the MIIL in which it is written.  This allows more money to be charged
for the more sohpisticated MIILs, thus satisfying the maketing types
among us.  And it also reflects the *fact* that no one MIIL will be
sufficient for all time.  Instead of embellinshing *one* MIIL forever,
until it becomes a CISC, we can have one MIIL for simple, procedural
languages, C, PASCAL, etc., one for Object oriented languages, one for
ADA (for which we can chanrge MEGA BUCKS because the military will
want it!!!), etc. etc.  Also, if MIIL specifications are made public,
we can all compete for the MIIL market by writing better (faster,
smaller, etc.) MIIL compilers.  A new market is waiting to be tapped!
The existence of MIILs doesn't cut revenue, it increases it!

Even with a MIIL for every area, there will still
only be a few, and having a few compilers on your system is not a big
deal (or won't be soon), and they can even be kept off-line if necessary
(unless the compilation is done on-the-fly).  Note that with the right
metaphores (such as that on the Mac, "double clicking"), the operating
system can discover that the application hasn't been compiled from MIIL
to native code and do that automatically.  "Please wait:  installing
application.  XX seconds 'till installation complete."  *THIS* is the
way to do it.  *This* is the way computers should work.  Whenever I tell
a layperson (but computer user) that I have been working on a way to let
new computers run old software, they ask why it hasn't always been that
way....  Think of software for computers as gasoline for automobiles and
you understand why the layperson is mad that IBM PCs can't run Apple
software!  What would you think if Arco gasoline only worked in economy
cars!

itcp@ist.CO.UK (News reading a/c for itcp) (09/26/88)

From article <650004@hpclscu.HP.COM>, by shankar@hpclscu.HP.COM (Shankar Unni):
>>[I write ..]
>> cannot see it being commercial.  That is, I could not imagine a
>> Company that produces the IL definition and sufficient code generators
>> and compiler front ends to establish a momentum making a profit. :-(.
> 
> Well, surprise, surprise. Both HP and MIPS use such an intermediate language
> for their RISC processors. And at least one of them is making a profit :-).
> (Disclaimer: I have no information on the other. No flames..) So there!
> --
> Shankar.

1. What I am interested in is a many (languages) to many (processors)
   Intermediate Language, and one made public for use by procesor or language
   designers.

2. Both MIPS and HP are basically hardware vendors, sure they may make a
   profit, but on their compiler operation?

	Tom

pb@nascom.UUCP (Peter Bergh) (09/26/88)

To contribute further to ethe existence proffof, Sperry (now part of Unisys) have
develoipped a set of compilers that (when I last was involved) comprised C, cCobol,
Fortran (for two architectures), Padscal, and Plus (a Sperry systems-programming
language) for the Sperruy Univadcc 1100 series and that used a reasonably prtortable
intermediate language.  The main design goal for the intermediate language, though,
was not to make it portable between machine architectures but to make it handle
a large subset of the currently existing languages (it handles PL/I but not all
of Ada).

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/26/88)

In article <944@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

>An intermediate language should exist, which should include everything that
>these machines and others can do.  But we should realize that many, if not
>most, machine operations do not exist on many machines.

>
>I agree that vector microprocessors will be fairly cheap.  But which type of
>architecture?  I am familiar with several of them.  I have used the CYBER 205,
>and it has useful instructions which are not vectorizable at all, or vectoriz-
>able only with difficulty and at considerable cost, on vector register
>machines.  Or will we be using massive parallelism?  Try procedures which
:
>machines.  An IL should be highly expressible and with an easy-to-use (from
>the human standpoint) syntax.  But if it is good, many of its features will
>be directly usable only on few machines.  There seems to be more useful
>constructs, Hugh, than are in your philosophy.

>
>
>
>
>
>
>

I fear that I may have been misunderstood.  I do not think that a portable IL
(PIL) can be developed which can efficiently use all the features of a given
architecture, very especially new, poorly understood architectures that
involve massive parallelism with limited communication between processors.

My point is that portable IL's are already in use, both explicitly and
implicitly, and that "vectors" could be simply included in a new IL, and
that it would be worth doing.  No current IL can optimally mediate between
the source language and a particular architecture, and yet, they are useful
because they do a good enough job in many circumstances, and they make easier
porting compilers, especially lesser used compilers that might never become
available at all, to new architectures.  Many people are using gcc,
not because it produces optimal code for the VAX, but because the code
it produces is good enough, and some compilers have become available to
people through it, which would not be otherwise available.

To carry the question about vectors further, it should not be necessary to
know whether the machine has vector registers or a memory to memory
architecture.  It would simply represent vector operations as memory to
memory operations, leaving register assignments to the code generator.  It is
true that some architectures would not be well used by such a scheme, but my
guess is that you could get 30% of the performance of a machine specific
compiler this way, and that would be good enough in many cases, and a 
significant improvement over the current situation, where portable compilers
get a 0% improvement over scalar code.  

This is not an idealistic pursuit of the ideal IL, but a practical approach
to solving the time/time tradeoff (how much programmer time can I afford
to spend to get how much speedup of my program?) in the near term vector
capable microprocessor world.

BTW, the old CDC/ETA compiler did not detect the case of finding the maximal
element of a vector and returning its index as one operation (a common
operation - the instruction is there to support it) and instead used
two vector operations, taking twice as long.  It is not trivial to make
optimal use of an architecture even if you don't have a PIL to worry about;
this is, of course, why the "RISC" word appeared somewhere in this discussion.
Since one of the tenets of the RISC philosophy is to usually exclude
instructions which can't be easily generated by a compiler, RISC architectures
tend to make PILs more practicable.



-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

brooks@maddog.llnl.gov (Eugene Brooks) (09/27/88)

In article <944@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>I agree that vector microprocessors will be fairly cheap.  But which type of
>architecture?  I am familiar with several of them.  I have used the CYBER 205,
>and it has useful instructions which are not vectorizable at all, or vectoriz-
>able only with difficulty and at considerable cost, on vector register
>machines.  Or will we be using massive parallelism?  Try procedures which
Just what instructions are we talking about here?  Lets pick an alternative
to compare to, say the CRAY XMP 48 instruction set.

rogerk@mips.COM (Roger B.A. Klorese) (09/28/88)

In article <345@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes:
>1. What I am interested in is a many (languages) to many (processors)
>   Intermediate Language, and one made public for use by procesor or language
>   designers.

Our UCODE predates our processor, is based on a theoretical machine which
is architecturally unlike our processors, and, while currently processor
specific in implementation, need not continue to be.
 
>2. Both MIPS and HP are basically hardware vendors, sure they may make a
>   profit, but on their compiler operation?

MIPS is a systems and technology company.  We license our compilers to
several vendors who use our chips to build their own systems.  (In fact,
since we now sell our chips through technology partners, royalties and
compiler licenses constitute our revenue in some of these deals.)  In this
case, yes, we do make money on our compilers.
-- 
Roger B.A. Klorese                           MIPS Computer Systems, Inc.
{ames,decwrl,prls,pyramid}!mips!rogerk  25 Burlington Mall Rd, Suite 300
rogerk@mips.COM (rogerk%mips.COM@ames.arc.nasa.gov) Burlington, MA 01803
I don't think we're in toto any more, Kansas...          +1 617 270-0613

prl@iis.UUCP (Peter Lamb) (09/29/88)

In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>
>Try this scenario:
>
>There are two kinds of computers in the world, brand X and brand Y.
>Brand X computers define the value pointed to by a NULL pointer to be
>a NULL value. That is, the load indirect instruction given the value
>that C uses for NULL is guaranteed to return NULL.  On the other hand
>brand Y computers core dump if you try to load a value from the
>address that is equivalent to NULL.

Actually there are 4 types; add to the above two types a machine
which returns constant garbage when you dereference NULL, and one which
returns whatever you last wrote into location NULL. Unix also runs
on machines like these...

>
>In all other respects X and Y computers are similar enough in word
>size, data formats, and so on, that software that doesn't dereference
>NULL ports easily from one brand of machine to the other.

That is to say, correct software ports correctly.

> ....

>A programmer uses brand X computers. He writes a pointer chasing
>program that assumes that *NULL == NULL.

He's written incorrect code (in C or similar languages).

>So, the compiler that runs on brand X machines must, at least, put a
>flag in the UIF stating that dereferencing NULL is allowed. The
>compiler on brand Y machines should state that dereferencing NULL is
>not allowed. That way the code can be made to run on any machine,
>though with a preformance hit when the original compilers assumptions
>don't match the reality of a specific machine. Obviously the compilers
>and code generators for brand X machines are going to be set up to
>produce good code for brand X computers and the same is true for brand
>Y computers.  But, it is still possible for UIF code generated for one
>machine to be translated to be run on the other machine.
>

*HOW* are you going to manage this? Even on the VAX, the classical
trap machine for *0 programmers, *0==0 is only true for a few special
cases:
	*(char*)0 == 0
	*(short*)0 == 0
*BUT*	*(int*)0 == 1041305344  !!!!	(try it...)
*AND*	*(float*)0 == 1.5807e-30

and all bets are off for the case
	((my_struct*)0)->element_in_my_struct

So any pointer chasing code which depends on *0==0 is going to be
highly non-portable at best, and will probably break even on
machines like the VAX.

It is typically code like

	strcmp("something", (char*)0);

which will work on a VAX, but crash on a Sun (or any other machine
which doesn't map in page 0).

There is, as far as I can see no general solution to this problem.
I seem to remember that K&R say (roughly, I don't have it to hand)
that 0 does not correspond to *ANY* valid data.

>So, to restate what I've said so many times (am I getting boring yet?):

Well, quite frankly, this dereferencing NULL business comes up
far too often on the net.

>It would seem that our definitions of "good code" are very different.
>My definition requires that the code do what I said to do. As I've

Just *what* are you saying when you dereference NULL?
Are you saying give me 0 (and if so, how much 0), are you
saying give me whatever random constant garbage happens to
be at address 0, are you saying give me whatever I wrote at
address 0 (and yes, such systems exist, running Unix), or
are you saying `I really feel like a core dump now'???

>One of the problems I think we've had with this entire exchange is
>that it has centered around C. C is not yet standardized, and because
>it was intended to be a systems programming language C has always
>tolerated machine dependent variations in the semantics of some of its
>operators. I believe the variation has been tolerated because it was
>believed to be justified by the resulting increase in speed. I believe

This is exactly what I mean. What's at zero is *UNDEFINED* in C,
and explicitly illegal in many other languages.

>Bob Pendleton @ Evans & Sutherland


-- 
Peter Lamb
uucp:  seismo!mcvax!ethz!prl	eunet: prl@ethz.uucp	Tel:   +411 256 5241
Institute for Integrated Systems
ETH-Zentrum, 8092 Zurich

henry@utzoo.uucp (Henry Spencer) (09/30/88)

In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>>... To be truly portable the intermediate form MUST address
>>>the issues you mention. Even if the source language doesn't define the
>>>semantics of dereferencing NULL pointers, the intermediate form must
>>>define the semantics of dereferencing NULL pointers.
>>
>>Unfortunately, it *can't*, without being machine-specific.
>
>... if UIF allows a compiler to put a flag in the UIF
>that says that *NULL == NULL, or if UIF defines *NULL == NULL, then
>the code will run on brand Y machines, but with a speed penalty caused
>by the run time checks that the code generator had to insert to comply
>with the the brand X compilers request that *NULL == NULL.

Right.  In other words, what we have done is to redefine the semantics
of C to allow *NULL.  Thus guaranteeing that all programs with this flag
in the UIF will be at a serious performance disadvantage on machines
that don't allow *NULL.  The semantics of *NULL are inherently and
incurably machine-dependent, and any "universal" intermediate format
file which specifies them is machine-dependent.

>I can't find any thing about requiring compilers to deduce number
>ranges in anything in my author_copy file. What I keep saying is that
>the compiler must explicitly state its ASSUMPTIONS in the UIF form of
>a program...

How does the target machine's translator know whether it can do the
arithmetic that the program wants?  This cannot be stated in the UIF
unless the compiler can figure it out.  It can't simply be based on the
compiler's host, because then a program which *doesn't* require the full
range of the host's arithmetic (think of a 32-bit host and a 16-bit
target, and a program which is careful not to depend on 32-bit numbers) 
again takes a massive efficiency hit for no good reason.

It is a property of the *program*, not the host it is compiled on, whether
it requires 32-bit arithmetic, the ability to dereference NULL pointers,
etc.  It is difficult to deduce these things from the program, unfortunately.
Modifying the program is not the answer, because there is a massive payoff
for being able to use this technology on existing programs.  Accepting the
efficiency hits is not the answer, because there is another massive payoff
for not losing efficiency.

>... I believe
>Henry published a paper that showed that using better algorithms is
>much better than using nonportable hardware features.

Geoff Collyer and I did indeed publish such a paper.  However, all the
effort on better algorithms is for naught if you cannot get efficient
code out of the compiler.  The *programmer* should not have to worry
about the details of how that is done, but it is important that it be
done.  That is, just because I compiled something on a *NULL machine to
be run on a non-*NULL machine should not mean that I take an efficiency
hit every time I use a pointer -- because I'm careful to avoid needing
*NULL, even though it is difficult for the compiler to know this.

Note, I am not saying that it is inherently evil to accept some efficiency
loss for the sake of correct functioning.  What I am saying is that people
who guarantee correct functioning by their own efforts don't want to take
that efficiency hit for no reason.  And if we are talking about something
that is supposed to sell, we cannot ignore the efficiency issue.  One can
make a fairly good argument that we would all be better off with a small
efficiency loss for the sake of correctness, but that is not the way the
market thinks, and trying to re-educate the market is a really good way
to go broke (if you are trying to do it for profit) or to be laughed at
and ignored (even if you aren't).  When I talk about the idea not being
"practical", I don't mean it is technically ridiculous, I mean that it
WON'T SELL -- people will not adopt it, so proposing it is pointless.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bpendlet@esunix.UUCP (Bob Pendleton) (10/04/88)

From article <634@eiger.iis.UUCP>, by prl@iis.UUCP (Peter Lamb):
> In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>
>>Try this scenario:
>>
>>So, the compiler that runs on brand X machines must, at least, put a
>>flag in the UIF stating that dereferencing NULL is allowed. ...
> 
> *HOW* are you going to manage this?

Run time checks. How else do you check for illegal operations at run
time? They can be implemented in hardware or software, I don't care
which.

> There is, as far as I can see no general solution to this problem.
> I seem to remember that K&R say (roughly, I don't have it to hand)
> that 0 does not correspond to *ANY* valid data.

Run time checks aren't a general solution? They aren't even very
expensive. At least not when compared to the alternative of buggy
nonportable code.

> Just *what* are you saying when you dereference NULL?

Don't ask me, ask the language definition. If it isn't defined then
you've found a flaw in the language definition. That applies to any
language. If the language defines it (any feature), you have to
implement so that it conforms to thae language specification. If the
language leaves it undefined, then you have to deal with the
fact that it will be used, and missued, in every possible way.

I've used dialects of LISP in which (car NIL) was eq NIL and (cdr NIL)
was eq NIL, and NIL, as a bit pattern, was not 0.

Is it possible that you think I'm in favor of defining *NULL to be
equal to NULL and are responding to that? I'm in favor of defining the
behavior of every operator in a language on all of its operand set.
Since NULL can be stored in a pointer, the actions of all pointer
operators when applied to NULL should, in my opinion, be defined.

> 
>>Bob Pendleton @ Evans & Sutherland
> 
> 
> -- 
> Peter Lamb
> uucp:  seismo!mcvax!ethz!prl	eunet: prl@ethz.uucp	Tel:   +411 256 5241
> Institute for Integrated Systems
> ETH-Zentrum, 8092 Zurich

djs@actnyc.UUCP (Dave Seward) (10/07/88)

In article <1988Sep29.192410.246@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>ranges in anything in my author_copy file. What I keep saying is that
>>the compiler must explicitly state its ASSUMPTIONS in the UIF form of
>>a program...
>
>How does the target machine's translator know whether it can do the
>arithmetic that the program wants?  This cannot be stated in the UIF
>unless the compiler can figure it out.

Or unless the programmer has a way of informing the compiler. My take is
that it is not worth supporting the likes of *NULL == NULL if it can't
be done effectively on all target machines, but the concept of the programmer
stating his assumptions about each of a class of variably implemented
features (arithmetic bit sizes, value of *NULL, et al) is a valuable one,
and puts the onus on the programmer, who in this case knows explicitly 
that he is writing supposedly portable code. Safe or reasonable values
can be assumed for these variable features, and the code generator for
each target can warn about cases where it can't implement the desired
option, or can't do it efficiently. It may even be reasonable for the format
to contain different code (provided by the programmer) for specific critical
sections, one for each variant of a variably defined feature, allowing the
most efficient use of each kind of machine (for that feature). The code
generator would then select the appropriate one for the target machine.

An additional thought about correctness: such a portable program should be
delivered with a set of verification tests so that one doesn't have to find
out with one's own data and effort that the program has a machine dependency
in it that prevents it from working (for some feature) on your machine. This
would quickly be enforced by market dynamics after 1) several people get
burned by broken programs, and 2) some vendors start to deliver test suites.

Dave Seward
uunet!actnyc!djs

bpendlet@esunix.UUCP (Bob Pendleton) (10/07/88)

From article <1988Sep29.192410.246@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> 
> Right.  In other words, what we have done is to redefine the semantics
> of C to allow *NULL. Thus guaranteeing that all programs with this flag
> in the UIF will be at a serious performance disadvantage on machines
> that don't allow *NULL. The semantics of *NULL are inherently and
> incurably machine-dependent, and any "universal" intermediate format
> file which specifies them is machine-dependent.

I thought the semantics of *NULL were implementation dependent. Note,
I did not say machine specific. There is a difference. I was trying to
show how implementation specific decisions can be passed on in a
portable way and be made to work, even on machines where they don't
make sense. The idea is, after all, to provide portability. 

Efficiency is, of course, a critical issue. If no one cared how long
something takes, computer development would never have started. Take a
look in the "Proceedings of the SIGPLAN '84 Symposium on Compiler
Construction" entitled "A Portable Optimizing Compiler for Modula-2"
page 310, by Michael L. Powell, who was at the time at DECWRL.  I've
included some relevant quotations from the paper.

"3.3 Optimizing Checks

"Runtime checks are often disabled in production programs because they
"cost so much. For example, the P-code translator, written in Berkely
"Pascal, runs 3 times slower when runtime checks are enabled. By
"optimizing runtime checking, its benefits can be obtained at a
"fraction of the usual cost.

"The runtime checks performed by the compiler include checking variant
"tags, subranges, subscripts, and pointers. The pointer check catches
"not only bad addresses, but also pointers to objects that have been
"disposed. Checks are entered into the expression tree like any other
"expressions, appearing to be unary operators. These expressions are
"often common subexpressions or loop invariants. Such expressions are
"also eligible for loop induction, which could replace a subscript
"check in a loop by checks of the lower and upper bounds of the loop
"index.

The following table is made from information contained in two tables
in the paper. The compilers being used are Berkeley Pascal (pc), the
Berkely Unix C compiler, the DEC VMS C compiler, and the Powell
Modula-2 compiler.  All times are in VAX 11/780 cpu seconds.

				Opt	Check
				All	Opt
Program	Berkeley UNIX	DEC	DEC	DEC
name	Pascal	   C	 C	Mod-2	mod-2

perm	 2.7	 2.6	2.5	2.0	2.4
Towers	 2.8	 2.6	2.7	1.9	2.6
Queens	 1.6	 1.0	0.7	0.9	1.3
Intmm	 2.2	 1.7	0.8	0.8	1.1
Mm	 2.7	 2.2	1.3	0.9	1.2
Puzzle	12.9	12.4	4.9	4.1	6.5
Quick	 1.7	 1.2	0.8	0.8	1.2
Bubble	 3.0	 1.7	1.0	1.0	1.9
Tree	 6.4	 6.2	3.4	1.9	2.2
FFT	 4.8	 4.1	2.6	1.6	2.0

The first 3 columns give execution times with all available
optimization turned on. The 4th column gives execution times for code
generated with all optimizations turned on and all checking turned
off. The 5th column gives execution times with all optimizations
turned on and all runtime checking turned on.

Comparing columns 4 and 5 we see that runtime checking can increase
runtime by as much as %50 for this Modula-2 compiler. But, with
checking turned on it generates code that is as much as twice as fast
as code compiled with the Berkeley C compiler. Notice that the
compiler is doing a lot more than just checking pointers for equality
to NULL.

This table certainly shows the cost of full runtime checking. I hope
the Berkeley C compiler has been improved during the last 4 years. I'd
hate to think we were worrying about the cost of runtime checks when
just using a good compiler can get you back 4 or 5 times what you
loose to run time checks.

Going on to the topic of a Machine Independed Intermediate Language, the
paper has this to say:

"Our P-code is a dialect of the P-code originally developed for Pascal
"compilers [Nori et al. 73]. P-code looks like machine language for a
"hypothetical stack machine and has been used successfully for
"portable compilers. For example, the Model programming language
"[Johnson and "Morris 76], which generates P-code, runs on the Cray-1,
"DEC VAX, Zilog Z8000, and Motorola MX68000 computers. The principle
"features that distinguish this version of P-code from others are
"support for multiple classes of memory and specification of types and
"bit sizes on all operations.

...

"The P-code translator is a one-pass compiler of P-code into VAX code.
"It performs the compilation by doing a static interpretation of the
"P-code. ...

I've used this technique myself. It works very nicely. I first saw it
described in the BCPL porting guide, which I read in ~75, I don't know
when it was written. It is an old and well understood technique.

"Although P-code is machine independent, the P-code translator is
"inherently machine dependent. Decisions of what registers and
"instructions to use to implement a particular P-code operation are
"left entirely to it. However, many of the strategies span a wide class
"of computers, in particular, register-oriented ones. thus the global
"structure of the P-code translator and many of its strategies are
"common to all the implemenations, adding a degree of machine
"independence.

I hope that the existence proofs that others have posted, plus this
information convinces you that the concepts behind MIILs have been
known, and in use for many years, that MIILs can be used as part of an
optimizing compiler system, and that there need not be any performance
loss as a result of using one.

> How does the target machine's translator know whether it can do the
> arithmetic that the program wants?  This cannot be stated in the UIF
> unless the compiler can figure it out.  It can't simply be based on the
> compiler's host, because then a program which *doesn't* require the full
> range of the host's arithmetic (think of a 32-bit host and a 16-bit
> target, and a program which is careful not to depend on 32-bit numbers) 
> again takes a massive efficiency hit for no good reason.
> 
> It is a property of the *program*, not the host it is compiled on, whether
> it requires 32-bit arithmetic, the ability to dereference NULL pointers,
> etc.  It is difficult to deduce these things from the program, unfortunately.
> Modifying the program is not the answer, because there is a massive payoff
> for being able to use this technology on existing programs.  Accepting the
> efficiency hits is not the answer, because there is another massive payoff
> for not losing efficiency.

Yes, it is a property of the program. But, if the language doesn't
allow you to declare the actual size of the data you are doing
arithmetic on, if the language doesn't define the semantics of pointer
operations, then where am I going to get the information needed to
make these decisions?  We could have compiler options that let you
tell the compiler things you can say in the language. We could forced
to compile by saying something like:

cc -short=16 -long=32 -catch_null

to define the size of the arithmetic and how to handle dereferencing
null pointers. Or, the compiler can get it from the host the program
is compiled on. Or, we can modify the definition of the langauge so
that the size of an int and what it means to dereference NULL are
explicitly stated. Or, we could use pragmas to supply the information.
No matter what mechanism is used, the information must be provided if
a program is to have any chance of being automagically portable from
one machine to another.

You mention the case of a program that has been designed so that it
can be run efficiently on a machine with small ints (say 16 bits). If
the program is developed on a machine with large ints (say 32 bits),
how does the programmer really know if it will work using small ints
without testing it on a machine with small ints? The only practical
way that I can think of is to use a compiler that allows you to tell
it how big an int is and that generates runtime checks to make sure
that small int semantics are enforced. Testing capabilities of this
sort would allow you to safely put the contstraint that ints must be
>= 16 bits long into the MIIL for the program. The same thing goes for
the *NULL problem. Test with runtime checks for references to *NULL.
Then you can put the assertion that NULL is not derefrenced into the
MIIL. 

Personally, I'd keep the runtime checks. Midnight calls from customers
are bad enough, but having the program die without even printing a
message that will help me get the customer running again are awful.

> ...

Much good stuff deleted

> efficiency loss for the sake of correctness, but that is not the way the
> market thinks, and trying to re-educate the market is a really good way
> to go broke (if you are trying to do it for profit) or to be laughed at

I can't resist. Look at who I work for and tell me I don't already
know. :-) If that doesn't make any sense to you, look up Evans &
Sutherland in "Fundamentals of Interactive Computer Graphics" by Foley
and Van Dam. Especially the stuff about the PS300. 

Of course I could also tell you about the commercial success (I think
20 copies were sold) of my FORTH compiler that "fixed" everything I
didn't like about FORTH.

> When I talk about the idea not being
> "practical", I don't mean it is technically ridiculous, I mean that it
> WON'T SELL -- people will not adopt it, so proposing it is pointless.

It will sell if you push it hard enough. If the alternative is having
to include IBM-PC/MS-DOS compatiblity as part of every machine you
make, then I think the computer manufacturers will work very hard to
make something  like this sell.

Consider; a new machine won't sell without a large existing base of
applications. And, software developers can't afford to develop for
machines that don't have a large installed base. A standard MIIL
allows hardware vendors to compete on a price/performance basis and
provides software vendors with a huge installed base of possible
customers. So a software distribution standard looks like a win for
hardware vendors, software vendors, and end users. A win win win
situation isn't going to be passed up.

Consider how hard it was to get people to stop laughing at the idea of
an operating system written in a high level language only 10 years
ago. The technology development that made that practical didn't stop.
-- 
              Bob Pendleton, speaking only for myself.
An average hammer is better for driving nails than a superior wrench.
When your only tool is a hammer, everything start looking like a nail.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

greyham@ausonics.OZ (Greyham Stoney) (10/11/88)

in article <993@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says:
> 
> Since NULL can be stored in a pointer, the actions of all pointer
> operators when applied to NULL should, in my opinion, be defined.

Hey.... this NULL pointer business is crazy; obviously (*NULL) is undefined -
how could anyone use it? (No, I'm not saying you support it....). But if ALL
actions when applied to the null pointer are to be defined, how about:
(*(NULL+1))? or (*(NULL+any_old_number)). No way; it's totally machine
dependant.

	Greyham

Vote *NO* to NULL pointer references!
-- 
# Greyham Stoney:      (disclaimer not necessary: I'm obviously irresponsible)
# greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove. (* Official Sponsor *)

henry@utzoo.uucp (Henry Spencer) (10/11/88)

In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>You mention the case of a program that has been designed so that it
>can be run efficiently on a machine with small ints (say 16 bits). If
>the program is developed on a machine with large ints (say 32 bits),
>how does the programmer really know if it will work using small ints
>without testing it on a machine with small ints? ...

Competent programming by people who understand portability.  We know
this works, we do it.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

chip@ateng.ateng.com (Chip Salzenberg) (10/14/88)

According to henry@utzoo.uucp (Henry Spencer):
>In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>If the program is developed on a machine with large ints (say 32 bits),
>>how does the programmer really know if it will work using small ints
>>without testing it on a machine with small ints? ...
>
>Competent programming by people who understand portability.  We know
>this works, we do it.

Just a confirmation and a testimonial here.  C News Alpha runs just fine
on a '286, thanks much to Messrs. Spencer and Collyer.
-- 
Chip Salzenberg             <chip@ateng.com> or <uunet!ateng!chip>
A T Engineering             Me?  Speak for my company?  Surely you jest!
	   Beware of programmers carrying screwdrivers.

henry@utzoo.uucp (Henry Spencer) (10/16/88)

In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes:
>Just a confirmation and a testimonial here.  C News Alpha runs just fine
>on a '286, thanks much to Messrs. Spencer and Collyer.

Dept of Minor Nits:  Collyer and Spencer.  Geoff did all the hard stuff.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bpendlet@esunix.UUCP (Bob Pendleton) (10/19/88)

From article <44@ausonics.OZ>, by greyham@ausonics.OZ (Greyham Stoney):
- in article <993@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says:
-- 
-- Since NULL can be stored in a pointer, the actions of all pointer
-- operators when applied to NULL should, in my opinion, be defined.
- 
- Hey.... this NULL pointer business is crazy; obviously (*NULL) is undefined -
- how could anyone use it? (No, I'm not saying you support it....). But if ALL
- actions when applied to the null pointer are to be defined, how about:
- (*(NULL+1))? or (*(NULL+any_old_number)). No way; it's totally machine
- dependant.

The idea was to define all of these to be runtime exceptions. Not to
make them meaningful. *NULL is about as meaningful as x/0, and both
should, in my opinion, cause an exception.

- 
- 	Greyham
- 
- Vote *NO* to NULL pointer references!

Absolutely!

- -- 
- # Greyham Stoney:      (disclaimer not necessary: I'm obviously irresponsible)
- # greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove. (* Official Sponsor *)

-- 
              Bob Pendleton, speaking only for myself.
An average hammer is better for driving nails than a superior wrench.
When your only tool is a hammer, everything starts looking like a nail.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

hermit@shockeye.UUCP (Mark Buda) (10/22/88)

In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes:
|According to henry@utzoo.uucp (Henry Spencer):
|>In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
|>>If the program is developed on a machine with large ints (say 32 bits),
|>>how does the programmer really know if it will work using small ints
|>>without testing it on a machine with small ints? ...
|>
|>Competent programming by people who understand portability.  We know
|>this works, we do it.
|
|Just a confirmation and a testimonial here.  C News Alpha runs just fine
|on a '286, thanks much to Messrs. Spencer and Collyer.

GNU CC, however, doesn't. I expected more from them... sniff...
-- 
Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189
Dumb UUCP: ...{rutgers,ihnp4,cbosgd}!bpa!vu-vlsi!devon!shockeye!hermit
Entropy will get you in the end.
"A little suction does wonders." - Gary Collins

ken@gatech.edu (Ken Seefried III) (10/24/88)

In article <222@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes:
>|Just a confirmation and a testimonial here.  C News Alpha runs just fine
>|on a '286, thanks much to Messrs. Spencer and Collyer.
>
>GNU CC, however, doesn't. I expected more from them... sniff...
>-- 

I'll be kind and simply call this kind of talk silly.  The 80286 is an amazingly
stupid design.  the GNU group made some assumptions (most of them pretty reasonable)
when the built gcc and its ilk.  One of the biggies was 32-bits implimented in a
semi-reasonable way.  The 80286 is niether 32-bits nor reasonably implimented.
Since the target audience for 'gcc' was 680x0, 32x32, etc. based, and the rest
of the world is moving that direction, and they wanted to produce a high
quality compiler, these requirments make a whole lot of sense.  I cannot believe 
the unmitigated gall of some people ( 'I expected more...' ).

'gcc' will not run on the PDP-11/2 in my closet, nor will it run on the old Z80-CP/M
machine that I use for a terminal, but then it was never ment to, so I tend not to 
bitch and moan.

Moral: if you want to run real software, get real hardware...

Oh, and please don't whine that its all that you can afford.  I know that story
inside and out (being a student, and having saved a whole bunch of pennies for
my computer).  

>Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189

   ...ken

dtynan@sultra.UUCP (Der Tynan) (10/25/88)

In article <222@shockeye.UUCP>, hermit@shockeye.UUCP (Mark Buda) writes:
> In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes:
> |
> |Just a confirmation and a testimonial here.  C News Alpha runs just fine
> |on a '286, thanks much to Messrs. Spencer and Collyer.
> 
> GNU CC, however, doesn't. I expected more from them... sniff...
> -- 
> Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189

Check out the GNU software philosphy.  RMS clearly states, that when writing
code for FSF, assume ints are 32 bits, and memory space >= 1MByte.  Your
expectations aside, they did what they said they'd do.
						- Der
-- 
Reply:	dtynan@sultra.UUCP		(Der Tynan @ Tynan Computers)
	{mips,pyramid}!sultra!dtynan
	Cast a cold eye on life, on death.  Horseman, pass by...    [WBY]

greyham@ausonics.OZ (Greyham Stoney) (10/26/88)

in article <1019@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says:
> 
[stuff about what I said earlier]
> The idea was to define all of these to be runtime exceptions. Not to
> make them meaningful. *NULL is about as meaningful as x/0, and both
> should, in my opinion, cause an exception.

Well that's just not possible in many cases. Looks like it'l have to be
DEFINED as being UNDEFINED.
-- 
# Greyham Stoney:      (disclaimer not necessary: I'm obviously irresponsible)
# greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove.  /* Official Sponsor */
# greyham@utscsd.oz - Uni of Technology, Sydney.

hermit@shockeye.UUCP (Mark Buda) (10/27/88)

In article <17536@gatech.edu> ken@gatech.UUCP (Ken Seefried iii) writes:
#In article <222@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes:
#>|Just a confirmation and a testimonial here.  C News Alpha runs just fine
#>|on a '286, thanks much to Messrs. Spencer and Collyer.
#>
#>GNU CC, however, doesn't. I expected more from them... sniff...
#>-- 
#
#I'll be kind and simply call this kind of talk silly.  The 80286 is an
#amazingly stupid design.

I agree wholeheartedly. The only semi-reasonable processor in the family is
the 80386, and that's pretty bad too.

#the GNU group made some assumptions (most of them pretty reasonable)
#when the built gcc and its ilk.  One of the biggies was 32-bits implemented
#in a semi-reasonable way.  The 80286 is niether 32-bits nor reasonably
#implemented. Since the target audience for 'gcc' was 680x0, 32x32, etc.
#based, and the rest of the world is moving that direction, and they wanted
#to produce a high quality compiler, these requirments make a whole lot of
#sense.

I think the problem is that I didn't make something clear in my original
posting. I don't want to compile *for* the 286. I want to compile for a
386, on a 386, but the compilers I have only understand 8086/286, and I'm
damned if I'm going to spend hundreds of dollars for a compiler I'll only
use once.

#I cannot believe the unmitigated gall of some people ( 'I expected
#more...' ).

The only thing I object to in GNU CC is the attitude that you can put a
pointer in an int or pass '0' for a null pointer where the portable thing
is '(char *)0' or NULL.

#Moral: if you want to run real software, get real hardware...
#
#Oh, and please don't whine that its all that you can afford.

It's not mine.

I'll keep my mouth shut from now on.

tron1@xanadu.UUCP (Kenneth Jamieson) (11/01/88)

It seems to me that there was something in that article about those slots being
independantly something-or-othered. Powered , that was it. The machine looks
clean, yet I wonder how well the propriatary monitor and stuff idea will catch
on?

Also, about its design. Ok, a 200+meg optical main drive is nice, but wont
a low-storge (or at least cheap) device be needed? I mean, I dont know
of any software houses that will wanna publish on 50$ disks with a word processor.
-- 
******************************************************************************
* All rumors about my death are true.          {...}galaxy!dsoft  \          *
* Responsibility is management's word for blame.            --- xanadu!tron1 *
* "The world is GOD's source level debugger"   {...}s4mjs!   /               *