[net.micro.68k] FLAME!!! Re: EA orthogonality

doug@terak.UUCP (Doug Pardee) (05/16/85)

**** WARNING ****   The following comments are not as nice as etiquette
recommends.
 
> I think total orthogonality would be *very* useful.
> ...
> A 68K compiler has to think about modifying the branch condition, etc.
> A 32K compiler just generates code in the way it sees the statement.
> 
> Of course, an optimizer might throw everything around again
> to save registers or whatever, but the inital code generation is
> much simpler in the 32K case.

What in heck do you think we users are paying you compiler writers to
DO?

The purpose of a CPU is to solve the *user's* application as quickly as
possible.

The purpose of a CPU is *NOT* to be as easy to write a compiler for as
possible.

Why on earth should the design of a CPU be based on how easy it will
make the jobs of the five people who will write the compilers for it?
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp

henry@utzoo.UUCP (Henry Spencer) (05/19/85)

> What in heck do you think we users are paying you compiler writers to
> DO?

How much do you want to pay us?  A lot, or not so much?  You could not
possibly pay me enough money to get me to implement a compiler for some
of the scummier machines around, unless your wallet is a lot bigger than
one would expect.

> The purpose of a CPU is to solve the *user's* application as quickly as
> possible.
> 
> The purpose of a CPU is *NOT* to be as easy to write a compiler for as
> possible.

Have you considered that the two may be related?  Difficult compilation
generally means poorer compilers, i.e. poorer performance for the user.

> Why on earth should the design of a CPU be based on how easy it will
> make the jobs of the five people who will write the compilers for it?

Because it will result in faster and more reliable compilers that produce
better code and better error messages.  If you have tried to hire good
compiler people lately, you know that compiler-writer time is neither
cheap nor in infinite supply.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

cdshaw@watmum.UUCP (Chris Shaw) (05/20/85)

| What in heck do you think we users are paying you compiler writers to
| DO?
| 
| The purpose of a CPU is *NOT* to be as easy to write a compiler for as
| possible.
| 
| Why on earth should the design of a CPU be based on how easy it will
| make the jobs of the five people who will write the compilers for it?
| -- 
| Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
|                ^^^^^--- soon to be CalComp

All right... then who should the opcodes be designed FOR, on your opinion...
COBOL programmers? FORTRAN programmers ?

I'm sure you've done assembler on an awful machine in your time.. I know I
have assembled for the worst commercially available processors in existence.
The ultimate lesson to be gained from such machines as the 1802, 6502 and Z80
is that orthogonality gains you a very noticeable productivity improvement
when coding in assembler. 

When one is writing a program that codes in assembler (i.e., a compiler), 
orthogonality is a humungous win, because you don't have to code register-use
weirdnesses into your compiler. You don't have to worry about what kind of 
expression you're evaluating when you produce code to do it, etc, etc.

The end result is that the compiler for a ortho machine is more likely right
and is to market faster, all other things being equal.

Chris Shaw    watmath!watmum!cdshaw
University of Waterloo

jack@boring.UUCP (05/20/85)

[ Note that I added net.arch to the newsgroup, since this is probably
where this discussion belongs]

In article <557@terak.UUCP> doug@terak.UUCP (Doug Pardee) writes:
>**** WARNING ****   The following comments are not as nice as etiquette
>recommends.
Agreed. Also, I think they're not true.
> 
>> I think total orthogonality would be *very* useful.
>> ...
>> A 68K compiler has to think about modifying the branch condition, etc.
>> A 32K compiler just generates code in the way it sees the statement.
>> 
>> Of course, an optimizer might throw everything around again
>> to save registers or whatever, but the inital code generation is
>> much simpler in the 32K case.
>
>What in heck do you think we users are paying you compiler writers to
>DO?
>
>The purpose of a CPU is to solve the *user's* application as quickly as
>possible.
Agreed. In my opinion, this means that the CPU should be optimized to
doing what most users do most of the time: running high-level language
programs.

>
>The purpose of a CPU is *NOT* to be as easy to write a compiler for as
>possible.
Not agreed. If a machine is simple, the compiler is simpler, and thus it
is available sooner, doesn't have as much bugs, etc.
>
>Why on earth should the design of a CPU be based on how easy it will
>make the jobs of the five people who will write the compilers for it?
Because *EVERYONE* will use the product of those five people.
If, for instance, a compiler for a certain machine generates lousy
code for a for-loop, because the compiler writers didn't have time to
optimize it because they were too busy with getting the compiler to *work*
that will waste *HOURS* of CPU time eventually for everyone using it.

This is also the whole point behind RISC architecture, one of the
rising stars at the moment.
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

doug@terak.UUCP (Doug Pardee) (05/22/85)

me>The purpose of a CPU is *NOT* to be as easy to write a compiler for as
me>possible.

> Not agreed. If a machine is simple, the compiler is simpler, and thus it
> is available sooner, doesn't have as much bugs, etc.

Did I miss something here?  Since when is it any concern of mine, as a
user, whether the compiler is simple???

And I have seen no evidence that compilers for "simple" machines are
available any sooner, or are any more reliable, than compilers for
warpo machines.

me>Why on earth should the design of a CPU be based on how easy it will
me>make the jobs of the five people who will write the compilers for it?

One response:

> Because *EVERYONE* will use the product of those five people.

But that doesn't address the question as to why the comfort and
convenience of those five people is of any concern to "*EVERYONE*".

Another response:

> If you have tried to hire good
> compiler people lately, you know that compiler-writer time is neither
> cheap nor in infinite supply.

Ah, here we finally get to the nitty-gritty.  What we're saying is that
we want to have CPUs that are easy to write compilers for so that we can
hire less-capable (aka *cheaper*) programmers to write the compilers!!!

Given how few micro-processor instruction sets there are, and how few
languages of interest, you don't *need* an "infinite supply" of compiler
programmers.  In fact, about a dozen could do the job for the entire
microcomputer world.  There are certainly a dozen top-notch compiler
programmers available for this task.  And given the importance of having
good compilers, they're worth whatever they get paid.

But CPUs and compilers are put out by IC manufacturers, and they
understand chips better than software.  So they tend to put their money
into design work on the chip, and hire cheap programming labor to
produce less-than-thrilling compilers.  Since the manufacturers'
compilers are often poor, third-party operations spring up all over the
place to try to cash in.  Typically underfinanced, these operations
*also* hire cheap programming labor and produce less-than-thrilling
compilers.  And the vacuum remains, so even more third-party start-ups
appear.

For heaven's sake, how many C compilers do we have to develop for the
68000 before we get one that's good???  Wouldn't it have been a whole
lot easier if Motorola or Microsoft or *someone* had put up the bucks
necessary to hire real compiler writers in the first place?

I think it makes more sense to take compiler-writing seriously, rather
than try to kludge the CPU so that every basement hacker can write what
he calls a "compiler".
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp

jbn@wdl1.UUCP (05/22/85)

       The idea is to make programs go fast.  This requires a machine
for which a a compiler can generate fast code.  This is quite different
from a machine for which it is easy to generate code.  One of the easiest
architectures for which to generate code is the true stack machine, where
all operands are pushed on the stack and all operators take data from the
stack and return it to the stack.  The code for such machines is reverse
Polish notation, such as HP calculators use.  USCD Pascal P-code is the
best known modern ``machine'' that works this way, but many hardware
machines, starting with the English Electric Leo Marconi KDF9 in 1959,
and many Burroughs machines from 1960 on, worked this way.  The compilers
are trivial.
       But you can't optimize effectively for a true stack machine.  Nor
can the machine overlap or pipeline operations effectively.  Because all
operations implicitly refer to the top of the stack, the independence of
operations needed for pipelining is very difficult if not impossible to
achieve.  Pipelined machines typically have many registers; the instruction
fetch/decode unit can then keep grabbing instructions and shipping them off
to the functional units for execution until blocked by a reference to a
register tied up by an operation in progress.  The CDC6600 and IBM 7030
(STRETCH) circa 1965 were the first machines that worked this way, and
the newer microprocessors are starting to use this technology.  The
Stanford MIPS machine does work this way, but lacks the hardware interlocks
(called the ``scoreboard'' in the CDC6600) to cause instruction fetch/decode
to block when a register conflict is detected; the compiler for the MIPS
machine has to stick in no-op instructions if look-ahead would cause a
register conflict.
      What the CPU designer concerned with speed really needs is a good
background in optimizing compiler technology and some knowledge of the
history of CPU architecture.

					John Nagle

kds@intelca.UUCP (Ken Shoemaker) (05/23/85)

> 
> [ Note that I added net.arch to the newsgroup, since this is probably
> where this discussion belongs]
> 
> >
> >The purpose of a CPU is *NOT* to be as easy to write a compiler for as
> >possible.
> Not agreed. If a machine is simple, the compiler is simpler, and thus it
> is available sooner, doesn't have as much bugs, etc.
> 
> This is also the whole point behind RISC architecture, one of the
> rising stars at the moment.
> -- 
> 	Jack Jansen, jack@mcvax.UUCP
> 	The shell is my oyster.

Not entirely true.

- the only instructions that can access memory are mov (or load) operations

- jumps jump only after the instruction after the jump has been executed

- some don't have hardware interlocks to prevent a register being read
  before a previous register write has completed, so you have to remember
  to do enough in between so you don't have problems.

- they don't allow arbitrary byte boundaries for code/data

You can argue that this is merely code reorganization, but they are
implemented this way such that you can eliminate both hardware pipeline
stages, and the delays in each stage that is there.  Just my impressions...
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

jack@boring.UUCP (05/25/85)

I'm not sure whether Doug Pardee is serious, or just trying to keep
the discussion going. I'll assume he *is* serious, and answer
him anyway.

Doug>The purpose of a CPU is *NOT* to be as easy to write a compiler for as
Doug>possible.

me> Not agreed. If a machine is simple, the compiler is simpler, and thus it
me> is available sooner, doesn't have as much bugs, etc.

Doug>Did I miss something here?  Since when is it any concern of mine, as a
Doug>user, whether the compiler is simple???

Exactly for the reasons stated above. You don't want to argue that a
4000 line compiler is easier to maintain, debug, etc. than one of
8000 lines, I hope?


Doug>And I have seen no evidence that compilers for "simple" machines are
Doug>available any sooner, or are any more reliable, than compilers for
Doug>warpo machines.

No? Get yourself a PR1ME, and try the pascal compiler :-(

Now, I won't comment on the rest point-by-point, since it would be
too long-winded that way. Let me just explain the following point:

When you are designing a machine, you are facing two size problems:
1. How do I fit all those transistors on this little square?
2. How do I fit all those opcodes in those 16 bits?

An orthogonal design is clearly good for (1), since it allows you to
put use the hardware (or firmware) for calculating "x'100(a6:B)[d0]"
many times.

Now, to satisfy (2), you can do two things:
- Make the operand fields small, so you can have many opcodes.
- Make the opcode fields small, so you can have complicated operands.
(I won't go into RISC here, which makes both of them small).

If you take the first choice, you can have lots of nifty instructions
like 'search for a one bit, and return the position in a register'
or 'copy a string and translate' and those kind of things, which will
*never* be used by *any* compiler (except for cobol, maybe) since
most high-level language don't have a construct for that.
Can you imagine a compiler that would recognize
	for(p=src, q=dst; *p; p++, q++)
		*q = table[*p];
and translate it in the above mentioned instruction?

If you take the second branch, you will *not* have a string translate
instruction. You will, however, have the ability to make your
design orthogonal. Wirth (I think, I'm not sure) has long ago
measured that the average expression had 1.5 operands. This means
the half of the instructions you give will be expressible in *one*
instruction, provided that the machine lets you address something on
the stack as an operand.

For example:
	a += b;
orthogonal:
	add	b(r5),a(r5)
non-orthogonal:
	mov	a(r5),r4		<-- AND MAKE SURE IT'S FREE!!
	add	b(r5),r3
	mov	r3,a(r5)
Now, in cycles, the first one would result in 4 memory cylces and
3 additions, and the second in 6 memory cycles and 4 additions (PLUS
an additional 2 instruction decodes).

Well, this has got long-winded after all, sorry for that.
You may do what you want, but I'll stick to hardware that was designed
by software people. 
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

johnl@ima.UUCP (05/27/85)

While we're all stomping on doug@terak, here's another little reason why
ugly architectures are of concern to other than compiler writers:  You
don't implement a language in a vacuum, and when your'e dealing with a
really ugly chip like tht 80?86 series, the language being compiled often
gets bent so that the compiler writer can finish his job in a finite amount
of time.  Every C compiler ever written for the '86 series has ended up
having several code "models" which do their data and addressing in various
ways that trade off size of usable address space vs. compactness and speed
of object code.  Some compilers have even added "near" and "far" pointer
declarations so that the user can give advice to the compiler about how
to handle dereferencing each pointer.  This means that every compiler user
who wants to compile usefully large programs and still have them run fast
has to learn more than he ever wanted about the strange warts of the '86s
segmented addressing.  I deal with exactly this problem every day (and I
promise, I'm not writing compilers) and it's getting awfully irritating.

Some may find this morally indefensible, but if you put all of the compiler
experts in the world into a room, they still couldn't find a way to generate
decent code for an '86 that appeared to have a linear address space like C
code most naturally wants.

So as has been said before, the 8086 and 286 are fine for high-performance
vending machines, but for real computing, please, give us anything else.
Clever compilers can't paper over this yawning chasm.

John Levine, ima!johnl

cdshaw@watmum.UUCP (Chris Shaw) (05/27/85)

>The purpose of a CPU is *NOT* to be as easy to write a compiler for as
>possible.
>

All right then, what IS the purpose of a CPU?? It would seem to me
that the purpose of a CPU is to run programs. The purpose of a well-designed
instruction set is to make it as easy to program as possible without sacri-
ficing performance.

Now, it also seems to me that an intelligent CPU design takes into account
the types of programs that will run on it. Thus, it's obvious that the 8035
was never designed to be anything more than a controller. When designing
the 32032 then, the kind of programs the designers of the chip had in mind 
were those that would be created by high-level languages. Thus, they made the
instruction set as easy as possible to write compilers for. Obviously,
orthogonality doesn't matter quite so much on a controller, where the programmer
is a human, not a program. On a general-purpose CPU, however, most programs will
be created by programs (compilers), so it makes sense to tailor the instruction
set to its intended programmers.

Anybody who has written a compiler will tell you that ortho machines are easier
to write compilers for. It's a simple fact that has been true since day 1.
The benefits of programs that are easy to write vs hard to write are as follows:

	1) Productivity of the programmer is much higher. Despite Mr. Trissel's
	   comments, compiler writers are harder to come by than (say) COBOL
	   programmers, and are therfore more expensive. Simply asking for
	   better programmers doesn't solve this problem. Therefore, the more
	   productive your programmers, the better. Of course, if the market
	   for 8035 C compilers is twice that for 68000 C compilers, then
	   maybe start writing 8035 stuff, but that's another matter entirely.

	2) Program correctness (lack of compiler bugs). All things being equal
	   (which they aren't), a compiler for a weird machine produced from
	   N man-months of labour will be generally less right than that for
	   an ortho machine. This point is really an outgrowth of productivity.
	   Almost as importantly, an ortho compiler will be easier to maintain
	   and fix bugs for than for a non-ortho machine, since there is no
	   complicated register-assignment algorithm, etc...

	3) Object code speed. Given that a CPUs x and y have the same hardware,
	   but different instruction sets (2 microcode sets, say), compiler code
	   produced for the ortho version is most likely going to be faster,
	   since special-purpose register decisions are not reflected in the
	   code. In other words, non-orthogonality generates superfluous moves
	   that would probably not be necessary in an ortho machine. This point
	   is true whether the code is compiler or human produced. The lack
	   of a general reg-to-reg add on the Z80 is cause for much wasted
	   reg-to-reg MOVs, (or worse) reg-to-memory MOVs, for example.

>I think it makes more sense to take compiler-writing seriously, rather
>than try to kludge the CPU so that every basement hacker can write what
>he calls a "compiler".
>-- 
>Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug

I think this point is ripe nonsense. The bit which grabs me worse, of course,
is the twisted use of the word "kludge". And as for this garbage about basement
hackers, well.... (I guess it's time to go upstairs for a beer & mellow out :-)

Chris Shaw    watmath!watmum!cdshaw  or  cdshaw@watmath
University of Waterloo
In doubt?  Eat hot high-speed death -- the experts' choice in gastric vileness !

g-frank@gumby.UUCP (05/28/85)

> Every C compiler ever written for the '86 series has ended up
> having several code "models" which do their data and addressing in various
> ways that trade off size of usable address space vs. compactness and speed
> of object code.
> 
> . . . if you put all of the compiler
> experts in the world into a room, they still couldn't find a way to generate
> decent code for an '86 that appeared to have a linear address space like C
> code most naturally wants.
> 
> So as has been said before, the 8086 and 286 are fine for high-performance
> vending machines, but for real computing, please, give us anything else.
> Clever compilers can't paper over this yawning chasm.
> 
> John Levine, ima!johnl

   Clever compilers for almost any language but C can paper over most sorts
of yawning chasms.  The 8086 series is not the first processor without a large
linear address space, and it won't be the last.  The problem is that C is a
programming language written with a particular machine storage model in mind,
and it ports poorly to other architectures.  Modula-2, Pascal, Ada, all are
languages that port quite well to the 8086 family, and produce efficient,
readable code without any sort of trickery required of the programmer.

   The problem is C, not Intel.  If you have programs that require enormous
data arrays, you picked the wrong processor, didn't you?  Otherwise, you just
picked the wrong language.

   Do try to desist from characterizing particular processors as being "suitable
for vending machines," by the way.  I have a stupid 68000 system in my basement
that I can't use and can't sell because there's no software for it, and one of
those vending machines sitting on my desk.


-- 
      Dan Frank

	  Q: What's the difference between an Apple MacIntosh
	     and an Etch-A-Sketch?

	  A: You don't have to shake the Mac to clear the screen.

johnl@ima.UUCP (05/29/85)

Let's continue this argument in net.arch, which is more appropriate.

John Levine, ima!johnl

wjafyfe@watmath.UUCP (Andy Fyfe) (05/29/85)

In article <387@gumby.UUCP> g-frank@gumby.UUCP writes:
>   The problem is C, not Intel.  If you have programs that require enormous
>data arrays, you picked the wrong processor, didn't you?  Otherwise, you just
>picked the wrong language.

The problem isn't just `C'.  Write a Fortran subroutine that has variable bounds
and the Intel Fortran compiler, not being able to bound the array, will assume
the worst (generating very scary code, particularly if it's a multidimensional
array).  For numerical work the above is very likely ---- a pity, given that
the 8086 family has had floating point hardware for so long.

--Andy Fyfe		...!{decvax, allegra, ihnp4, et. al}!watmath!wjafyfe
			wjafyfe@waterloo.csnet

thomson@uthub.UUCP (Brian Thomson) (05/29/85)

Chris Shaw writes about orthogonality:
>When designing
>the 32032 then, the kind of programs the designers of the chip had in mind 
>were those that would be created by high-level languages. Thus, they made the
>instruction set as easy as possible to write compilers for. 
>On a general-purpose CPU, ... most programs will
>be created by programs (compilers), so it makes sense to tailor the instruction
>set to its intended programmers.

In my experience, the difficulty of (decent) compiler construction is affected
less by orthogonality than by the number of code sequences that must be
considered when implementing a given source language construct.
The C statement
		a = b * c + d + e;
might, in different contexts be implemented on your 32032 as:

		movd	_b,r0
		muld	_c,r0
		addd	_d,r0
		addd	_e,r0
		movd	r0,_a

or, if c is the constant 2, d a stack local, and e the constant 4,

		movd	_b,r0
		addr	4(-4(fp))[r0:w],_a

or even, if b, c, and d are all unsigned shorts, and e == b,

		movzwd	_b,r0
		indexw	r0,_c,_d	; b * (c+1) + d
		movd	r0,_a

Does that last one look ridiculous?  That's exactly my point: it's the
best code sequence under the given set of assumptions, and no compiler
would ever find it.  If these fancy addressing modes and high-level
language oriented instructions could be added without penalizing the
performance of bread-and-butter instructions, I'd be all for it, but
such is never the case.

If a machine forces me to put something in a data register before I
can add to it, and has no exceptions to this rule, it will be easy to
generate code.  It only gets tough when there are options.
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson

seth@megad.UUCP (Seth H Zirin) (05/30/85)

>       Dan Frank writes:
>
>    Clever compilers for almost any language but C can paper over most sorts
> of yawning chasms.
> The problem is that C is a programming language written with a particular
> machine storage model in mind, and it ports poorly to other architectures.

C ports well to any large linear address space, stack-oriented processor.  In
addition, it ports well to small linear address space processors like the 
6809.  I've used it on IBM's, Univac-1100's, VAXen, and all of the
680x0 processors.

>    The problem is C, not Intel.  If you have programs that require enormous
> data arrays, you picked the wrong processor, didn't you?  Otherwise, you just
> picked the wrong language.

WRONG! C lets you exploit a machine's strengths at the expense of not hiding
the weaknesses.  If you've picked intel for C, you've picked the wrong
processor, the 68000 doesn't need to sweep its weaknesses under a high-level
language rug.  C does however work nicely with intel PROMS.

> I have a stupid 68000 system in my basement that I can't use and can't sell
> because there's no software for it, and one of those vending machines
> sitting on my desk.

If you can't use a 68000 based machine, you're probably in the wrong field.
On the topic of selling it, send me EMAIL with your asking price.

-- 
-------------------------------------------------------------------------------
Name:	Seth H Zirin
UUCP:	{decvax, ihnp4}!philabs!sbcs!megad!seth

Keeper of the News for megad

guy@sun.uucp (Guy Harris) (06/01/85)

> The problem is that C is a programming language written with a particular
> machine storage model in mind, and it ports poorly to other architectures.
> Modula-2, Pascal, Ada, all are languages that port quite well to the 8086
> family, and produce efficient, readable code without any sort of trickery
> required of the programmer.

Umm... Pascal has pointers, "new", and "dispose", just like C has pointers
and its library has "malloc" and "free".  How would you write a program
that, say, manipulated large trees requiring >64KB worth of node storage in
Pascal?  Probably similarly to how you'd write it in C.  Now imagine that
program dealing with two nodes at the same time.  Well, you load one pointer
into the DS register and one of the general-purpose registers, and the other
one into the ES register and one of the general purpose registers.  You have
to kludge a bit with the "use the ES register" prefix, but the advance
information data sheet I have in front of me doesn't list any times for the
segment-selection prefix so I assume it takes no time.

Now imagine that program dealing with three nodes at the same time - say
it's an expression tree, and it's evaluating an addition by adding the LHS
to the RHS and storing the result in the parent node.  Well, you load one
pointer into the DS register and one of the general purpose registers, the
second one into the ES register and one of the general purpose registers,
and the third one into the EES register and one of the general purpose
registers... Oops.  There *is* no EES register.  Oh well, shuffle shuffle
shuffle....  (If you can get good code for this one, try something where you
have to use each pointer more than once.)

The same data sheet says a direct intersegment call takes 19 more clocks
than a direct within segment call in protected mode, and even in real
address mode it takes 6 more clocks.  Load pointer to DS/ES instructions
take 21 clocks in protected mode and 7 in real address mode.  Using those
segment registers dynamically rather than statically is not cheap.

Do you have evidence that compilers for the other languages you mention can
solve problems like this much better than compilers of equivalent quality
for C?  If so, can you show that the difference is due to some
characteristic of the languages in question?

	Guy Harris

doug@terak.UUCP (Doug Pardee) (06/03/85)

Wait a second!  It looks like I should have used one of my "patented"
200-line postings, because an awful lot of people have misinterpreted
my comments.

The original posting to which I had responded did *not* say that EA
orthogonality would result in better compiled code.  It said that EA
orthogonality would allow that compiler writer to save himself the
trouble of swapping operands on a compare instruction and logically
inverting the branch condition.

This does *NOT* improve the performance of the compiled code.  In fact,
on the NS320xx cpus (the only ones around with 2-address architecture),
a "backwards" compare instruction takes an extra 2 clock cycles of
execution time.

I have no objection to compiler writers who wish to make a case that EA
orthogonality will result in better compiled object code.  But I object
strenuously to the notion that regardless of whether it would benefit
or hurt the users, the cpu architecture should be changed to please lazy
compiler writers.

EA orthogonality should be argued on the basis of the efficiency of the
resulting object code, not on the ease with which the handful of
compiler writers can do their job.

Some of the notes have indicated that these concerns are one and the
same.  Sometimes, but not always.  Here's a choice counter-example:
Some RISC machines have a "branch *after* next instruction" operation.
This allows the pipeline to be used more efficiently.  It results in
more efficient object code than conventional branch instructions, but
it is a booger-bear to write an effective compiler for.

A lot of folks have also suggested that compilers which were easily
written (I call them "hastily knocked out" :-) are more bug-free than
ones that took some time to implement.  I maintain that the quantity of
bugs is related to the quantity and quality of design and debugging.
Now how much design and debugging do you expect to get from a compiler
writer who thinks that putting the operands of a "compare" instruction
in the proper order is "too much work"?

It is also said that good compilers take longer to produce than crummy
ones.  True.  Are we all so impatient that we'd rather have a crummy
compiler now than to wait six months for a good one?

And it has been said that good compilers cost more than crummy ones.
I'm not exactly surprised.  Isn't there an old saw about "only getting
what you pay for"?

I suggest that part of the problem here is that a lot of folks who are
reading this hope to write The Great American Compiler.  They weren't
planning on spending the time and money to write a good compiler.  And
they don't much care for hearing suggestions that users don't want to
buy crummy compilers.  (Have at it, my mailbox is asbestos-lined now).
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp

rap@oliveb.UUCP (Robert A. Pease) (06/05/85)

> 
> You may do what you want, but I'll stick to hardware that was designed
> by software people. 
> -- 
> 	Jack Jansen, jack@mcvax.UUCP
> 	The shell is my oyster.

The thing that I keep thinking about is  that  every  paper,  article,
text, or whatever I have seen on the subject says that the best way to
design a system is to first decide what the application  will  be  and
then  design  the  hardware to support the design goals.  Seems to me,
then,  that  an  orthogonal  archetecture  would  support  high  level
languages  much  better  than one that is not orthogonal, or do I just
see things more clearly that others :-).
-- 
					Robert A. Pease
    {hplabs|zehntel|fortune|ios|tolerant|allegra|tymix}!oliveb!oliven!rap

paul@greipa.UUCP (Paul A. Vixie) (06/05/85)

In article <210@uthub.UUCP> thomson@uthub.UUCP (Brian Thomson) writes:
>The C statement
>		a = b * c + d + e;
>
>might, in different contexts be implemented on your 32032 as:
>		movd	_b,r0
>		muld	_c,r0
>		addd	_d,r0
>		addd	_e,r0
>		movd	r0,_a
>
>or, if c is the constant 2, d a stack local, and e the constant 4,
>		movd	_b,r0
>		addr	4(-4(fp))[r0:w],_a
>
>or even, if b, c, and d are all unsigned shorts, and e == b,
>		movzwd	_b,r0
>		indexw	r0,_c,_d	; b * (c+1) + d
>		movd	r0,_a

Or, how about:

;		extern long int a, b, c, d, e;
;		a = b * c + d + e;
		movd	ext(_b), tos
		muld	ext(_c), tos
		addd	ext(_d), tos
		addd	ext(_e), tos
		movd	tos, ext(_a)

;		extern long int a, b;
;		#define c 2
;		auto long int d;
;		#define e 4
;		a = b * c + d + e;
		movd	ext(_b), tos
		muld	2, tos
		addd	4(fp), tos
		addd	4, tos
		movd	tos, ext(_a)

;		extern long int a;
;		extern unsigned short int b, c, d;
;		a = b * c + d + b;
		movzwd	ext(_b), tos
		movzwd	ext(_c), tos
		muld	tos, tos
		movzwd	ext(_d), tos
		addd	tos, tos
		movzwd	ext(_b), tos
		addd	tos, tos
		movd	tos, ext(_a)

----------------
The above code is not very pretty nor efficient.  In each case I have done
five operations:  move, multiply, add, add, move.  The only real difference
is in the addressing modes;  this seems common of compiler-generated code.

I am no longer (thank <insert deity here>) an expert on the 68xxx, but I
don't remember an external or frame-relative addressing mode;  one assumes
that the many otherwise useless address registers will be used to hold the
current global and frame pointers, and the loader has alot of fixing up to
do on those globals - every reference needs modification, not just an extern
table (unless you plan to have your compiler generate enough low-level stuff
to do what the 32xxx external addressing mode does automagically).

Not being a compiler writer (yet :-), anyway) I don't see many other things
a compiler could optimize for (except the "muld 2, tos" which could have been
"ashd 1, tos" but only vax-11 C from DEC does this).  I do know that the
68xxx's addressing modes and strange restrictions on address and data registers
are more characteristic of RISC than a machine with all those instructions.
Can the 68xxx even do a "addd -(sp), (sp)" without doing the pop at the wrong
time?  The one I worked with didn't have any memory-to-memory instructions;
you could do register to memory, memory to register, or register to register,
but they were all different instructions (in fact, different instructions for
address and data registers, and that's when they felt like providing them -
often you had to move into an (address or data) register from a (data or
address) register to do a simple operation.

Gosh, what a ramble.  Sorry about that everybody.  My point in all this is
that a compiler can generate *clean* code *easily* for the 32xxx because
of all the neato addressing modes;  generating code for the 68xxx is either
(easy, ugly, inefficient) or (hard, functional, efficient) but that's like
a choice between the electric chair and the gas chamber.

	Paul Vixie
	{pyramid,dual,decwrl}!greipa!paul

mark@rtech.UUCP (Mark Wittenberg) (06/05/85)

> For example:
> 	a += b;
> orthogonal:
> 	add	b(r5),a(r5)
> non-orthogonal:
> 	mov	a(r5),r4		<-- AND MAKE SURE IT'S FREE!!
> 	add	b(r5),r3
> 	mov	r3,a(r5)
> Now, in cycles, the first one would result in 4 memory cylces and
> 3 additions, and the second in 6 memory cycles and 4 additions (PLUS
> an additional 2 instruction decodes).
> 
> -- 
> 	Jack Jansen, jack@mcvax.UUCP
> 	The shell is my oyster.

And furthermore, the orthogonal sequence is normally atomic;
in an OS kernel the non-orthogonal sequence might easily have to
be protected by a "disable/enable interrupt" sequence around it,
or "test-and-set" or some such in a multi-processor system 
(e.g., "a" and "b" might be global vars).
Multi-process user-programs would need "enter/exit monitor" or
"block-on-semaphore" sequences.  Besides being a pain (sometimes
a royal pain) this has the potential for eating a lot of CPU time.
-- 

Mark Wittenberg
Relational Technology
zehntel!rtech!mark
ucbvax!mtxinu!rtech!mark