[net.arch] M680*0 "small model"

hanko@mot.UUCP (Jim Hanko) (06/12/85)

     In the endless debate over the relative performance of various micros,
much discussion centers around whether "small model" 80*86 code can outperform
M680*0 code developed for 24+ bit addresses and 32 bit integers (a
"huge model").  In fact, Intel has recently been running a glossy ad which
is implicitly based on this type of comparison.  Overlooked in the debate,
however, is the fact that the M680*0 has a logically consistent subset of
addressing modes and operations which constitute a "small model". 

The features of the M680*0 small model are:

	32k program 	- logical addresses 0 to 7fff 
			- absolute short & PC-relative addressing modes
	32k stack&data	- logical addresses f...f8000 to f...fffff
			- absolute short addressing mode
	16 bit integers	- full set of word operations
	16 bit pointers	- the CPU sign-extends word addresses to 32 bits

     This M680*0 small model is fully self-consistent (including sizeof(int)
being equal to sizeof(int *)), is easy to implement, and is capable of
supporting virtually all published benchmarks that work with the 80*86 small
model.  M680*0 small model code is significantly smaller and faster than
M680*0 huge model code (eg. absolute addressing takes two fewer bytes and 
eight fewer cycles on the M68000).

     The M680*0 small model is not generally supported in M680*0 systems
and compilers for the following resons:

	1. Small models are not as generally applicable as huge models.
	2. The M680*0 huge model is also relatively easy to implement.
	3. The M680*0 huge model takes a relatively small performance
	   'hit' (compared to other common architectures :-)
	4. Supporting multiple models requires extra compiler code and 
	   redundant libraries.

     Perhaps, if the small model is as good an idea as Intel claims, more
M680*0 systems should support it. 

     The bottom line in performance measurement is, therefore, not whether
a small model 80*86 can or cannot beat a huge model M680*0, but which
small model and which huge model outperforms the other. 

     From now on, let's compare apples to apples. 


================================
Jim Hanko, UNIX group, Motorola Microsystems, Tempe, AZ U.S.A
{seismo | ihnp4 } ! ut-sally ! oakhill ! mot ! hanko
================================
Disclaimer: the opinions expressed here are my own, but anyone may adopt them.

joel@peora.UUCP (Joel Upchurch) (06/15/85)

>      The M680*0 small model is not generally supported in M680*0 systems
> and compilers for the following resons:
> 
> 	1. Small models are not as generally applicable as huge models.
> 	2. The M680*0 huge model is also relatively easy to implement.
> 	3. The M680*0 huge model takes a relatively small performance
> 	   'hit' (compared to other common architectures :-)
> 	4. Supporting multiple models requires extra compiler code and 
> 	   redundant libraries.
> 

It sounds to me like what the 68000 needs is an optimizing assembler.
On Perkin-Elmer 3200 series computers there are several addressing
modes, using from 2 to 6 bytes per instruction. The assembler
automatically figures the best addressing mode and generates the
appropiate object code. This is an iterative process, since one
squeeze pass can cause instructions on the next pass to be further
squeezed, since the instruction may be closer to the label it is
referenceing allowing a more compact addressing format to be used.

It seems to me that such a optimizer could be used as a post-processor
for the compiler allowing all languages to use this technique. You
should also have the option of running without squeeze to save compile
time during debug.

jer@peora.UUCP (J. Eric Roskos) (06/16/85)

> It sounds to me like what the 68000 needs is an optimizing assembler.
> On Perkin-Elmer 3200 series computers there are several addressing
> modes, using from 2 to 6 bytes per instruction. The assembler
> automatically figures the best addressing mode and generates the
> appropiate object code.

Actually the better 8086 assemblers do this.  And in the days back when I
used to write 8086 assemblers I used an algorithm very similar to the one
you're describing to do it.  It is fairly complicated to do, because the
iterative algorithm can loop forever in pathological cases if you don't
keep track of what you're doing.

Well, actually the Intel assemblers do even more than that... some mnemonics,
such as MOV, can generate 8 or so different opcodes, depending on the
operands, before you even get to all the small details like the addressing
modes, selecting the destination operand and size of immediates, etc.
-- 
Full-Name:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	   "Gnyx gb gur fhayvtug, pnyyre..."

guy@sun.uucp (Guy Harris) (06/17/85)

> > It sounds to me like what the 68000 needs is an optimizing assembler.
> > On Perkin-Elmer 3200 series computers there are several addressing
> > modes, using from 2 to 6 bytes per instruction. The assembler
> > automatically figures the best addressing mode and generates the
> > appropiate object code.

> Actually the better 8086 assemblers do this.

Heck, guys, the UNIX PDP-11 assembler does this for branch instructions.  So
does the UNIX VAX-11 assembler and the MIT and AT&T 68000 assemblers.

> > It seems to me that such a optimizer could be used as a post-processor
> > for the compiler allowing all languages to use this technique. You
> > should also have the option of running without squeeze to save compile
> > time during debug.

Since PCC-derived C compilers and "f77"-derived FORTRAN 77 compilers
generate assembly code instead of object code, they all use this technique.

	Guy Harris

hammond@petrus.UUCP (06/17/85)

> Joel Upchurch writes:
> It sounds to me like what the 68000 needs is an optimizing assembler.
> On Perkin-Elmer 3200 series computers there are several addressing
> modes, using from 2 to 6 bytes per instruction. The assembler
> automatically figures the best addressing mode and generates the
> appropiate object code. ...

to which J. Eric Roskos responds:
> Actually the better 8086 assemblers do this.  And in the days back when I
> used to write 8086 assemblers I used an algorithm very similar to the one
> you're describing to do it.  It is fairly complicated to do, because the
> iterative algorithm can loop forever in pathological cases if you don't
> keep track of what you're doing. ...

a) See Communications of the ACM April 1978, (Vol. 21, No.4) which has
   "Assembling Code for Machines with Span-Dependent Instructions" by
   Thomas G. Szymanski.  It provides a nice terminating algorithm for
   the job.  I used it in an assembler for the Series/1 and it worked.
   Of course, unless all your code is in 1 assembler file, you'll miss
   some optimizations and be forced to make worse case assumptions about
   some other cases.

b) As far as the need for optimizations, I think an appropriate comment
   was made by Steve Johnson (PCC author) during a class at BTL on the
   pcc.  The comment was in reference to the optimizations done by typical
   IBM compilers versus those done by typical C compilers and he said
  (I paraphrase here)  "When you've committed yourself to an architecture
  like IBM's, you'd better become the world's expert in optimizing
  compiler technology."  The same comment is applicable to Intel in the
  micro world.

Rich Hammond, [ucbvax,decvax,allegra]!bellcore!hammond

darrell@sdcsvax.UUCP (Darrell Long) (06/17/85)

Can you give a bound on the number of passes that your Perkin-Elmer
compiler makes?
-- 
Darrell Long
Department of Electrical Engineering and Computer Science
University of California, San Diego

USENET: sdcsvax!darrell
ARPA:   darrell@sdcsvax

jnw@mcnc.UUCP (John White) (06/18/85)

> >      The M680*0 small model is not generally supported in M680*0 systems
...
> It sounds to me like what the 68000 needs is an optimizing assembler.
...
> It seems to me that such an optimizer could be used as a post-processor
> for the compiler allowing all languages to use this technique.

    There is a problem in doing this. Since several files can be compiled
seperatly and then linked together, there is no way the compiler can
know what data can be fit into the area of memory accessed by "short"
instructions. Only the linker can know where each piece of data will be
put. I have seen a compiler (on a time-sharing system) that could be
given a "shortstatic=number" switch. All variables shorter than "number" bytes
whould be put in a seperate partition that the linker whould put within
the first 32k of memory. Now the compiler would be able to generate short
absolute addreses instead of long absolute addreses. Everything except
the biggest arrays could usually be fit in this short partition. In
fact, the savings from the shortstatic switch were about as large as from
the -O switch, only the shortstatic switch didn't slow the compiler down.
The -O and shortstatic switches were independant, so they could be used
together for even greater savings.
- John N. White   {duke, mcnc}!jnw

carter@masscomp.UUCP (Jeff Carter) (06/18/85)

In article <1069@peora.UUCP> joel@peora.UUCP (Joel Upchurch) writes:
>>      The M680*0 small model is not generally supported in M680*0 systems
>> and compilers for the following resons:
>>   .........etc.......etc.....etc......
>
>It sounds to me like what the 68000 needs is an optimizing assembler.
>On Perkin-Elmer 3200 series computers there are several addressing
>modes, using from 2 to 6 bytes per instruction. The assembler
>.......etc....etc
>referenceing allowing a more compact addressing format to be used.
>
>It seems to me that such a optimizer could be used as a post-processor
>for the compiler allowing all languages to use this technique.  (etc.)

Or perhaps I could suggest a different method. Rather than add another
postprocessor to the 6 stages of processing already involved in
such *Wonderful* Unix utilities as pcc, why not modify the assembler
to choose the shortest possible address mode for jumps and branches, 
etc. so that the compiler doesn't have to to know such things?
The compiler says "branch to location _xxA3" and if this label
is "close enough" for the small model to apply, then the assembler
generates the proper short branch. This will work for all references
inside of a given program unit. Now, for global references from external
program units is another problem....... (Post-process the a.out file? :-))

[The views expressed herein have no relationship to reality, as neither
 does the author]

Jeff Carter
MASSCOMP 
1 Technology Park,
Westford, MA 01886
UUCP: .....!{ihnp4 | decvax | allegra |harpo}!masscomp!carter

hammond@petrus.UUCP (06/18/85)

As someone has already pointed out, the UNIX assemblers since the time
of the PDP 11 have done the span-dependent optimization for branch
instructions (i.e. select branch or branch of opposite flavor over jump)
They haven't selected the shortest addressing mode since you have to allow
for separate compilation, which means that things like pointers passed
between functions in different files must be in the long format, unless
you restrict programs to running in just 64k.  Further, global variables
generally end up with 32 bit absolute addresses.  To get rid of these
you need to integrate the assembler, loader, and library archiver to
be able to handle multiple size addresses.  This is a non-trivial task.
It gets worse for something like the NSC 32032 where you can have 7, 14,
or 30 bit offsets instead of just 16 and 32 bits.

Another problem, at least on UNIX, is that the code is mapped in the lowest
portion of the address space, which is where small absolute addresses
point.  To be most effective, the code, which can easily reference other
code locations with PC relative offsets ought not to use the region where
small absolute addresses work, so that the data can be placed there.
For most programs, global data would easily fit in the first 32k of address
space with a reduction in code size.

Rich Hammond BCR	[allegra, decvax, ucbvax]!bellcore!hammond

steve@anasazi.UUCP (Steve Villee) (06/18/85)

> 
>      In the endless debate over the relative performance of various micros,
> much discussion centers around whether "small model" 80*86 code can outperform
> M680*0 code developed for 24+ bit addresses and 32 bit integers (a
> "huge model").  In fact, Intel has recently been running a glossy ad which
> is implicitly based on this type of comparison.  Overlooked in the debate,
> however, is the fact that the M680*0 has a logically consistent subset of
> addressing modes and operations which constitute a "small model". 
> 
>      From now on, let's compare apples to apples. 
> 
> 
> ================================
> Jim Hanko, UNIX group, Motorola Microsystems, Tempe, AZ U.S.A
> {seismo | ihnp4 } ! ut-sally ! oakhill ! mot ! hanko
> ================================
> Disclaimer: the opinions expressed here are my own, but anyone may adopt them.

I don't think it's quite fair to compare this "small model" with
the small model provided by the Intel 8086 or 8088.  If your 68000 system
has an MMU, it might be almost reasonable, but even there, many OS's
are not set up to let user programs address the range ffff8000 to ffffffff.
So it might well be that all code and data would have to fit in the
first 32K.  In any case, this 68000 "small model" does not readily
upgrade to something analogous to the Intel medium model (separate
code and data, with 64K for each).

The real problem is if your 68000 system does not have an MMU.  This
is the case with the Macintosh, Jackintosh, Amiga, etc. that are in
the same price league with the IBM PC and clones.  In this case, you
are limited to one "small model" process running at once, and it is
very unlikely that the range ffff8000 to ffffffff is available.  You
also have to watch out for the interrupt vectors.

Don't get me wrong.  I'm no fan of the 8086 architecture.  But at least
things like PC/IX are possible on an 8086 system without an MMU,
with lots of small model processes running that the OS can switch
between easily.  While there is no rigid memory protection, it's
pretty tough to step on other processes accidentally.  You have
go into assembly language and muck with the ES or something.

A fairer "small model" for the 68000 would use (Am,Dn.W) addressing
modes, with the 16-bit pointer in Dn and some kind of base address
in Am.  This would provide closer to the same capability that Intel
small model provides.  But of course this would be slower than the
large model!

--- Steve Villee (asuvax!anasazi!steve)
    International Anasazi, Inc.
    2219 East University Drive
    Phoenix, Arizona 85034
    (602) 275-0302

joel@peora.UUCP (Joel Upchurch) (06/19/85)

> Can you give a bound on the number of passes that your Perkin-Elmer
> compiler makes?
> -- 
> Darrell Long
> Department of Electrical Engineering and Computer Science
> University of California, San Diego

On the largest module I could lay my hands (~29,000 lines) on
it was 10 passes. Assembling smaller modules seems to take about
5 to 6 passes. Of course you have the option of assembling without
squeeze during the debug cycle. There are also debug  and optimize
options with our FORTRAN compiler.

edler@cmcl2.UUCP (Jan Edler) (06/20/85)

In article <587@mcnc.mcnc.UUCP> jnw@mcnc.UUCP (John White) writes:
>Since several files can be compiled
>seperatly and then linked together, there is no way the compiler can
>know what data can be fit into the area of memory accessed by "short"
>instructions. Only the linker can know where each piece of data will be put.

I think the article on page 591 of the proceedings of the June 1985
USENIX conference describe an attempt to solve this problem.

	Jan Edler
	New York University
	cmcl2!edler
	edler@nyu

jss@sjuvax.UUCP (J. Shapiro) (06/22/85)

"This" referring to address shrinking of span dependent code:

> Heck, guys, the UNIX PDP-11 assembler does this for branch instructions.  So
> does the UNIX VAX-11 assembler and the MIT and AT&T 68000 assemblers.
> 
> 	Guy Harris

Not quite right, Guy. As I understood it, the Perkin Elmer assembler was
described as doing an essentially arbitrary number of passes. With UNIX
it was decided that this approach was too slow, and that the break point
of cost effectiveness was at two passes, which catches about 80 percent
of such things. the PDP-11 compilers and the VAX compilers both make only
two passes.

You are correct that they will replace jump/branch as appropriate. This
is essentially an extension of the concept of span dependent optimization
of addressing modes.

I recently went off and looked into all of this, which is how I learned.

Jon Shapiro

chris@umcp-cs.UUCP (Chris Torek) (06/23/85)

Actually, the 4BSD Vax assembler doesn't shrink long branches, it
expands short ones.  This generally (always? have't though about it
enough) produces shorter code.

The assembler also does branch tunneling (branch to a branch that gets
you to your destination if it's shorter & faster to do so).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

jer@peora.UUCP (J. Eric Roskos) (06/24/85)

> If your 68000 system has an MMU, it might be almost reasonable, but even
> there, many OS's are not set up to let user programs address the range
> ffff8000 to ffffffff.

I get the feeling this statement is based on ONE operating system, the OS
for the Apple Macintosh.  One of its major flaws is that certain types of
program segments are limited to 32K.  I believe, if I'm not mistaken, that
this is because they used entirely PC-relative addressing in order to aid
in code relocation.

> Don't get me wrong.  I'm no fan of the 8086 architecture.  But at least
> things like PC/IX are possible on an 8086 system without an MMU,
> with lots of small model processes running that the OS can switch
> between easily.

Is it true that PC/IX uses this approach?  It had been my understanding that
PC/IX worked more or less like AT&T's "Mini-Unix*" for the PDP-11/03, where
the old process was swapped out and the new one swapped in each time a new
process was scheduled to run.

Actually, that is one of the few places where the 8086's segmentation
registers would seem particularly useful to me.  Unfortunately, the unflagging
demand for "large model" compilers on the part of IBM PC users more or less
did away with the chance for that sort of thing.

> A fairer "small model" for the 68000 would use (Am,Dn.W) addressing
> modes, with the 16-bit pointer in Dn and some kind of base address
> in Am.

I think the Macintosh's OS does this for the data segment; i.e., data is
all addressed off one base register.

----------

NOTE: followups to this article will go to net.arch, where they more properly
belong.

*Unix is a trademark of AT&T.
-- 
Shyy-Anzr:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	    "Erny vfgf qba'g hfr Xbqnpuebzr."

mike@peregrine.UUCP (Mike Wexler) (06/25/85)

> Don't get me wrong.  I'm no fan of the 8086 architecture.  But at least
> things like PC/IX are possible on an 8086 system without an MMU,
> with lots of small model processes running that the OS can switch
> between easily.  While there is no rigid memory protection, it's
> pretty tough to step on other processes accidentally.  You have
> go into assembly language and muck with the ES or something.
How about this.  On a Altos 586(and 8086 base micro) if you pass the
compiler an unterminated string, it will cause the whole system to
crash.  I have done application development work on the 586 in C(not
recommended) and it crashes quite often.  I didn't not write one
line of assembler.
-- 
--------------------------------------------------------------------------------
Mike Wexler(trwrb!pertec!peregrine!mike) | Send all flames to:
15530 Rockfield, Building C              |	trwrb!pertec!peregrine!nobody
Irvine, Ca 92718                         | They will then be given the 
(714)855-3923                            | consideration they are due.

hanko@mot.UUCP (Jim Hanko) (06/26/85)

> I don't think it's quite fair to compare this "small model" with
> the small model provided by the Intel 8086 or 8088.  ...

> The real problem is if your 68000 system does not have an MMU.  ...
> In this case, you
> are limited to one "small model" process running at once, and it is
> very unlikely that the range ffff8000 to ffffffff is available.  You
> also have to watch out for the interrupt vectors.

> A fairer "small model" for the 68000 would use (Am,Dn.W) addressing
> modes, with the 16-bit pointer in Dn and some kind of base address
> in Am.  This would provide closer to the same capability that Intel
> small model provides.  But of course this would be slower than the
> large model!
> 
> --- Steve Villee (asuvax!anasazi!steve)
>     International Anasazi, Inc.


     Actually, one non-MMU M680*0 "small model" would involve using two
A-registers as the base of the code and data segments.  Then, the d(An) modes
could be substituted for the absolute short modes.  (Note: other formulations
are possible, including pointing to the middle of the segments, allowing
64K program and 64K data).

     Note that this would not "be slower than the large model!", but would
in fact cause the SAME SIZE AND SPEED code to be generated as my original
M680*0 "small model", which required an MMU.  The only loss it suffers is
that two of the A registers would no longer be available for register
variables (are they segment registers? :-). 

     The (Am,Dn.W) modes you mentioned would be used, as in the original
"small model", to index into arrays after their base is loaded into an A
register. 

		eg. a[i + 1] = c;  /* i int; c char; a char array */

	MMU small model:		non-MMU small model:
	(absolute short modes)		(assume A5 has base address of data)

	LEA.W	a,A0			LEA.W	a(A5),A0
	MOVE.W	i,D0			MOVE.W	i(A5),D0
	MOVE.B	c,1(A0,D0.W)		MOVE.B	c(A5),1(A0,D0.W)

     Both implementations require 14 bytes and 43 cycles on an M68000. 


================================
Jim Hanko, UNIX group, Motorola Microsystems, Tempe, AZ U.S.A
{seismo | ihnp4 } ! ut-sally ! oakhill ! mot ! hanko
================================
Disclaimer: the opinions expressed here are my own, but anyone may adopt them.