[comp.arch] MicroVAX emulation

scc@cl.cam.ac.uk (Stephen Crawley) (03/11/89)

In article <12000@haddock.ima.isc.com> Stephen Uitti writes:
>uVAX IIs implement all sorts of VAX instructions that just aren't in the
>hardware.  Both VMS & flavors of UNIX do this (sometimes even correctly) ...
>Almost no one uses these instructions, so who cares?

We do!.  The MIT CLU compiler generates LOCC, CMPC and MATCHC instructions 
for various operations on the (builtin) string type. 

OK ... so you take a performance hit on an Ultrix uVAX II.  Fine you say.

But on a UVaxII running the Amoeba OS, your program keels over!!  Why?

Well ... the Amoeba kernel does not include the code for emulating the
instructions!  And why is that?  Because the emulation code in Ultrix
and VMS is all copyrighted by DEC ... and the CWI folks never got
around to reimplementing it.  (Why should they?  They write their 
code in C, and C compilers don't generate the string instructions ...)

So we hacked a compiler switch into the VAX Clu optimiser so that
we could disable generation of the troublesome instructions.

Contentious statement:
  Until DEC is prepared to supply free and unencumbered source code
  for the instruction emulation routines, the uVAX II cannot be said
  to implement the VAX architecture!
  
-- Steve

#include "disclaimer.equ"

suitti@haddock.ima.isc.com (Stephen Uitti) (03/14/89)

In article <679@scaup.cl.cam.ac.uk> scc@cl.cam.ac.uk (Stephen Crawley) writes:
>In article <12000@haddock.ima.isc.com> Stephen Uitti writes:
>>uVAX IIs implement all sorts of VAX instructions that just aren't in the
>>hardware.  Both VMS & flavors of UNIX do this (sometimes even correctly) ...
>>Almost no one uses these instructions, so who cares?
>
>We do!.  The MIT CLU compiler generates LOCC, CMPC and MATCHC instructions 
>for various operations on the (builtin) string type. 
>
>OK ... so you take a performance hit on an Ultrix uVAX II.  Fine you say.
	Wait a minute.  I never said that taking a performance hit
	would be OK.  It was more like, "If an instruction is never
	used anyway...".

	Similar things happen with floating point.  For example,
	current compilers for the Mac (Lightspeed C) support
	68881 code generation.  Of course, a Mac Plus doesn't
	have one of these.  Code produced with the option will
	either work fast or not at all.  One could have a VAX
	compiler option to generate wierd instructions or not.
	The code with the wierd instructions would run faster
	(one hopes) on, say, an 8600, but wouldn't run at all
	when brought to a uVAX II.  I prefer the way Turbo C
	handles 8087 (floating) support on PC:  look to see if it
	is there - if so, use it, if not, emulate it in software.
	That way it at least always runs.

>But on a UVaxII running the Amoeba OS, your program keels over!!  Why?
>Well ... the Amoeba kernel does not include the code for emulating the
>instructions!  And why is that?  Because the emulation code in Ultrix
>and VMS is all copyrighted by DEC

	4.3 BSD (Mt Xinu?) implemented their code from the
	architecture manuals.  The code differed from the Ultrix
	code, in that it didn't work.  Something about using the
	wrong manuals.  I assumed that since the OS seemed to
	work that these instructions were not used.  Silly me.
	Amoeba doesn't "really support" the uVAX II.  Maybe the
	Amoeba people can get support code from Mt Xinu.  [For
	a small fee, I'd do write it. :-]

>Contentious statement:
>  Until DEC is prepared to supply free and unencumbered source code
>  for the instruction emulation routines, the uVAX II cannot be said
>  to implement the VAX architecture!

	Pretty soon we'll all be working for the Free Software Foundation.

>-- Steve
	Stephen.
>
>#include "disclaimer.equ"
	yeah, sure.

scc@cl.cam.ac.uk (Stephen Crawley) (03/20/89)

> > OK ... so you take a performance hit on an Ultrix uVAX II.  Fine you say.
> Wait a minute.  I never said that taking a performance hit would be OK.

Excuse me, but I didn't mean to imply that you (Stephen Uitti) would say
that.  I thought I was expressing myself clearly, but perhaps I should 
have said "Fine I hear you say".

> Similar things happen with floating point [example about 68000's & 68881's 
> & Lightspeed C deleted for brevity]

The example is not applicable for 2 reasons.

1)	The "68000 architecture" does not include FP instructions.  The
	"VAX architecture" DOES include the string instructions.

2)	You are talking about 68000 compilers that have a compile time
	option for generating instructions for a 68881.  The MIT CLU
	compiler has NO SUCH OPTION.
	
> Amoeba doesn't "really support" the uVAX II.

Strictly speaking that's true.  However, none of the software CWI supplies
minds the missing emulation code, and the C compilers in question doesn't
generate them. 

> Maybe the Amoeba people can get support code from Mt Xinu.

Or DEC, or Berkeley.  Last time I talked to them about this, the CWI folk 
didn't want to get into all the legal hassle.  I quite understand their
position on this.

> > Contentious statement:
> >  Until DEC is prepared to supply free and unencumbered source code
> >  for the instruction emulation routines, the uVAX II cannot be said
> >  to implement the VAX architecture!

> Pretty soon we'll all be working for the Free Software Foundation.

Huh?  Now your putting words into MY mouth!

I've no problems with people paying money for software that they need.  
I'm objecting to people being **obliged** to fork out a few thousand quid 
for a software license that is going to be 99.8% useless.

[The following is for your benefit Stephen Uitti since you do not 
 seem to understand what #include "dislaimer.equ" means.]

The views expressed above are my own, and do not represent the
official position of the Cambridge University Computer Laboratory.

slackey@bbn.com (Stan Lackey) (03/21/89)

In various articles various writers write:
>> > OK ... so you take a performance hit on an Ultrix uVAX II.  Fine you say.
>> Wait a minute.  I never said that taking a performance hit would be OK.
>> Similar things happen with floating point [example about 68000's & 68881's 
>> & Lightspeed C deleted for brevity]
>
>The example is not applicable for 2 reasons.
>
>1)	The "68000 architecture" does not include FP instructions.  The
>	"VAX architecture" DOES include the string instructions.
True statements.  HOWEVER, the MICROvax architecture does NOT include
string instructions, other than MOVC3 (and MOVC5?).  Some DEC-supplied
software includes emulation to ease the transition.  Generating the
in-line emulation using a reduced instruction set is NO PROBLEM, right?
And gets better performance too?  :-)

>> >  Until DEC is prepared to supply free and unencumbered source code
>> >  for the instruction emulation routines, the uVAX II cannot be said
>> >  to implement the VAX architecture!
1.  The uVAX is NOT said to implement the VAX architecture.
2.  MIPS, AMD, Intel, Motorola, Sun, etc. don't do your work for you either.
    I don't understand why it is OK for the RISC suppliers to supply
    reduced instruction sets, but if DEC does it it's evil.
-Stan

rogerk@mips.COM (Roger B.A. Klorese) (03/22/89)

In article <37515@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>The MICROvax architecture does NOT include
>string instructions, other than MOVC3 (and MOVC5?).  Some DEC-supplied
>software includes emulation to ease the transition.  Generating the
>in-line emulation using a reduced instruction set is NO PROBLEM, right?

There *IS NO* MicroVAX architecture.  There is a MicroVAX *implementation*
of the VAX architecture, which implements some instructions in software.
This is very different.

>1.  The uVAX is NOT said to implement the VAX architecture.

I've never seen this claimed.

>2.  MIPS, AMD, Intel, Motorola, Sun, etc. don't do your work for you either.
>    I don't understand why it is OK for the RISC suppliers to supply
>    reduced instruction sets, but if DEC does it it's evil.

Ours is not "reduced" as in an incompatible subset of our other products.
Our *architecture* is designed around instruction leanness.  If we were to
release a system on which some of our instructions would not run in either
hardware or provided software, that would not be reduced, that would be
STUPID.
-- 
Roger B.A. Klorese                                  MIPS Computer Systems, Inc.
{ames,decwrl,pyramid}!mips!rogerk      928 E. Arques Ave.  Sunnyvale, CA  94086
rogerk@servitude.mips.COM (rogerk%mips.COM@ames.arc.nasa.gov)   +1 408 991-7802
"Committing a gross indecency shows you do things in a big way." - J. Broughton

scc@cl.cam.ac.uk (Stephen Crawley) (03/23/89)

> HOWEVER, the MICROvax architecture does NOT include string instructions, 
> other than MOVC3 (and MOVC5?).  

There is no such thing as the MICROvax architecture.

According to the VAX architecture manual, there is a VAX architecture
with 4 defined subsets:
  Full VAX ... i.e. everything
  Kernel subset
  MicroVAX I subset
  MicroVAX chip subset
  
In the same chapter they also write:
  Also, the combination of hardware and instruction emulation routines
  in the operating systems must (as required) give the appearance of
  a complete architecture on all processors.

All members of the VAX family including microVAXes are sold as implementing
the VAX instruction set ... either in hardware or in software.

> Some DEC-supplied software includes emulation to ease the transition.

No.  There is clearly no concept of a transition involved ... if you
go by what DEC say in the VAX architecture manual.

> Generating the in-line emulation using a reduced instruction set is NO 
> PROBLEM, right? 

Provided you have source to the compiler and skilled effort available,
it boils down to "time == money".  But I could name a few places that 
won't sell source code for any price that WE can afford.  [I'm not talking 
about MIT ... who give CLU compilers etc away free to academic institutions]

> And gets better performance too?  :-)

Obviously.  Of course you emulate the functionality you require rather
than emulating the MATCHC, SKIPC, SPANC etc instructions.

> MIPS, AMD, Intel, Motorola, Sun, etc. don't do your work for you either.
> I don't understand why it is OK for the RISC suppliers to supply
> reduced instruction sets, but if DEC does it it's evil.

Now you are talking about compilers here.  [I hope] MIPS, AMD, etc. would 
not object to you using one of their compilers to cross compiling and 
running code on a bare processor.  I'm happy to pay for the s/w licenses 
for the cross development system. 

The difference between RISC manufacturers and DEC is that with the former
I only need a s/w license for the machine that I run the compiler 
on, while with the latter I need a s/w license for ALL machines.

Another difference is that MIPS, AMD, etc always made a big thing of you
buying their RISC compilers, whereas in the case of the microVAX subsets,
it is in the fine print.

I'd also be happy if DEC unbundled the emulation source code and sold it 
separately for its true value.  I just cannot accept that the true value 
of the emulation code is identical to that of (say) a full Ultrix license.

=====

OK ... so maybe I'm being unrealistic here.  In the real world, anything 
goes so longs as it is not illegal.  Screw the client for as much as he
can pay.  That's the advantage of a free market.

Perhaps if we academics sold the results of our efforts back to industry 
at their true market values we could afford to carry a few rip-offs.

Disclaimer:  These are my own views, not those of my employers.

rascal@verdix.com (Stephen Scalpone) (03/23/89)

From the MicroVAX Handbook, Copyright 1984 by DEC, page 1-1:

    The VAX architecture is designed by Digital for its family
	of 32-bit, virtual memory minicomputers.  Digital has also
	defined a proper subset of the VAX architecture call the
	MicroVAX architecture.

The section continues to state what is and is not included in the
MicroVAX architecture.

hascall@atanasoff.cs.iastate.edu (John Hascall) (03/24/89)

In article <15665@winchester.mips.COM> rogerk@mips.COM (Roger B.A. Klorese) writes:
>In article <37515@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>>The MICROvax architecture does NOT include
>>string instructions, other than MOVC3 (and MOVC5?).  Some DEC-supplied
>>software includes emulation to ease the transition.  Generating the
>>in-line emulation using a reduced instruction set is NO PROBLEM, right?

>There *IS NO* MicroVAX architecture.  There is a MicroVAX *implementation*
>of the VAX architecture, which implements some instructions in software.
>This is very different.

>>1.  The uVAX is NOT said to implement the VAX architecture.

>I've never seen this claimed.

   To quote from "VAX MACRO and Instruction Set Reference Manual",
   page 9-126, "Character String Instructions--LOCC":

      This instruction is not part of the MicroVAX 
      architecture definition.

   This notice appears at the bottom of each page which describes an
   instruction not in the MicroVAX architecture.


   Since we are on this topic, here is a little experiment I ran.

   I wrote a version of the C function strlen (which basically looks for
   a byte of 0x00) which does not use the LOCC (locate character instruction),
   but rather uses the trick:

	    ((x - 0x01010101) & ~x) & 0x80808080) != 0)

   to scan 4 characters at a time for the null-byte, it basically looks
   like:

	    MOVL    4(AP),R0                ; address of string (arg 1) in R0
      5$:   SUBL3   #^x01010101,(R0),R1     ; R1 = *string - 0x01010101
	    BICL2   (R0)+,R1                ; R1 = R1 & ~*string++
	    BICL2   #^x7F7F7F7F,R1          ; R1 = R1 & 0x80808080
	    BEQL    5$                      ; all 4 bytes are non-zero
	    BBS     #7,R1,10$               ; low order byte was 0
	    BBS     #15,R1,11$              ; second byte was 0
	    BBS     #23,R1,12$              ; third bytes was 0
	    SUBL2   #1,R0                   ; high byte was 0, adjust
	    SUBL2   4(AP),R0                ; length = end-start-adjust
	    RET
      10$:  SUBL2   #2,R0                   ; adjust, etc....


   Anyway here are some results compared with the "VAXC" strlen:

   I had a for loop which did 10 strlens/mystrlens, which was repeated
   x times based on the length of the string l:

	   x = (l <    10) ? 10000 :
	       (l <   100) ?  1000 :
	       (l <  1000) ?   100 :
	       (l < 10000) ?    10 : 1;

               
                 ----------------- string lengths --------------
  machine  vers      0      1      9     99    999   9999  99999
  -------  ----  -----  -----  -----  -----  -----  -----  -----
  VS2000   VAXC   9.16   9.51  12.04   4.03   3.28   3.14   3.17
	   MINE   2.42   2.52   3.41   1.31   1.13   1.12   1.13

  6220     VAXC    .89    .98   1.26    .53    .42    .44    .43
	   MINE    .92    .96   1.29    .51    .40    .42    .45

  11/780   VAXC   3.02   2.98   3.23    .70    .39    .51    .58
	   MINE   2.54   2.68   3.50   1.32   1.08   1.16   1.24

  11/785   VAXC   2.27   2.30   2.40    .33    .24    .23    .39
	   MINE   1.97   2.11   2.44    .72    .69    .70    .90

    (all were VMS5.0-2 except my vaxstation which is VMS4.7)

    Since the VS2000 is just a uVAXII, it obviously emulates the LOCC
    which is in strlen, but what about the 6220?  Does it?  In four
    cases the 11/785 (and once the 11/780) toasted the 6220!

    Perhaps I should try a simple:

	5$:  TSTB  (R1)+
	     BNEQ  5$

    as well.


    For what it's worth,
    John Hascall

scc@cl.cam.ac.uk (Stephen Crawley) (03/25/89)

> From the MicroVAX Handbook, Copyright 1984 by DEC, page 1-1:
> 
>    The VAX architecture is designed by Digital for its family
>	of 32-bit, virtual memory minicomputers.  Digital has also
>	defined a proper subset of the VAX architecture call the
>	MicroVAX architecture.
>
> The section continues to state what is and is not included in the
> MicroVAX architecture.

My previous quote was taken from the "VAX Architecture Reference Manual",
Copyright 1987 by DEC. Page 359.  I cannot find any reference to any
"MicroVAX architecture" in that edition of the RM.  Instead they talk 
about subsets of the VAX architecture.  

There is clearly some inconsistency in DEC's terminology here.  It would
be interesting to see if later editions of the MicroVAX Handbook say
the same thing as the 1984 edition.

ruiu@dragos.UUCP (dragos) (04/01/89)

In article <37515@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:
>     I don't understand why it is OK for the RISC suppliers to supply
>     reduced instruction sets, but if DEC does it it's evil.


     Because Ken Olsen has repeatedly stated he dislikes RISC, and 
     has on occasion declared that DEC will never sell a processor
     they will call a RISC.   :-) :-) :-)

-- 
Dragos Ruiu  ruiu@dragos.UUCP  "Yes, Dragos is my first name."
   ...alberta!edm!dragos!ruiu  "Why? Someone said it sounded like a nodename!"

bauer@loligo.cc.fsu.edu (Jeff Bauer) (04/10/89)

In article <514@dragos.UUCP> ruiu@dragos.UUCP (dragos) writes:
>In article <37515@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:
>>     I don't understand why it is OK for the RISC suppliers to supply
>>     reduced instruction sets, but if DEC does it it's evil.
>
>
>     Because Ken Olsen has repeatedly stated he dislikes RISC, and 
>     has on occasion declared that DEC will never sell a processor
>     they will call a RISC.   :-) :-) :-)
>
>-- 
>Dragos Ruiu  ruiu@dragos.UUCP  "Yes, Dragos is my first name."
>   ...alberta!edm!dragos!ruiu  "Why? Someone said it sounded like a nodename!"

Boy, all things do come around again...and again.

I have a copy of a paper from grad school days by Clark and Strecker of DEC
from Sept. '80; one of the early examples of RISC-bashing (and VAX crowing) 
right before the Berkeley RISC I.  In this case the authors blast points 
made earlier by Patterson & Ditzel in _The Case for the Reduced Instruction 
Set Computer_ (CAN, Oct. 1980) by making such nifty claims as [contents
quoted without permission and probably horribly out of context :) ] -

	o Ease of compiler-writing..."code generation in VAX compilers
	  is simplified by having them all [different address mode
	  instructions] (this is attested to by VAX compiler-writers)"

	o Regarding the probability of increased design errors ..
	  "How would Patterson and Ditzel compare the complexity of
	  the VAX-11/780 microcode to that of, say, an optimizing compiler?"

	o Considering CISC instructions executing in separate functional
	  units..."to speed up the multiply function on the RISC would
	  require a speed-up of the whole processor while speeding up the
	  multiply instruction on the CISC could be accomplished by
	  adding specialized data paths and control."

	o Challenging the "risc-takers"..."Casual evaluation of cost and
	  performance will not be sufficient unless the differences between
	  a RISC and a CISC are extreme, which is unlikely.  Paper designs
	  will not be enough."

	o And a block/parry..."Patterson and Ditzel suggest that marketing
	  strategy can increase the size or complexity of an instruction set.
	  We can state from first-hand knowledge that this is not true for
	  the VAX architecture."

Seems to me that some of the points only helped the early RISC-takers by
driving development to counter the CISC arguments.  Of course DEC is still
riding the VAX instruction set in many hardware guises, but they sure have
widened up their tunnel vision since 1980.
-- 
Jeff Bauer					bauer@loligo.cc.fsu.edu
Control Data Corporation			(904) 644-2591 ext. 113

mark@hubcap.clemson.edu (Mark Smotherman) (04/11/89)

In article <573@loligo.cc.fsu.edu>, bauer@loligo.cc.fsu.edu (Jeff Bauer) writes:
> Boy, all things do come around again...and again.
> I have a copy of a paper from grad school days by Clark and Strecker of DEC

   Douglas Clark and William Strecker, "Comments on 'The Case for the
   Reduced Instruction Set Computer,' by Patterson and Ditzel," Computer
   Architecture News, vol. 8, no. 6, October 15, 1980, pp. 34-38.

I've always wondered why they seem to take a swipe at their own designers when,
in discussing why the INDEX function was faster on the 780 if implemented as a
sequence of simple instructions, they say:

  "Anecdotal accounts of irrational implementations are certainly
                         ^^^^^^^^^^ (my emphasis)
   interesting.  Is it *typical*, however, that composite instructions
   run more slowly than equivalent sequences of simple instructions?
   The paper reports that a sequence of several simple instructions
   can replace the VAX INDEX instruction with a 45% speed gain on
   the 780.  This is a problem of implementation, not architecture.
   Fundamentally, after all, the implementation of the INDEX
   *function* with more than one instruction simply cannot take less
   time than the one-instruction version, assuming equal hardware in
   both cases.  The explanation of this anomaly is that the 780's
   Floating Point Accelerator speeds up the multiply in the
   multi-instruction implementation, but doesn't see the INDEX at all."

This is interesting to reread after the series of email articles discussing
how hard it is to pipeline the VAX architecture.  I've heard that the
real win on VAX implementations is to put in a heavy-duty microcode pipe.

Also, does anyone know if DEC is working on an HPS (i.e. a.k.a. micro-
dataflow, restricted dataflow, decoupled VLIW) version of the VAX?  Yale
Patt reported work on this in the 1986 Microprogramming conference.

   Yale Patt, *et al.*, "Run-Time Generation of HPS Microinstructions
   from a VAX Instruction Stream," in Proc. MICRO 19, New York, Oct. 1986,
   pp. 75-81.

   (and I think a paper in MICRO-20 also)

Has DEC followed up this work?
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

slackey@bbn.com (Stan Lackey) (04/13/89)

In article <5064@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes:
>In article <573@loligo.cc.fsu.edu>, bauer@loligo.cc.fsu.edu (Jeff Bauer) writes:
>   Douglas Clark and William Strecker, "Comments on 'The Case for the
>   Reduced Instruction Set Computer,' by Patterson and Ditzel," Computer
>I've always wondered why they seem to take a swipe at their own designers when,
>in discussing why the INDEX function was faster on the 780 if implemented as a
>sequence of simple instructions, they say:
>   the 780.  This is a problem of implementation, not architecture.
>   Fundamentally, after all, the implementation of the INDEX
>   *function* with more than one instruction simply cannot take less
>   time than the one-instruction version, assuming equal hardware in
>   both cases.  The explanation of this anomaly is that the 780's
>   Floating Point Accelerator speeds up the multiply in the...

OK, I admit it, I'm the one responsible for this mess.  There was a
really good reason at the time.  I was doing the FPA.  There was no
INDEX instruction at the time, and I decided that I would add 32-bit
integer mul (MULL) to those instructions the FPA would optimize, for
exactly that reason: so the sequence to do array address calcs would
go faster.  (There was opposition to this, but I made it stick
anyway!)  So I added MULL.  Late in the program, the architecture
committee added the INDEX instruction.  But my boards were in the
board shop, and it was too late for me to add it to the FPA.  So the
microcoded version was all you got.  So it was a case of bad timing,
not proof that RISC's or CISC's are good, bad, or indifferent after
all.

I don't think the later VAXes had this problem.
-Stan