[comp.arch] 80-20

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (11/11/89)

From the press:

"...Digital's engineers isolated the core VAX instruction set,
including 80% of the most used opcodes and optimized it to the VAX
9000 gate structure.  The conversion didn't involve reducing - or
RISCing - the instruction set but more accurately hardwiring it into
a single-cycle instruction set.  ... The other 20% of complex
instructions execute with microcode as always..."


If DEC would document exactly what's in that 80%, then VAX compiler
writers could FINALLY settle the subject of choosing between
different instruction sequences.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

hascall@atanasoff.cs.iastate.edu (John Hascall) (11/12/89)

In article <???> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
 
}From the press:
 
}"...Digital's engineers isolated the core VAX instruction set,
}including 80% of the most used opcodes and optimized it to the VAX
}9000 gate structure.  The conversion didn't involve reducing - or
}RISCing - the instruction set but more accurately hardwiring it into
}a single-cycle instruction set.  ... The other 20% of complex
}instructions execute with microcode as always..."
 
}If DEC would document exactly what's in that 80%, then VAX compiler
}writers could FINALLY settle the subject of choosing between
}different instruction sequences.

    How hard can it be to figure out?  Write a simple loop:

               MOVL    #BIGNUM,R0
       10$:
               <i-u-t>             ; repeat instruction under test,
                  .                ; say, 100 times
                  :
               <i-u-t>
               SOBGTR  R0,10$

    Anyway, looking at the VAX Arch. Handbook (Chap 10) we find:

        304 instructions (unless they've added some)
        304 * 0.80 = 243 (approx.)
        
        We can probably assume that most of the "Kernel Instruction Set"
   (those instruction required in any implementation--may not be emulated)
   are hardcoded.  That's 175 instructions instructions (less MOVC3, MOVC5,
   LDPCTX, PROBER, PROBEW, REI, SVPCTX, INDEX, POPR, PUSHR, XFC, CALLS, CALLG,
   RET, 6 queue instructions and 7 bitfield instructions) giving 148.  Then
   there are 102 FP instructions (less ACBx (4), POLYx (4) and EMODx (4))
   giving 90.  148 Kernel + 90 FP = 238 instructions (or 78%).  I would
   suspect I'm not off by more than a handfull of instructions in either
   direction.


John Hascall

mash@mips.COM (John Mashey) (11/12/89)

In article <1925@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
...
>}If DEC would document exactly what's in that 80%, then VAX compiler
>}writers could FINALLY settle the subject of choosing between
>}different instruction sequences.

>    How hard can it be to figure out?  Write a simple loop:
>
>               MOVL    #BIGNUM,R0
>       10$:
>               <i-u-t>             ; repeat instruction under test,
>                  .                ; say, 100 times
>                  :
>               <i-u-t>
>               SOBGTR  R0,10$
.......

1) Note that one must be careful with such a thing, because many of the
more aggressive machines have all kinds of stalls or other pipeline
effects that will NOT be revealed by such a test.  It may be sufficient
to reveal whether or not it's single-cycle-issue inthe normal case,
or it might not.

2) It is hard for compiler writers to EVER figure out the optimal
code sequences, in any evolving family of computers. This is nothing
new.  About 20 years ago, I was torturing myself to write Really Good S/360
BAL code, having carefully studied the timings for 360/50, 360/67,
360/75, and then 370/1xxs.  In any broad computer line, there is seldom
code that is optimal for everything.  Note that optimal code for 286,
386, 486 are all different, as was code for 68000, 68010, and 68020, at
least, and the 2-cycle bus interface of the 68030 changed some of
the tradeoffs. 

3) It certainly isthe case, that a plausible strategy in a product line
is to worry about the machines with:
	the longest pipelines, and usually longest latencies
	the most parallel units
in that you might put in optimizations that make little difference to
the simpler machines, they won't usually hurt them much, if at all, while
they noticably help the more complex ones.
Along this line, I've heard of compiler speedups on S/360 machines
(like code scheduling, to spread loads and usage of the loaded data apart),
which dn't bother the old simpel machines, but help ones with more
aggressive pipelines, because some of the stalls are then eliminated.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

henry@utzoo.uucp (Henry Spencer) (11/12/89)

In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>If DEC would document exactly what's in that 80%, then VAX compiler
>writers could FINALLY settle the subject of choosing between
>different instruction sequences.

Odds are good that you wouldn't go far wrong if you treated the VAX as
a RISC:  use the simple instructions and addressing modes and ignore the
messy ones.  Actually, I'm told that many CISCs perform better with code
generated that way.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (11/12/89)

In article <1925@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
>    Anyway, looking at the VAX Arch. Handbook (Chap 10) we find:
>
>        304 instructions (unless they've added some)
>        304 * 0.80 = 243 (approx.)

I would assume that the 80% is dynamic instruction frequency, not static
percentage of opcode space.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bzs@world.std.com (Barry Shein) (11/13/89)

In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>If DEC would document exactly what's in that 80%, then VAX compiler
>writers could FINALLY settle the subject of choosing between
>different instruction sequences.

If you want a good feel for the 80% look at the code generated by
VAX/VMS Fortran. It uses far less than 80% of the instructions as far
as I can tell (perhaps more in rare cases, hmmm, what %age of the
instructions for what %age of generated code...)

Someone had given me some Fortran code a while back and asked me to
compare it under VMS/Fortran and Unix. The Unix Fortran we had was so
abysmal that we finally agreed that rewriting it into C was fair game
(since the interest was in precisely this algorithm being run, and
other smallish algorithms which could be rewritten if that was the
best way to do it.) Unix/C (4.2bsd) and VMS/Fortran (both Vax/780's)
were compared.

There was a few percent difference between the two, UNIX/C being
slower (but not by much, still curious.) I finally compared generated
code to see why. My conclusion was that the only difference is
VMS/Fortran's avoidance of the SOBGTR instruction on simple loops,
preferring instead a sequence like DEC, TST, BNE which apparently ran
faster. In fact, I concluded that the only thing worth optimizing in
this rather turgid code was the loop overhead, it had a lot of fancy
if-then-else's but actually didn't do much other than swap elements
around in a matrix (it was useful, but the looping overhead dominated
the 20 minutes it took to run on these systems, you could remove the
code in the loops and it made very little difference to the total run
time, I wonder how common that is in physics code?)

ANYHOW...sorry...it was an interesting exercise, go take a look at
generated VMS/Fortran code (it's very good) and you'll see immediately
the kind of things which are fast on a Vax.
-- 
        -Barry Shein

Software Tool & Die, Purveyors to the Trade         | bzs@world.std.com
1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs

bzs@world.std.com (Barry Shein) (11/13/89)

>If you want a good feel for the 80% look at the code generated by
>VAX/VMS Fortran. It uses far less than 80% of the instructions as far
>as I can tell (perhaps more in rare cases, hmmm, what %age of the
>instructions for what %age of generated code...)

Gak, did I write that?

Replace "a good feel for the 80%" with "a good feel for what is
probably in the 80%". I have no specific information, just assuming
that their Fortran code generator knows the fastest parts of the
instruction set currently so I'd guess that's what people are looking
for.

I shouldn't have started that at all, apologies.
-- 
        -Barry Shein

Software Tool & Die, Purveyors to the Trade         | bzs@world.std.com
1330 Beacon St, Brookline, MA 02146, (617) 739-0202 | {xylogics,uunet}world!bzs

dricejb@drilex.UUCP (Craig Jackson drilex1) (11/15/89)

In article <1989Nov12.183132.3120@world.std.com> bzs@world.std.com (Barry Shein) writes:
>
>In article <6927@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>>If DEC would document exactly what's in that 80%, then VAX compiler
>>writers could FINALLY settle the subject of choosing between
>>different instruction sequences.

   As an aside, I suspect that all of compiler writers that DEC cares
   about (those that work for DEC), already have access to all the
   instruction timing information that they want ...

>If you want a good feel for the 80% look at the code generated by
>VAX/VMS Fortran. It uses far less than 80% of the instructions as far
>as I can tell (perhaps more in rare cases, hmmm, what %age of the
>instructions for what %age of generated code...)
>ANYHOW...sorry...it was an interesting exercise, go take a look at
>generated VMS/Fortran code (it's very good) and you'll see immediately
>the kind of things which are fast on a Vax.
>        -Barry Shein
>
>Software Tool & Die, Purveyors to the Trade         | bzs@world.std.com

From what I saw of the announcement, the 9000's are targeted at 'business',
not 'scientific', applications.  If you really want to know that 80%
set, I'd look at the output of VAX/VMS COBOL, plus any other languages
they have now for transaction processing.

I went to a user's group meeting last week where a person was astounded
that I didn't use COBOL.  (This was for users of Unisys (formerly Burroughs)
computers.)  The average application on these machines does transaction
processing, with 100s, if not 1000s of 'terminals' doing order-entry,
ATM processing, or some such.

There's a whole different world out there, which most Usenetters would
have trouble even conceiving of.

And that world is where the 9000s are targeted.
-- 
Craig Jackson
dricejb@drilex.dri.mgh.com
{bbn,ll-xn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}

ktl@wag240.caltech.edu (Kian-Tat Lim) (11/16/89)

In article <6150@drilex.UUCP>, dricejb@drilex (Craig Jackson drilex1) writes:
>From what I saw of the announcement, the 9000's are targeted at 'business',
>not 'scientific', applications.

And that's why they have vector processors :-).  Digital's marketroids
seem to have borrowed ideas from IBM's 3090 people: they're saying
"it's a mainframe" AND "it's a supercomputer."

I think I'll stick with my *KILLER MICROS*, thank you...

--
Kian-Tat Lim (ktl@wagvax.caltech.edu, KTL @ CITCHEM.BITNET, GEnie: K.LIM1)

tihor@acf4.NYU.EDU (Stephen Tihor) (11/17/89)

They footnotede the its a supercomputer line at DECUS by stating that
"Using IBM's Definitons..."

rod@venera.UUCP (Rodney Doyle Van Meter III) (11/17/89)

In article <6150@drilex.UUCP> dricejb@drilex.UUCP (Craig Jackson drilex1) writes:
>
>From what I saw of the announcement, the 9000's are targeted at 'business',
>not 'scientific', applications.  If you really want to know that 80%
>set, I'd look at the output of VAX/VMS COBOL, plus any other languages
>they have now for transaction processing.
>
>There's a whole different world out there, which most Usenetters would
>have trouble even conceiving of.
>
>And that world is where the 9000s are targeted.

Perhaps. Perhaps not. I'm sure that's where their high-reliability
and transaction-processing marketing tacks are headed.

However, they're implementing vector instructions as part of the
VAX architecture. At some point, all CPUs without vector instructions
will be required to emulate them. Fortunately, that's one area
where DEC seems to do okay.

Do business applications use vector instructions? Doubt it. They're
pushing vector Fortran, anyway, just like everybody else, not vector
COBOL.

With four CPUs with vector processors, 512MB memory, and a few gig
of reasonably fast disk, it is supposed to peak at around 1 Gflops.
That's enough to keep a lot of supercomputer users happy, and it
has the "advantages" of coming with VMS, which actually is a
low-maintenance, relatively stable, multi-user OS, when compared
to some supercomputer OSes.

The kicker? The price tag, of course, just like always with a VAX.
Order of five million, list, for the decked-out box, plus HSCs
and disks. For that amount of money, you can get higher performance boxes.

Has anybody seen the vector instructions? They are memory-to-memory,
I assume? Does it include scatter-gather instructions?

I think this is a good move for the VAX guys. It may give them some
life for a while yet. 

			--Rod