[comp.arch] VAX 11 years after

dmr@alice.UUCP (06/17/88)

I happened to come upon an old file, and thought it might be
of historical interest.  It is the notes I wrote near the end
of 1976 or the beginning of 1977 of a presentation that DEC
gave to us at Bell Labs of their unannounced new machine.
I cut out uninteresting material and left the commentary.
I added a few brief notations in [].  At the end are some
remarks from today's perspective.

			Dennis Ritchie
			research!dmr
			dmr@research.att.com

-----------------------
               Summary of DEC 32-bit machine.

            DMR, from notes by JFO [Joe Ossanna]

  (DEC confidential--subject to non-disclosure agreement)
       [presumably the agreement has expired by now!]

The project is called `VAX'-- Virtual Address Extension.
The first hardware is called `STAR' (unoriginal name!) and
the operating system STARLET.  Its speed, in native mode, is
"between 1 and 2 times the 11/70."  [It wasn't.  On programs
that didn't need much memory an 11/70 was noticeably
faster.]  The speed emulating an 11/70 in user mode is about
that of an 11/70.  The cost is intended to be comparable to
an 11/70.  [It was considerably more, actually.]  We could
get a machine on a field test basis toward the end of 1977.
[We didn't, in our group.]  I don't know when regular
deliveries are scheduled.  They are now "past the breadboard
stage;" which seems to mean at least that they have at least
one machine electrically, but not mechanically, the same as
the final version.  I gather that a "field test" machine is
free but of course it is likely to be used for training FE's
and would not be our own.

Instruction set architecture.

The machine is byte-addressed, with a 32-bit virtual
address.  It handles the following data formats:

     [Here I delete a long section describing data formats,
      address modes, and instructions.  It is astonishingly
      correct and complete; Joe must have taken excellent notes,
      and the presentation was much more informative than the
      fluff one usually gets on such occasions.]


Calls.  The machine has a built-in calling sequence.  I'll
try to reproduce it exactly.  Briefly, though, it appears to
be possible to do just what C wants.  I'll try to make clear
just what the hardware does do, so that it can be checked.

      [Here there is a long, essentially correct,
       description of `calls' and `callg' and how to use
       them--access arguments, allocate and refer to locals,
       and so forth.]

As a side note, SCJ [Steve Johnson] with some advice from me has just
written a description of what C wants from a calling
sequence and what it is forced to take on some machines.  So
far as I can determine, this organization embodies every
desirable feature that was imagined by us and several more
besides.  I am astonished at how well it is designed,
particularly considering that this is the same company that
gave us the `mark' instruction.

There are lot more miscellaneous instructions.  [things like
insque and find-first-set; I omit.]

Memory mapping and system features.

This area is rather complicated and somewhat less nice.  The
virtual address is 32 bits (maybe it was really 31, but it
hardly matters).  The high order bit selects "system" or
"program" space; this has no protection implications, but
does help determine the style of mapping.  The next bit
selects "program 0" or "program 1" if the "system" bit is
off.  "System" plus the "program 1" bit is undefined and
reserved.  The machine is paged, but not segmented, except
that the three legal states of the program bit with the
system bit select one of three page tables.  The page size
is XX bytes. [Note that either we missed hearing this or
they didn't say!]

Suppose an address lies in system space.  Then the YY bits
below the S and P bits are used to look up in a system page
table; its base is stored in a hardware register and there
is a limit.  The page table word (discussed more below)
gives the physical address.  The system page table lies on a
physical page boundary.

If the address is in program space, the page number is
looked up in either the p0 or p1 page tables.  The base and
limits of both of these are in hardware registers, however
the base is not a physical address but is mapped according
to the system address space.  Incidentally, the P1 page
table goes backwards in memory.  One thinks of a P1 address
as a moderately small 31-bit negative number.

The page table word ultimately accessed has a present bit, 4
bits (15 states) of protection information, and a physical
address.  I don't know the size of the bit bield, but it is
generous compared to the 2MB of memory that can be attached
to the machine at the moment.  There is a "modified" bit but
no "accessed" bit.

The machine is designed for virtual memory.  Any instruction
can be restarted.  They don't promise that if you look at
the detailed state of things when a page-fail interrupt
occurs you will see anything interesting; just that you get
the virtual address of the failing reference, and that the
instruction can be restarted from the beginning with the
right results.  The implication is, that things work right,
but that all pages referenced by an instruction must be in
core for the whole instruction.  You can't step through a
piece at a time.  Thus there is theoretically a minimum set
of pages that have to be present and it is not entirely
trivial (perhaps as big as 20) for some of the odder
instructions.

There are four protection domains, something like kernel,
executive, supervisor, user.  The latter three cannot
execute privileged instructions and in general they claim
attempts have been made to prevent a less privileged domain
from interfering with a more privileged.  The 15 states in a
page table word somehow encode a nested set of access rights
to the page.  This must be some subset of the cross product
(read, write [,execute?])X(k,e,s,u).  I don't know the
details.  One hopes it is sensible.

Critique

The design of the user-available instruction set is is one
of the most attractive I have ever seen.  We could not
investigate all the nooks and crannies, but it appears to be
extremely regular in its treatment of both operators and
operands; this tends to make a compiler's code generator
simple (and thus more nearly able to approach optimality).
DEC claims that despite the doubled number of bits in the
virtual address space, the size in bytes of programs should
approach that of the 11.  I intend to investigate this with
C outputs, but I am inclined to accept the claim.  The
architecture loses bits in most address modes (which occupy
at least one byte, and sometimes several more), but gains in
being able to express small displacements from registers and
small literals.  For example, to load a small constant, or a
value at a small displacement from a register, takes three
bytes on VAX and four on the 11.

Some care will be needed to produce programs in which all
the addresses have minimal length.  Fortunately, the same
techniques which we use on the Interdata remain applicable.
[This technique is post-loader optimization; either after
or inside of ld, squash as many as possible of the long addresses to
shorter ones.  So far as I know, this hasn't been done
yet for the VAX, though I could be wrong.  It certainly
has been done for other machines.]

The memory mapping is not so good, mainly because it does
not seem easy to use the very large virtual address space.
If information is placed at random the page tables become
huge (2^21 words!).  However, the user page tables can
themselves be paged, and this may provide an out.  I asked
Steve Rothman why they did not go to a segmented scheme, and
the reply was that the overhead (presumably on address-cache
misses) seemed too large.  I should have investigated this
further, because I don't believe it.  He may have had in
mind segmentation combined with full mapping of the user
addressing tables.  This might indeed be pretty messy.

They talked some about software.  It was rather depressing.
Most of it will be emulated.  (Presumably in a 2MB machine
you will still have to tell the assembler how big a symbol
table to use.)  The system itself will be new, but
unimaginative.  They did not seem to understand, for
example, why or even how the command interpreter should be a
separate process and not in the system, and why commands
themselves should be processes.  They are also still stuck
mostly in assembly language.  There are companies that are
learning about how to write software, but DEC is evidently
not one of them.

My general impression is that this is a remarkably good
machine.  DEC talked about lots of other features, such as
the physical design, self-checking, and subset isolation; at
least they were soothing to hear.  It sounded pretty good,
but it's hard to know how it will work out in practice.

------------------------

Latter-day thoughts:

Although the instruction set didn't work out as nicely as it seemed it
would, even from the CISC viewpoint, it holds up pretty well.
From the 1988 viewpoint, VAX has been as important to DEC's
success as the 360/370 architecture has been to IBM's.
I certainly don't gush as strongly today, partly because the ISA
isn't as orthogonal and neat as it seemed it would be.  There are
still plenty of special cases lurking in the instructions and address
modes C wants to compile, for example.  The call instructions do
indeed do just what you want, but are too slow; one might better
make do with less.

Of course the real critique arises not from the details, but from the RISC
insight.  I take this to be, in essense, that those address syllables
are expensive to decode, and that the complicated instructions
are bad, not because of complication alone, but because they
practically insist on a microcode implementation that RISC is able
to dispense with altogether.

The analysis of the mapping is correct, and picks out the architectural
weakness of the scheme (no attractive way to do segments) but misses
the small page size.  As I noted in the text, we didn't hear or
weren't told what it was.  Today, 512 bytes is obviously much too small.
Then, I probably wouldn't have thought so.  It's interesting
that the small size persists, though.  DEC must feel that their software
can't cope with a change.

My impressions of the software situation were entirely correct.

The main fact about the machine that we missed out on altogether
was the IO architecture.  Even the VAX-11/780 had a messy
collection of peculiar bus adaptors glued on it, and of course the
situation has never really become any better.