[comp.arch] Words & Bytes

firth@sei.cmu.edu (Robert Firth) (05/04/87)

Herewith some thoughts about word versus byte addressing.
There are several issues here, and it might be best to
separate them.

Transfer Unit
-------------

First, what should be the unit of transfer between memory 
and the CPU?  A byte is clearly too small; judging by the
way most data caches are built, 64 or 128 bits at a time
looks reasonable, depending on bus width and bandwidth.

For writes into memory, we need a sensible indivisible
update primitive for multiprocessors, and it is certainly
helpful if there is a "natural" data type for which stores
are always indivisible, but I see no reason to make every
store indivisible.  If two parallel tasks update bytes of
the same natural unit, ignoring the warnings in the manual,
tough.

Alignment
---------

Secondly, what about alignment?  It seems reasonable to me
to prohibit accesses to some unaligned units (words on a
byte boundary, for instance).  However, I found the PE3200
very tedious in requiring alignment of 64-bit quantities,
since it meant I had to keep the stack always double-word
aligned, in case some subroutine further down declared a
local DOUBLE variable.  But if there is a clear performance
gain, I'd live with the tedium.

Addressing Unit
---------------

Thirdly, the question of addressing unit.  I think this
should be clearly distinguished from transfer unit, since
we are talking about two quite different design spaces.
I see these advantages to word addressing rather than
byte addressing

a. you get more address space for the same number of bits

b. you don't have to multiply array indices by 4, or step
   pointers by 4

c. you can't accidentally address an unaligned word

With high-level languages, (c) shouldn't happen.  And I don't
attach much importance to (b), since in the first place the
cost is small after optimisation, and in the second place
arrays come in all sizes, and the compiler had better be able
to handle all sizes well.  Incidentally, that is also the
reason I think most "scaled-index" address modes are useless -
making a special case out of arrays with component size 1,2,4
or 8 is pointless, since you still have to generate good code
for arrays of component size 6,7,12 or whatever, and the
general solution probably works just as well as the special
modes.

That leaves (a), and given that we are always running out of
address space, I think it is a persuasive reason.

Byte Arrays
-----------

However, word addressing creates some problems, that the machine
designer must solve, or see his machine execrated.  As a simple
example, how do you implement arrays of bytes (or characters).
Nobody will tolerate loose packing, so it must be possible to
access the individual bytes of an array, and the byte index
is a computed quantity.

The DG Nova, for instance, has separate instructions to access
the left-hand and right-hand bytes.  This is pathetic - you
have to compute the index and switch on the bottom bit.  The
CA NM4 has a special address mode, where the value in a certain
register is treated as a byte index, and the hardware does the
rest - this is a good solution.

Of course, you cannot declare a byte array that spans the whole
address space, since that's WHY the machine has word addressing.
A more serious problem is that the address mode assumes that
the array starts on a natural boundary.  Given things like Ada's
slices and Fortran EQUIVALENCE, you cannot in general assume that
a parametric byte array starts on a word boundary.  So you have to
pass the array as a descriptor

	(word address, byte offset)

that designates the true first element.  There is then an extra
addition in each index calculation, which can usually be optimised
out of loops.  The main cost is that the caller has to pass an extra
word.

I recall a conference at which a designer of the ICL2900 defended
the hardware array operations, which indeed assumed word alignment
of the array start.  She said "But EQUIVALENCEd char arrays in
Fortran hardly ever happen."  That is the kind of reasoning that
drives compiler writers berserk.  It doesn't matter if only 0.001%
of declared arrays are unaligned.  What matters is that over 60%
of array references are to formal parameters, where you must 
always generate the worst-case code.

Byte Pointers
-------------

And finally the big issue: what happens to all that C code with
"char *" littered all over it?  There must be a clean way to create,
pass as a parameter, and use, a byte pointer.  At worst, one can
treat the variable as an array of size 1, and pass a descriptor
of the kind shown above.  This is a high overhead, though, since

a. every byte-pointer variable is two words

b. a subroutine passed an array usually does a lot of work; one
   passed a simple pointer can be very small, and the extra one word
   of parameter overhead is relatively a larger cost.

Ones attitude to this depends, I suppose, on how one programs.
I could live with it, since I rarely use byte pointers even in
text-processing code.  Most C programmers I expect will scream,
since all that hand-optimised code is now hand pessimised.

Postscript
----------

Nearly all machines have trouble with arrays of things smaller than
the addressing unit.  And many of them have trouble with arrays that
aren't some special multiple of the addressing unit.  In high-level
languages, such arrays are pretty common, and the idea that byte
and word arrays are special cases is obsolescent.

It seems to me about time the hardware designers built the right
indexing primitive, which takes

	(array root, component index, component size)

and where the component size can be anything from 1 bit to HUGE, and 
the hardware does the scaling as appropriate, either multiplying or
dividing the index.

Robert Firth

rw@beatnix.UUCP (05/05/87)

In article <1195@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>
>Postscript
>----------
>
>Nearly all machines have trouble with arrays of things smaller than
>the addressing unit.  And many of them have trouble with arrays that
>aren't some special multiple of the addressing unit.  In high-level
>languages, such arrays are pretty common, and the idea that byte
>and word arrays are special cases is obsolescent.

One of the most fundamental computer engineering principles is efficient 
implementation of the most common cases as special cases when there's a suf-
ficient performance win to be had, and this certainly applies here.

>
>It seems to me about time the hardware designers built the right
>indexing primitive, which takes
>
>	(array root, component index, component size)
>
>and where the component size can be anything from 1 bit to HUGE, and 
>the hardware does the scaling as appropriate, either multiplying or
>dividing the index.

Are we in a time warp back to the mid-70s?  This flies in the face of all the
research that led us to RISC.  The standard argument against this kind of
thing is:
1. For a given amount of hardware, it'll be slower than simpler indexing.
2. For a given speed, it'll need more hardware.
3. If you build the fastest machine you can with this indexing scheme, it will
   be slower than the fastest machine you can build with simple indexing.
4. If you special case the first couple of sigmas via normal byte or word
   addressing, you're left with so few cases it no longer makes sense to
   waste the hardware on it instead of generating special code.  

Russell Williams
..{ucbvax!sun,lll-lcc!styx,altos86,bridge2}!elxsi!rw