firth@sei.cmu.edu (Robert Firth) (05/04/87)
Herewith some thoughts about word versus byte addressing. There are several issues here, and it might be best to separate them. Transfer Unit ------------- First, what should be the unit of transfer between memory and the CPU? A byte is clearly too small; judging by the way most data caches are built, 64 or 128 bits at a time looks reasonable, depending on bus width and bandwidth. For writes into memory, we need a sensible indivisible update primitive for multiprocessors, and it is certainly helpful if there is a "natural" data type for which stores are always indivisible, but I see no reason to make every store indivisible. If two parallel tasks update bytes of the same natural unit, ignoring the warnings in the manual, tough. Alignment --------- Secondly, what about alignment? It seems reasonable to me to prohibit accesses to some unaligned units (words on a byte boundary, for instance). However, I found the PE3200 very tedious in requiring alignment of 64-bit quantities, since it meant I had to keep the stack always double-word aligned, in case some subroutine further down declared a local DOUBLE variable. But if there is a clear performance gain, I'd live with the tedium. Addressing Unit --------------- Thirdly, the question of addressing unit. I think this should be clearly distinguished from transfer unit, since we are talking about two quite different design spaces. I see these advantages to word addressing rather than byte addressing a. you get more address space for the same number of bits b. you don't have to multiply array indices by 4, or step pointers by 4 c. you can't accidentally address an unaligned word With high-level languages, (c) shouldn't happen. And I don't attach much importance to (b), since in the first place the cost is small after optimisation, and in the second place arrays come in all sizes, and the compiler had better be able to handle all sizes well. Incidentally, that is also the reason I think most "scaled-index" address modes are useless - making a special case out of arrays with component size 1,2,4 or 8 is pointless, since you still have to generate good code for arrays of component size 6,7,12 or whatever, and the general solution probably works just as well as the special modes. That leaves (a), and given that we are always running out of address space, I think it is a persuasive reason. Byte Arrays ----------- However, word addressing creates some problems, that the machine designer must solve, or see his machine execrated. As a simple example, how do you implement arrays of bytes (or characters). Nobody will tolerate loose packing, so it must be possible to access the individual bytes of an array, and the byte index is a computed quantity. The DG Nova, for instance, has separate instructions to access the left-hand and right-hand bytes. This is pathetic - you have to compute the index and switch on the bottom bit. The CA NM4 has a special address mode, where the value in a certain register is treated as a byte index, and the hardware does the rest - this is a good solution. Of course, you cannot declare a byte array that spans the whole address space, since that's WHY the machine has word addressing. A more serious problem is that the address mode assumes that the array starts on a natural boundary. Given things like Ada's slices and Fortran EQUIVALENCE, you cannot in general assume that a parametric byte array starts on a word boundary. So you have to pass the array as a descriptor (word address, byte offset) that designates the true first element. There is then an extra addition in each index calculation, which can usually be optimised out of loops. The main cost is that the caller has to pass an extra word. I recall a conference at which a designer of the ICL2900 defended the hardware array operations, which indeed assumed word alignment of the array start. She said "But EQUIVALENCEd char arrays in Fortran hardly ever happen." That is the kind of reasoning that drives compiler writers berserk. It doesn't matter if only 0.001% of declared arrays are unaligned. What matters is that over 60% of array references are to formal parameters, where you must always generate the worst-case code. Byte Pointers ------------- And finally the big issue: what happens to all that C code with "char *" littered all over it? There must be a clean way to create, pass as a parameter, and use, a byte pointer. At worst, one can treat the variable as an array of size 1, and pass a descriptor of the kind shown above. This is a high overhead, though, since a. every byte-pointer variable is two words b. a subroutine passed an array usually does a lot of work; one passed a simple pointer can be very small, and the extra one word of parameter overhead is relatively a larger cost. Ones attitude to this depends, I suppose, on how one programs. I could live with it, since I rarely use byte pointers even in text-processing code. Most C programmers I expect will scream, since all that hand-optimised code is now hand pessimised. Postscript ---------- Nearly all machines have trouble with arrays of things smaller than the addressing unit. And many of them have trouble with arrays that aren't some special multiple of the addressing unit. In high-level languages, such arrays are pretty common, and the idea that byte and word arrays are special cases is obsolescent. It seems to me about time the hardware designers built the right indexing primitive, which takes (array root, component index, component size) and where the component size can be anything from 1 bit to HUGE, and the hardware does the scaling as appropriate, either multiplying or dividing the index. Robert Firth
rw@beatnix.UUCP (05/05/87)
In article <1195@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > >Postscript >---------- > >Nearly all machines have trouble with arrays of things smaller than >the addressing unit. And many of them have trouble with arrays that >aren't some special multiple of the addressing unit. In high-level >languages, such arrays are pretty common, and the idea that byte >and word arrays are special cases is obsolescent. One of the most fundamental computer engineering principles is efficient implementation of the most common cases as special cases when there's a suf- ficient performance win to be had, and this certainly applies here. > >It seems to me about time the hardware designers built the right >indexing primitive, which takes > > (array root, component index, component size) > >and where the component size can be anything from 1 bit to HUGE, and >the hardware does the scaling as appropriate, either multiplying or >dividing the index. Are we in a time warp back to the mid-70s? This flies in the face of all the research that led us to RISC. The standard argument against this kind of thing is: 1. For a given amount of hardware, it'll be slower than simpler indexing. 2. For a given speed, it'll need more hardware. 3. If you build the fastest machine you can with this indexing scheme, it will be slower than the fastest machine you can build with simple indexing. 4. If you special case the first couple of sigmas via normal byte or word addressing, you're left with so few cases it no longer makes sense to waste the hardware on it instead of generating special code. Russell Williams ..{ucbvax!sun,lll-lcc!styx,altos86,bridge2}!elxsi!rw