throopw@sheol.UUCP (Wayne Throop) (08/29/90)
> From: meissner@osf.org (Michael Meissner) > IMHO, the 64 bit machine should represent all addresses in bits [...] > Before people lynch me, let me explain, that I think that the > addresses that are not appropriately aligned should trap. Why should people lynch someone for that, when they don't lynch people for advocating byte addresses that may or may not trap (or silently fail) on unaligned operations? (Or IS that a lynching offense? ...) Surely bit-granular addressing is the only sensible way to go... what reasons are against it in a 64-bit world? -- Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org
mash@mips.COM (John Mashey) (08/29/90)
In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: >Surely bit-granular addressing is the only sensible way to go... what >reasons are against it in a 64-bit world? I doubt this is the only sensible way to go, but it does lead to an interesting exercise. First, let us observe that with 64-bit addressing, one could afford to sacrifice 3 bits for this, which is probably not true in the 32-bit world. However, for concreteness, and ease of looking at it, let's think about the 32-bit version. So here's the exercise: 1) Start with any of the popular RISC architectures (or a synthetic, like P&H's DLX). 2) Change the addressing so it uses bit-addressing. 3) (Architecture): what must you change? what instructions would be added or deleted? are there common code sequences that would change? for better, for worse. 4) Implementation: are there any interesting critical-path or area layout issues that show up? 5) Software: which existing languages can make use of this feature, and how? Which languages can PORTABLY make use of this feature, i.e., least-common-denominator theory of software causes most people to avoid unusual features unless there's a big benefit. Are there language extensions one would propose that can portably take advantage of such features? ------ Note: it would probably good to make the assumption that if an architecture traps on unaligned accesses that it would continue to do so. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
daveg@near.cs.caltech.edu (Dave Gillespie) (08/30/90)
>>>>> On 29 Aug 90 06:00:33 GMT, mash@mips.COM (John Mashey) said: > In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: >>Surely bit-granular addressing is the only sensible way to go... what >>reasons are against it in a 64-bit world? > I doubt this is the only sensible way to go, but it does lead to an > interesting exercise. Okay... You could provide two variants of the load/store instructions, one set that trap on unaligned accesses and one set that don't (but are possibly much slower). This latter instruction replaces the specialized "bit-field" instructions that some machines have now. Compilers could have an option to generate only the slow-but-safe instructions for the benefit of fast-but-reckless programmers. I've heard of at least one architecture (SPARC?) that uses this approach. It seems to me that a bit-addressed, aligned-accessing machine can be thought of as just like a byte-addressed machine with three zeros on the right that would have been on the left. (Once we are confident that bit-addressing doesn't cost too much from this point of view, we can go on to see what bit-addressing gains us beyond what we can do now.) The only problem I can think of is that you lose bits in your instruction word. I cringe every time I look at the 68000's short-branch instruction that uses an 8-bit displacement measured in bytes; instructions must fall on even addresses, so one precious bit of the displacement is completely wasted. Now imagine a bit-addressed 68000: four bits are wasted! Most newer machines (like the 88000) don't have this problem; they shift the displacement enough bits to the left so that all bits are useful. You would probably want to adopt this approach for all displacements. Accessing an aligned double-precision float with a 16-bit offset from the frame pointer wastes three bits now, but with bit-addressing it would waste six bits. On a machine like the 88000, there is a huge extra cost if the target doesn't lie within a 16-bit displacement of the index register, and this cost would become much more significant if you didn't auto-shift your displacements to avoid wasting those bits. Certain programming tricks (putting an odd address in a register and then using an odd displacement to access a word on an even address) would be much harder to do with auto-shifted displacements, but I don't think this would be a serious problem. It's much easier to be little-endian than big-endian if you use bit addresses. In bit-addressing, little-endian translates to "LSB is bit zero", and big-endian translates to "LSB is bit 31 (or 15, or 7, or 63, or whatever)". Just look at the 68020's bit-field instructions: Because the rest of the architecture is big-endian, they had to number bits from the left in the bit-field instructions, even though the rest of the architecture numbers bits from the right. If anything, switching to bit-addressing might put an end to one brand of flame war on comp.arch. :-) The hardware would probably be about the same. I hear really fast machines are limited by the carry chain in the incrementer for the program counter; this gets to be three bits shorter if the PC is a bit address. Those auto-shifted displacements might make your addressing unit a bit more complicated, but most new machines already have a shifter there to handle scaled indexing. I doubt software could tell the difference unless it casts pointers to ints and back. If you are writing a compatibility-oriented compiler, probably all your pointers will be multiples of eight. So you could cure even this by doing a three-bit shift when converting between pointers and ints. (Not that I'd advocate such a kludge for any purpose other than to provide an all-the-world's-a-VAX mode for those who absolutely must get software package X to compile effortlessly on your box.) Let's see... Operations on C character arrays now require shifts where previously they didn't. Many of these shifts can be factored into a scaled-indexing addressing mode, but some will not be so amenable. (For example, "p = p + i" or "i = p - q", with "int i; char *p, *q;".) Operations on BCD digit strings would be slightly cleaner. Some Lisp implementations stuff tag bits into the low two bits of a longword's byte address. Now you could fit five tag bits down there. Bit arrays are nicer to work with. It may now be practical to pack a Pascal "array of 0..3" into two-bit chunks, when formerly it might have been inefficient to do so. Features like subranges would suddenly be much more popular among programmers. Oh, no! This would give Ada an advantage over C. :-) Enough! It's someone else's turn now. -- Dave -- Dave Gillespie 256-80 Caltech Pasadena CA USA 91125 daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg
cik@l.cc.purdue.edu (Herman Rubin) (08/30/90)
In article <DAVEG.90Aug29225127@near.cs.caltech.edu>, daveg@near.cs.caltech.edu (Dave Gillespie) writes: | >>>>> On 29 Aug 90 06:00:33 GMT, mash@mips.COM (John Mashey) said: | > In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: > | >>Surely bit-granular addressing is the only sensible way to go... what | >>reasons are against it in a 64-bit world? > | > I doubt this is the only sensible way to go, but it does lead to an | > interesting exercise. > > Okay... > > You could provide two variants of the load/store instructions, one set > that trap on unaligned accesses and one set that don't (but are possibly > much slower). Why much slower? At most two items would have to be loaded, and a shift made. .................... > The only problem I can think of is that you lose bits in your > instruction word. I cringe every time I look at the 68000's > short-branch instruction that uses an 8-bit displacement measured in > bytes; instructions must fall on even addresses, so one precious bit > of the displacement is completely wasted. Now imagine a bit-addressed > 68000: four bits are wasted! We are discussing machines where the memory is sufficiently large that 32-bit addressing is inadequate. Four bits is 1/16 of an address. Most newer machines (like the 88000) > don't have this problem; they shift the displacement enough bits to > the left so that all bits are useful. You would probably want to > adopt this approach for all displacements. Accessing an aligned > double-precision float with a 16-bit offset from the frame pointer > wastes three bits now, but with bit-addressing it would waste six > bits. There are quite a few machines now which have both addresses and indices. For an index, the appropriate shifting is done. This is not new hardware. Again, we are discussing 64-bit address machines. "Wasting" even 8 bits in addressing is only 1/8 of the length of an address. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/31/90)
At the cost of an extra bit, and decoding logic, you can encode aligned data field size in the address as follows: abcdefghijkl0 => 1-bit address abcdefghijkl abcdefghijk01 => 2-bit address abcdefghijk0 to abcdefghijk1 abcdefghij011 => 4-bit address abcdefghij00 to abcdefghij11 abcdefghi0111 => 8-bit address abcdefghi000 to abcdefghi111 abcdefgh01111 => 16-bit address abcdefgh0000 to abcdefgh1111 abcdefg011111 => 32-bit address abcdefg00000 to abcdefg11111 abcdef0111111 => 64-bit address abcdef000000 to abcdef111111 abcde01111111 => 128-bit address abcde0000000 to abcde1111111 ... You could, for example, use such an encoding in the literal index or offset field of your instruction. You lose the ability, eg., to take an odd address and add an odd offset to it, in an indexed addressing mode, to get a properly aligned even address, but that's not too much of a loss. Gould used this approach in the PN and NP computers. The same approach can be used on a bus to get sub-field selection. However, byte lane select flags cost only 7 bits (upto 128 bits), and let you handle data that is misaligned wrt bus boundaries in fewer operations. (Eg. MIPS' load-left and load-right conceivably could be done across a bus in this way). -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/31/90)
>> You could provide two variants of the load/store instructions, one set >> that trap on unaligned accesses and one set that don't (but are possibly >> much slower). > >Why much slower? At most two items would have to be loaded, and a shift >made. You don't think 2X or 3X is much slower? Then let me sell you a CISC system - it's only 2X slower than a RISC. -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
tif@doorstop.austin.ibm.com (Paul Chamberlain/32767) (08/31/90)
In article <DAVEG.90Aug29225127@near.cs.caltech.edu> daveg@near.cs.caltech.edu (Dave Gillespie) writes: >It seems to me that a bit-addressed, aligned-accessing machine can be >thought of as just like a byte-addressed machine with three zeros on >the right that would have been on the left. Now there's an interesting thought. Why not order these 64 bits so that the 3 on the left are the bit offset. Mere mortals would use it just like a byte addressed machine but wizards could use those 3 bits anyway they like. Hmm, I guess that could complicate adressing the n-th bit in the system. You'd have to make an instruction which rearranges the bits in a pointer. Paul Chamberlain | I do NOT represent IBM tif@doorstop, sc30661@ausvm6 512/838-7008 | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif
petolino@joe.Eng.Sun.COM (Joe Petolino) (08/31/90)
>>You could provide two variants of the load/store instructions, one set >>that trap on unaligned accesses and one set that don't (but are possibly >>much slower). Although you don't say so, I assume that the slower variants would actually give the right results. Remember that there are architectures out there that neither trap nor work correctly when presented with unaligned addresses. >> This latter instruction replaces the specialized >>"bit-field" instructions that some machines have now. Compilers could >>have an option to generate only the slow-but-safe instructions for the >>benefit of fast-but-reckless programmers. I've heard of at least one >>architecture (SPARC?) that uses this approach. No, SPARC doesn't do this, and I'd be surprised if any other architecture did. See answer to next question. However, I think there is a SPARC compiler option to do unaligned loads and stores using software emulation. This is of course very slow. >Why much slower? At most two items would have to be loaded, and a shift >made. The main reason not to allow unaligned accesses is that supporting unaligned accesses greatly increases the *complexity* of the memory system in a machine with virtual addressing and caches. Speed is probably not an issue - in the implementations I've seen (various IBM 370 implementations), you only pay a speed penalty if you actually use an unaliged address. If you did implement instructions that work with unaligned addresses, there would be little reason to also implement the trap-on-unaligned variants - the complexity has to be there anyway. As an example of the kind of complexity that arises with support for unaligned accesses, imagine a store that spans a virtual page boundary, and only one of the pages is write-protected. -Joe
harpermh@mentor.cc.purdue.edu (Matthew Harper) (08/31/90)
In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes: >In article <DAVEG.90Aug29225127@near.cs.caltech.edu> daveg@near.cs.caltech.edu (Dave Gillespie) writes: >>It seems to me that a bit-addressed, aligned-accessing machine can be >>thought of as just like a byte-addressed machine with three zeros on >>the right that would have been on the left. > >Now there's an interesting thought. Why not order these 64 bits so that >the 3 on the left are the bit offset. Mere mortals would use it just like >a byte addressed machine but wizards could use those 3 bits anyway they like. > This isn't a new idea - Texas Instuments line of graphics coprocessor chips TMS34010 & TMS34020 can address bits with nearly every instruction... But, as noted previously, performance lags when access is unaligned --- even when running out of cache...
daveg@near.cs.caltech.edu (Dave Gillespie) (08/31/90)
>> The only problem I can think of is that you lose bits in your >> instruction word. I cringe every time I look at the 68000's >> short-branch instruction that uses an 8-bit displacement measured in >> bytes; instructions must fall on even addresses, so one precious bit >> of the displacement is completely wasted. Now imagine a bit-addressed >> 68000: four bits are wasted! Herman Rubin responds in <2491@l.cc.purdue.edu>: > We are discussing machines where the memory is sufficiently large that > 32-bit addressing is inadequate. Four bits is 1/16 of an address. That's not the point. The 68000 puts a short branch in a 16-bit instruction. Eight bits of that instruction encode the displacement. Using a naive encoding, you are wasting four of eight bits, *NOT* four of 64. The problem here is wasting bits in the instruction word, which reflects directly on performance unless your instruction cache is 100% perfect. >> Most newer machines (like the 88000) >> don't have this problem; they shift the displacement enough bits to >> the left so that all bits are useful. You would probably want to >> adopt this approach for all displacements. Accessing an aligned >> double-precision float with a 16-bit offset from the frame pointer >> wastes three bits now, but with bit-addressing it would waste six >> bits. > There are quite a few machines now which have both addresses and indices. > For an index, the appropriate shifting is done. This is not new hardware. I don't think you are using the terminology in quite the way I do, but, yes, the hardware is already there, which is why I mentioned the idea shortly after this paragraph. >> Again, we are discussing 64-bit address machines. "Wasting" even 8 bits >> in addressing is only 1/8 of the length of an address. Once again: Many compilers for 68000-style machines require functions to have no more than 32K bytes worth of local variables. This is so that they can use a fast 16-bit displacement from the frame pointer to refer to those variables. Without auto-scaling, those same 16-bit displacements now allow only 32K *bits* of local variables. So to do the same job a 68000 does now would require a compiler to generate longer displacements, i.e., larger instructions, which take more time to fetch from memory, which in turn slows everything down. Many new machines support auto-scaling of index registers, but none that I have seen support auto-scaling of fixed displacements in the instruction word. I was trying to show why bit addressing would create a much more compelling need for this feature. Of course, few functions have more than 32K bits worth of locals, so perhaps it's not that great a sacrifice. But compilers that previously could get away with imposing a hard limit to make their code generators simpler would now pretty much have to be willing to generate longer displacements when the need arises. Many architectures, like the 68000 and the 88000, make full 32-bit displacements significantly slower or more awkward to use. -- Dave -- Dave Gillespie 256-80 Caltech Pasadena CA USA 91125 daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg
daveg@near.cs.caltech.edu (Dave Gillespie) (08/31/90)
> = Joe Petolino, >> = Herman Rubin, >>> = Me >>>You could provide two variants of the load/store instructions, one set >>>that trap on unaligned accesses and one set that don't (but are possibly >>>much slower). > Although you don't say so, I assume that the slower variants would actually > give the right results. Remember that there are architectures out there > that neither trap nor work correctly when presented with unaligned addresses. Yes, I meant that one variety would trap, the other would do the extra work to do an unaligned access correctly. >>> This latter instruction replaces the specialized >>>"bit-field" instructions that some machines have now. Compilers could >>>have an option to generate only the slow-but-safe instructions for the >>>benefit of fast-but-reckless programmers. >>Why much slower? At most two items would have to be loaded, and a shift >>made. > The main reason not to allow unaligned accesses is that supporting > unaligned accesses greatly increases the *complexity* of the memory system > in a machine with virtual addressing and caches. Speed is probably not an > issue - in the implementations I've seen (various IBM 370 implementations), > you only pay a speed penalty if you actually use an unaliged address. > If you did implement instructions that work with unaligned addresses, there > would be little reason to also implement the trap-on-unaligned variants - > the complexity has to be there anyway. As an example of the kind of > complexity that arises with support for unaligned accesses, imagine a > store that spans a virtual page boundary, and only one of the pages is > write-protected. I thought perhaps the extra hardware to handle unaligned bit addresses might be unpleasant; you would have to shift, e.g., 64 possible ways, so you would probably want to use the ALU's shifter for this rather than a special shifter in the bus interface. So I envisioned one set of instructions that are slow because they need the ALU to be available, and one that can use a maximally-simple bus interface directly and can run simultaneously with ALU instructions. Of course, a clever processor could have a single load instruction that computed the address, tested if it was aligned, and, if not, only then went on the requisition the ALU. Then you would need only just type of instruction, but it would make the hardware more complicated. It would now be data-dependent whether a load instruction and an ALU instruction could be allowed to run simultaneously. Maybe barrel shifters are fast and easy enough these days that this isn't a big deal. But my guess is that it still is. -- Dave -- Dave Gillespie 256-80 Caltech Pasadena CA USA 91125 daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg
chip@tct.uucp (Chip Salzenberg) (08/31/90)
According to cik@l.cc.purdue.edu (Herman Rubin): >We are discussing machines where the memory is sufficiently large that >32-bit addressing is inadequate. Not quite. We are discussing machines where the *address space* is too large for 32 bits. For widely-used machines, physical memory will usually be less than four gigabytes, and thus 32-bit addressable. >Four bits is 1/16 of an address. I would not so blithely throw away 15/16 of my address space. >Again, we are discussing 64-bit address machines. "Wasting" even 8 bits >in addressing is only 1/8 of the length of an address. I most certainly would not throw away 255/256 of my address space. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip>
colin@array.UUCP (Colin Plumb) (08/31/90)
In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes: > Now there's an interesting thought. Why not order these 64 bits so that > the 3 on the left are the bit offset. Mere mortals would use it just like > a byte addressed machine but wizards could use those 3 bits anyway they like. Because it's unnecessary. Where is some code that depends on the fact that incrementing a character pointer adds 1 to the value of its bit pattern, if read as an integer? I'm sure there is some, buried somewhere, but for the most part, the only code that does such pointer diddling in in C and C hides the representation of pointers sufficiently, unless you really try. If you're extra paranoid, define the cast from a pointer to a long and back to do a shift by 3 bits; then you only have to worry about unions. -- -Colin
dave@fps.com (Dave Smith) (09/01/90)
In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes: >Now there's an interesting thought. Why not order these 64 bits so that >the 3 on the left are the bit offset. Mere mortals would use it just like >a byte addressed machine but wizards could use those 3 bits anyway they like. That's a sick idea, right up there with the way that Intel arranged the bits in the segment descriptors for the '286/'386. You can't increment your bit address and walk right through memory, you've got to do all kinds of funky addressing. This bit addressing stuff is all fine and well, but it seems like a lot of extra baggage to carry around for a little bit-twiddling every now and then. Why not just have operations that will return the nth bit? -- David L. Smith FPS Computing, San Diego ucsd!celerity!dave or dave@fps.com
throopw@sheol.UUCP (Wayne Throop) (09/02/90)
> From: mash@mips.COM (John Mashey) > However, it is still clear that bit-addressing would break more programs > than byte-addressing, so all I wanted was for someone to work through > a complete example to show why it would be A Good Thing on balance. Looking at it from another perspective, I think it certain that many, many MORE programs (and mostly the same sloppy programs, BTW, IMHO) will be broken by making pointers 64 bits long, whether or not integers are also made 64 bits long. Further, programs that count on pointer formats being byte granular are already not maximally portable, since many machines are still word addressed (though, granted, this is becomeing quite rare). This in mind, the backwards compatibility issue is a small one, IMHO. Of course, the complete example showing the net effects is a good idea. > From: daveg@near.cs.caltech.edu (Dave Gillespie) > It's much easier to be little-endian than big-endian if you use bit > addresses. In bit-addressing, little-endian translates to "LSB is bit > zero", and big-endian translates to "LSB is bit 31 (or 15, or 7, or > 63, or whatever)". Well, OK, but why go after the least significant bit anyways? Isn't it more likely to be needing the MSB (at least, for many operations)? Or, to put it another way, I think the situation is still balanced endian-wise on a bit-addressed machine. Some operations are easier on little-endian machines, others on big-endian. Certainly as a practical matter, a bit-addressed machine can quite easily be big-endian: the bit-addressed machine I'm most familiar with was big-endian for example, and this didn't interfere with the addressing granularity one whit. > From: chip@tct.uucp (Chip Salzenberg) > I would not so blithely throw away 15/16 of my address space. Why not? You are (or at least the industry at large is) willing to throw away 3/4 of it so that addresses are byte-granular despite the fact that so many loads/stores are 32-bit aligned. What's that I hear from the cheap seats? It's important to be able to reference characters, since so many things are text-oriented? Really? So why isn't it important to indicate pixels, bitmaps, positions in huffman (and other bit granular) code strings, page table property bits, boolean arrays, and any of the other zillions and zillions of naturally-bit-granular things that abound in the software written every day? I still claim that trading address space for finer granularity is a Good Thing (even a Very Good Thing), IF we are talking about at-least-64 bit addresses. (I'd even argue it for 32 bit addresses, but even I have to admit it's dodgier there.) > From: daveg@near.cs.caltech.edu (Dave Gillespie) > The 68000 puts a short branch in a 16-bit > instruction. Eight bits of that instruction encode the displacement. > Using a naive encoding, you are wasting four of eight bits, *NOT* > four of 64. From the rest of Dave's posting, it's not clear to me that he's really objecting to bit-granular addresses on the above grounds. (In fact, it seems not to be the case.) But let me respond as to a hypothetical person who *does* think this is an objection. I hope Dave takes no offense. The point is, why use a naive encoding? Presumably a RISC would insist on word-of-some-size aligned istreams, so word-granular offsets in instructions would remain, just as today. Similarly for offsets and partial addresses in instructions that manipulate other aligned things. Sure, let the people who want to operate on entities of bit granularity pay for the privilege. But there's no reason to require them to tie their hands behind their backs and stand on their heads and spin hula-hoops on their feet while they type with their noses either. -- Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org
daveg@near.cs.caltech.edu (Dave Gillespie) (09/02/90)
>>>>> On 2 Sep 90 03:27:13 GMT, throopw@sheol.UUCP (Wayne Throop) said: > Looking at it from another perspective, I think it certain that many, > many MORE programs (and mostly the same sloppy programs, BTW, IMHO) will > be broken by making pointers 64 bits long, whether or not integers are > also made 64 bits long. Good point! > From: daveg@near.cs.caltech.edu (Dave Gillespie) >> The 68000 puts a short branch in a 16-bit >> instruction. Eight bits of that instruction encode the displacement. >> Using a naive encoding, you are wasting four of eight bits, *NOT* >> four of 64. > From the rest of Dave's posting, it's not clear to me that he's really > objecting to bit-granular addresses on the above grounds. (In fact, it > seems not to be the case.) But let me respond as to a hypothetical > person who *does* think this is an objection. I hope Dave takes no > offense. No offense taken---you got my point exactly. Many existing architectures put byte offsets into the instruction word even though alignment constraints mean some of the bits of the offset will always be zero. Why not omit the zeros and put those bits on the left where they can do some good! If it's only one or two bits it's not such a big deal, but a bit-addressed machine would have to take the wastage of, say, 4-6 bits much more seriously. Some machines do this now for branch offsets, but I don't know of any that do it for all types of offsets. A couple other people have implied that 64-bit machines would have 64-bit instruction words, with plenty of bits to spare. But I really doubt this will happen: The temptation to use that wide bus to fetch two 32-bit instructions in one gulp is probably too great. Even 32-bit instruction sizes are controversial. > Sure, let the people who want to operate on entities of bit granularity > pay for the privilege. But there's no reason to require them to tie > their hands behind their backs and stand on their heads and spin > hula-hoops on their feet while they type with their noses either. Um, er, my sentiments exactly... -- Dave -- Dave Gillespie 256-80 Caltech Pasadena CA USA 91125 daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg
cik@l.cc.purdue.edu (Herman Rubin) (09/02/90)
In article <11192@celit.fps.com>, dave@fps.com (Dave Smith) writes: > In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes: ..................... > This bit addressing stuff is all fine and well, but it seems like a lot of > extra baggage to carry around for a little bit-twiddling every now and then. > Why not just have operations that will return the nth bit? But what if k bits, starting from the n-th bit, are wanted? This is not all that uncommon, and may be needed frequently, including supercomputers. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
dave@fps.com (Dave Smith) (09/05/90)
In article <2504@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >But what if k bits, starting from the n-th bit, are wanted? This is not >all that uncommon, and may be needed frequently, including supercomputers. Shift to the left, shift to the right do some or's. What you're asking for makes the hardware more complex and therefore SLOWER! I'm not a hardware designer but I listen to them whine and moan all the time about these kinds of things, so let me put forth what I think would be needed to do that (y'all can flame me if you like) Ok, here's some ways to give k bits starting at the n-th bit: a) A bit-addressable memory, n (call it 32) bits wide. So, it has a lot of 1 bit wide memory cells. Any one of these bits can be bit 0 thru n on the output bus. How do you hook the memories to the bus? Each cell to each line of the bus? Ick! Lots of circuitry and lots of (overlapping!) traces on your board. A cross-bar circuit would work but those are expensive. We could load the memories into a barrel shifter and then shift and then do another load. b) Do the work on the processor end. This pretty much means you've gotta do shifts since your memory will be giving you word aligned data. No crossbars here. You'll have to do two memory fetches on non-word aligned fetches. c) 1 bit data path. This is silly, but it does supply the wanted functionality. Suppose you go ahead and do the circuitry for this. This means this circuitry is in the critical path for data fetches, ALWAYS. Even when it's not used, it's another layer of circuitry. When you're working with faster machines, nano-seconds count. Most of the time you'll be working with 32-bit word, or even double-word aligned data. So why pay the extra penalty for this circuitry on _every_ data fetch? All you're doing is hiding the bus from the programmer. Let the compiler do that for you (sorry, Herman, I know how you feel about HLL's). Byte addressing is bad enough and a lot of people probably wouldn't support if it weren't for the fact that a lot of I/O devices require you to be able to write only one byte to a certain address and nothing more. Communications protocols that are byte-aligned make processing them go slower and don't even start me off on bit-stuffed formats (thank YOU, ISO for ASN.1!). -- David L. Smith FPS Computing, San Diego ucsd!celerity!dave or dave@fps.com
pauls@apple.com (Paul Sweazey) (09/06/90)
Whole issue of bit/byte/etc. address pointers reminds me alot of the big-endian vs. little-endian wars. Most (but not all) people argue for one or the other based on what they think is the "natural" structure. The idea of bit-address pointers was brought up as an architectural issue because the bit is the only non-arbitrary basic quantum of data in binary logic. This is a great academic excercise. I would also add data types of any bit length. For such a machine I want to know what kinds of hardware and software systems we might end up with. It is obvious to most that todays computing systems would not get faster by having the hardware support bit-level addressing, much less an infinite number of data types. But there are entire classes of systems that don't get explored (or even conceived of) because they don't map well into the architectural constraints of computing systems today . I am mildly surprised that brainstorms about revolutionary architectures are rare in comp.arch. We could be just a brainstorm away from the next killer-micro-wave. Paul Sweazey pauls@apple.com 408-974-0253
crowl@cs.rochester.edu (Lawrence Crowl) (09/07/90)
In article <26DE7EE3.58FC@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes: >We are discussing machines where the *address space* is too large for 32 >bits. For widely-used machines, physical memory will usually be less than >four gigabytes, and thus 32-bit addressable. A bit granularity in virtual addresses does not imply a bit granularity in physical addresses. For instance, we could have a bit-grain 64-bit virtual address and a word-grain 32-bit physical address. Before anyone jumps on me for such a bizzare notion, note that some microprocessors do not provide the low bit (or two) of the physical address on the pinout. (Although some have byte select pins, from which you can infer the address.) -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{ames,rutgers}!rochester!crowl Rochester, New York, 14627
mshute@cs.man.ac.uk (Malcolm Shute) (09/07/90)
I've come in on the middle of this discussion, so forgive me for any misconceptions. Can I first check that I understand some of the background to this correctly. Is the following correct: 1) People want to move on from 32-bit addressing (either for genuine reasons of running out of address space or 'fashion' ones of seeing 32-bit as passe'). 2) A single leap to 64-bit seems rather drastic, but there are good engineering and fashion reasons for not going to an intermediate non-power-of-two. 3) Bit addressing would be one method of disguising the introduction of a 58-bit machine as a power-of-two one whilst at the same time moving some useful bit-twiddling hardware into the data-fetch logic instead of in the occasional ALU instruction logic. > In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes: > > Now there's an interesting thought. Why not order these 64 bits so that > > the 3 on the left are the bit offset. [...] In article <643@array.UUCP> colin@array.UUCP (Colin Plumb) writes: > [It need not be axiomatic that > incrementing a character pointer adds 1 to the value of its bit pattern, if > read as an integer] Moreover, can I comment on the obvious: integer data are just special cases of fixed-point. It is simple enough to think of this 64-bit machine as being a fixed-point, word-addressed one (with 6 binary places of fraction to indicate fractions of a word). Let's use octal to represent an example of such a quantity: 12345671234567123456.71 (all digits represent 3-bits, except the most sig one). To 'increment' this number correctly requires some knowledge of what we think 'increment' means. Thus we would add 1.00 to it in order to increment to the same position within the next word, or add 0.10 to it in order to increment to the same position within the next byte, or add 0.01 to it to point to the next bit. It is also necessary to allow for different data types being used throughout the memory. A table of string pointers might be 'compacted' to a fixed-point format with only 3 binary places of fraction. A jump table, or a block of integer variables might be compacted to 0-binary-place fixed-point (i.e. integer); the offset on a branch instruction would make sensible use of this compaction too. Lastly, it is necessary to determine how to represent the result of the increment. An increment to a 6-bp (binary-place) fixed-point pointer would presumably need to deliver a 6-bp result to the address hardware. An increment to a 0-bp number (an integer) would need to deliver a 0-bp result. And an increment to a 0 or 3-bp compacted pointer in a table would need to return a 0, 3 or 6-bp result, depending on context. So, our 'increment' operation needs to ask three questions: 1) By how much should I increment? 2) What is the data-type of the object I am incrementing? 3) What is the data-type of the result I must deliver? I don't think that I have said anything in this article which has not already been stated or implied. However, I wanted to spell it out on paper, just to check in my mind that I understand thus far. -- Malcolm SHUTE. (The AM Mollusc: v_@_ ) Disclaimer: all
throopw@sheol.UUCP (Wayne Throop) (09/10/90)
> From: pauls@apple.com (Paul Sweazey) > It is obvious to most that todays computing systems would not get faster > by having the hardware support bit-level addressing, much less an infinite > number of data types. I slightly disagree. I think there are a largeish class of things that would get faster just by having the architectural address format be bit granular, thus avoiding hauling around and paying for loading and storing pointer-plus-bit-descriptor instead of just pointers. Such as packed subrange integer types or boolean types, as might be used in compression schemes, in symbol tables, in graphics applications, in hardware descriptors, and so on and on. Note well, I'm not necessarily supposing that the memory system per se needs to support bit granular access. Just that the standard pointer format ought to be able to indicate any bit in the virtual address space. Because otherwise, bit entities must live in a restricted part of the address space, or it becomes much more costly than necessary to represent bit granular or bit aligned entities. I claim that such entities are artificially less common than they "ought" to be, simply because putting them into byte granular and byte aligned containers, while wasting memory (sometimes a significant amount of memory), is currently more convenient. I therefore think there is a significant class of algorithms which would indeed get faster... perhaps a couple of tens of percent faster, perhaps even a couple of times faster. And along with that is a significant class of algorighms which would consume significantly less memory. -- Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/10/90)
> From: pauls@apple.com (Paul Sweazey) > It is obvious to most that today's computing systems would not get faster > by having the hardware support bit-level addressing, ... I'll agree with that, but not for Sweazey's reason. Computing systems wouldn't get faster if bit-level addressing were supported because some of them (such as the NS32?32) *already* support bit-level addressing. The NS32?32 series has CBIT[I] offset, base (base).[offset:1] := 0 IBIT offset, base (base).[offset:1] ^= 1 SBIT[I] offset, base (base).[offset:1] := 1 TBIT offset, base (base).[offset:1] ^= 0 all of which first do F := (base).[offset:1] Here the base can be a register (in which case offset is 0..31) or a memory address (in which case offset can be anything). There is also CVTP offset, base, dest dest := (&base)*8 + offset which converts a byte address and a bit offset to a bit address. Thus: struct { int a:2, b:1; } x; int (*p):1 = &(x.b); *p = 1; would turn into ; x at -4(FP), p at -8(FP) CVTP 2, -4(FP), -8(FP) ; p := &(x.b) SBITD -8(FP), @0 ; *p := 1 I haven't an M68020 manual handy, but I believe the bit-field instructions there could be used similarly. (The 32?32 also has bit-field instructions.) This might suggest a compromise position: a machine with 64-bit data paths and ALU, 32-bit *word* addresses, but load/store 1/8/16/32 instructions which combined a word address with a 32-bit bit offset. A 32-bit word address would let you address 32Gb. -- Psychiatry is not about cure. Psychiatry is about power.