ksg@houxl.UUCP (K.GRANT) (10/03/84)
I'm interested in the byte alignment problem which exists in 32 bit machines. If a processor is given an address for a 32 bit operand (hereafter called a word) which is not aligned on a word boundary it can either: 1) fault 2) fetch the word in two accesses. (Are there memories which support arbitrarily aligned accesses to bytes, two byte quantities, and four byte quantities? If not, why not?) Do most processors support the first or second option? If a processor chooses the second option, how does it handle memory faults between accesses? How does it support restartability of that instruction? What other problems have I omitted? Does the logic design become unbearable? Thanks, houxl!ksg
jackson@uiucdcsb.UUCP (10/05/84)
If I remember correctly, most IBM machines use the 1st option.
bcase@uiucdcs.UUCP (10/05/84)
The main reason most memories don't support all kinds of arbitrary alignments is that the critical path to memory is lengthened. Thus, even memory references which do not require the alignment hardware pay the price and are slowed down. And sometimes the hardware itself can be more expensive than the memory chips (in small memories). bcase
cem@intelca.UUCP (Chuck McManis) (10/06/84)
One of the nicer things the DEC-10s and 20s had was something called a byte pointer. In this context a byte was any arbitrary grouping of bits up to 16 I believe, but it may have gone up to 36, anyway the a string of "bytes" had to start on an 36 bit word boundary but you could have very long strings (ie all of a text file could be considered a string of 7 bit bytes) There were several commands for using bytes notably LDB, DPB for Load from byte pointer and Deposit to byte pointer, both of these opcodes took an accumulator and an effective address of the byte pointer, and transferred the byte to/from the lower (rightmost) bits into/out of memory. It also took care of odd bits, (ie 5 7bit ascii bytes would fit into a 36 bit word with one bit (the lsb) left over) There were also autoincrement and autodecrement modes (this is DEC right ? :-)) and were quite convenient(sp?) for manipulating things smaller than a word. Memory was always accessed as a 36 bit word and the extraction was done in microcode I am pretty sure. I sure wish some of todays processors were so talented and didn't need such archaic things such as byte, and word alignment with bytes fixed at 8 bits. --Chuck -- -- Chuck - - - D I S C L A I M E R - - - {ihnp4,fortune}!dual\ All opinions expressed herein are my {proper,idi}-> !intelca!cem own and not those of my employer, my {ucbvax,hao}!hplabs/ friends, or my avocado plant. :-} ARPAnet : "hplabs!intelca!cem"@Berkeley
guy@rlgvax.UUCP (Guy Harris) (10/07/84)
> One of the nicer things the DEC-10s and 20s had was something called a > byte pointer.... Memory was always accessed as a 36 bit word > and the extraction was done in microcode I am pretty sure. I sure > wish some of todays processors were so talented and didn't need > such archaic things such as byte, and word alignment with bytes > fixed at 8 bits. On the original KA10, a quick look at the timing would indicate that an LDB or STB instruction did its work via the old trick of "shift and mask"; the timings were dependent on how many bits you had to shift the word right or left. (The KA10 wasn't microcoded, and I don't think the KI10 was, either; I don't remember whether the KL10 timings implied it was done by shifting or not.) You need a barrel shifter or somesuch to make it doable in a fixed number of cycles. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
johnl@godot.UUCP (10/09/84)
There seem to have been four stages of byte addressing philosophy. 1. Prehistoric: Machines like the 1620 and Z80 which were addressed a digit at a time, and built that way. No alignment constraints, since there were no performance implications thereof. 2. Early, such as IBM System 360 and the PDP-11: Byte addressed but word- implemented. Objects must be aligned on "natural" boundaries, i.e. multiples of their own size, and you get a program fault if they're not. Sometimes software caught the faults and made it appear that arbitrary alignment was possible, although very slowly. 3. Decadent, such as IBM 370 and Vax: Assembler programmers complained about having to align stuff, so the misalignment was handled in microcode. There's still a penalty for misalignment, but it's not so bad. 4. Post-modern, such as Pyramid 90X, Berkeley RISC, and Stanford MIPS: Hardware and software designers start to talk to each other, and find that a) teaching compilers to deal with alignment isn't that hard, and b) if you do so, you buy back a lot of performance. There have also been strange intermediate stages such as least one post-modern machine that enforces alignment by ignoring the low-order bits of the address. I suppose it would be possible to have fiendishly clever memory designs where adjacent words were always in different memory banks so you could cycle both at the same time. Sounds pretty awful, though, since you have to determine for each memory reference how many memories to cycle and how to splice the parts together. As far as I can tell, it's never been seriously proposed for implementation, except perhaps incidentally in very large cached architectures such as the IBM 308X. John Levine, ima!johnl
crandell@ut-sally.UUCP (Jim Crandell) (10/09/84)
There is in fact a rather obvious solution to the alignment/performance issue (I cannot bring myself to call it a problem) which satisfies everyone except the members of a group I shall refer to as X. Assuming eight-bit bytes and 32-bit words, the method involves four independent byte-wide* memories (two LSBs for bank select), a +0:+1:+2:+3 circuit, and eight 4-by-4 crossbar switches. The exact details of implementation, the reasons for its unpopularity, and the identity of X are left as rather trivial exercises for the reader. * Mostek legal department please note hyphen and lack of capitals. -- Jim Crandell, C. S. Dept., The University of Texas at Austin {ihnp4,seismo,ctvax}!ut-sally!crandell
bob@anwar.UUCP (Bob Erickson) (10/10/84)
A note about the Pyramid 90x. After further discussion between software and hardware engineers, it was determined that it wouldn't be very difficult or expensive (speedwise) to implement almost arbitrary byte alignment (e.g. longwords accessed on any even address) in the microcode. Pyramid will be offering this microcode change to its' customers real soon now. I think this implies that arbitrary byte alignment does not necessarily imply a performance penalty in the global throughput of a machine. I know compilers can be taught to do alignment, but many programmers using C's simple address arithmetic mechanisms, can't be. :-) -- ========================================================== Be Company: HHB-Softron 1000 Wyckoff Ave. Mahwah NJ 07430 201-848-8000 UUCP address: {ihnp4,decvax,allegra}!philabs!hhb!bob
bprice@bmcg.UUCP (10/10/84)
In article <ima.426> John Levine, ima!johnl writes: >I suppose it would be possible to have fiendishly clever memory designs >where adjacent words were always in different memory banks so you could >cycle both at the same time. Sounds pretty awful, though, since you have >to determine for each memory reference how many memories to cycle and how >to splice the parts together. As far as I can tell, it's never been >seriously proposed for implementation, except perhaps incidentally in very >large cached architectures such as the IBM 308X. Indeed, it has been done. The Burroughs B1700-B1800-B1900 series had a memory of two banks, interleaved by word. If the word containing the first address were in bank A, it was fetched. At the same cycle, bank B was accessed with address n or n+1, as appropriate. As I recall, the words were 32 bits. The processor's data path was 24 bits, so many possibilities arose: all bits in the same word; all bits in two words, but in one processor word; the operand in two memory words, and two processor words;... There were several features of the B1700 architecture that were noteworthy--the bytes that we all know and love, that we are discussing here--the B1700 addressed bits, and operand sizes were given in several levels: address granularity in the code, "byte" size for string operations, string length. The processor was microcoded, and several interpreters were provided: COBOL-RPG, Fortran, and MCP (the operating system), were the most popular. Interpreter selection was dynamic--the operating system used a different one than the programs did, quite often. The microcode had a dedicated memory, and the size of the microstore was the subject of a hardware option: it ranged from zero to nearly enough. Overflow microcode resided in main memory. Enough for now. Maybe somebody who knows more about the B1700 could carry on. --Bill Price -- --Bill Price uucp: {decvax!ucbvax philabs}!sdcsvax!bmcg!bprice arpa:? sdcsvax!bmcg!bprice@nosc
tom@hcrvx1.UUCP (Tom Kelly) (10/11/84)
> I suppose it would be possible to have fiendishly clever memory designs > where adjacent words were always in different memory banks so you could > cycle both at the same time. Sounds pretty awful, though, since you have > to determine for each memory reference how many memories to cycle and how > to splice the parts together. As far as I can tell, it's never been > seriously proposed for implementation, except perhaps incidentally in very > large cached architectures such as the IBM 308X. If my memory serves me correctly, the CDC 6600 series had something like this. It was called "Phased Memory" - the configuration we had was described as "40 banks phased". My understanding was that adjacent addresses were in different memory banks, so that memory access could be overlapped. Of course, the 6600 had no byte addressing. If you wanted a byte, you fetched a word and then used shifts and masks. Tom Kelly (416) 922-1937 {utzoo, ihnp4, decvax}!hcr!hcrvx1!tom
wayne@bambi.UUCP (Wayne Wilner) (10/13/84)
Thanks to Bill Price for explaining the B1700...B1900 bit-addressibility. The one feature that seems to always escape everyone's attention is that variable-length strings were addressed by a triple: starting address length direction not just the usual address-length pair. By having the "direction" parameter, one could fetch toward higher or toward lower addresses from the starting address. So add to Bill's description that if bank A were given address N, then bank B might be given N, N+1, or N-1. --mhuxl!thumper!wayne Wilner Bell Communications Research
mjl@ritcv.UUCP (Mike Lutz) (10/14/84)
Ah, the B1700! I've waxed loquacious on this fascinating machine before, but what the hell! it's time for another lecture. Bill Price discussed its primary distinctive feature already, namely bit addressable memory. Just to fill in a bit ;-): a single micro instruction could read from 1 to 24 bits beginning at any bit address in main memory (and scanning in either direction). The number of bits could be a constant, or determined by the current ALU precision (1-24). Programmer controlled variable sized operands were supported by another B1700 innovation: if the microinstruction register (MIR) was the destination of a micro-instruction, then the value it was sent was logically or'ed with the next micro-instruction fetched; the resulting bit pattern was the actual microinstruction executed. Thus if one had a bit count in, say, register Y, and wanted to read that number of bits from main memory into register X, the code looked like the following: MOVE Y TO MIR READ 0 BITS TO X This technique was used throughout the micro architecture to support jump tables, status bit testing, and variable length shifts & rotates. The Burrough's designers put a lot of effort into creating a micro architecture for which efficient emulators and interpreters could be rapidly developed. In my case, I was a graduate student on a project investigating microprogramming and emulation, and the B1700 was a godsend to us, as it made it relatively easy to develop and investigate new architectural ideas. It's hard to explain what made the micro architecture so good, but my experience and that of me and my colleagues was that the pieces fit together well. There always seemed to be a natural, even elegant, solution to microprogramming problems which fit hand in glove with what the hardware provided. On top of this base, Burrough's constructed emulators for specialized FORTRAN, COBOL, and SDL (a systems programming language) machines. SDL was used for the operating system, compilers, and most of the utilities. As Bill Price mentioned, processes using different architectures could be multiprogrammed, with the appropriate emulator being invoked as part of a context switch. The O.S. was ahead of for its time, supporting multiprogramming, virtual memory, and multiple emulators on systems with as little as 48Kbytes of memory and using RK05-like discs. Enough nostalgia, though I do believe any serious computer architect could learn a lot from the B1700's design. Mike Lutz P.S. Wayne Wilner, one of the B1700 designers, used to be on the net. If he still is, maybe he could fill in more of the details that I forgot. -- Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {allegra,seismo}!rochester!ritcv!mjl ARPA: ritcv!mjl@Rochester.ARPA
guy@rlgvax.UUCP (Guy Harris) (10/15/84)
> Ah, the B1700! I've waxed loquacious on this fascinating machine > before, but what the hell! it's time for another lecture. > > Bill Price discussed its primary distinctive feature already, namely > bit addressable memory. Just to fill in a bit ;-): a single micro > instruction could read from 1 to 24 bits beginning at any bit address > in main memory (and scanning in either direction). The number of bits > could be a constant, or determined by the current ALU precision > (1-24). Well, on the IBM 7030 (a/k/a STRETCH), a single *macro* instruction (in the sense of non-micro, not in the sense of a macro assembler) could read from 1 to 32 (or possibly even 64) bits beginning at any bit address in memory, although it only scanned in the "forward" direction. The field length was part of the instruction. > Programmer controlled variable sized operands were supported by another > B1700 innovation: if the microinstruction register (MIR) was the > destination of a micro-instruction, then the value it was sent was > logically or'ed with the next micro-instruction fetched; the resulting > bit pattern was the actual microinstruction executed. Thus if one had > a bit count in, say, register Y, and wanted to read that number of bits > from main memory into register X, the code looked like the following: > > MOVE Y TO MIR > READ 0 BITS TO X Sounds like the microinstruction equivalent of the IBM 360's EX (execute) instruction, which uses it for much the same purpose. The MVC (move character) instruction has the character count in the instruction itself, so if you have the number of characters to be moved in a register you just do an EX and tell it to stuff the count into the instruction being executed. (The 370 introduced a MVCL instruction (move characters long, or was it MVL for "move long"?) which had the addresses and count in registers; it allowed you to move more than 256 characters at a time, and was interruptible so you couldn't lose interrupts by trying to move all of main memory around. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
ken@turtlevax.UUCP (Ken Turkowski) (10/15/84)
> There is in fact a rather obvious solution to the alignment/performance > issue (I cannot bring myself to call it a problem) which satisfies > everyone except the members of a group I shall refer to as X. > Assuming eight-bit bytes and 32-bit words, the method involves four > independent byte-wide* memories (two LSBs for bank select), a > +0:+1:+2:+3 circuit, and eight 4-by-4 crossbar switches. The exact > details of implementation, the reasons for its unpopularity, and the > identity of X are left as rather trivial exercises for the reader. An extension of this technique is used in high-performance raster graphics systems, which allows access to several horizontally-, vertically-, or block-contiguous pixels. It is called a tesselated frame buffer, and is described in two recent papers, one about a year ago in IEEE Computer Graphics and Applications, the other in this years' SIGGRAPH tutorial on State-of-the-Art in Image Synthesis. If there is any interest, I can dig up the exact references and post them to the net. -- Ken Turkowski @ CADLINC, Palo Alto, CA UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
arndt@ttds.UUCP (Arndt Jonasson) (10/15/84)
>One of the nicer things the DEC-10s and 20s had was something called a >byte pointer. In this context a byte was any arbitrary grouping of bits Had? They are still very much alive, although DEC no longer does any development on them (or so I have heard). A PDP-10 bytepointer can access any contiguous string of bits that lies within a 36-bit word. It doesn't have to start on a word boundary. The usual application of byte pointers is handling 7-bit bytes, i.e characters, but others are used as well. In the source code for ITS TECO (the base system for the *real* Emacs), I recall that 36-bit pointers are used for some purpose. The autoincrementing versions of LDB and DPB (ILDB and IDPB) increment the pointer before accessing the byte. There is also an IBP (increment byte pointer). (Actually ILDB and IDPB are just IBP followed by LDP and DPB). There are no autodecrementing modes, but later versions of the PDP-10 have an ADJBP (adjust byte pointer) which adjusts a byte pointer an arbitary amount (forwards or backwards). ADJBP doesn't fit in very well with the others, though. It seems like a piece of happy hacking in the microcode. {decvax,philabs}!mcvax!enea!ttds!arndt
ken@turtlevax.UUCP (Ken Turkowski) (10/15/84)
High performance raster graphics hardware uses frame buffer tessellation to gain access to multiple pixels on arbitrary two-dimensional boundaries. These pixels may be horizontally-, vertically-, or block-contiguous. Some references are: Rodney Stock, "Graphics Animation Hardware", notes for the SIGGRAPH 1983 State-of-the-Art in Image Synthesis tutorial. Thomas Porter & Rodney Stock, "Image Composition", notes for the SIGGRAPH 1984 State-of-the-Art in Image Synthesis tutorial. Mary Whitton, "Memory Design for Raster Graphics Displays", IEEE Computer Graphics and Applications, March 1984, vol. 4, no. 3, pp. 48-64. The R. Stock papers may be difficult to get ahold of, because they have only been distributed as tutorial notes, although you may be able to write Tom Porter for any papers he has on frame buffer tessellation at: Thomas Porter Lucasfilm, Ltd. Computer Graphics Division P.O. Box 2009 San Raphael, CA 94912 The M. Whitton paper is more comprehensive, so much so that it gives away the secrets of frame buffer design so that anyone can design a good one. -- Ken Turkowski @ CADLINC, Palo Alto, CA UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
gnu@sun.uucp (John Gilmore) (10/16/84)
> I sure > wish some of todays processors were so talented and didn't need > such archaic things such as byte, and word alignment with bytes > fixed at 8 bits. > --Chuck I thought people (even at Intel :-) ) would know by now that the 68020 has even better bit-string-manipulation facilities than the DEC 10/20. The 10's required the bit string to fit in a single memory word; the 68020 allows totally arbitrary alignment. The 68020 also takes a byte address as the base, and adds a 32-bit signed bit number; if you don't need the byte addressas e.g. an array base, you can still address up to 2Gbits or 256Mbytes with simple sequential bit numbers (easy for hardware since the word size is a power of 2). There are 8 instructions that work with such bitfields: load (signed & unsigned), store, set (to ones), clear, invert, test, and find-first-one-bit. Note that such instructions have more overhead (both on the 10 and the 68020) than their simpler relations (eg load a whole aligned word). There's still room for fixed-size bytes in an architecture -- the 432 proved that.
crandell@ut-sally.UUCP (Jim Crandell) (10/16/84)
> Actually, the 1620 addressed its digits in even/odd pairs and, although > an address had no restriction to be even or odd, there was a performance > gain by aligning on a pair boundary. I guess that makes it a decadent > machine (I know several people who would agree with that estimation). Almost. Instructions had to start at even addresses, and the performance advantage (which was all of 10 musec -- 1 cycle -- on, for example, TF and TR) applied only to the Model II. -- Jim Crandell, C. S. Dept., The University of Texas at Austin {ihnp4,seismo,ctvax}!ut-sally!crandell
phil@unisoft.UUCP (Phil Ronzone) (10/19/84)
Now that the 7030 has been mentioned ..... My second most favorite machine that-I've-never-programmed is the 1700. My most favorite machine that-I've-never-programmed is the IBM 7030. The 7030 realized as a single-chipper today would be most impressive. I'm still trying to verify if (as IBM claims) that the 7030 first introduced the word ``byte''. Any comments?
sysad@tikal.UUCP (sysad) (10/20/84)
>>A note about the Pyramid 90x. After further discussion between >>software and hardware engineers, it was determined that it wouldn't be >>very difficult or expensive (speedwise) to implement almost arbitrary >>byte alignment (e.g. longwords accessed on any even address) in the >>microcode. Pyramid will be offering this microcode change to its' >>customers real soon now. >> >>I think this implies that arbitrary byte alignment does not necessarily >>imply a performance penalty in the global throughput of a machine. >> >>I know compilers can be taught to do alignment, but many programmers >>using C's simple address arithmetic mechanisms, can't be. :-) I can tell you from recent bitter experience that Pyramid will NOT be offering this microcode change to ANYONE, anytime. It seems that when HHB Softron decided to re-port its code, Pyramid decided byte-alignment was too hard. Apparently they also decided not to tell anyone. Pyramid "software and hardware engineers" also seem to believe that compilers cannot easily be taught to do alignment. If you are expecting arbitrary byte alignment out of Pyramid, you'd better ask them again. Duane Hesser ...uw-beaver!tikal!sysad
wls@astrovax.UUCP (William L. Sebok) (10/20/84)
> > I sure > > wish some of todays processors were so talented and didn't need > > such archaic things such as byte, and word alignment with bytes > > fixed at 8 bits. > > --Chuck > > I thought people (even at Intel :-) ) would know by now that the 68020 > has even better bit-string-manipulation facilities than the DEC 10/20. > The 10's required the bit string to fit in a single memory word; the > 68020 allows totally arbitrary alignment. The 68020 also takes a byte > address as the base, and adds a 32-bit signed bit number; if you don't > need the byte addressas e.g. an array base, you can still address up to > 2Gbits or 256Mbytes with simple sequential bit numbers (easy for > hardware since the word size is a power of 2). There are 8 > instructions that work with such bitfields: load (signed & unsigned), > store, set (to ones), clear, invert, test, and find-first-one-bit. Since no one has mentioned it yet I think that I should say that the Vax also has such bit string instructions that let one address a bit string of 1 to 32 bits with arbitrary alignment with respect to word boundaries. A bit address consists of a byte address base and a signed 32 bit offset from the base. Instructions provided are FFS (find first bit set), FFC (find first bit clear, EXTV (extract bit field sign extended), EXTVZ (extract bit field zero extended), CMPV (compare sign extended bit field to integer), CMPVZ (compare zero extended bit field to integer), and INSV (move integer to bit field). -- Bill Sebok Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls
greg@sdcsvax.UUCP (Greg Noel) (10/22/84)
In article <393@ism780.UUCP> darryl@ism780.UUCP writes: >Actually, the 1620 addressed its digits in even/odd pairs and, although >an address had no restriction to be even or odd, there was a performance >gain by aligning on a pair boundary. I guess that makes it a decadent >machine (I know several people who would agree with that estimation). > --Darryl Richman But I'm one who would disagree...... It was an interesting machine, and it became my first love. (Now you know what's wrong with me!) Actually, even though Darryl is correct that the 1620 fetched in even/odd pairs, only instruction fetches were optimized to take advantage of this. (Instructions were twelve digits long and had to be aligined on an even address.) Data fetches still fetched each digit individually. Later in its evolution, the 1620 Mod II was better at optimizing the references; it essentially had a (four-digit?) cache, and it could get the second digit from the same pair in about one-quarter the "normal" access time. -- -- Greg Noel, NCR Torrey Pines Greg@sdcsvax.UUCP or Greg@nosc.ARPA
jans@mako.UUCP (Jan Steinman) (10/29/84)
In article <astrovax.475> wls@astrovax.UUCP (William L. Sebok) quotes, writes: >> ...68020 allows totally arbitrary alignment. The 68020 also takes a byte >> address as the base, and adds a 32-bit signed bit number; if you don't >> need the byte addressas e.g. an array base, you can still address up to >> 2Gbits or 256Mbytes with simple sequential bit numbers (easy for >> hardware since the word size is a power of 2). > >Since no one has mentioned it yet I think that I should say that the Vax >also has such bit string instructions that let one address a bit string of 1 >to 32 bits with arbitrary alignment with respect to word boundaries. A bit >address consists of a byte address base and a signed 32 bit offset from the >base. Please add to the list the NS32000 chips. National >almost< did it right... The general form is a base and a signed, 30 bit, bit offset from that base. (Offsets also come in signed 14 and 7 bit lengths for memory conservation.) A useful instruction (CVTP) generates the absolute address for such an item. (Kinda like an LEA for bits.) Why do I say "almost" did it right? The instruction set is not quite orthogonal when it comes to >bit fields<. While still capable of arbitrary alignment, the field may not span more than four bytes and may not have an immediate operand for offset! One use of bit fields is to break up imposed data structures, such as hardware registers, or comm strings. Such data structures seldom have dynamic alignment and a static, immediate operand offset to the beginning of the bit field would be useful. FLAME PROOF YOUR ARTICLE TODAY! Before you protest, note that the Extract Field Short (EXTSi) instruction is limited to a 3 bit, bit offset, which means you must have previously obtained the byte address of the target bit field. Useable in a pinch, but it sure smells like rubber cement! Incomplete orthogonality: you must use one of two instructions depending on the size of the target. -- :::::: Jan Steinman Box 1000, MS 61-161 (w)503/685-2843 :::::: :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 ::::::