jhallen@wpi.wpi.edu (Joseph H Allen) (02/10/89)
Reduction of instruction set size/complexity is the main area of design which enhances speed in RISC processors. Another area which I'm wondering about is data size handling. Modern RISC processors handle 8, 16, 32 and 64 bit words. Some even handle data which crosses "word" bounderies (and on some (well one) the byte order can be changed). The logic that must be dedicated to this must be incedible, plus this logic is in the memory data path and therefore might a speed constaint (especially if the data goes through the ALU before being presented to the registers). Would it be a terrible hardship to only have two data sizes (perhaps character and word) and not allow words to cross word boundaries? Certainly it would require that people don't use "bad" programming techniques similer to what has to be done on 68000 or IBM 360. But would not the improvement in speed (by freeing up chip space to allow for more registers or to simply reduce data path delay time) be worth it?
cik@l.cc.purdue.edu (Herman Rubin) (02/10/89)
In article <732@wpi.WPI.EDU>, jhallen@wpi.wpi.edu (Joseph H Allen) writes: ............................ > Would it be a terrible hardship to only have two > data sizes (perhaps character and word) and not allow words to cross word > boundaries? Certainly it would require that people don't use "bad" > programming techniques similer to what has to be done on 68000 or IBM 360. > But would not the improvement in speed (by freeing up chip space to allow for > more registers or to simply reduce data path delay time) be worth it? It would be a real nuisance. For numerical problems, it would be a good idea to have _at least_ 32, 64, and 128, for both fixed point and floating point. There are very good reasons for editing, etc., to have 8 and 16 also. I can also think of good uses for individual bits. As for crossing word boundaries, this can be very convenient, but not as important. BTW, what is a word? Is it 16, 32, or 64 bits? And if a word is 32 bits, does a 64-bit quantity have to start on an address divisible by 64? We can get more registers by using more chips. Data path delay is not likely to be reduced by having fewer types. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
mash@mips.COM (John Mashey) (02/15/89)
In article <732@wpi.WPI.EDU> jhallen@wpi.wpi.edu (Joseph H Allen) writes: > >Reduction of instruction set size/complexity is the main area of design which >enhances speed in RISC processors. Another area which I'm wondering about is >data size handling. Modern RISC processors handle 8, 16, 32 and 64 bit words. >Some even handle data which crosses "word" bounderies (and on some (well one) >the byte order can be changed). The logic that must be dedicated to this must >be incedible, plus this logic is in the memory data path and therefore might a >speed constaint (especially if the data goes through the ALU before being >presented to the registers). Would it be a terrible hardship to only have two >data sizes (perhaps character and word) and not allow words to cross word >boundaries? Certainly it would require that people don't use "bad" >programming techniques similer to what has to be done on 68000 or IBM 360. >But would not the improvement in speed (by freeing up chip space to allow for >more registers or to simply reduce data path delay time) be worth it? 1) Automatic handling of unaligned data is indeed expensive, which is why RISC machines geernally omit it. 2) You certainly need word & character operations [to match the statistics of user programs.] If you have to materialize halfword ops, UNIX kernel code will suffer, for three reasons: a) There are many densely-encoded structures. Some of those might convert shorts to ints, but that doesn't do anything about: b) Networking code has 16-bit things all over the place, and you have NO CHOICE about the sizes, and c) When dealing with arbitrary devices, across things like VME buses, you'd better be able to generate indivisible 16-bit loads/stores, or your choice of peripheral controllers will be impacted. Some must be exactly 16-bits to match the semantics of the devices. Although MOST user programs don't use 16-bit quantities a lot, some do, a lot. 3) Once you have load word, load byte [signed|unsigned], and load half [signed|unsigned], all of which you really want to have, it doesn't take much more logic to do the unaligned operations (as separate instructions, NOT as an automatic thign that happens for unaligned operations). 4) Once you have all of that, it actually takes very little logic to do the byte-ordering swapping: in fact, what really happened was that the alignment network that shuffles bytes around anyway just got more complete. Oddly enough, I don't think it ended up taking any more silicon space, as the width was the same (32 bits), and the height was already forced by other constraints. 5) As usual, most of this has to be determined scientifically, by simulation of the impact of omitting the partial-word instructions. It is interesting that at least {HP, MIPS, Sun, Motorola} all came to the same conclusions on this (include the partial-word load/stores). In our case, we had some heritage of word+byte only (Stanford MIPS); I wouldn't put UNIX on a machine that didn't have 16-bit operations, even though many user-level statistics wouldn't justify their presence. 6) The unaligned load/store operations have proved absolutely invaluable. People maybe able to clean up their act on new code, but sometimes they have huge databases that have alignment problems. The unaligned operations turn out to be useful for C strings, COBOL+PL/1, and for porting large FORTRAN programs that have COMMON+EQUIVALENCE combinations that effectively prohibit "correct" alignment, especially if these came from the IBM or DEC worlds...which a few programs do. If you own a 2-million line CAD program, which you didn't write, and which contains code thru which the armies have marched thru the years, you do NOT want to be told that you must rework the program before you can get it to work the very first time. It's a lot easier to turn on a compiler switch that uses the unaligned instructions, typically losing 10-15% of performance, and either tune it later, or not bother at all, but at least get the application working.... Anyway, it's a good question: it's always good to question why features are included. In this case, there are good reasons. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
rcbaps@eutrc3.UUCP (Pieter Schoenmakers) (02/16/89)
In article <13259@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >[...] I wouldn't put UNIX on a machine >that didn't have 16-bit operations, even though many user-level statistics >wouldn't justify their presence. [...] Just for your information: it has been done: the Acorn Archimedes R140, which is to be released officially this month, runs Unix BSD on the ARM, a load/store RISC processor, supporting only 32 bit operations on registers and having word (32bit) and signed char (8bit) load/store operations. I don't have any benchmarks on the Unix version, but the C compiler I have on my Archimedes warns about the use of shorts (ansi! :), but is _very_ fast for a desktop computer running at a mixture of 4 and 8 Mhz (Dhrystone results put it just below an IBM PS2/80). ---Tiggr
mash@mips.COM (John Mashey) (02/20/89)
In article <483@eutrc3.UUCP> rcbaps@eutrc3.UUCP (Pieter Schoenmakers) writes: >In article <13259@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >>[...] I wouldn't put UNIX on a machine >>that didn't have 16-bit operations, even though many user-level statistics >>wouldn't justify their presence. [...] > >Just for your information: it has been done: the Acorn Archimedes R140, >which is to be released officially this month, runs Unix BSD on the ARM, >a load/store RISC processor, supporting only 32 bit operations on registers >and having word (32bit) and signed char (8bit) load/store operations. > I don't have any benchmarks on the Unix version, but the C compiler I have >on my Archimedes warns about the use of shorts (ansi! :), but is _very_ fast >for a desktop computer running at a mixture of 4 and 8 Mhz (Dhrystone results >put it just below an IBM PS2/80). Oops, I should I have been more specific. Not having anything but 32-bit arithmetic doesn't bother me (or most of the other RISC types), but I care about 16-bit loads and stores both for performance reasons, and for the sturctural reason of dealing cleanly with 16-bit device registers from arbitrarily-chosen peripheral boards. UNIX certainly can be put on a machine without 16-bit load/stores, and has been put on far uglier machines, and for some implementations it might well be the least of evils to leave out halfword operations. [Note, or course, that Dhrystone doesn't use halfwords in any significantly noticable numbers...] Anyway, I didn't mean to imply that it was impossible to put UNIX on such a machine, merely that *I* wouldn't do it! -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
andrew@frip.gwd.tek.com (Andrew Klossner) (02/21/89)
[] "Would it be a terrible hardship to only have two data sizes (perhaps character and word) and not allow words to cross word boundaries?" Data sizes: you need do atomic 8-bit, 16-bit, and 32-bit loads and stores in order to deal with all the sorts of device registers you might meet. No crossing boundaries: absolutely. My favorite machines prohibit this. I find it useful for detecting garbaged pointers while debugging. Also, it's quite convenient for the kernel if an instruction can't reference more than one page, which wouldn't be so if a load could refer to a word that starts in one page and ends in another. On the down side, this breaks a lot of existing code. A pathological case is the Fortran program that uses EQUIVALENCE statements to do its own memory allocation (there being no such facility in the language), and which "knows" that a double can be equivalenced to any integer. This gives rise to double references at addresses that are not multiples of eight bytes. One workaround is to have the Fortran compiler fall back to using two single-word operations to fetch or store a double. -=- Andrew Klossner (uunet!tektronix!orca!frip!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]
aglew@mcdurb.Urbana.Gould.COM (02/21/89)
Alignment and selection of bytes are two different things. <Deliberately obscure comment. I can't give away *all* my research topics>
rcbaps@eutrc3.UUCP (Pieter Schoenmakers) (02/22/89)
In article <11040@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes: >[] > > "Would it be a terrible hardship to only have two data sizes > (perhaps character and word) and not allow words to cross word > boundaries?" > >Data sizes: you need do atomic 8-bit, 16-bit, and 32-bit loads and >stores in order to deal with all the sorts of device registers you >might meet. On the Archimedes (not only the Unix machine), (only 32 and 8 bits (both aligned) load/store), the I/O bus is 16 bits wide. Reading from I/O space puts the data in the low 16 bits of the databus; Writing into I/O space puts the 16 top bits of the databus onto the I/O bus. Both 16 and 8 bit devices are no problem; all are accessed using 32-bit operations. ---Tiggr
mo@prisma (02/24/89)
The Acorn Risc Machine (ARM) is a very interesting beast. All of its instructions are conditional in that they look at the condition codes. Further, since the machine does NOT do delayed branches and it has a simple pipe, branches are a bit more expensive that might be expected. Hence, it makes sense in many cases to do an if-then-else as condition set true: instr true: instr true: instr true: instr true: instr true: instr false: instr false: instr false: instr false: instr false: instr false: instr where the processor just falls through at noop speeds which it does at one cycle per noop. I don't remember exactly when the tradeoff occurs, but it is surprisingly effect. Further, the ARM folks took a slightly different view of RISC. To paraphrase the conversation I had with them: The usual RISC folks ask the question: what's the best way to use 200k transistors in building a VLSI cpu. We (Acorn) asked: How do we build the simplest, dirt-cheapest cpu possible in 20K transistors that still gets good performance? Well, the ARM is pretty spiffy. It is already being used by at least one peripheral controller company because "where else can you get 6 mips for $35 with a decent instruction set and large address space that already has a good C compiler and a decent (Unix-based) development environment?" All in all, a very tasty piece of work. -Mike