wkk@cbnewsl.att.com (wesley.k.kaplow) (06/18/91)
Well, although I do not have data from one of the more popular CISC machines, I do have data on the AT&T CRISP processor compared to the MIPS R2000, R3000 processors. Introduction: What I say here is for CODE SIZE only and says nothing about whether the CRISP processor itself is a CISC or RISC processor. CRISP does however distinguish itself from most 'RISC' processor in that CRISP has three instruction code sizes, 2, 6, and 10 bytes. The MIPS processor has only once size, 4 bytes. Also, like a CISC processor, CRISP is not a load/store machine in that is has a basically othagonal instruction-set in addressing modes. Benchmarks: Several benchmarks where used to determine the static code size data. These include the following groups: 1) Dhrystone 2) Real Program (RP) - 9 Unix programs cat, comm, diff, echo, mv, nroff, pr, rm, wc 3) BSC - 7 Benchmarks Ackermann's, Word Count, Quick Sort, TTY driver, Symbol Table Insert, Buffer Manipulation, Statistic Method: The compiler for the MIPS processors was the UMIPS-BSD 1.0 system. This did contain an optimizing compiler, but not an optimizing linker. The compiler for the CRISP was an internal research compiler. Results: Benchmark MIPS Code Size/CRISP Code Size ---------------------------------------------- BSC 1.83 Dhrystone 1.84 RP 1.67 In more detail it was found that 70% of CRISP instruction were of the 2-byte form and the remaining 30% of the 6-byte form. Therefore, the average CRISP instruction size was 3-3.4 bytes/instructions while MIPS is 4 bytes/instruction. As well as bytes/instruction average, we compared the static instruction count as well. In our study we found that: Extra Instructions % in MIPS = 44%. By combining this static instruction increase with the larger bytes/instruction number, we also come to approximately the same ~1.7x larger code size for MIPS. Conclusion: This work was done 4 years ago and therefore the MIPS compiler may have improved. Processors like the CRISP in terms of instruction encoding (i.e., they have many shorter than 4-byte instruction codings) may also show similar results, since many 'CISC' processors usually have instructions that have 2 or even 3 regiester-based instructions, and the MIPS instructions are always 4-byte regardless of the number of operands. Also, like a CISC processor CRISP has a basically 'orthagonal' instruction set, unlike the general RISC load/store type architecture. I think these results may be representative for architectural reasons. Once again I remind everybody in NetLand, I am not presenting any claims as to the performance characteristics of the processors above. with disclamer; use disclamer; Wesley Kaplow AT&T Bell Laboratories Whippany, NJ 201-386-4634
preston@asta.rice.edu (Preston Briggs) (06/18/91)
wkk@cbnewsl.att.com (wesley.k.kaplow) writes: [yet another code size comparison] >Results: > Benchmark MIPS Code Size/CRISP Code Size > ---------------------------------------------- > BSC 1.83 > Dhrystone 1.84 > RP 1.67 ... >As well as bytes/instruction average, we compared the static instruction >count as well. In our study we found that: > > Extra Instructions % in MIPS = 44%. Again, we'd _really_ like to see instruction counts. I know the MIPS compiler unrolls loops, at least sometimes. This alone makes me doubt the validity of any static instruction counts. An earlier poster noted that sometimes code size dominates all other considerations. In this case, we should consider an interpreter. Forth is one example. To an extent, CISC machines (with microcoded implementations) are another. Then there's table-driven scanners, parsers, code generators, and so forth. Choosing the right instruction set can lead to tremendous space compression (orders of magnitude). Preston Briggs
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/19/91)
In article <1991Jun18.152303.1889@rice.edu> preston@asta.rice.edu (Preston Briggs) writes: | Again, we'd _really_ like to see instruction counts. | I know the MIPS compiler unrolls loops, at least sometimes. | This alone makes me doubt the validity of any static instruction counts. Speaking just for me the instruction counts are of minimal interest, since my real concern is not how many instructions it generates, or how high an instruction rate, but how many CPU sec it takes to run the program. I have no doubt that my IPC has a higher instruction rate than my 486, but when I start a program I don't care about anything but how long it takes, and how much memory it uses. And while the code size of one program is of little concern within the limits we've been discussing, a 20% increase in stored program size makes a diference on the fileserver. Say I have a total binary size (including X) of 2.5GB, 20% of that is about a reasonable news partition... | An earlier poster noted that sometimes code size dominates all other | considerations. In this case, we should consider an interpreter. | Forth is one example. To an extent, CISC machines (with microcoded | implementations) are another. Then there's table-driven scanners, parsers, | code generators, and so forth. Choosing the right instruction set | can lead to tremendous space compression (orders of magnitude). I'd like to see how you got that. Even if you had a CPU which used LZW or Huffman code as opcodes, I don't think you see even one order of magnitude. Note that this is hard to measure, since any object files almost always contain data. Still, no compression method operating on either RISC or CISC code will give anything like even one order of magnitude, so I don't see that using an interpreted language will save that much. If it will maybe you've hit on another data compression scheme. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) GE Corp R&D Center, Information Systems Operation, tech support group Moderator comp.binaries.ibm.pc and 386-users digest.
preston@asta.rice.edu (Preston Briggs) (06/19/91)
I wrote: >| An earlier poster noted that sometimes code size dominates all other >| considerations. In this case, we should consider an interpreter. >| Forth is one example. To an extent, CISC machines (with microcoded >| implementations) are another. Then there's table-driven scanners, parsers, >| code generators, and so forth. Choosing the right instruction set >| can lead to tremendous space compression (orders of magnitude). and, davidsen@crdos1.crd.ge.com (bill davidsen) writes: > I'd like to see how you got that. Even if you had a CPU which used LZW >or Huffman code as opcodes, I don't think you see even one order of >magnitude. Note that this is hard to measure, since any object files >almost always contain data. Still, no compression method operating on >either RISC or CISC code will give anything like even one order of >magnitude, so I don't see that using an interpreted language will save >that much. If it will maybe you've hit on another data compression >scheme. I'm perhaps thinking a little more indirectly than you. It's nothing too radical... A little APL 1-liner is a very compact representation of what might take several K if compiled into lots of (inlined) assembly. We can build a table-driven LR parser, using various table compression ideas (eliminate duplicate rows, ...) that's much smaller (table+interpreter) than directly writing the parser in assembly. Also much slower! In the proceedings of the 86 symposium on compiler construction, Pennello gives an example where they built a very fast LR parser by trading space for time, effectively hard coding much of the parse table. The space cost was estimated at a factor of 4 if error recovery was included. The work on "super-combinators" is another example. Then again, there's Forth. They trade a factor of 10 (?) in time to get a flexible threaded-code interpretation scheme that's very compact. One of the micro-software companies is supposed to deliver applications with 2 kinds of code: machine code, for time-sections, and some sort of interpreted code (like P-code?), for the bulky, non-critical parts of the code. With such a scheme, they were able to fit a lot more functionality into 640K (or whatever constaint they faced). So, people who've got very tight space constraints would do well to consider using an interpreter of some sort. Preston Briggs
martin@adpplz.UUCP (Martin Golding) (06/21/91)
In <3436@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >In article <1991Jun18.152303.1889@rice.edu> preston@asta.rice.edu (Preston Briggs) writes: >| Choosing the right instruction set >| can lead to tremendous space compression (orders of magnitude). > I'd like to see how you got that. Even if you had a CPU which used LZW >or Huffman code as opcodes, I don't think you see even one order of >magnitude. The Pick operating system, it its original machine code, requires slightly less than .5 Meg of memory. It expands 4 to 6 times on CISC machines, and 8 to 12 times on RISC machines; ie: 1 order of magnitude. The instruction set was designed simultaneously with the system, and is the most C of all the ISs I've ever seen on a C. Details, of course, are proprietary. >Note that this is hard to measure, since any object files >almost always contain data. But not more than a percentage, so the one order of magnitude is plus some insignificant amount (or even a significant amount, but certainly not 2 orders of magnitude). >Still, no compression method operating on >either RISC or CISC code will give anything like even one order of >magnitude, so I don't see that using an interpreted language will save >that much. If it will maybe you've hit on another data compression >scheme. This isn't a process of data compression, it's a process of function compression, ie, each instruction encodes a series of machine operations. For a clear and RISC type example, take a look at your generated assembly code, and estimate the resulting size increase if you removed address calculation from all of the load/store instructions. To make use of such an instruction set, you need either a high level source language or you need a 'vectorizing' compiler, ie, one that can infer a single high level operation out of a series of low level ones. Or, you can write everything in assembler, which is sufficiently productive if your machine language is of a much higher order than the available high level language. I'll always wonder whether the world would be substantially different, if that original PDP11 had had the commercial instruction set. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/21/91)
In article <825@adpplz.UUCP> martin@adpplz.UUCP (Martin Golding) writes: | This isn't a process of data compression, it's a process of function | compression, ie, each instruction encodes a series of machine operations. | For a clear and RISC type example, take a look at your generated assembly | code, and estimate the resulting size increase if you removed address | calculation from all of the load/store instructions. No doubt the code would be tighter if you didn't calculate the addresses, but it wouldn't *work* very well. One reason the Intel instruction set code is small is dedicated registers. By making certain instructions always operate on certain registers the opcode can be smaller. There's no doubt that a symettrical instruction set will make compiler writing a lot easier, but that's not the point here, in terms of code size the occasional mov added to get something in the right place is usually less than the code saved by lots of one byte instructions. Three CISC machines typify approaches to this: Intel has lots of dedicated registers, Motorola has two classes of register (arith and address), and the NS32k has totally general purpose registers. Since each has a diferent number of registers I won't even try to draw any conclusion from their code size, these are just nice examples for discussion. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) GE Corp R&D Center, Information Systems Operation, tech support group Moderator comp.binaries.ibm.pc and 386-users digest. "I admit I am predjudiced, I can't stand bigots." -me
przemek@rrdstrad.nist.gov (Przemek Klosowski) (06/21/91)
>>>>> On 21 Jun 91 davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) said: Bill> In article <825@adpplz.UUCP> martin@adpplz.UUCP (Martin Golding) writes: Martin> This isn't a process of data compression, it's a process of function Martin> compression, ie, ea instruction encodes a series of machine operations. Martin> For a [..] RISC type example, take a look at your generated assembly Martin> code, and estimate the resulting size increase if you removed address Martin> calculation from all of the load/store instructions. Bill> No doubt the code would be tighter if you didn't calculate the Bill> addresses, but it wouldn't *work* very well. But the point is that a (hypothetical) CISC instruction mov @(r0[r1])+, #somevalue supposed to do int r1; char * r0[N]; *(r0[r1])++ = SOMEVALUE; is equivalent to the RISC sequence: add r2,r1,r0 ; to get &(r0[r1]) load r3, (r2) ; fetch the address add r3, 4 ; increment store r3,(r2) ; store post-increment address load r4, (r3) ; do indirection store #somevalue, (r4) to give a contrived example. All the operations that are 'encoded' by subsequent RISC instructions, are encoded compactly within the CISC instruction word; the compactness comes thanks to the fact that only a limited subset of all combinations of adds/loads (defined by available addressing modes) is allowed. If your code used such addressing modes a lot, you would loose terribly in the code density department. The nice thing is that both RISC proponents and CISC manufactures agree that the very complicated addressing modes do not get used much (vide the fate of the fancy addressing modes in 68040). The orthogonal set of registers would make programming RISC in assembly nice, if it wasn't for delay slots and branch delays. Then of course, smart assemblers do it for you. Then again, machines get faster almost more rapidly than you tune your code, so by sitting back and waiting for the next model you might be as productive as the guy next door hand-assembling his critical routines :^). -- przemek klosowski (przemek@ndcvx.cc.nd.edu) Physics Department University of Notre Dame IN 46556