shs@aldebaran.Berkeley.EDU (Steve Schoettler) (10/31/88)
In article <7352@wright.mips.COM> earl@mips.COM (Earl Killian) writes: >I think it would be interesting to benchmark various different >machines using gcc as the compiler. This partially removes one >variable: how much performance is due to the compiler and how much to >the hardware. That might get you somewhere, but I don't think it will what you want to find out about the architectures, and it won't completely remove the compiler as a variable. Consider that gcc compiles down into an intermediate RTL description of the original C code. The RTL describes an idealized abstract machine, and is designed to be abstract enough so that it can be mapped into a variety of processors: 68K,386,370, etc. How efficiently this RTL description is compiled into the target machine code reflects how close the abstract machine is to the actual target machine. So, I think what you'll find from such a study is which machine most closely resembles the abstract machine Richard Stallman et al had in mind when the RTL was designed. I am currently working on some silicon compiler software that is designed to give instruction set designers rapid feedback about design choices. One thing I have been considering is automatically generating a gcc machine description file, which can be used to generate a C compiler, which can be used to compile and run benchmarks. In other words, with the right software, you should be able to input an instruction set and get out dhrystones (etc). If you used only results from these C benchmarks to design your processor, I believe that you would, after a few iterations, converge on an instruction set that very closely matched the intermediate abstract machine used by gcc. Maybe that's not such a bad idea. Perhaps the best instruction set for the GNU C compiler is one that uses as instructions the same basic blocks that are described by the RTL. Of course, you'd have to do something about the infinite number of registers, etc, but then you'd have a fantastic C compiler for it! But what about the tradeoffs and assumptions made in the design of the abstract machine? Certainly there were intended source languages, and intended target machines, so how ideal is it? Let me know if you have any answers. I believe that using gcc for benchmarks across machines is a good idea, as it is sort of a "constant" and removes some of the variation we do see across machines. I just don't know how meaningful the results will be. For a demonstration of how much compiler technology affects "performance", I'd ask the MIPS folks what the difference is between typical unix code compiled with no optimizations and with all optimizations turned on. Steve shs@ji.Berkeley.EDU
csimmons@hqpyr1.oracle.UUCP (Charles Simmons) (11/03/88)
In article <26627@ucbvax.BERKELEY.EDU> shs@ji.Berkeley.EDU (Steve Schoettler) writes: >In article <7352@wright.mips.COM> earl@mips.COM (Earl Killian) writes: >>I think it would be interesting to benchmark various different >>machines using gcc as the compiler. This partially removes one >>variable: how much performance is due to the compiler and how much to >>the hardware. > >Consider that gcc compiles down into an intermediate RTL description >of the original C code. The RTL describes an idealized abstract machine, >and is designed to be abstract enough so that it can be mapped into >a variety of processors: 68K,386,370, etc. How efficiently this RTL >description is compiled into the target machine code reflects how >close the abstract machine is to the actual target machine. Um... Seeing how I've worked on the GCC 370 compiler, I'd argue with this point of view. One of the really neat aspects of GCC is that you can, in some sense, generate machine-dependent RTL code. For example, the original RTL code doesn't have any support at all for double word integers (or 'long long's). But, in an early pass of the compiler, the compiler calls a machine dependent routine to generate intermediate RTL code. By performing this pass, the generated intermediate RTL code tends to provide an abstract machine that very closely mimics the target machine. >So, I think what you'll find from such a study is which machine most >closely resembles the abstract machine Richard Stallman et al >had in mind when the RTL was designed. There are, of course, other issues. The 370 has various aspects that make writing a compiler for it difficult. In particular, allowing subroutines that require more than 4K bytes of instructions is somewhat tricky; and the fact that the 370 doesn't have negative offsets makes it difficult to implement a stack on the 370 that is compatible with the type of stack that GCC would like to implement. (I guess what I'm saying here is that GCC does contain an abstract machine that doesn't map real well onto the 370, and the abstract machine is partially described by portions of the compiler that have nothing to do with the RTL code.) Of course, even using GCC to run all benchmarks still leaves you open to differences in the skill of a compiler writer. An untuned implementation of GCC might not contain as many peephole optimizations as would be desirable, or one of the instructions in the instruction set may not have been described. On the other hand, if you give me a benchmark, it becomes relatively easy to tune GCC to run that particular benchmark quickly. >Steve -- Chuck
anand@amax.npac.syr.edu (Anand Rangachari) (11/08/88)
In article <474@oracle.UUCP> csimmons@oracle.UUCP (Charles Simmons) writes: [...] >Of course, even using GCC to run all benchmarks still leaves you >open to differences in the skill of a compiler writer. An untuned >implementation of GCC might not contain as many peephole optimizations >as would be desirable, or one of the instructions in the instruction >set may not have been described. On the other hand, if you give me >a benchmark, it becomes relatively easy to tune GCC to run that particular >benchmark quickly. I was just wondering if that was such a bad thing after all. After all, a benchmark is supposed to be representative of the typical programs a user may want to run. Thus in improving the speed of a benchmark, you may actually improve the speed of a sizeable number of programs. An excellent argument against this is of course is that we dont have such benchmarks available (So I have gathered from the discussions on this group). R. Anand Internet: anand@amax.npac.syr.edu Bitnet: ranand@sunrise