bcase@amdcad.UUCP (04/01/87)
In article <6042@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <5@wb1.cs.cmu.edu> avie@wb1.cs.cmu.edu (Avadis Tevanian) writes: >>... the 4.3 libc ... has been carefully optimized to use the fancy >>VAX instructions for the string routines. Unfortunately, some of >>these instructions are not implemented by the MicroVAX-II hardware. >>As it turns out, what is happening is that your tests (including >>Dhrystone) are causing kernel traps to emulate those instructions! > >Exactly. Strcpy, strcat, and strlen were all modified to use the >Vax `locc' instruction to find the ends of strings. This instruction >is not implemented in hardware in the uVax II. The obvious solution >is to arrange the libraries so that on a uVax, programs use a >straightforward test-byte-and-branch loop (see sample code below). This brings up one of my major beefs abouts complex archtiectures: an optimizing compiler might have to do different things depending upon the *version* of a CPU it is compiling for! An optimizing compiler that is considered "a great compiler" for one version of a CPU might be "a mediocre" compiler for the next version of the machine. The compiler writer found out that some obvious sequences of code are not the best for the current version of the machine, but then the implementors of the next version "fake him out" by changing the relative timings of the instructions (and take note of the fact that determining instruction timings for some machines, e.g. VAXs, is near impossible since DEC just won't tell you. This makes superior code generation a nightmare). One of the reasons that simple architectures are better for compilers is that (nearly) all instructions take the same amount of time and space. Thus, code generation and optimization are *much* easier. Also, this relationship of one time unit/one space unit per instruction is unlikely to change as a function of CPU version. bcase
zben@umd5.UUCP (04/04/87)
In article <15341@amdcad.UUCP> bcase@amdcad.UUCP (Brian Case) writes: > This brings up one of my major beefs abouts complex archtiectures: an > optimizing compiler might have to do different things depending upon > the *version* of a CPU it is compiling for! An optimizing compiler > that is considered "a great compiler" for one version of a CPU might > be "a mediocre" compiler for the next version of the machine. Gosh, I seem to remember a Cobol compiler that generated different code for programs with the following two directives: Object-Computer is Univac-1108. Object-Computer is Univac-1108 with four memory boxes. Forgive me if the dashes are in the wrong places. It's been a LONG time. (Not long enough though...) I don't buy the complexity argument. You're arguing that bicycles are better than cars because they are easier to fix and easier to learn to drive, while completely forgetting the performance differances. Case in point: I just came up with a fast integer square-root routine for a local project (written in C, available on request). It has one multiply within the main loop. I also have a Unisys 1100 assembly version with NO multiplies in the loop, but I can't translate it to C because C doesn't have the double register operations, double precision shifts, and there is no easy way to code for the LSC (load shift and count) instruction other than yet another C loop. I guess the point here is that it is possible for a dedicated assembly language programmer to effectively utilize these complex architectures to fly rings around anything written in a higher-level language. It is also possible for a really brilliantly written code generator to approach this kind of performance. Any attempt to simplify these architectures had better deliver blinding increases in hardware speed, or I'm still going to think it's a plot by the programmers and compiler writers to shirk their responsibilities... -- umd5.UUCP <= {seismo!mimsy,ihnp4!rlgvax}!cvl!umd5!zben Ben Cranston zben @ umd2.UMD.EDU Kingdom of Merryland UniSys 1100/92 umd2.BITNET "via HASP with RSCS"
rbj@icst-cmr.arpa (04/09/87)
Case in point: I just came up with a fast integer square-root routine for a local project (written in C, available on request). It has one multiply within the main loop. I also have a Unisys 1100 assembly version with NO multiplies in the loop, but I can't translate it to C because C doesn't have the double register operations, double precision shifts, and there is no easy way to code for the LSC (load shift and count) instruction other than yet another C loop. What, no `asm' directive? How about `sed'-ing the assembly output? I guess the point here is that it is possible for a dedicated assembly language programmer to effectively utilize these complex architectures to fly rings around anything written in a higher-level language. It is also possible for a really brilliantly written code generator to approach this kind of performance. Any attempt to simplify these architectures had better deliver blinding increases in hardware speed, or I'm still going to think it's a plot by the programmers and compiler writers to shirk their responsibilities... I'm not so sure. Remember the day Mike McAmis (heard from him lately?) was so proud he had occaision to use `Add Negative Thirds'? Remember the `convert Fieldata to (or from) Binary' using `Masked Load Uppers' with strings like `B0B0B0' and `888888'? Pretty arcane stuff. They just don't make them like that anymore. The machines were designed by engineers (remember, we dropped out of engineering and into computer science) who thought, `yeah, it's easy to do ANT, I'll just gate some of these carrys end around (note: U1108 is one's complement) instead of to the next bit' instead of finishing the *useful* instruction set. Therefore, we have `Test Greater' but not `Test Less'. Yeah, it was fun to look thru code, trying to slice off an instruction here or there, looking for faster instructions, etc. But those days are gone now, and it's conceptual clarity that counts. I'm sure you know all the arguments about simplified decoding that RISC is supposed to deliver. If the speed isn't delivered, at least the machine should be cheaper. It takes a Real Programmer to find occasion to use those Macho instructions. What makes you think some Wimpy compiler can do it? :-) umd5.UUCP <= {seismo!mimsy,ihnp4!rlgvax}!cvl!umd5!zben Ben Cranston zben @ umd2.UMD.EDU Kingdom of Merryland UniSys 1100/92 umd2.BITNET "via HASP with RSCS" (Root Boy) Jim "Just Say Yes" Cottrell <rbj@icst-cmr.arpa> I'm mentally OVERDRAWN! What's that SIGNPOST up ahead? Where's ROD STERLING when you really need him?