gooley@uicsl.UUCP (07/05/86)
My guess is that the polynomial-evaluation instructions have some hardware support on the bigger VAXen (my unreliable memory tells me that the VAX FPA provides some in addition to doing the FP arithmetic), and so might be faster than the unrolled-loop user-programmed versions. On the 750, such instructions are done in microcode -- but that doesn't explain why they are so very slow. DEC would have to provide them for compatibility with machines on which they make more sense... !uiucdcs!uicsl!gooley
ddb@starfire.UUCP (David Dyer-Bennet) (07/24/86)
> In article <526@mips.UUCP>, mash@mips.UUCP (John Mashey) writes: > .................................................... Now the shock; the > unrolled version was *faster*. I forget how much, but it was enough to > offset the extra code space required by far. > > Explanation anyone? Especially someone who knows VAX/750 microcode > details? > -- > der Mouse > Can't claim knowledge of 750 microcode details, but I was rather interested to discover when I looked into some things conveniently to hand that in MOST cases a complex instruction runs slower than doing the same work on the same processor with simple instructions. It's also true of the INDEX instruction on VAX, and of the LDB and string copy and edit instructions on the PDP-10. (instructions to simply move adjacent storage usually run fast, as in the VAX MOVC3 or the Intel rep/movsw). In the one case I could analyze at the microcode level, which may not be at all typical, I finally decided that the problem was that all the fancy provisions to make loops run fast, overlap things, etc., only worked above the microcode level, so when you did complex stuff in microcode it didn't get the assists. Don't know if this is really a general rule, I haven't looked at this in all that many architectures. -- David Dyer-Bennet Usenet: ...ihnp4!umn-cs!starfire!ddb Fido: sysop of fido 14/341, (612) 721-8967 Telephone: (612) 721-8800 USmail: 4242 Minnehaha Ave S Mpls, MN 55406
ronc@fai.UUCP (Ronald O. Christian) (07/29/86)
In article <258@starfire.UUCP> ddb@starfire.UUCP (David Dyer-Bennet) writes: >> In article <526@mips.UUCP>, mash@mips.UUCP (John Mashey) writes: >> .................................................... Now the shock; the >> unrolled version was *faster*. [...] >Can't claim knowledge of 750 microcode details, but I was rather interested >to discover when I looked into some things conveniently to hand that in >MOST cases a complex instruction runs slower than doing the same work on the >same processor with simple instructions. It's also true of the INDEX >instruction on VAX, and of the LDB and string copy and edit instructions >on the PDP-10. I'm usually just a "listener" on this newsgroup, but if I understand what you're saying above, I'm a little surprised that this is surprising. (Or something like that.) The first group I ever worked with was doing a lot of Z80 hacking for a control application, and we had to work under *very* stiff memory constraints. Typically we wrote the code first on a development system with lots of memory, then started to pare away at it to get it to fit in memory. But a funny thing happened: The programs started running faster. We were typically getting rid of things like use of the index (IX, IY) registers, add and subtract functions, and so forth in favor of indirect addressing and boolean operations because the instructions used fewer bytes. We also used loops whenever possible and this of course increased overhead, but in general "thin" code ran faster. It ran so much faster that we started writing thin code even if there was not a byte advantage. You could say we were using a RISC sub-set of the Z80 code set. It occurs to me that even though a complex instruction on a Vax does more per byte of opcode than a simple instruction, the Vax is "hiding" from you the fact that the complex instruction is really sort of a macro, telling the computer to execute lots and lots of microcode. When the cost of the microcode exceeds the cost of fetching the original instruction, the instruction becomes more costly than it's counterpart coded out of simple instructions. Hmmm. It just occurred to me that the speed of main memory is a big factor in the opcode/microcode trade-off. Could Vaxen have been designed with much slower main memory in mind? (Or faster cache?) Ron -- -- Ronald O. Christian (Fujitsu America Inc., San Jose, Calif.) seismo!amdahl!fai!ronc -or- ihnp4!pesnta!fai!ronc Oliver's law of assumed responsibility: "If you are seen fixing it, you will be blamed for breaking it."