greg@isrnix.UUCP (Gregory R. Travis) (03/09/84)
I was doing some playing this evening and guess what I found out: 1) On the PDP 11/44 a floating point (double precision) clear (8 bytes) is almost exactly twice as fast as 4 clr (integer clear (2 bytes each)) instructions. I replaced the code in clrbuf (in bio.c) with floating point clears for a code speedup. 2) A floating point load (double prec. again) followed by a floating point store is just a weeeee bit faster than the appropriate number of 'mov' instructions (assuming the cache is disabled). I'll bet on the 11/70 you could use floating point load/stores for twice the speed over conventional mov's. What the h*ll does this mean? That for some applications involving manipulation of blocks of data, it may be keen-o to use the floating point processor for the manipulations. Super-cool 11 floating point processors (like the FP-11C in the 11/70 and FP-11E in the 11/60) that operate in parallel with the CPU may give you quite a performance boost if you play your cards right. Can anyone see problems with this scheme? Has anyone thought of it before? Does anyone run a 44 or 24 with the commercial instruction set option? If you do, do you use the block character move instructions? Here at isrnix I wrote some code that copies kernel buffers to/from the users address space with 'mov' instructions (the scheme plays with the segmentation registers) instead of the slow m[t,f]p[d,i] instructions. It would be a thrill to see if I could pop a CIS board in our CPU and use the block move instruction and see what kind of a performance increase I get. Even with the current situation I get better than twice the performance in copying buffers than the previous copyin/copyout scheme. Any comments? -- Gregory R. Travis Institute for Social Research - Indiana University - Bloomington, In ihnp4!inuxc!isrnix!greg {pur-ee,allegra,qusavx}!isrnix!greg
guy@rlgvax.UUCP (Guy Harris) (03/11/84)
2) A floating point load (double prec. again) followed by a floating point store is just a weeeee bit faster than the appropriate number of 'mov' instructions (assuming the cache is disabled). I'll bet on the 11/70 you could use floating point load/stores for twice the speed over conventional mov's. You could use that as long as the Floating Interrupt on Uninitialized Variable trap is disabled; otherwise, the bit pattern for -0.0 will cause a trap. (I assume that the floating load and store don't normalize, as the "Min" and "Max" times for LDF are the same, which wouldn't be the case if they normalized - unless there's a barrel shifter in there). Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
ka@hou3c.UUCP (Kenneth Almquist) (03/13/84)
Another way to clear memory fast on PDP-11 machines is to use the mov instruction instead of the clear instruction. The clr instruction performs a read, modify, write sequence on all the PDP-11 machines that I have looked at. Reading from a register is faster than reading from memory, so if you have a spare register you can zero it, and then replace the clr instructions in the loop with moves from the register. Kenneth Almquist
chris@basser.SUN (Chris Maltby) (03/15/84)
Just a small note to point out that ALL floating point numbers on PDP11 and VAX computers are normalized. Remember the 'hidden' bit? So change your bcopy routine if you will, but just hope your FPU doesn't pack it in.
chris@basser.SUN (Chris Maltby) (03/19/84)
- The other advantage of the 'mov' instruction over 'clr' is that it works in the presence of parity errors in the memory to be cleared. A big hand for the pdp11 microcode designers!