[net.unix-wizards] Some hacks I'll share!

greg@isrnix.UUCP (Gregory R. Travis) (03/09/84)

   I was doing some playing this evening and guess what I found out:

		1) On the PDP 11/44 a floating point (double precision)
		   clear (8 bytes) is almost exactly twice as fast as
		   4 clr (integer clear (2 bytes each)) instructions.
		   I replaced the code in clrbuf (in bio.c) with
		   floating point clears for a code speedup.
		2) A floating point load (double prec. again) 
		   followed by a floating point store is just a weeeee
		   bit faster than the appropriate number of 'mov'
		   instructions (assuming the cache is disabled).
		   I'll bet on the 11/70 you could use floating point
		   load/stores for twice the speed over conventional
		   mov's.

  What the h*ll does this mean?  That for some applications involving
  manipulation of blocks of data, it may be keen-o to use the floating
  point processor for the manipulations.  Super-cool 11 floating point
  processors (like the FP-11C in the 11/70 and FP-11E in the 11/60)
  that operate in parallel with the CPU may give you quite a performance
  boost if you play your cards right.

  Can anyone see problems with this scheme?  Has anyone thought of it
  before?  

  Does anyone run a 44 or 24 with the commercial instruction set 
  option?  If you do,  do you use the block character move instructions?
  Here at isrnix I wrote some code that copies kernel buffers to/from the
  users address space with 'mov' instructions (the scheme plays with the
  segmentation registers) instead of the slow m[t,f]p[d,i] instructions.
  It would be a thrill to see if I could pop a CIS board in our CPU and
  use the block move instruction and see what kind of a performance
  increase I get.  Even with the current situation I get better than 
  twice the performance in copying buffers than the previous copyin/copyout
  scheme.

  Any comments?

-- 
    Gregory R. Travis
    Institute for Social Research - Indiana University - Bloomington, In
    ihnp4!inuxc!isrnix!greg
    {pur-ee,allegra,qusavx}!isrnix!greg

guy@rlgvax.UUCP (Guy Harris) (03/11/84)

		2) A floating point load (double prec. again) 
		   followed by a floating point store is just a weeeee
		   bit faster than the appropriate number of 'mov'
		   instructions (assuming the cache is disabled).
		   I'll bet on the 11/70 you could use floating point
		   load/stores for twice the speed over conventional
		   mov's.

You could use that as long as the Floating Interrupt on Uninitialized
Variable trap is disabled; otherwise, the bit pattern for -0.0 will cause
a trap.  (I assume that the floating load and store don't normalize, as
the "Min" and "Max" times for LDF are the same, which wouldn't be the case
if they normalized - unless there's a barrel shifter in there).

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

ka@hou3c.UUCP (Kenneth Almquist) (03/13/84)

Another way to clear memory fast on PDP-11 machines is to use the mov
instruction instead of the clear instruction.  The clr instruction
performs a read, modify, write sequence on all the PDP-11 machines that
I have looked at.  Reading from a register is faster than reading from
memory, so if you have a spare register you can zero it, and then replace
the clr instructions in the loop with moves from the register.
					Kenneth Almquist

chris@basser.SUN (Chris Maltby) (03/15/84)

Just a small note to point out that ALL floating point numbers
on PDP11 and VAX computers are normalized. Remember the 'hidden' bit?

So change your bcopy routine if you will, but just hope your FPU
doesn't pack it in.

chris@basser.SUN (Chris Maltby) (03/19/84)

-
The other advantage of the 'mov' instruction over 'clr' is
that it works in the presence of parity errors in the memory
to be cleared. A big hand for the pdp11 microcode designers!