[comp.arch] special instructions for checksums

kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/15/90)

In article <69436@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
\\\
>Current, fast workstations move about 1MByte/sec TCP/IP user-process-to-
>user-process over ethernet.  Ignoring important details, one saved
>instruction/byte is a million saved instructions/sec.  Simplistically, if
>you could "fetch, add word to accumulator, add-carry to accumulator", you
>could save 0.5 instructions/byte on a MIPS CPU.  Of course an ADDC
>instruction would cost in many other places, dragging in all of the
>disadvantages of status bits.
\\\

Well considering the poor `ol cpu has to wait & wait & wait for that byte to 
come along, even at 1 Mb, I think context switching comes into the picture 
(it does anyway, right)?

If you take this into a/c don't you lose your 1/2 instruction per byte?

I'm not qualified to talk about MIPS architecture, and I'm not sure how 
familiar you are with modern pipeline design, but, in _general_, obtaining 
psw info like carry is not perhaps as staightforward as it appears. 

Status bits may be effectively unavailable to immediately-following alu 
operations, for example. Just as the carry is coming _out_ of the alu the next 
set of operands is going _in_ -- it _seems_ easy enough to have it 
recirculate, doesn't it? There are various timing considerations and design 
niceties (e.g. to latch all the status bits into a psw register at the same 
time we must wait for the last one to stabalize -- e.g. zero detect) that 
sometimes make this difficult, especially when trying to make overall savings 
in the design.

In the checksum situation it seems to me you don't have any choice but to fill 
the resulting delay slot with a nop. There goes your 1/2 instruction!

On another tack (probably a tangent) -- how much would it cost to
``add in that extra carry wire'' to an existing design? (Or would you
rather MIPS support two slightly different machines?).

\begin{story}

First off -- we have to ask the guys (to be understood as a genderless
term when _I_ use it) to ``drop in that extra carry wire''. After
a bit of grumbling they do it -- maybe these guys are _good_ and
it takes a couple of weeks (whew!) and maybe it takes longer. (You
better start to run a tab on this now).

Now you can _bet_ this isn't going to be transparent to _all_
existing software so the software guys are going to be racking
their brains trying to figure out what needs changing here & there.
Of course the _compilers_ will need a ``few patches'' now that
(possibly) delay slot logic has changed a bit. Of course, now
that the _compiler_ has changed we may need to mod a few other
things too. This is getting a bit expensive....

Time passes.

This month's sales figures show our extra instruction didn't
cause sales to skyrocket; people apparently don't want to
buy our boxes to implement that faster checksum for their
ethernets! Why not? The advertising blurb made it perfectly
clear how much faster it would run... Talking about that,
the bill just come it... Doesn't look good...

Maybe we'll have to jack the price of the box up to start
_paying_ for all this (even `tho we did so in anticipation 
of this, it wasn't enough). Might affect sales anyway so we'd 
better jack it _way_ up...

``Whose idea _was_ this anyway?''

Meanwhile, on the other side, of town someone has put
their checksum code into a $10 chip -- and it handles
10 Mb ethernet too! (:-)

\end{story}

>What do the extend precision experts say about carry bits?

I'm only a novice but I don't need carry. 

Although carry was a popular way to perform some stuff for residue arithmetic 
(i.e.  calculating remainder 2^n-1 where n is the word size) I'm not sure the 
time saved by using it, considering all the _other_ stuff that typically goes 
on in these XP algorithms (i.e. was have to do lots of integer multiplies 
anyway), is significant.  Would someone like to check?

In any case, since you can't even take unsigned arithmetic for granted,
depending on carry would not be portable. I don't know of any (portable)
way to test it in common HLLs for another thing. [And I do _not_ repeat _not_ 
like to have to learn X-1000 different assembly languages in order to support 
some trashy (if _I_ wrote it) piece of software that is going to die within a 
matter of years anyway].

-Kym Horsell