[comp.lang.c++] Add-with-carry operator

zweig@cs.uiuc.edu (Johnny Zweig) (02/04/91)

In my work with the TCP/IP protocol suite, I have had to implement 16-bit
1's-complement arithmetic in a protable, efficient way for protocol processing
modules.  In case you're rusty, ones-complement addition is exactly ordinary
twos-complement addition with carry (i.e. 101 + 111 = 101 in 3-bit 1's
complement since the 1 carried out of the left goes "end around" to the low
order bits -- 111 thus is "-0").

Every microprocessor I've ever seen the assembly language instruction set for
has, in addition to ordinary twos-complement addition instructions, some kind
of "add with carry" instruction which, while not actually implementing ones
complement arithmetic (to add A and B in ones complement, you execute A + B
add-with-carry 0), is just what the doctor ordered for doing IP checksum
computations, and many other iterative ones-complement arithmetic
computations.

The problem: there is not an add-with-carry operator in C++, so there isn't
any way of writing portable code to use this. I know, I know, I know, I can be
totally gross and icky and hard and use asm's and come up with a different one
for each processor/compiler combination I want to run on. But if there were a
standard way of letting the compiler know that I want to do add-with-carry
rather than vanilla add I could write portable, efficient code that people
like Van Jacobson would think was swell.  I suppose that a properly inlined
function could do the trick, but it would be nice if there were an operator.

Comments?

-Johnny Ones

dhoyt@vx.acs.umn.edu (DAVID HOYT) (02/05/91)

In article <1991Feb3.202530.14874@julius.cs.uiuc.edu>, zweig@cs.uiuc.edu (Johnny Zweig) writes...
>The problem: there is not an add-with-carry operator in C++, so there isn't
>any way of writing portable code to use this.
  You can write add with carry code that will work portably with all
2's complement machines.   Unless you're writing code for some special
embedded processor you can hit 100% of the machines out there with one
source program and no ifdefs, and if you are writing for an embedded
processor you aren't interested in portable code anyway.  I vote no for this
one.  The algorithm is, of course, left up to the reader (hint it was solved
a looong time ago).

david | dhoyt@vx.acs.umn.edu | dhoyt@vx.acs.umn.edu

jimad@microsoft.UUCP (Jim ADCOCK) (02/05/91)

In article <1991Feb3.202530.14874@julius.cs.uiuc.edu> zweig@cs.uiuc.edu writes:
|
|The problem: there is not an add-with-carry operator in C++, so there isn't
|any way of writing portable code to use this. I know, I know, I know, I can be
|totally gross and icky and hard and use asm's and come up with a different one
|for each processor/compiler combination I want to run on. But if there were a
|standard way of letting the compiler know that I want to do add-with-carry
|rather than vanilla add I could write portable, efficient code that people
|like Van Jacobson would think was swell.  I suppose that a properly inlined
|function could do the trick, but it would be nice if there were an operator.
|
|Comments?

On my machine/compiler, the following example generates code
close to what one would generate by hand in assembler.  Thus, one
has an example showing it's not necessary for the language to add
a add-plus-carry command -- good optimizers can generate such
commands where implied by code.  I believe compilers ought to 
offer built-in support for word sizes to 2X the natural bit-size of
the machine.  Thus, for example, 32-bit machines ought to support
code generation for 64-bit integer arithmetic.  Whether 64-bit arithmetic
on a 32-bit machine ought to be called "long long" or just "long"
remains debatable.


/* 
assumes: 
8 bits per byte
two's complement
sizeof(short) < sizeof(long)
*/

#define bitsperbyte 8

class longlong
{
private:
    unsigned short s[4];
public:
    longlong(long);
    const longlong& operator+=(const longlong& p);
    void print();
};

longlong::longlong(long l) 
{
    s[0] = (unsigned short)l;
    s[1] = (unsigned short)(l >> (bitsperbyte * sizeof(unsigned short)));
    s[2] = s[3] = (l >= 0) ? (unsigned short)0 : (unsigned short)-1;;
}

const longlong& longlong::operator+=(const longlong& p)
{
    long accum;
    accum = (long) s[0] + p.s[0];
    // unwrapped loop: 
    s[0] = (unsigned short)accum;
    accum = ((unsigned short)(accum >> (bitsperbyte * sizeof(unsigned short))))
        + (long) s[1] + p.s[1];
    s[1] = (unsigned short)accum;
    accum = ((unsigned short)(accum >> (bitsperbyte * sizeof(unsigned short))))
        + (long) s[2] + p.s[2];
    s[2] = (unsigned short)accum;
    accum = ((unsigned short)(accum >> (bitsperbyte * sizeof(unsigned short))))
        + (long) s[3] + p.s[3];
    s[3] = (unsigned short)accum;
    return *this;
}

extern "C" void printf(const char*, ...);
void longlong::print()
{
    printf("%04X%04X%04X%04X\n", s[3], s[2], s[1], s[0]);
}

main()
{
    longlong a(0x12345678L); a.print();
    longlong b(0xEDCBA988L); b.print();
    a += b; a.print();

    return 0;
}

zweig@cs.uiuc.edu (Johnny Zweig) (02/07/91)

Peter Deutsch sent me the following piece of email:

From deutsch@parcplace.com Mon Feb  4 12:05:07 1991
Date: Mon, 4 Feb 91 09:59:50 PST
From: deutsch@parcplace.com (Peter Deutsch)
To: zweig@cs.uiuc.edu
Subject: Add-with-carry
Status: OR

Please post this to comp.lang.c++ for me: our mail posting software is
permanently broken.

I too have wanted to do add-with-carry in C, so I found the next best
thing: an algorithmic hack to avoid needing it.  Here is the code for
computing z=x+y, with cin and cout being carry-in and carry-out
respectively (0 or 1).  x, y, and z must all be declared as >>unsigned<<.

	if ( cin )
	  cout = (x >= ~y ? 1 : 0),
	  z = x + y + 1;
	else
	  cout = (x > ~y ? 1 : 0),
	  z = x + y;

If you are doing accumulate-with-carry over an array, the best thing to do
is probably a split loop, something like this:

	/* Assume cin=0 to start */
	unsigned accum = 0;
	unsigned i = 0;
	unsigned elt;
	for ( ; ; )
	 {
carry0:	   if ( i >= count ) break;
	   elt = data[i]; i++;
	   if ( elt > ~accum )
	    { accum += elt; goto carry1; }
	   else
	    { accum += elt; goto carry0; }
carry1:	   if ( i >= count ) break;
	   elt = data[i]; i++;
	   if ( elt >= ~accum )
	    { accum += elt + 1; goto carry1; }
	   else
	    { accum += elt + 1; goto carry0; }
	 }
----------------------------------------------------------------------------
While I appreciate that it's doable (and I appreciate David's comment that
this problem was solved "a looooonnnnnngggg time ago") I _still_ don't think
this addresses my basic point:

	WHY SHOULD I HAVE TO WRITE CODE THAT PERFORMS OPERATIONS THAT ARE
	UNNECESSARY ON EVERY MICROPROCESSOR PRODUCED WITHIN THE LAST 10
	YEARS?

I know the trick of holding the carry bits somewhere else (my code does 16-bit
2's complement adds in 32-bit words and uses the high order bits to hang on to
the carry bits).  I see Peter's solution, though it involves comparisons that
will also add to my instruction-count.  My officemate reminded me that G++ can
do an inline asm() in a library which expands to add-with-carry (though since
the language makes no guarantees about keeping the carry bit in your context,
it isn't clear that even this will always produce correct code).

Maybe I have gotten spoiled by the fact that, in general, I can write C/C++
code that I can think of as portable assembly language. I think the fact that
the language doesn't have a carry bit or a way of setting/clearing/using it
for calculations is an omission (God help me when Bjarne hears that!!).  If
someone has a straightforward way (not left as an exercise to the reader) that
I can produce:

	movl	a0@+, d2	| fetch 32-bit word
	addxl	d2,d0		| add it along with previous carry
	movl	a0@+, d2	| fetch 32-bit word
	addxl	d2,d0		| add it along with previous carry
	movl	a0@+, d2	| fetch 32-bit word
	addxl	d2,d0		| add it along with previous carry

on my 68020 from something along the lines of:

	for(i = 0; i < length; i++) {
	    sum = add_with_carry( sum, p[i] );
	}

and the moral equivalent of the given assembly code on my Sparc, I would love
to hear it.  Maybe it isn't worth it -- I am willing to concede that. But
given the closeness of the virtual machine model C++ has with that of most
major microprocessor vendors (in terms of 8-,16-,32-bit words, indirect
addressing, and that sort of thing) it has always bugged me (pun intended)
that there is not a machine-independent way of talking about carry bits (and,
for that matter, sign and overflow bits) within the language.

-Johnny Carry

egr@contact.uucp (Gordan Palameta) (02/07/91)

In <1991Feb3.202530.14874@julius.cs.uiuc.edu> zweig@cs.uiuc.edu (Johnny Zweig) writes:

>In my work with the TCP/IP protocol suite, I have had to implement 16-bit
>1's-complement arithmetic in a protable, efficient way for protocol processing
>[rest of article omitted]

To add-with-carry two 16-bit numbers in a portable way:

       - convert them to unsigned 32-bit (high word zero-filled)
       - add them to get a 32-bit result (normal addition, ie 2's complement)
       - take the upper 16 bits and lower 16 bits and add them together,
         getting a 32-bit result
       - do previous step again (important!)

The lower 16 bits will be your answer (the upper 16 bits will be zero-filled).

Try the tcp-ip newsgroup for any further info...

henry@zoo.toronto.edu (Henry Spencer) (02/08/91)

In article <1991Feb6.210332.12826@julius.cs.uiuc.edu> zweig@cs.uiuc.edu writes:
>	WHY SHOULD I HAVE TO WRITE CODE THAT PERFORMS OPERATIONS THAT ARE
>	UNNECESSARY ON EVERY MICROPROCESSOR PRODUCED WITHIN THE LAST 10
>	YEARS?

The fast answer is "because there are machines on which they are necessary".
Standards committees hesitate to require features that are flat-out
unimplementable on some machines, like a carry bit.  Admittedly, such
machines are not common, but they do exist; my MSc thesis involved work
on one.

The right answer to such a requirement is probably a subroutine library
written in assembler.
-- 
"Maybe we should tell the truth?"      | Henry Spencer at U of Toronto Zoology
"Surely we aren't that desperate yet." |  henry@zoo.toronto.edu   utzoo!henry

dhoyt@vx.acs.umn.edu (DAVID HOYT) (02/08/91)

In article <1991Feb6.210332.12826@julius.cs.uiuc.edu>, zweig@cs.uiuc.edu (Johnny Zweig) writes...
>While I appreciate that it's doable (and I appreciate David's comment that
>this problem was solved "a looooonnnnnngggg time ago") I _still_ don't think
>this addresses my basic point:

  Sorry,  that came off a bit more snotty than I really meant.  Really,
what I should have said was that you wouldn't find in in any recently
published book.

> 
>	WHY SHOULD I HAVE TO WRITE CODE THAT PERFORMS OPERATIONS THAT ARE
>	UNNECESSARY ON EVERY MICROPROCESSOR PRODUCED WITHIN THE LAST 10
>	YEARS?
  My response was a bit of a knee jerk, I must admit.  My feeling, which
is mine, is that this is really an optimization problem for a specific
implementation.  (Nice to have an optimiser that would cover that one, aye?)
But the problem is even worse than a simple optimization problem.  To put
features into a language that maps to an efficient use of some particular
machine feature is very dangerous.  How many bugs have been found, and are
still left to be found, in code written by programmers assumed that

    *pp++ = *qp++;

was an atomic operation? 

>                                                                      But
>given the closeness of the virtual machine model C++ has with that of most
>major microprocessor vendors (in terms of 8-,16-,32-bit words, indirect
>addressing, and that sort of thing) it has always bugged me (pun intended)
>that there is not a machine-independent way of talking about carry bits (and,
>for that matter, sign and overflow bits) within the language.

Soapbox time.

  Shreek!  As someone who works with Crays (64 bit words, 24, 32, 48, or 64
bits which are significant in integer operations) if you ever give me
code that assumes that you have a 32 bit processor, indirect addressing,
sizeof( struct { char a[4]; } ) == 4 or anything else like that, I'll really
scream.  32 bit processors are reaching advanced middle age.  In ten years
the sparc 2, mips 3000, et al. will be considered as quaint as the 8088
is now.

  I understand what you are saying, but in real terms I must say that
addition-with-carry bignum math isn't very important.  In any optimization
effort, one should always design the code first using algorithms of the
proper complexity (O(n log n) rather than O(n^2)), then measure the actual
application, if the performance is not acceptable, then identify the areas
that consume the greatest amount of resources, then an only then look for
ways to tweak the code to run faster.  Hand coding of assembler is one of
the tools at your disposal, but it is seldom needed.  In real world terms,
the original design is far more important than how many instruction cycles
that it will take to execute any small segment of code.

  All of this is the realm of software engineering, not in the design of a
new language.  Take a gander at Bliss some time.  It has all of these great
optimization features of the language, pretty much the same kind of features
that you are asking for in C++.  Now look at the $10-$15 billion dollar
company called Digital which is in danger of missing the boat completely
because it is taking them five or more years to convert VMS from Bliss to C
(or whatever their converting it into).

   In the long run, I am much more interested in adding things like private
and public as modifiers to extern and static than adding what seems to be
strictly an optimization issue for particular implementations of C++.

david | dhoyt@vx.acs.umn.edu            An opinion?  I work for a bureaucracy!

jimad@microsoft.UUCP (Jim ADCOCK) (02/15/91)

In article <1991Feb6.210332.12826@julius.cs.uiuc.edu> zweig@cs.uiuc.edu writes:

|	WHY SHOULD I HAVE TO WRITE CODE THAT PERFORMS OPERATIONS THAT ARE
|	UNNECESSARY ON EVERY MICROPROCESSOR PRODUCED WITHIN THE LAST 10
|	YEARS?
|
|I know the trick of holding the carry bits somewhere else (my code does 16-bit
|2's complement adds in 32-bit words and uses the high order bits to hang on to
|the carry bits).  I see Peter's solution, though it involves comparisons that
|will also add to my instruction-count.  My officemate reminded me that G++ can
|do an inline asm() in a library which expands to add-with-carry (though since
|the language makes no guarantees about keeping the carry bit in your context,
|it isn't clear that even this will always produce correct code).

Please read my example again.  I wrote my example on a machine with
only 16 bit arithmetic.  Therefore, to support 32-bit longs, my
compiler has to be smart enough to generate code that uses the carry
identically the same as if one were writing in assembly.  Further,
since people often perform mixed size arithmetic, the compiler is
smart enough to optimize-out instructions where it knows it would
be adding zero.  So code gets generated to make the best use of
the carry bit possible, and does so without adding more low level 
hacks to the language.

Now you'll probably say: "But, I have a 32-bit machine, not a 16-bit
machine."  Fine.  Many compiler developers are beginning to recognize
that ideas that work well on 16-bit machines also have application
to 32-bit machines.  Therefore these compiler vendors are beginning
to offer built-in compiler support for "software" "long long" 64-bit types.
Once you have a compiler that supports 64-bit "long longs" you
will get code generation using the carry bits the same as if you 
did it by hand.

In summary, given that one has an optimizing compiler with built-in "software"
arithmetic support for words 2X the native word size of the underlying
machine, one can easily write code that automatically makes correct/optimal
use of the carry bit.  It is not necessary to add another hack to the
language.  Rather, compiler vendors should be providing optimized code
generation support to 2X the word size of their underlying machine.
Given such support, programmers can write classes that extrapolate to
NX the word size of the underlying machine, and proper use of the
carry bit will be used though-out.

Instead of adding more hacks, vendors should be worrying about what is
necessary for better automatic code optimization.