[comp.arch] COBOL Decimal Arithmetic

otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) (09/23/89)

A comment on COBOL decimal support in hardware...

I haven't looked at this area in many years but I doubt if
that much has changed.

There are many ways to implement COBOL decimal arithmetic 
(both ASCII and packed decimal).  Common ways are:

(1)   Implement all cases in one or more runtime libraries.

(2)   Convert to binary (using hardware instructions if
      available), add binary, convert from binary (using 
      hardware if available)

(3)   Use hardware decimal instructions.

(4)   Generate inline code to use the "trick" to add decimal
      operands with a binary adder.  The "trick" is well 
      known but maybe not known or obvious to those reading 
      this newsgroup.

Real compilers might use more than one of the above methods
depending on if "special, high-occurring" cases are handled 
and the performance of decimal instructions on the target 
hardware.

(1) is very slow but might be reasonable since decimal 
arithmetic might not contribute much to overall application
performance.  The 10% decimal arithmetic in COBOL might not
seem like much but if poorly implemented, decimal arithmetic
might require 60% of the runtime cycles.

Very seldom, if ever, is (2) fastest due to the problem of 
converting from binary to decimal.  This conversion requires 
N divides or N+1 multiplies (N = number of digits).  In an 
optimizing compiler, a case might occur that a converted 
result could be used many times before a conversion back to 
decimal would be required.  In this case, (2) might be the 
fastest method.

Decimal instructions for COBOL have been implemented in many
machine architectures.  However, most implementations are 
done in microcode and often are not very fast.  Part of the 
problem is that the implementations try to implement general 
decimal arithmetic instructions that can handle 1-18 digits, 
scaling, different signs, etc.  The most common cases in 
COBOL are arithmetic with like items.-- these are known at 
compile time.  Things that are known at compile time are 
pushed off to runtime -- at the expense of performance.  The 
spirit of RISC would seem to indicate that operations should 
be broken down at compile time to its simplest components and 
only that code be generated.  Fortunately for COBOL, the most 
frequent cases that occur can be done with inline code using 
a binary adder using (4).

An example of (4) from a 68K analysis I did in 1984 is:

Unsigned ASCII characters can be added using a 32-bit binary adder 
by the following algorithm:

1. Add the constant 96969696 (base16) to one of the operands.  This 
"biases" the operand so that carries will propagate.  If one of the 
operands is a constant, then the constant will be pre-biased at 
compile time and this step can be omitted.

2. Add the other operand and the biased operand in binary.  Carries 
will propagate but the resulting digits require correction.  If a 
carry out did occur from a digit, the result requires the addition 
of 30 (base16).  If a carry out did not occur, then the carry 
propagate bits must be removed before the add of 30 (base16).

3.Correct the result by masking out the carry propagate bits and 
ORing 3030303030 (base16)

The operands are loaded, scaled, and padded depending on the 
result.  For example, the code for a 68K to add two ASCII digits to 
two ASCII digits giving a four digit result would be:

     MOVE.L         D6,D0          . ASCII zero to D0
     MOVE.W         op1(A5),D0     . Load first operand
     MOVE.L         D6,D0          . ASCII zero to D1
     MOVE.W         op2(A5),D1     . Load second operand
     ADD.L          #$96969696,D0  . Bias op1
     ADD.L          D1,D0          . Binary add
     MOVE.L         D0,D1          .
     AND.L          #$F0F0F0F0,D1  .
     SUB.L          D1,D0          .
     LSR.L          #3,D1          .
     AND.L          #$06060606,D1  .
     SUB.L          D1,D0          . Remove propagate bits
     OR,L           D6,D0          . Form ASCII digits
     MOVE.L         D0,op3(A5)     . Store results.

The algorithm for packed decimal operands is similar except that 
computing the correction for the inner carries is more complicated.  
Some architectures (NCR at least) implement a special instruction 
to correct the inner carries by using the saved nibble carries from 
a previous arithmetic instruction.  This one simple instruction 
makes the packed decimal algorithm simpler and faster.  

In general, using the compiler to determine special cases at 
compile time and generating the best code results in faster code 
than supporting a general decimal arithmetic instructions in 
hardware.  Not having decimal instructions is not a 7 to 1 
performance penalty at all!  Using the decimal instructions as 
implemented on many machines is a lose.

Aren't these just RISC ideas?

ok@cs.mu.oz.au (Richard O'Keefe) (09/23/89)

In article <943@rd1632.Dayton.NCR.COM>, otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) writes:
> (2)   Convert to binary (using hardware instructions if
>       available), add binary, convert from binary (using 
>       hardware if available)

> Very seldom, if ever, is (2) fastest due to the problem of 
> converting from binary to decimal.  This conversion requires 
> N divides or N+1 multiplies (N = number of digits).

I'll say it again:  conversion between N digit decimal and binary,
when N is known at compile time, doesn't require **ANY** multiplications
or divisions.  I'll spell out how to do decimal->binary one digit at a
time.  Let Dn, ..., D1, D0 be decimal digits 0..9.  (They might have
been ASCII or EBCDIC originally, with the high bits masked off.  Or
they might have been nibbles.)  Calculate
	X = D0 + pot[1][D1] + ... + pot[n][Dn]
where pot[i][j] is *PRECOMPUTED* as j*10**i. Overflow is only possible
in the last step.  To support COBOL's 18-digit requirement, 18 * 10 * 64
bits of tables will suffice.  At the price of larger tables, it is
possible to use fewer memory references.  There are other optimisations
possible.  Binary->decimal conversion is a bit harder, but it is
possible to do it with ~N memory references and ~N binary adds or
subtracts, and again, it can be optimised.

henry@utzoo.uucp (Henry Spencer) (09/24/89)

In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes:
>(2)   Convert to binary (using hardware instructions if
>      available), add binary, convert from binary (using 
>      hardware if available)
>
>Very seldom, if ever, is (2) fastest due to the problem of 
>converting from binary to decimal.  This conversion requires 
>N divides or N+1 multiplies (N = number of digits)...

Nonsense.  The multiplies and divides are done at compile time; the
conversion is done with table lookup or the decision-tree equivalent.
Even the simplest approach requires roughly 4N comparisons, a bit of data
movement, and no multiplies or divides at all.  Byte-at-a-time table
lookup reduces this nearly to zero.

Agreed, however, that playing tricks with the binary adder is a viable
alternative if very little arithmetic is being done.
-- 
"Where is D.D. Harriman now,   |     Henry Spencer at U of Toronto Zoology
when we really *need* him?"    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

mash@mips.COM (John Mashey) (09/25/89)

In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes:
>A comment on COBOL decimal support in hardware...
.....Lots of good stuff, from somebody who really knows a lot about
this topic.
....
>Unsigned ASCII characters can be added using a 32-bit binary adder 
>by the following algorithm:
>
>1. Add the constant 96969696 (base16) to one of the operands.  This 
....etc.
This is what we (and I suspect many others) do.
......
>In general, using the compiler to determine special cases at 
>compile time and generating the best code results in faster code 
>than supporting a general decimal arithmetic instructions in 
>hardware.  Not having decimal instructions is not a 7 to 1 
>performance penalty at all!  Using the decimal instructions as 
>implemented on many machines is a lose.

>Aren't these just RISC ideas?

Yep, Although it would appear that differing RISCs appear to have made
differing amounts of emphasis on such support, which would include:
	a) Decimal arithmetic assists.
	b) Unaligned operations, which are far more important in COBOL
	(and PL/1) than in C/PASCAL/FORTRAN.

OPINION: in order of support for COBOL, I think the following is
	a reasonable ordering, but more comments from people who know
	more would help:

	1. HP PA: has Decimal Correct and Intermediate Decimal Correct,
	Store unaligned  (which stores 1-4 bytes, anywhere in word)
	(maybe somebody from HO could explain a little more?)
	See COMPUTER, January 1989 article by Ruby Lee.
	2. MIPS R3000: has Load Word (left & Right), Store Word (Left &
	Right), to help all of the unaligned operations and moves.
	3. Motorola 88K - some of the bitfield operations might be helpful.
	4. SPARC

COBOL was clearly a major consideration for HP; at least somewhat for R3000.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086