otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) (09/23/89)
A comment on COBOL decimal support in hardware...
I haven't looked at this area in many years but I doubt if
that much has changed.
There are many ways to implement COBOL decimal arithmetic
(both ASCII and packed decimal). Common ways are:
(1) Implement all cases in one or more runtime libraries.
(2) Convert to binary (using hardware instructions if
available), add binary, convert from binary (using
hardware if available)
(3) Use hardware decimal instructions.
(4) Generate inline code to use the "trick" to add decimal
operands with a binary adder. The "trick" is well
known but maybe not known or obvious to those reading
this newsgroup.
Real compilers might use more than one of the above methods
depending on if "special, high-occurring" cases are handled
and the performance of decimal instructions on the target
hardware.
(1) is very slow but might be reasonable since decimal
arithmetic might not contribute much to overall application
performance. The 10% decimal arithmetic in COBOL might not
seem like much but if poorly implemented, decimal arithmetic
might require 60% of the runtime cycles.
Very seldom, if ever, is (2) fastest due to the problem of
converting from binary to decimal. This conversion requires
N divides or N+1 multiplies (N = number of digits). In an
optimizing compiler, a case might occur that a converted
result could be used many times before a conversion back to
decimal would be required. In this case, (2) might be the
fastest method.
Decimal instructions for COBOL have been implemented in many
machine architectures. However, most implementations are
done in microcode and often are not very fast. Part of the
problem is that the implementations try to implement general
decimal arithmetic instructions that can handle 1-18 digits,
scaling, different signs, etc. The most common cases in
COBOL are arithmetic with like items.-- these are known at
compile time. Things that are known at compile time are
pushed off to runtime -- at the expense of performance. The
spirit of RISC would seem to indicate that operations should
be broken down at compile time to its simplest components and
only that code be generated. Fortunately for COBOL, the most
frequent cases that occur can be done with inline code using
a binary adder using (4).
An example of (4) from a 68K analysis I did in 1984 is:
Unsigned ASCII characters can be added using a 32-bit binary adder
by the following algorithm:
1. Add the constant 96969696 (base16) to one of the operands. This
"biases" the operand so that carries will propagate. If one of the
operands is a constant, then the constant will be pre-biased at
compile time and this step can be omitted.
2. Add the other operand and the biased operand in binary. Carries
will propagate but the resulting digits require correction. If a
carry out did occur from a digit, the result requires the addition
of 30 (base16). If a carry out did not occur, then the carry
propagate bits must be removed before the add of 30 (base16).
3.Correct the result by masking out the carry propagate bits and
ORing 3030303030 (base16)
The operands are loaded, scaled, and padded depending on the
result. For example, the code for a 68K to add two ASCII digits to
two ASCII digits giving a four digit result would be:
MOVE.L D6,D0 . ASCII zero to D0
MOVE.W op1(A5),D0 . Load first operand
MOVE.L D6,D0 . ASCII zero to D1
MOVE.W op2(A5),D1 . Load second operand
ADD.L #$96969696,D0 . Bias op1
ADD.L D1,D0 . Binary add
MOVE.L D0,D1 .
AND.L #$F0F0F0F0,D1 .
SUB.L D1,D0 .
LSR.L #3,D1 .
AND.L #$06060606,D1 .
SUB.L D1,D0 . Remove propagate bits
OR,L D6,D0 . Form ASCII digits
MOVE.L D0,op3(A5) . Store results.
The algorithm for packed decimal operands is similar except that
computing the correction for the inner carries is more complicated.
Some architectures (NCR at least) implement a special instruction
to correct the inner carries by using the saved nibble carries from
a previous arithmetic instruction. This one simple instruction
makes the packed decimal algorithm simpler and faster.
In general, using the compiler to determine special cases at
compile time and generating the best code results in faster code
than supporting a general decimal arithmetic instructions in
hardware. Not having decimal instructions is not a 7 to 1
performance penalty at all! Using the decimal instructions as
implemented on many machines is a lose.
Aren't these just RISC ideas?ok@cs.mu.oz.au (Richard O'Keefe) (09/23/89)
In article <943@rd1632.Dayton.NCR.COM>, otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) writes: > (2) Convert to binary (using hardware instructions if > available), add binary, convert from binary (using > hardware if available) > Very seldom, if ever, is (2) fastest due to the problem of > converting from binary to decimal. This conversion requires > N divides or N+1 multiplies (N = number of digits). I'll say it again: conversion between N digit decimal and binary, when N is known at compile time, doesn't require **ANY** multiplications or divisions. I'll spell out how to do decimal->binary one digit at a time. Let Dn, ..., D1, D0 be decimal digits 0..9. (They might have been ASCII or EBCDIC originally, with the high bits masked off. Or they might have been nibbles.) Calculate X = D0 + pot[1][D1] + ... + pot[n][Dn] where pot[i][j] is *PRECOMPUTED* as j*10**i. Overflow is only possible in the last step. To support COBOL's 18-digit requirement, 18 * 10 * 64 bits of tables will suffice. At the price of larger tables, it is possible to use fewer memory references. There are other optimisations possible. Binary->decimal conversion is a bit harder, but it is possible to do it with ~N memory references and ~N binary adds or subtracts, and again, it can be optimised.
henry@utzoo.uucp (Henry Spencer) (09/24/89)
In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes: >(2) Convert to binary (using hardware instructions if > available), add binary, convert from binary (using > hardware if available) > >Very seldom, if ever, is (2) fastest due to the problem of >converting from binary to decimal. This conversion requires >N divides or N+1 multiplies (N = number of digits)... Nonsense. The multiplies and divides are done at compile time; the conversion is done with table lookup or the decision-tree equivalent. Even the simplest approach requires roughly 4N comparisons, a bit of data movement, and no multiplies or divides at all. Byte-at-a-time table lookup reduces this nearly to zero. Agreed, however, that playing tricks with the binary adder is a viable alternative if very little arithmetic is being done. -- "Where is D.D. Harriman now, | Henry Spencer at U of Toronto Zoology when we really *need* him?" | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mash@mips.COM (John Mashey) (09/25/89)
In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes: >A comment on COBOL decimal support in hardware... .....Lots of good stuff, from somebody who really knows a lot about this topic. .... >Unsigned ASCII characters can be added using a 32-bit binary adder >by the following algorithm: > >1. Add the constant 96969696 (base16) to one of the operands. This ....etc. This is what we (and I suspect many others) do. ...... >In general, using the compiler to determine special cases at >compile time and generating the best code results in faster code >than supporting a general decimal arithmetic instructions in >hardware. Not having decimal instructions is not a 7 to 1 >performance penalty at all! Using the decimal instructions as >implemented on many machines is a lose. >Aren't these just RISC ideas? Yep, Although it would appear that differing RISCs appear to have made differing amounts of emphasis on such support, which would include: a) Decimal arithmetic assists. b) Unaligned operations, which are far more important in COBOL (and PL/1) than in C/PASCAL/FORTRAN. OPINION: in order of support for COBOL, I think the following is a reasonable ordering, but more comments from people who know more would help: 1. HP PA: has Decimal Correct and Intermediate Decimal Correct, Store unaligned (which stores 1-4 bytes, anywhere in word) (maybe somebody from HO could explain a little more?) See COMPUTER, January 1989 article by Ruby Lee. 2. MIPS R3000: has Load Word (left & Right), Store Word (Left & Right), to help all of the unaligned operations and moves. 3. Motorola 88K - some of the bitfield operations might be helpful. 4. SPARC COBOL was clearly a major consideration for HP; at least somewhat for R3000. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086