otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) (09/23/89)
A comment on COBOL decimal support in hardware... I haven't looked at this area in many years but I doubt if that much has changed. There are many ways to implement COBOL decimal arithmetic (both ASCII and packed decimal). Common ways are: (1) Implement all cases in one or more runtime libraries. (2) Convert to binary (using hardware instructions if available), add binary, convert from binary (using hardware if available) (3) Use hardware decimal instructions. (4) Generate inline code to use the "trick" to add decimal operands with a binary adder. The "trick" is well known but maybe not known or obvious to those reading this newsgroup. Real compilers might use more than one of the above methods depending on if "special, high-occurring" cases are handled and the performance of decimal instructions on the target hardware. (1) is very slow but might be reasonable since decimal arithmetic might not contribute much to overall application performance. The 10% decimal arithmetic in COBOL might not seem like much but if poorly implemented, decimal arithmetic might require 60% of the runtime cycles. Very seldom, if ever, is (2) fastest due to the problem of converting from binary to decimal. This conversion requires N divides or N+1 multiplies (N = number of digits). In an optimizing compiler, a case might occur that a converted result could be used many times before a conversion back to decimal would be required. In this case, (2) might be the fastest method. Decimal instructions for COBOL have been implemented in many machine architectures. However, most implementations are done in microcode and often are not very fast. Part of the problem is that the implementations try to implement general decimal arithmetic instructions that can handle 1-18 digits, scaling, different signs, etc. The most common cases in COBOL are arithmetic with like items.-- these are known at compile time. Things that are known at compile time are pushed off to runtime -- at the expense of performance. The spirit of RISC would seem to indicate that operations should be broken down at compile time to its simplest components and only that code be generated. Fortunately for COBOL, the most frequent cases that occur can be done with inline code using a binary adder using (4). An example of (4) from a 68K analysis I did in 1984 is: Unsigned ASCII characters can be added using a 32-bit binary adder by the following algorithm: 1. Add the constant 96969696 (base16) to one of the operands. This "biases" the operand so that carries will propagate. If one of the operands is a constant, then the constant will be pre-biased at compile time and this step can be omitted. 2. Add the other operand and the biased operand in binary. Carries will propagate but the resulting digits require correction. If a carry out did occur from a digit, the result requires the addition of 30 (base16). If a carry out did not occur, then the carry propagate bits must be removed before the add of 30 (base16). 3.Correct the result by masking out the carry propagate bits and ORing 3030303030 (base16) The operands are loaded, scaled, and padded depending on the result. For example, the code for a 68K to add two ASCII digits to two ASCII digits giving a four digit result would be: MOVE.L D6,D0 . ASCII zero to D0 MOVE.W op1(A5),D0 . Load first operand MOVE.L D6,D0 . ASCII zero to D1 MOVE.W op2(A5),D1 . Load second operand ADD.L #$96969696,D0 . Bias op1 ADD.L D1,D0 . Binary add MOVE.L D0,D1 . AND.L #$F0F0F0F0,D1 . SUB.L D1,D0 . LSR.L #3,D1 . AND.L #$06060606,D1 . SUB.L D1,D0 . Remove propagate bits OR,L D6,D0 . Form ASCII digits MOVE.L D0,op3(A5) . Store results. The algorithm for packed decimal operands is similar except that computing the correction for the inner carries is more complicated. Some architectures (NCR at least) implement a special instruction to correct the inner carries by using the saved nibble carries from a previous arithmetic instruction. This one simple instruction makes the packed decimal algorithm simpler and faster. In general, using the compiler to determine special cases at compile time and generating the best code results in faster code than supporting a general decimal arithmetic instructions in hardware. Not having decimal instructions is not a 7 to 1 performance penalty at all! Using the decimal instructions as implemented on many machines is a lose. Aren't these just RISC ideas?
ok@cs.mu.oz.au (Richard O'Keefe) (09/23/89)
In article <943@rd1632.Dayton.NCR.COM>, otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) writes: > (2) Convert to binary (using hardware instructions if > available), add binary, convert from binary (using > hardware if available) > Very seldom, if ever, is (2) fastest due to the problem of > converting from binary to decimal. This conversion requires > N divides or N+1 multiplies (N = number of digits). I'll say it again: conversion between N digit decimal and binary, when N is known at compile time, doesn't require **ANY** multiplications or divisions. I'll spell out how to do decimal->binary one digit at a time. Let Dn, ..., D1, D0 be decimal digits 0..9. (They might have been ASCII or EBCDIC originally, with the high bits masked off. Or they might have been nibbles.) Calculate X = D0 + pot[1][D1] + ... + pot[n][Dn] where pot[i][j] is *PRECOMPUTED* as j*10**i. Overflow is only possible in the last step. To support COBOL's 18-digit requirement, 18 * 10 * 64 bits of tables will suffice. At the price of larger tables, it is possible to use fewer memory references. There are other optimisations possible. Binary->decimal conversion is a bit harder, but it is possible to do it with ~N memory references and ~N binary adds or subtracts, and again, it can be optimised.
henry@utzoo.uucp (Henry Spencer) (09/24/89)
In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes: >(2) Convert to binary (using hardware instructions if > available), add binary, convert from binary (using > hardware if available) > >Very seldom, if ever, is (2) fastest due to the problem of >converting from binary to decimal. This conversion requires >N divides or N+1 multiplies (N = number of digits)... Nonsense. The multiplies and divides are done at compile time; the conversion is done with table lookup or the decision-tree equivalent. Even the simplest approach requires roughly 4N comparisons, a bit of data movement, and no multiplies or divides at all. Byte-at-a-time table lookup reduces this nearly to zero. Agreed, however, that playing tricks with the binary adder is a viable alternative if very little arithmetic is being done. -- "Where is D.D. Harriman now, | Henry Spencer at U of Toronto Zoology when we really *need* him?" | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mash@mips.COM (John Mashey) (09/25/89)
In article <943@rd1632.Dayton.NCR.COM> otto@rd1632.UUCP (Jerome A. Otto) writes: >A comment on COBOL decimal support in hardware... .....Lots of good stuff, from somebody who really knows a lot about this topic. .... >Unsigned ASCII characters can be added using a 32-bit binary adder >by the following algorithm: > >1. Add the constant 96969696 (base16) to one of the operands. This ....etc. This is what we (and I suspect many others) do. ...... >In general, using the compiler to determine special cases at >compile time and generating the best code results in faster code >than supporting a general decimal arithmetic instructions in >hardware. Not having decimal instructions is not a 7 to 1 >performance penalty at all! Using the decimal instructions as >implemented on many machines is a lose. >Aren't these just RISC ideas? Yep, Although it would appear that differing RISCs appear to have made differing amounts of emphasis on such support, which would include: a) Decimal arithmetic assists. b) Unaligned operations, which are far more important in COBOL (and PL/1) than in C/PASCAL/FORTRAN. OPINION: in order of support for COBOL, I think the following is a reasonable ordering, but more comments from people who know more would help: 1. HP PA: has Decimal Correct and Intermediate Decimal Correct, Store unaligned (which stores 1-4 bytes, anywhere in word) (maybe somebody from HO could explain a little more?) See COMPUTER, January 1989 article by Ruby Lee. 2. MIPS R3000: has Load Word (left & Right), Store Word (Left & Right), to help all of the unaligned operations and moves. 3. Motorola 88K - some of the bitfield operations might be helpful. 4. SPARC COBOL was clearly a major consideration for HP; at least somewhat for R3000. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086