[comp.unix.wizards] function "prratio

jik@athena.mit.edu (Jonathan I. Kamens) (06/07/90)

  (Note the Followup-To to comp.unix.wizards -- which one to go to is
pretty much a toss-up, so I chose the one I read. :-)

  In the BSD sources for the program compress (ucb/compress/compress.c),
there exists the following function:

    prratio(stream, num, den)
    FILE *stream;
    long int num, den;
    {
	    register int q;			/* Doesn't need to be long */

	    if(num > 214748L) {		/* 2147483647/10000 */
		    q = num / (den / 10000L);
	    } else {
		    q = 10000L * num / den;	/* Long calculations, though */
	    }
	    if (q < 0) {
		    putc('-', stream);
		    q = -q;
	    }
	    fprintf(stream, "%d.%02d%%", q / 100, q % 100);
    }

This function accomplishes using only integer arithmetic what the
following function accomplishes much more clearly, using floating point:

    prratio(stream, num, den)
    FILE *stream;
    long int num, den;
    {
	    double ratio;
	
	    ratio = (double) num / (double) den;

	    fprintf(stream, "%0.2f%%", ratio * 100.0);
    }

  Not only is the former function much less clear than the latter, the
former function screws up in some cases, while the latter one doesn't
(on a VAXstation 3100, for example, plugging in 905762 and 908964 for
num and den in both functions gives you 100.07% from the former
function, and 99.65% from the latter (the latter is correct).

  Keen observers will notice that the constant in the first function is,
of course, quite architecture-specific (or, at least, I think it is).  Sigh.

  My question is, WHY were things done as shown in the first example?  I
can think of several reasons, but I'm not able to convince myself 100%
that any of them were applicable when compress was written, and I'm even
less able to convince myself that any of them are applicable now :-). 
Some of the reasons I've thought of:

1. Integer arithmetic is faster.  Irrelevant when the function only gets
   called once or twice per compression, and that is the case here.

2. Some architectures don't have floating point arithmetic.  Yeah, right.

3. The floating point arithmetic on some architectures is so inaccurate that
   the code above produces better results.

Well, what do y'all think?  WHY was the function originally written as
shown in the first example above, and does it need to stay that way no,
especially since it produces incorrect results?

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

jaw@riacs.edu (James A. Woods) (06/08/90)

good eye.
the original was floating point, like your suggestion, and
i wrote it knowing the floats were not a cpu drain.

it was changed by ken turkowski (still of apple?) to accomodate
a pdp 11 without floating point hardware.

i agree that it should revert to floating point.

better yet would be to speedup dan bernstein's 'squeeze',
a miller/wegman/ziv/lempel coder which has better c-rates. 

ames!jaw

brnstnd@kramden.acf.nyu.edu (06/13/90)

In article <1990Jun7.184525.962@riacs.edu> jaw@riacs.edu (James A. Woods) writes:
  [ calculation of ratios in compress ]
> i agree that it should revert to floating point.

I disagree. Say you want to print the floating-point ratio of integers x
and y, rounded down and accurate to 4 digits. 10000 * (x / y) is fast;
(10000 * (x % y)) / y is easy with appropriate shifts, with no chance of
overflow; and that's it. On anything other than a Cray this is much
faster than any floating-point calculation, not to mention more
reliable. (Of course, you can just take the squeeze 1.7 approach of
assuming that 10000 * x won't overflow. :-) )

The coming version of squeeze still prints to only 1% accuracy, since I
don't find further digits relevant; to calculate (100 * (x % y)) / y, it
does a binary search (with power-of-two deltas). No kidding. It's pretty
damn fast, too.

> better yet would be to speedup dan bernstein's 'squeeze',
> a miller/wegman/ziv/lempel coder which has better c-rates. 

Eliminating unsqueeze's I/O bottlenecks and making it faster than
uncompress, adding compress compatibility, improving portability, and
using a new extensible hash scheme have all come first. I suppose I'll
get around to speeding up squeeze itself at some point...

---Dan