[comp.sys.m68k] ML FP

krooglik@gondwana10.ecr.mu.oz (Alex KROOGLIK) (07/05/90)

  Can anyone help me. I have been programming in 68K on the Amiga for
around 2 years now and I have finally taken the plunge and tried to write
some FP software. However, I am finding this extremely diffcult. I have
considered using BCD but I find it too slow. What I would like to know
is if anyone has written FP software that is SUPER accurate. 128 bits
doesn't particularly concern me. What I basically want to know is:

               1. How do I turn a decimal number (eg 1.23123484) into Binary.
	       2. How do I then convert the binary into FP (ie Normalise it)

 I know all of the necessary rules for performing arithmetic, so I would
really appreciate anyones help on the above.


						Alex Krooglik
						University of Melbourne
						Melbourne, AUSTRALIA
						[128.250.64.1]

mcmahan@netcom.UUCP (Dave Mc Mahan) (07/10/90)

 In a previous article, krooglik@gondwana10.ecr.mu.oz (Alex KROOGLIK) writes:
>
>  Can anyone help me. I have been programming in 68K on the Amiga for
>around 2 years now and I have finally taken the plunge and tried to write
>some FP software. However, I am finding this extremely diffcult. I have
>considered using BCD but I find it too slow. What I would like to know
>is if anyone has written FP software that is SUPER accurate. 128 bits
>doesn't particularly concern me. What I basically want to know is:
>
>               1. How do I turn a decimal number (eg 1.23123484) into Binary.

I'm not sure if this is what you want, but it works for me.  This routine
was written off the top of my head and may have a few syntax errors.  The
concept is quite simple, though.  I assume that when you say "decimal number"
you mean a string that contains a decimal number, which you then wish to
turn into an internal floating point form.  Integer decimal numbers are a
trivial subset of the following routine.

float base10_to_float(input)
char	*input;
{
	int	decimal_pt_seen;
	float	output_value,
			scale_factor;

	decimal_pt_seen = FALSE;
	output_value    = 0.0;
	scale_factor    = 0.1;

	while (*input)
	{
			switch (*input)
			{
			case '.' :
				if (!decimal_pt_seen)
				{
					decimal_pt_seen = TRUE;
				}
				else
				{
					puts("Too many decimal points in your value\n");
					return -999999.99999;  /*  Or whatever signifies an error  */
				}
				break;

			case '0' :
			case '1' :
			case '2' :
			case '3' :
			case '4' :
			case '5' :
			case '6' :
			case '7' :
			case '8' :
			case '9' :
				if (!decimal_pt_seen)
				{
					return_value = return_value * 10.0;
					return_value = return_value + (*input - '0');
				}
				else
				{
					return_value = return_value + scale_factor * (*input - '0');
					scale_factor = scale_factor / 10.0;
				}
				break;

			default :
				printf("Unrecognized character value in the string.\n");
				return -999999.99999;  /*  Or whatever signifies an error  */
		}
		input++;
	}
	return return_value;
}

>	       2. How do I then convert the binary into FP (ie Normalise it)

To convert to the internal format of the floating point chip, you need to know
the specific bit patterns and length of the chip in use.  There are bits for
sign and representation of infinities, bits for the mantissa and bits for the
exponent.  Each chip family lays these out in a different form that they feel
gives the best results for what they have optimized to.  I assume you mean the
format used by the 68881/2 chip family.  I can't seem to find my book, so I
will not be able to help you out on this one.  Obviously, the best way is if
the compiler you are using (or are you using an assembler?) supports the
format of the chip you are using.  Otherwise, you need to do some major
writing of routines that convert between ASCII, FP format, and integer formats.

For super accurate stuff (many digits of precision) there are several
approaches.  From you post, it seems that speed is essential.  Without knowing
more about your specific task, I'm afraid I can't help you.

>						Alex Krooglik

   -dave

tim@proton.amd.com (Tim Olson) (07/13/90)

|  In a previous article, krooglik@gondwana10.ecr.mu.oz (Alex KROOGLIK) writes:
| >
| >  Can anyone help me. I have been programming in 68K on the Amiga for
| >around 2 years now and I have finally taken the plunge and tried to write
| >some FP software. However, I am finding this extremely diffcult. I have
| >considered using BCD but I find it too slow. What I would like to know
| >is if anyone has written FP software that is SUPER accurate. 128 bits
						^^^^^^^^^^^^^^^

| >               1. How do I turn a decimal number (eg 1.23123484) into Binary.


In article <11865@netcom.UUCP> mcmahan@netcom.UUCP (Dave Mc Mahan) writes:
| float base10_to_float(input)
| char	*input;
| {
| 	int	decimal_pt_seen;
| 	float	output_value,
| 			scale_factor;
| 
| 	decimal_pt_seen = FALSE;
| 	output_value    = 0.0;
| 	scale_factor    = 0.1;

	.
	.
	.

The problem with this is that it uses floating point, and values that
are not exactly representable in binary floating-point implementations
(i.e. 0.1).  Here is a more accurate way to do this conversion (taken
from an assembler I wrote):

-----------------------------------------------------------------------------
/* atoieee -- convert ascii to IEEE representation */
atoieee(type)

int type;
{
	int	i,
		ch,		/* character for conversion */
		m_sign,		/* mantissa sign */
		e_sign,		/* exponent sign */
		decpt,		/* decimal point */
		d_exp,		/* decimal exponent */
		c_exp,		/* calculated decimal exponent */
		exp;		/* binary exponent */
	
	WORD	mant1,		/* mantissa word 1 */
		mant2,		/* mantissa word 2 */
		tmp1,		/* temporary mantissa */
		tmp2;
	m_sign = e_sign = decpt = d_exp = c_exp = exp = 0;
	mant1 = mant2 = 0;

	/* skip optional white space */
	if (!skipwhite()) {
		error(XPFLOAT);
		return (FALSE);
	}
	/* get optional mantissa sign */
	ch = sgetc();
	if (ch == '+')
		ch = sgetc();	/* skip over plus sign */
	else if (ch == '-') {
		ch = sgetc();
		++m_sign;
	}
	/* convert mantissa */
	if ((ch != '.') && (ch < '0' || ch > '9')) {
		error(XPFLOAT);
		return (FALSE);
	}
	while ((ch == '.') || (ch >= '0' && ch <= '9')) {
		if (ch == '.') {
			if (decpt) {
				error(XPFLOAT);
				return (FALSE);
			}
			else
				++decpt;
		}
		else {
			if (mant2 < 214748364) {	/* check for overflow */
				
				/* multiply mant1, mant2 by 10 */
				dplshift(&mant1,&mant2,1);
				tmp1 = mant1;
				tmp2 = mant2;
				dplshift(&mant1,&mant2,2);
				dpadd(&mant1,&mant2,tmp1,tmp2);

				/* add in the digit */
				dpadd(&mant1,&mant2,(WORD)ch-'0',(WORD)0);
			}
			else	/* ignore digit, bump exponent */
				++c_exp;
			
			if (decpt)
				--c_exp;
		}
	ch = sgetc();
	}
	/* check for an exponent */
	if (ch == 'e' || ch == 'E') {
		ch = sgetc();

		/* get optional exponent sign */
		if (ch == '+')
			ch = sgetc();
		else if (ch == '-') {
			++e_sign;
			ch = sgetc();
		}
		if (ch < '0' || ch > '9') {
			error(XPFLOAT);
			return (FALSE);
		}
		while (ch >= '0' && ch <= '9') {
			if (d_exp < 214748364)
				d_exp = d_exp*10 + (ch - '0');
			ch = sgetc();
		}
	}
	/* put back last character so gettok() can use it */
	sungetc();

	if (e_sign)
		d_exp = -d_exp;
	d_exp += c_exp;		/* add calculated exponent from decimal point */

	/* calculate binary exponent */
	if (mant1 || mant2) {	/* no calculation if mantissa is zero */
		if (d_exp >= 0) {	/* scale up */
			while (d_exp != 0) {
				--d_exp;
				while (mant2 > 429496729) {	/* shift down */
					dprshift(&mant1,&mant2,1);
					++exp;
				}
				tmp1 = mant1;
				tmp2 = mant2;
				dplshift(&mant1,&mant2,2);
				dpadd(&mant1,&mant2,tmp1,tmp2);
				++exp;
			}
		}
		else {	/* scale down */
			while (d_exp != 0) {
				++d_exp;
				while ((mant2 & 0x80000000) == 0) {
					dplshift(&mant1,&mant2,1);
					--exp;
				}
				tmp1 = 0;	/* set remainder to 0 */
				tmp2 = 0;
				for (i=0; i<64; ++i) {	/* do divide */
					qplshift(&tmp1,&tmp2,&mant1,&mant2,1);
					dpsub(&tmp1,&tmp2,(WORD)5,(WORD)0);
					if (tmp2 & 0x80000000)
						dpadd(&tmp1,&tmp2,(WORD)5,
						    (WORD)0);
					else
						mant1 |= 1;
				}
				--exp;
			}
		}
		while ((mant2 & 0x80000000) == 0) { /* now normalize value */
			dplshift(&mant1,&mant2,1);
			--exp;
		}
		dplshift(&mant1,&mant2,1); /* shift one more time to get 1.f */
		--exp;
		exp += 64;
		if (type == FLOAT) {
			Floatval1 = m_sign << 31;
			if (exp > 127 || exp < -128)
				error(EOOR);
			exp += 127;	/* bias by 127 for float */
			Floatval1 |= (exp & 0xff) << 23;
			Floatval1 |= (mant2 & 0xfffffe00) >> 9;
		}
		else {	/* convert a double */
			Floatval1 = m_sign << 31;
			if (exp > 1023 || exp < -1024)
				error(EOOR);
			exp += 1023;	/* bias by 1023 for double */
			Floatval1 |= (exp & 0xfff) << 20;
			Floatval1 |= mant2 >> 12;
			Floatval2 = (mant2 & 0xfff) << 20;
			Floatval2 |= mant1 >> 12;
		}
	}
	return (TRUE);
}




/* dplshift -- double precision left shift */
dplshift(value1,value2,amount)

WORD *value1, *value2;
int amount;
{
	int carry;

	while (amount--) {
		carry = ((*value1 & 0x80000000) != 0);
		*value1 <<= 1;
		*value2 <<= 1;
		*value2 |= carry;
	}
}



/* dprshift -- double precision right shift */
dprshift(value1,value2,amount)

WORD *value1, *value2;
int amount;
{
	int carry;

	while (amount--) {
		carry = ((*value2 & 0x1) != 0);
		*value2 = (*value2 >> 1) & 0x7fffffff;
		*value1 = (*value1 >> 1) & 0x7fffffff | (carry << 31);
	}
}



/* qplshift -- quad precision left shift */
qplshift(value1,value2,value3,value4,amount)

WORD *value1, *value2, *value3, *value4;
int amount;
{
	int carry1, carry2, carry3;

	while (amount--) {
		carry1 = ((*value3 & 0x80000000) != 0);
		carry2 = ((*value4 & 0x80000000) != 0);
		carry3 = ((*value1 & 0x80000000) != 0);
		*value3 <<= 1;
		*value4 <<= 1;
		*value4 |= carry1;
		*value1 <<= 1;
		*value1 |= carry2;
		*value2 <<= 1;
		*value2 |= carry3;
	}
}



/* dpadd -- double precision add */
dpadd (dest1,dest2,src1,src2)

WORD *dest1, *dest2, src1, src2;
{
	int carry, cin31, a31, b31;

	cin31 = ((((*dest1&0x7fffffff) + (src1&0x7fffffff)) & 0x80000000) != 0);
	a31 = ((*dest1 & 0x80000000) != 0);
	b31 = ((src1 & 0x80000000) != 0);
	carry = ((a31 + b31 + cin31) > 1);

	*dest1 += src1;
	*dest2 += src2 + carry;
}



/* dpsub -- double precision subtraction */
dpsub (dest1,dest2,src1,src2)

WORD *dest1, *dest2, src1, src2;
{
	int borrow, bin31, a31, b31;

	bin31 = ((((*dest1&0x7fffffff) - (src1&0x7fffffff)) & 0x80000000) != 0);
	a31 = ((*dest1 & 0x80000000) != 0);
	b31 = ((src1 & 0x80000000) != 0);
	borrow = ((a31 - b31 - bin31) < 0);
	*dest1 -= src1;
	*dest2 -= src2 + borrow;
}


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)