[net.unix-wizards] 11/24 and 11/44 printf problems

rick (10/20/82)

  A friend of mine is running System III, but is not a member of USENET.
As such he asked me to relay the following problem for him so take my
interpretation with a grain of salt.

  He is trying to port System III to both 11/24's and 11/44's and in doing
so he was able to simplify an example to the following code in 'main.c'
of the kernel:

	printf("does 340224 = %d\n", (long) 340224);

  On the 11/24 this results in:

	does 340224 = 327684

and on the 11/44 the result is:

	340224 = 34012E

Notice also that the word "does" does not appear in the output.  Apparently
the problem can be traced to "lrem" in "math.s".  Any thoughts on this one
besides hardware finger pointing?

				Richard L. Maus, Jr. (Rick)
				(BTL) HO 1G501 x4021 (201-949-4021)
				...!houca!rick

jhh (10/20/82)

I ran into this problem several years ago, and found the problem,
but never found the solution.  The problem, as I remember it,
is that lrem.s or ldiv.s uses the result of an operation that is
listed as undefined in the processor handbook.  The 11/70
executes the instruction `correctly', i.e. the way the code
expects it to, while an 11/34 does not.  I believe the
offending instruction is a divide where an overflow occurs.
The two processor handbooks I have do not list anything as
undefined, but they do say the operation is aborted.

				John Haller

lepreau (10/21/82)

I expect this is the solution.  It gave me a hell of a time one night a
year or so ago when some date conversion routines suddenly started
breaking.  This one is worth saving... it is wrong in 2.8 bsd too. Kudos
to Henry and the folks in Canada.
-Jay Lepreau

>From harpo!mhtsa!ihnss!ucbvax!decvax!utzoo!henry Tue Dec  1 06:46:23 1981
Subject: ldiv bug fix
Newsgroups: net.v7bugs
The ldiv bug reported by watmath!egisin seems to be yet another bit of
bad behavior in ldiv that is fixed by the div-abort bugfix I put out on
net.v7bugs in May.  At least, his example works fine here.

>From randvax!decvax!utzoo!henry  Fri Dec 11 04:37:29 1981
Subject: ldiv bug

Following are the two articles I sent describing the bug and the fix.
It also seems to fix yet another bug:  (unsigned)32768/1 supposedly
used to screw up.

						Henry Spencer

Autzoo.643
NET.v7bugs
utzoo!henry
Thu May 21 20:22:51 1981
ldiv/lrem on 44,23
The V7 long-int divide and remainder routines, as distributed by Bell,
make an invalid assumption about the DIV instruction on the 11.  DIV
aborts if the quotient is too big for a signed 16-bit number;  the
routines assume that the dividend register pair is untouched afterwards.
This was generally true on early 11's, but Dec has never guaranteed it
and it is NOT TRUE on the 11/44.  111111111 [that's nine 1's] % 10 yields 11.

The fix is fairly easy;  it's the same fix for all six occurrences: the 
libc ldiv, lrem, aldiv, and alrem, and the kernel ldiv and lrem in mch.s .
Look for a DIV followed by a BVC.  If the BVC falls through, r0 and r1
must be put back as they were before the DIV.  Specifically:

	1. Before DIV, add "mov r0,-(sp)".
	2. After BVC, add "mov r2,r1" and "mov (sp),r0".
	3. After label "1", about 6 lines down, add "tst (sp)+".

The fixed-up routines function properly whether the particular cpu manifests
the problem or not, so this fix can be universally applied.

This also may have cured the largest-negative-dividend bug that the
V7-addenda-tape README alludes to;  at least, I can't reproduce said bug.

Another local installation has discovered that similar divide anomalies
occur on the 11/23 and can be cured with the same fixes.  I do not have
a test case for the 23;  111111111%10 works fine on it.

Autzoo.681
NET.v7bugs
utzoo!henry
Tue Jun  2 16:08:27 1981
ldiv/lrem on 23, ch.2
Addendum to my previous NET.v7bugs message about the ldiv/lrem bug on
the 44 and 23:  the people who discovered that the problem also occurs
on the 23 have supplied test cases.  If you want to see the problem on
a 23, try  11335500/100  or  11335577%100  .  Thanks to Chris Sturgess
and Ron Gomes of Human Computing Resources for this one.

dennis (10/22/82)

There is an obscure bug in the 11/44 float box which causes
the stexp instruction to misbehave when used with auto-increment or -decrement
addressing modes.  The register is changed by the size of the current
float word instead of by 2.  We reported this error to DEC over
a year ago, and heard that there was a fix out, but never heard anything more.
This might cause at least some of the problem you are seeing.

larry (10/25/82)

UNIX System III is already ported to the PDP-11/44.  I did it about a year
ago and it has been made available to the world at large through AT&T in
Greensboro, NC.  You may want to contact them for the appropriate tapes
and documentation.  I also added an RK06/7 driver, fixed a bug (I think I
did) in the RL01/2 driver, changed the TE16 driver to work as a UNIBUS device,
and updated all relevant documentation (AT&T doesn't print these changes,
but they are in the machine readable form of the tape).

In relation to your question, I came accross the same problem, but that phase
of the project was changed such that the VAX-11/730 became the target machine,
so a solution was not pursued.  Since then, I've left Western Electric and
no longer have access to a PDP-11/44.

Internally, AT&T also has the Version 7 system I moved to the 44 and UNIX 4.0.
After that, I lost interest as has the Bell System in general with PDP-11s.
I should also state that the system did work well on the 44 (outside of the
problem indicated), but Berkeley 2.? for PDP-11s is a much more usable system
and I would prefer it wherever possible.

						Larry Rogers
						purdue!ncrday!larry

tj@Okc-Unix@sri-unix (10/27/82)

From: Cal Thixton <tj@Okc-Unix>
Date: 26 Oct 1982  1:06:20 EST (Tuesday)

Try using %D or %ld instead of the %d. %d is for 16 bit
quantities, not 32, which is what you are passing.

			Cal Thixton