rick (10/20/82)
A friend of mine is running System III, but is not a member of USENET. As such he asked me to relay the following problem for him so take my interpretation with a grain of salt. He is trying to port System III to both 11/24's and 11/44's and in doing so he was able to simplify an example to the following code in 'main.c' of the kernel: printf("does 340224 = %d\n", (long) 340224); On the 11/24 this results in: does 340224 = 327684 and on the 11/44 the result is: 340224 = 34012E Notice also that the word "does" does not appear in the output. Apparently the problem can be traced to "lrem" in "math.s". Any thoughts on this one besides hardware finger pointing? Richard L. Maus, Jr. (Rick) (BTL) HO 1G501 x4021 (201-949-4021) ...!houca!rick
jhh (10/20/82)
I ran into this problem several years ago, and found the problem, but never found the solution. The problem, as I remember it, is that lrem.s or ldiv.s uses the result of an operation that is listed as undefined in the processor handbook. The 11/70 executes the instruction `correctly', i.e. the way the code expects it to, while an 11/34 does not. I believe the offending instruction is a divide where an overflow occurs. The two processor handbooks I have do not list anything as undefined, but they do say the operation is aborted. John Haller
lepreau (10/21/82)
I expect this is the solution. It gave me a hell of a time one night a year or so ago when some date conversion routines suddenly started breaking. This one is worth saving... it is wrong in 2.8 bsd too. Kudos to Henry and the folks in Canada. -Jay Lepreau >From harpo!mhtsa!ihnss!ucbvax!decvax!utzoo!henry Tue Dec 1 06:46:23 1981 Subject: ldiv bug fix Newsgroups: net.v7bugs The ldiv bug reported by watmath!egisin seems to be yet another bit of bad behavior in ldiv that is fixed by the div-abort bugfix I put out on net.v7bugs in May. At least, his example works fine here. >From randvax!decvax!utzoo!henry Fri Dec 11 04:37:29 1981 Subject: ldiv bug Following are the two articles I sent describing the bug and the fix. It also seems to fix yet another bug: (unsigned)32768/1 supposedly used to screw up. Henry Spencer Autzoo.643 NET.v7bugs utzoo!henry Thu May 21 20:22:51 1981 ldiv/lrem on 44,23 The V7 long-int divide and remainder routines, as distributed by Bell, make an invalid assumption about the DIV instruction on the 11. DIV aborts if the quotient is too big for a signed 16-bit number; the routines assume that the dividend register pair is untouched afterwards. This was generally true on early 11's, but Dec has never guaranteed it and it is NOT TRUE on the 11/44. 111111111 [that's nine 1's] % 10 yields 11. The fix is fairly easy; it's the same fix for all six occurrences: the libc ldiv, lrem, aldiv, and alrem, and the kernel ldiv and lrem in mch.s . Look for a DIV followed by a BVC. If the BVC falls through, r0 and r1 must be put back as they were before the DIV. Specifically: 1. Before DIV, add "mov r0,-(sp)". 2. After BVC, add "mov r2,r1" and "mov (sp),r0". 3. After label "1", about 6 lines down, add "tst (sp)+". The fixed-up routines function properly whether the particular cpu manifests the problem or not, so this fix can be universally applied. This also may have cured the largest-negative-dividend bug that the V7-addenda-tape README alludes to; at least, I can't reproduce said bug. Another local installation has discovered that similar divide anomalies occur on the 11/23 and can be cured with the same fixes. I do not have a test case for the 23; 111111111%10 works fine on it. Autzoo.681 NET.v7bugs utzoo!henry Tue Jun 2 16:08:27 1981 ldiv/lrem on 23, ch.2 Addendum to my previous NET.v7bugs message about the ldiv/lrem bug on the 44 and 23: the people who discovered that the problem also occurs on the 23 have supplied test cases. If you want to see the problem on a 23, try 11335500/100 or 11335577%100 . Thanks to Chris Sturgess and Ron Gomes of Human Computing Resources for this one.
dennis (10/22/82)
There is an obscure bug in the 11/44 float box which causes the stexp instruction to misbehave when used with auto-increment or -decrement addressing modes. The register is changed by the size of the current float word instead of by 2. We reported this error to DEC over a year ago, and heard that there was a fix out, but never heard anything more. This might cause at least some of the problem you are seeing.
larry (10/25/82)
UNIX System III is already ported to the PDP-11/44. I did it about a year ago and it has been made available to the world at large through AT&T in Greensboro, NC. You may want to contact them for the appropriate tapes and documentation. I also added an RK06/7 driver, fixed a bug (I think I did) in the RL01/2 driver, changed the TE16 driver to work as a UNIBUS device, and updated all relevant documentation (AT&T doesn't print these changes, but they are in the machine readable form of the tape). In relation to your question, I came accross the same problem, but that phase of the project was changed such that the VAX-11/730 became the target machine, so a solution was not pursued. Since then, I've left Western Electric and no longer have access to a PDP-11/44. Internally, AT&T also has the Version 7 system I moved to the 44 and UNIX 4.0. After that, I lost interest as has the Bell System in general with PDP-11s. I should also state that the system did work well on the 44 (outside of the problem indicated), but Berkeley 2.? for PDP-11s is a much more usable system and I would prefer it wherever possible. Larry Rogers purdue!ncrday!larry
tj@Okc-Unix@sri-unix (10/27/82)
From: Cal Thixton <tj@Okc-Unix> Date: 26 Oct 1982 1:06:20 EST (Tuesday) Try using %D or %ld instead of the %d. %d is for 16 bit quantities, not 32, which is what you are passing. Cal Thixton