[comp.sys.sun] Sun-Spots Digest, v6n28

Sun-Spots-Request@RICE.EDU (William LeFebvre) (03/15/88)
SUN-SPOTS DIGEST          Monday, 14 March 1988        Volume 6 : Issue 28

Today's Topics:
              Re: Multi-process debugging using Dbxtool (2)
                          Re: Ethernet problems
                            Re: nfsd problems
                Re: rcp and rlogin hang in 3.4EXPORT, fix
                  TCP packet size bug in 3.4 AND 3.5 (2)
                    rasterfile(5) to Postscript filter
                     Strange Ethernet error messages
                   Reasonable Ethernet collision rates?
                  Question about Sun 3/160 VME bus speed
               Connecting an HP LaserJet II to a SUN 4/280?
              Anyone using a Sun 4 for central time sharing?
                    Any problems in upgrading to 3.5?

Send contributions to:  sun-spots@rice.edu
Send subscription add/delete requests to:  sun-spots-request@rice.edu
Bitnet readers can subscribe directly with the CMS command:
    TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name
Recent backissues are stored on "titan.rice.edu".  For volume X, issue Y,
"get sun-spots/vXnY".  They are also accessible through the archive
server:  mail the word "help" to "archive-server@rice.edu".

----------------------------------------------------------------------

Date:    Thu, 3 Mar 88 10:03:16 +0100
From:    Danny Backx <mcvax!prlb2!kulcs!dannyb@uunet.uu.net>
Subject: Re: Multi-process debugging using Dbxtool (1)

>From:    ha@purdue.edu
>...That is, at any time, there could be as many instances of Dbxtool
> as there are active processes belonging to the program.

This is NOT what is meant in the paper.
What you _can_ do is this :
- when some process is running, it is possible to attach a dbx process to it,
  which clearly doesn't have to be the parent...
- when you are debugging a process, you can allow it to continue on its own.

The commands for this are (inside dbx, of course) 'debug' and 'detach'.
NOT 'attach' as both the manual and the paper say.

What you also can NOT do, and which is VERY annoying, is to single-step or
debug a process while it is fork()-ing.  (more general : no file can be
debugged if more than one process is runnung it)

Now this is something the people at SUN should fix. Multi-process
debugging is almost impossible due to this. Moreover, it shouldn't be that
hard to fix.  All they have to do is really duplicate the process in
memory when it forks, and if it is being debugged. (And also when you
start debugging it, and multiple processes are running the same core...)

	Danny Backx

 Danny Backx                           | mail: Katholieke Universiteit Leuven 
 Tel: +32 16 200656 x 3537                     Dept. Computer Science
 E-mail: dannyb@kulcs.UUCP             |        Celestijnenlaan 200 A
         ... mcvax!prlb2!kulcs!dannyb  |        B-3030 Leuven
         dannyb@kulcs.BITNET           |        Belgium     

------------------------------

Date:    Wed Mar  2 17:14:14 1988
From:    David_T_Lawlor@cup.portal.com
Subject: Re: Multi-process debugging using Dbxtool (2)

To debug processes that are not the children of dbxtool I do the
following:

Run the process to debug.  While its running -- or waiting for input, get
into a dbxtool window (or dbx)

Use the command "debug progname <pid>".  You can get the pid from the "sh
ps ax" command inside dbx

dbx then attaches itself to the process.

You can then hit ^C to stop the process.

use dbx as usual.

To finish use the command "detach" to let go of the process. If you don't,
and use the quit command, the process you were debugging dies.

davel@cup.portal.com

------------------------------

Date:    Mon, 29 Feb 88 08:09:28 EST
From:    steve@cs.umd.edu (Steven D. Miller)
Subject: Re: Ethernet problems
Reference: v6n20

The behavior you're seeing, and the frequency with which it occurs, makes
me think that you're having what is called an Ethernet meltdown.

Meltdown behavior is generally caused by using different broadcast
addresses on your network.  The change from all-zeroes (4.2BSD convention)
to all-ones (4.3BSD and DARPA standard convention) is enough to cause the
problem, as is the change from unsubnetted broadcasts to subnetted
broadcasts.

Given the 4.2BSD/SunOS TCP/IP implementation (at least as of SunOS 3.2;
later revisions may be fixed, I don't know for sure), any machine that
sees a broadcast packet it doesn't recognize will try to forward that
packet.  An example scenario might be:

	Host sees broadcast to 128.8.128.255
	Host thinks broadcast address is 128.8.0.0, and says,
		"gee, that's not a broadcast packet, so I should
		 forward it!"
	Host does ARP for 128.8.128.255.  Of course, no one responds.  At
		least, no one should respond...
	Host keeps ARPing potentially a number of times until it finally
		gives up.

Multiply this by fifty hosts, all pounding on the Ethernet roughly
simultaneously (since they all see the different broadcast at the same
time), and you get a period during which the Ethernet is unusable.

Hosts simply should not forward packets of any sort, and they certainly
should not *under any circumstances* forward a broadcast packet.  I don't
think that gateways should ever forward broadcasts, either; my reasoning
is identical to that in RFC 1009.  (This document is a gem when it comes
to telling TCP/IP implementors and administrators what to do and not to
do, and what strange things can happen as a result.)

Of course, there is this nice kernel variable "ipforwarding" which can be
used to disable forwarding and which you might think can be used to stop
this antisocial behavior.  Guess again.  In a 4.2BSD system, if you turn
off ipforwarding, all that will happen is that you'll swap ICMP Network
Unreachable messages for ARPs (at a possible packet savings, as you'll
definitely only get one ICMP message per broadcast, while you may get more
than one ARP).  In fact, these ICMP messages will erroneously have the
differing broadcast address as their source address, due to a quirk in the
4.2BSD implementation.  This makes it rather difficult to find the
perpetrator unless you know which machines have which Ethernet
addresses...

You might be able to verify whether a meltdown is occuring by:

	jello# etherfind -proto icmp -o -arp

If you see big bursts of ARPs or big bursts of ICMP Network Unreachable
messages, you've got a meltdown.  I'd suspect that it's one of the
"oddball" machines on your net that is causing the problem.  The correct
broadcast address to use is your network number with all-ones in the host
field, with modification if you're using subnets.  (I.e., we at Maryland
would broadcast on 128.8.255.255 if we weren't using an additional eight
bits of subnet mask; we do broadcast on, for example, 128.8.128.255 on
subnet 128.)  You should use the same thing everywhere, though, even if
this means using an incorrect broadcast address.

I hope this helps.

	-Steve

Spoken: Steve Miller    Domain: steve@mimsy.umd.edu    UUCP: uunet!mimsy!steve
Phone: +1-301-454-1808  USPS: UMIACS, Univ. of Maryland, College Park, MD 20742

[[ Well, this was not the problem that Benson was experiencing, but the
problem described here is sufficiently catastrophic (we've had it happen
here at Rice) that I decided to include this message anyway.  --wnl ]]

------------------------------

Date:    Mon, 29 Feb 88 09:02:37 EST
From:    Steve D. Miller <steve@brillig.umd.edu>
Subject: Re: nfsd problems
Reference: v6n20

We've seen similar behavior (strange hangings of nfsd instances in disk
wait) here.  With Chris Torek to do the thinking and me to do the grunt
work of tracing kernel data structures, we finally tracked the problem
down to a very subtly corrupted inode on the disk.  Fsck didn't see
anything wrong with it, but there was something about it that caused nfsd
to endless loop whenever it touched that inode.  (I don't remember exactly
what the problem was.  I think it had to do with a corrupted directory.
Chris, do you remember?)

You might be able to reproduce the problem at will by NFS mounting all
your server's partitions on a client (say at /tmp/foo), and then running a
find(1) to look at each directory.  ("find /tmp/foo -name X-X-X" will
probably do it.  It was a find run on an NFS partition that first brought
the problem to our attention.)  If your NFS daemons hang, you've probably
got the same problem.

The cure was not for the faint of heart.  We looked at the output of ps
axl to find what address the nfsds were waiting on, and what priority they
were waiting at.  We made a guess as to what it was exactly that they were
waiting on data-structure-wise (sources come in handy here, as you can see
what in the NFS and filesystem code waits on particular things at a given
priority, and guess from that what data structure the wait channel might
represent), then turned that structure back into an inode number.  We then
used ncheck to find the files (actually, I think, directories)
corresponding to those inodes, and carefully did a ls -li on the
appropriate directories (the normal filesystem worked still) to figure out
what inodes in those directories corresponded to which files.  We then ran
clri on the appropriate inodes, ran fsck to put the new orphans into
lost+found, and used the ls -li to put the files in lost+found back into
their proper places.

The problem has not reoccurred in more than a year of operation.  I think
it was surmised at the time that it was a problem with the statelessness
of NFS.  (Races can occur when doing some directory operations, or so I've
heard.)

I realize that my description is pretty vague.  Like I said, it's been a
long time.  I hope this is enough to lead you (or someone else) down the
right path, though...

	-Steve

Spoken: Steve Miller    Domain: steve@mimsy.umd.edu    UUCP: uunet!mimsy!steve
Phone: +1-301-454-1808  USPS: UMIACS, Univ. of Maryland, College Park, MD 20742

------------------------------

Date:    29 Feb 88 10:22 -0800
From:    Dan Razzell <razzell%vision.ubc.cdn@ean.ubc.ca>
Subject: Re: rcp and rlogin hang in 3.4EXPORT, fix
Reference: v6n17, v6n25

We experienced this too. It certainly seems to be fixed in the 3.5 kernel,
but I'm not sure exactly what did it.  There is a note in the Release 3.5
Manual in section 5.4, "TCP/IP File Transfer Hangs" which may pertain.

You don't have to install the entire 3.5 distribution to get the fix.
It's enough to extract the root, system, and network tar files, and from
them copy the various TCP/IP daemons and build a new kernel.

------------------------------

Date:    Tue, 1 Mar 88 14:31:40 PST
From:    S.C.Blair <ascway!scb@spar-20.spar.slb.com>
Subject: TCP packet size bug in 3.4 AND 3.5 (1)

The offensive bug that caused Sun's to talk in 512 byte packets is back
again.  A confirmation call & subsequent e-mail from Ray Jang there
confirmed it's STILL (groan) there. The fix is as follows:

adb -w /vmunix                                   
        tcp_mss+0xac?w 400                               
        tcp_mss+0xac: 0x200 = 0x400                      
        tcp_mss+0xbc?w 400                               
        tcp_mss+0xbc: 0x200 = 0x400
^D

*for those feeling brave in the kernel(warning) you can use 'adb' in this
manner:		   --------------------=======

adb -w -k vmunix /dev/kmen

This fix will eliminate the need to reboot the machine after doing the
patch.  The mail from Ray also indicated that some of SUN'S OWN machines
had the patch installed, and some didn't.  [[ Serves 'em right!  --wnl ]]
Being an Ex-Sunner, I am surprised that this SIMPLE fix doesn't appear in
the 3.5 kernal.

Steve Blair
Schlumberger Technology Corp-Austin, tx
uucp: {backbone}!sun!decwrl!spar!ascway!blair

------------------------------

Date:    Thu, 3 Mar 88 11:54 EST
From:    SYSRUTH@UTORPHYS.BITNET
Subject: TCP packet size bug in 3.4 AND 3.5 (2)

This bug originally cropped up in 3.4 of SUN UNIX, and a patch for it was
broadcast on this list by SUN.  For those of you patiently awaiting the
arrival of 3.5 so you won't have to worry about it any more:

*** It is NOT fixed in 3.5 ***

We installed several diskless 3/50's, served off a SUN 4, which requires
3.5 to be running on the 3's. We did the software installation straight
off tape, and lo and behold, the generic kernel, the kernel we made for
the 3/50's (and the 3/110 we are also serving) all show:

# adb /pub/vmunix
tcp_mss+0xac?x
_tcp_mss+0xac:  200
tcp_mss+0xbc?x
_tcp_mss+0xbc:  200

The number should be x400, i.e. 1024 bytes, not 512. The same patch as for
3.4 will still work (i.e. tcp_mss+0xac?w 400  and same for +0xbc).

We got these tapes in mid-January. 3.5 was released at the beginning of
December.

Does anybody know what module the error is in, and how to patch it so that
new makes of the kernel will already have the patch applied? I am
gradually learning my way around UNIX, but adb is still mostly out of my
ken.

Thanks.

Ruth Milner                        (preferred) BITNET: sysruth@utorphys
Systems Manager                     InterNet: sysruth@helios.toronto.edu
University of Toronto Physics

------------------------------

Date:    Sat, 27 Feb 88 20:31:10 +0200
From:    leonid@TAURUS.BITNET
Subject: rasterfile(5) to Postscript filter

Here is a program to convert rasterfile(5) images into PostScript for a
LaserWriter.  This is a version of psraster I wrote.  There should be no
trouble compiling and installing it, except the tabs.  Since this will be
posted through BITNET, tabs will disapear, so please make sure to
reconstruct the Makefile before attempting to compile.

If you write any fixes into this program, or have any suggestion for
imptovement please email them to me so I can incorporate them into the
next release.

Leonid
E-Mail: leonid@taurus.BITNET, @cunyvm.cuny.edu:leonid@Math.Tau.Ac.IL

[[ The shar file has been placed in the archives as
"sun-source/psraster.shar" and is 13495 bytes long.  It can be retrieved
via anonymous FTP from the host "titan.rice.edu" or via the archive server
with the request "send sun-source psraster.shar".  For more information
about the archive server, send a mail message containing the word "help"
to the address "archive-server@rice.edu".  --wnl ]]

------------------------------

Date:    Mon, 29 Feb 88 12:54:35 PST
From:    Jonathan Eisenhamer <jon@mira.astro.ucla.edu>
Subject: Strange Ethernet error messages

Recently, the following appeared on the console of our Sun 3/50:

le0: Received packet with ENP bit in rmd cleared
le0: Received packet with STP bit in rmd cleared

There were about a dozen pairs of these messages.  They have not happened
before or since.  Just wondering what they mean.

Thanks,
Jonathan Eisenhamer
UCLA Astronomy
jon@mira.astro.ucla.edu
jon@uclastro.bitnet

------------------------------

Date:    Mon, 29 Feb 88 14:47:23 EST
From:    smb@research.att.com
Subject: Reasonable Ethernet collision rates?

What are reasonable Ethernet collision rates?  What about input and output
errors?  One one net, where we have a single Sun 3/280, and a bunch of
4.3bsd VAXen, the Sun is running about .02%, and the VAXen are at .3% --
an order of magnitude difference.  On the other hand, most (but not all)
of the VAXen show no input or output errors (on DEUNAs or DELUAs), while
the Sun shows a modest number of each.  (Output errors are running at
.0025%; input errors are .001%).  Virtually all of this traffic would be
rcp/rlogin stuff (with some admixture of broadcasts from rwhod and
routed).

On a second cable, the situation is rather different.  We have two file
servers and a plethora of clients, wired up using Thinwire cable and DEC's
DECConnect scheme (i.e., one station per thinwire segment, all
interconnected via DEMPRs and DELNIs).  The two file servers, a LANBridge,
and two DELNIs are directly on a piece of thick coax.  On that net, we're
seeing collision rates of .6% or thereabouts.  Output errors run about 0%
on the clients and .05% on the servers; input errors range from .01% to
.06% on the clients and .01% on the server.  Most of the clients are not
truly diskless; they have small local disks for root, /tmp, swap, and some
local bin directories; thus, there's much less NFS traffic than normal.

So -- are the rates I'm seeing too high?  What constitutes an 'output
error'?  That, I assume, could differ for different Ethernet controllers;
we seem to have ie and le controllers.

--Steve Bellovin
{ihnp4,ucbvax,allegra}!ulysses!smb
smb@ulysses.att.com

------------------------------

Date:    Mon, 29 Feb 88 10:45:08 EST
From:    Ned Danieley <ndd@sunbar.mc.duke.edu>
Subject: Question about Sun 3/160 VME bus speed

We are using an IKON 10089 DRV11 emulator to do asynchronous 16 bit word
DMA transfers to a Sun 3/160. The IKON specs suggest that it will run at
up to 2 Meg transfers/second if the bus arbitrator can keep up at that
rate. We appear to get only about 650 K though. This is on a very lightly
loaded system: basically nothing is happening except for the data
acquisition. Should we be seeing a higher rate, or is this a reasonable
speed for the Sun bus?

Ned Danieley (ndd@sunbar.mc.duke.edu)
Basic Arrhythmia Laboratory
Duke University Medical Center
Durham, NC  27710
(919) 684-6807 or 684-6942

------------------------------

Date:    29 February 1988 10:14:59 CST
From:    Steven G. Krantz <C31801SK@WUVMD.BITNET>
Subject: Connecting an HP LaserJet II to a SUN 4/280?

Does anyone know anything about hooking a Hewlett-Packard Laser Jet II to
a SUN Model 4/280?  The serial port on the 280 seems to be flawed (is this
right) and we can't figure out how to configure the cable (certainly the
manual that comes with the Laserjet doesn't give a clue).  Thanks for any
help.

Steven G. Krantz
Washington University in St. Louis

------------------------------

Date:    29 Feb 88 16:11:09 GMT
From:    arnold@emory.UUCP (Arnold D. Robbins {EUCC})
Subject: Anyone using a Sun 4 for central time sharing?

I am the Unix Systems Programmer for the Emory University Computing
Center.  We provide central computing resources to the entire Emory
University campus.  Currently, we provide Unix on two (rapidly aging) Vax
780s, running Mt.  Xinu's educational 4.3 + NFS (which I highly recommend,
by the way). We mostly support Computer Science instruction, but also have
a number of other departments using the Unix machines for research and/or
word processing. At peak load we tend to see about 28 users logged on at
once, doing student type things, editing, compiling, running, debugging.

We are planning on replacing the two 780's with a single Sun 4.  Is there
anyone out there who has already done something like this?  In other
words, if you have a Sun 4 supporting 48 or more simultaneous logins on
real terminals (not rlogins), I would like to hear from you.  Is it
working poorly, well? How much memory and disk must I have? Any and all
information about what you've done would be appreciated.

(We are not planning on using the Sun 4 to serve any workstations, just as
a central timesharing machine.)

Thanks in advance,

Arnold Robbins
ARPA, CSNET:	arnold@emory.ARPA	BITNET: arnold@emory
UUCP: { decvax, gatech, }!emory!arnold	DOMAIN: arnold@emory.edu (soon)

------------------------------

Date:    Tue, 1 Mar 88 10:27:17 PST
From:    mordor!tolerant!vsi1!lmb@ut-sally.UUCP (Larry Blair)
Subject: Any problems in upgrading to 3.5?

We are currently running SunOS 3.4.  As an OEM, distributing 3.4 with our
new shipments is a nightmare (7 tapes).  Sun has been pushing us to use
3.5.  I have several questions aimed at those sites that have upgraded:

We have heard thru the grape-vine that there are problems with 3.5.  The
only one that I've seen published in Sun-Spots is the atrun problem.  What
has been your experience with bugs fixed vs. bugs created?  In short, is
this upgrade worth doing?

According to Sun, 3.5 is released as a full release, rather than an
upgrade.  What problems did you encounter when trying to upgrade existing
systems?  How did you prevent the loss of locally modified system files?
Sun always seems to assume that they can clobber any files that they want
to.

*   *   O     Larry Blair
  *   *   O   VICOM Systems Inc.     sun!pyramid----\
    *   *   O 2520 Junction Ave.     uunet!ubvax-----!vsi1!lmb
  *   *   O   San Jose, CA  95134    ucbvax!tolerant/
*   *   O     +1-408-432-8660

------------------------------

End of SUN-Spots Digest
***********************