[comp.sys.sun] Sun-Spots Digest, v6n238

Sun-Spots-Request@Rice.edu (William LeFebvre) (09/28/88)
SUN-SPOTS DIGEST        Monday, 26 September 1988     Volume 6 : Issue 238

Today's Topics:
                      Re: Shared Memory vs. Malloc()
                         Re: Problem with Sun TCP
              Serial line H/W flow control under 3.5 and 4.0
              stdio.h and dbm.h have different views of NULL
                   Problem: SLOW Boot on diskless Suns
              Device driver for Motorola MVME300 GPIB board?

Send contributions to:  sun-spots@rice.edu
Send subscription add/delete requests to:  sun-spots-request@rice.edu
Bitnet readers can subscribe directly with the CMS command:
    TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name
Recent backissues are available via anonymous FTP from "titan.rice.edu".
For volume X, issue Y, "get sun-spots/vXnY".  They are also accessible
through the archive server:  mail the request "send sun-spots vXnY" to
"archive-server@rice.edu" or mail the word "help" to the same address
for more information.

----------------------------------------------------------------------

Date:    Thu, 22 Sep 88 09:45:48 EDT
From:    clapper@nadc.arpa (Brian M. Clapper)
Subject: Re: Shared Memory vs. Malloc()

I ran into similar problems once, on a pure System V machine (not on a Sun).
Perhaps my experiences will help.  Marc Rochkind's _Advanced_UNIX_Programming_ 
(Prentice-Hall, 1985) provides good background to the problem.  Here's an
excerpt (p. 194):

	"You map (attach) a segment to your address space with shmat.  You
	can request a particular address by setting the addr argument
	appropriately.   This is important if you are also allocting memory
	dynamically with the brk or sbrk system calls ... because they won't
	go beyond an attached shared segment.  Unfortunately, in many
	applications, this constraint requires you to know exactly how memory
	is to be used, and to carefully assign regions to shared segments and
	dynamic allocation subroutines (such as malloc).  This make programs
	buggy, hard to maintain, and noportable.  If you don't care where the
	memory is mapped (lucky you!), you can use zero for addr and let the
	kernel pick an address.  Don't expect to be able to use brk or sbrk
	again, however.  If you've already called them, the kernel will
	respect their work and allocate the shared segment out of harm's way."

I ran into this problem while working on a System V-specific application,
so the issue of portability did not arise.  I considered several
solutions, both of which required knowledge of how memory was to be used.
The first was Rochkind's:  specify an address at which to attach shared
memory.  This solution requires that you calculate the end of the data
segment, and know how much room is available to expand the data segment.
You can locate the end of the data segment with the sbrk() system call,
passing it a '0' argument.  On p. 229 of the aforementioned book, Rochkind
provides a small subroutine calculate available space.  (Notice that it,
too, uses sbrk).

	long avail ()  /* return number of available bytes */
	{
		char *sbrk ();
		long ulimit ();

		return (ulimit (3, 0L) - (long) sbrk (0));
	}

With these two pieces of information, along with the size of the shared
memory segment you wish, you can allocate your shared memory as far away
from the current end of your data segment as possible.  Note shmat *may*
attach the shared memory a little closer to the end of your data segment
than you ask for, since it may try to align the segment on a segment
boundary.  If you really care about this, you can set some arbitrary lower
address limit and compare the return result from shmat with that limit.

A simpler, but slightly kludgier, way of doing it is to pre-allocate a
large area of memory with malloc, before you attach your shared memory
segment.  Estimate the maximum amount of memory your program needs to
malloc (and add a safety factor).  At the beginning of your program,
malloc that much memory.  Then free it.  Typically, malloc and free try to
minimize the number of system calls that are made.  Consequently, free
doesn't actually reduce the size of the data segment; it simply marks the
freed memory as available and adds it to a free list.  When you call
malloc, it examines the free list to determine whether it can satisfy your
request from there.  If it can't, then it attempts to enlarge the data
segment with an sbrk call.  So, by doing one super-large malloc at the
beginning, you force malloc to call sbrk to enlarge the data segment.
When you free it, you don't reduce the data segment, so you can attach
your shared memory segment without specifying an address and be reasonably
sure it is out of harm's way.  As in the previous solution, you might want
to calculate whether you have enough room to do all this first.  This
solution is simpler (in a sense), but it requires you to couple your code
to a presumed malloc/free behavior.  This is probably safe, but only you
can make that decision.

Finally, I did not get the impression that malloc can overlap the shared
memory segment.  If the shared memory segment is 'in the way' of malloc's
attempt to enlarge the data segment, the malloc will fail (because its
call to sbrk fails).  Your data area should not become corrupt (assuming
you are programming carefully :-)); you just won't be able to malloc.

Caveat: The above assumptions were verified on a VAX 11/785 running AT&T's
System V Release 2.0.  They should apply to SunOS, but I haven't experimented
with Sun's implementation of the System V IPC primitives.

By the way, I enthusiastically recommend Rochkind's book to anyone doing
any serious UNIX programming.  It is excellent.

Hope this helps.

Brian M. Clapper					clapper@nadc.arpa
Naval Air Development Center				(215) 441-2118
Warminster, PA 18974-5000

------------------------------

Date:    Thu, 22 Sep 88 09:00:12 PDT
From:    jrich@devnet4.hac.com (john richardson)
Subject: Re: Problem with Sun TCP

We have some users wired to a terminal server so they may connect to any
one of our Suns.  Their .login disables interrupts (initially we were
using 'onintr -' and now 'stty susp undef intr undef') to prevent them as
best as we can from getting to the operating system.  When their
connection is broken, either due to some failure or their hitting the
break key (the terminal server is configured to disconnect them when this
happens), the login csh runs away (soaks up as much cpu time as it can
get) until we kill it with SIGHUP.  We have noted, however, the in.telnetd
process that spawned the csh did go away on its own when the connection
was broken.

My uneducated (no source code) guess is that the in.telnetd process is
trying to kill its spawned child process with a SIGINTR, although it would
seem SIGHUP would be the proper signal to use in this situation.  The cpu
activity generated by the csh is a result of it repeatedly trying to read
from the closed socket and ignoring the errors (for robustness!).

I tried adding 'unset ignoreeof' to the .login.

We are running 3.2 on both Sun 2's and Sun 3's.  We have tried importing a
3.5 version of in.telnetd but this didn't seem to help.

Several items on the net recently seem to be complaining about problems
with various parts of Sun's TCP.  One question about whether Sun uses
keepalives may be the same problem.  One piece of information on the net
suggested that init does not reset some signals to SIG_DFL and so all its
children including logins may inherit a screwed up signal environment.  I
do not think this is a factor.  I renamed in.telnetd to in.telnetd- and
substituted a small program that did reset all signals to SIG_DFL prior to
execing in.telnetd- with no effect.

Here is a minimalist description of how to cause this problem.
1.  Save this file somewhere, say /tmp/xxx.
    It is a trimmed down version of the offending .login.

---------------cut here---------------------------
#	Login file for wire planners
#	@(#)planner.login	1.16	8/23/88
#
# Leave suspend keys undefined so user can't
#
if ( `stty speed` !~ [0-9]* ) stty 9600
term -m ":?hp"
reset
stty susp undef dsusp undef intr undef quit undef
# 
# Ask the user if they want to change their password 
#
echo "Do you wish to change your password?"
set answer = "$<"
if ("$answer" == "y") then        
    passwd
endif
#
#	Present menu of planning programs.
#
set choice="1"
while( "$choice" != "0" )
	clear
	echo "           Wire Planner Menu"
	echo ""
	echo "	0 = logout"
	echo "	1 = help"
	echo "	2 = Editor"
	echo "	3 = More"
	echo "	4 = Csh"
	echo ""
	echo -n "?"
	set choice="$<"
	switch ( "$choice" )
		case "0":
			breaksw
		case "1":
#
#			help
#
			echo "This is a modified version of the planner's menu"
			echo "used for testing the runaway csh problem"
			echo "and possible solutions."
			breaksw
		case "2":
			vi /etc/termcap
			breaksw
		case "3":
			more /etc/termcap
			breaksw
		case "4":
			/bin/csh
			breaksw
		default:
			echo "Unknown option $choice"
			breaksw
	endsw
	echo "Hit return to continue"
	set junk="$<"
end
logout
----------------------end of file------------------

2.  telnet to localhost and after logging in,
    source this file, /tmp/xxx.
3.  Answer the terminal type prompt appropriately,
    presumably answer no to changing your password,
    and then select any of options 2, 3, or 4.
4.  When the specified option is started,
    type control-] to interrupt telnet.
    Enter 'close' in response to the telnet prompt.
5.  If you have access to top, use it to watch the csh
    consume all the available cpu on your machine.

Naturally, other applications running on the affected machine are adversed
affected, so this can be a debilitating nuisance.  Currently, we are
dealing with the problem by running a program cloned from wnl's top that
looks for csh processes with excessive cpu usage and sending SIGHUP's to
them.  It is effective, but not ideal.

Any input would be welcome.  In particular, any comments about whether
this is observed in 3.4 or 3.5 (or I suppose 4.0 although this is probably
8 or 9 months away for us).  Thank you.

John Richardson 
Hughes Aircraft Company
jrich@devnet4.hac.com
(714) 732-5588

------------------------------

Date:    Thu, 22 Sep 88 09:21:26 EDT
From:    paul@morganucodon.cis.ohio-state.edu (Paul Placeway)
Subject: Serial line H/W flow control under 3.5 and 4.0

I am interested in getting hardware flow control (RTS/CTS handshaking) to
work on Sun 3/180s (and probably their 3/50 clients) running 3.5.1 or 4.0,
and on a Sun 4/280 running 4.0.  Primerily, I want to make the Sun respond
to handshakes supplied by a printer (that is, making the serial driver
obey the MDMBUF bit), but we would also like to know how to make the Sun
supply handshaking for it's input (like TANDEM but in hardware).

Any pointers?
		Thanks in advance,
		Paul Placeway
		paul@cis.ohio-state.edu

------------------------------

Date:    Thu, 22 Sep 88 10:34:42 EDT
From:    ehrlich@shire.cs.psu.edu (Dan Ehrlich)
Subject: stdio.h and dbm.h have different views of NULL

Machine Type:	Sun 4/260S
O/S Version:	SunOS 4.0
Organization:	Computer Science Department
		The Pennsylvalia State University
		333 Whitmore Laboratory
		University Park, PA   16802
Phone Number:	+1 814 865 9723

Description:

	If a program includes both stdio.h and dbm.h a compiler
	diagnostic is generated complaining that NULL is redefined.
	From looking at stdio.h we find that NULL is:

		#define	NULL	0

	Now if we look at dbm.h we find that NULL is also:

		#define	NULL	((char *) 0)

	Probably either will work.  Which one is correct?  Is dbm not
	implemented with ndbm as in 4.3BSD?

Repeat-By:

	Compile the following C program:

		#include <stdio.h>
		#include <dbm.h>
		main(){}

Fix:

	Try doing it the 4.3BSD way by surrounding the definition of
	NULL in dbm.h with #ifndef NULL  ...  #endif

------------------------------

Date:    22 Sep 88 13:08:57 GMT
From:    emory!km@gatech.edu (Ken Mandelberg)
Subject: Problem: SLOW Boot on diskless Suns

We frequently find that some of our diskless Sun 3's take a very long time
to boot. The problem is early in the game during the tftp, presumably of
the boot program from the server. From the console of the diskless Sun all
one sees is a very slowly rotating cursor. Eventually the cursor takes off
and the tftp completes rapidly and all is well.  The delay can be 5
minutes or more.

If I monitor the network from another node with etherfind I can see that
the diskless Sun has successfully rarped with the server and the delay
period is actually the server repeatedly sending a large packet to the
client followed by the client sending a small packet back. It looks like
the server is trying to send a tftp data packet that the client keeps
rejecting. After several minutes of this amounting to 1000s of these
transactions, something times out and it all starts again. I have only
monitored this a few times. One time the next attempt after the timeout
was successful, another time it took until the third attempt.

Right now we are seeing this consistantly between a Sun 3/60 client and a
Sun 4/280 server, both running 4.0. We have also seen it to a lesser
extent between other Sun 3 clients and servers on 3.X.

It appears that what is happening is that the tftp is going wrong right at
the beginning and the protocol is to naiive to recover without several
minutes of timeout.

In case its relevant, we have a class B network using a netmask of
ffff0000 on all Sun nodes. There are other nodes on the network that are
running older TCP software (AT&T and DEC VMS) that should not be talking
to the Suns, but I suppose could be. During my etherfind I monitored all
traffic to or from the client's IP address and saw no packets from anyone
but the server.

Any ideas?

[[ This send part was mailed in two days later:  --wnl ]]

I recently posted a note about our problem with the slow booting of our
diskless sun 3's. Since the time I posted the note, we have monitored the
network a bit more carefully and I find that my original description was
misleading.

The external symptoms from the console of the booting diskless Suns
remains the same. A certain percentage of the boots are uneventful and
quite kick. However, other boots go very slowly and take 10 minutes to do
the tftp of the boot program from the host.  I can take one machine and
repeatedly boot it, with maybe 3 fast boots and 2 slow ones out of 5.

Previously I was convinced that on the slow boots that the tftp was
getting hung, timing out and restarting from the beginning until it
finally succeeded. More carful observations with etherfind actually show a
different pattern. On the slow tftp's the tftp is completing successfully
with no bad packets, just running very slowly. The server sends a packet
and the client acknowledges each with no retransmits.  The problem is that
the client frequently waits several seconds before sending its
acknowledgement. In fact it looks like this delay occurs approximately on
every other acknowledgement. Since the boot program is several hundred
packets long, the accumulated delay for the entire tftp is several hundred
seconds. Once the boot program takes control the rest of the boot goes
quickly.

It appears that PROM based tftp is fragile, and some condition on our net
can induce it to subsequently introduce a long delay between receiving a
tftp packet and acknowledging it. We think that it may be getting confused
by broadcast packets. If the tftp starts at a clear moment it completes
quickly before the next broadcast packet appears.  This is just a guess.

>From an operational point of view the workaround is clear. If the boot is
a slow one, you just interrupt it and try again.  After a small number of
tries you get a fast tftp.

Does this seem familiar to anyone? We are running Sun 3/50 and 3/60
diskless clients on a Sun 4 server all running 4.0. The same symptoms
showed up on 3.X with a Sun 3 server too. Since the problem seems to be in
the PROM tftp, it would seem to be OS invariant.

-- 
Ken Mandelberg      | km@mathcs.emory.edu          PREFERRED
Emory University    | {decvax,gatech}!emory!km     UUCP 
Dept of Math and CS | km@emory                     NON-DOMAIN BITNET  
Atlanta, GA 30322   | Phone: (404) 727-7963

------------------------------

Date:    Thu, 22 Sep 88 10:52:30 BST
From:    Alan McIvor <mcivor%robots.oxford.ac.uk@nss.cs.ucl.ac.uk>
Subject: Device driver for Motorola MVME300 GPIB board?

Hi,

Does anybody know where I can get a device driver for the Motorola MVME300
GPIB(HP-IB, IEEE-488) board. The only GPIB software in Catalyst is based
on a National Instruments board.

Thanks,
	Alan M. McIvor

JANET: mcivor@uk.ac.oxford.robots (or robots.oxford.ac.uk)
ARPA: mcivor%uk.ac.oxford.robots@nss.cs.ucl.ac.uk
UUCP: ...!mcvax!ukc!ox-rob!mcivor

Post: Alan M. McIvor
      Robotics Research Group
      Department of Engineering Science
      Oxford University
      Parks Road
      Oxford OX1 3PJ
      UK

------------------------------

End of SUN-Spots Digest
***********************