Sun-Spots-Request@Rice.edu (William LeFebvre) (09/28/88)
SUN-SPOTS DIGEST Monday, 26 September 1988 Volume 6 : Issue 238 Today's Topics: Re: Shared Memory vs. Malloc() Re: Problem with Sun TCP Serial line H/W flow control under 3.5 and 4.0 stdio.h and dbm.h have different views of NULL Problem: SLOW Boot on diskless Suns Device driver for Motorola MVME300 GPIB board? Send contributions to: sun-spots@rice.edu Send subscription add/delete requests to: sun-spots-request@rice.edu Bitnet readers can subscribe directly with the CMS command: TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name Recent backissues are available via anonymous FTP from "titan.rice.edu". For volume X, issue Y, "get sun-spots/vXnY". They are also accessible through the archive server: mail the request "send sun-spots vXnY" to "archive-server@rice.edu" or mail the word "help" to the same address for more information. ---------------------------------------------------------------------- Date: Thu, 22 Sep 88 09:45:48 EDT From: clapper@nadc.arpa (Brian M. Clapper) Subject: Re: Shared Memory vs. Malloc() I ran into similar problems once, on a pure System V machine (not on a Sun). Perhaps my experiences will help. Marc Rochkind's _Advanced_UNIX_Programming_ (Prentice-Hall, 1985) provides good background to the problem. Here's an excerpt (p. 194): "You map (attach) a segment to your address space with shmat. You can request a particular address by setting the addr argument appropriately. This is important if you are also allocting memory dynamically with the brk or sbrk system calls ... because they won't go beyond an attached shared segment. Unfortunately, in many applications, this constraint requires you to know exactly how memory is to be used, and to carefully assign regions to shared segments and dynamic allocation subroutines (such as malloc). This make programs buggy, hard to maintain, and noportable. If you don't care where the memory is mapped (lucky you!), you can use zero for addr and let the kernel pick an address. Don't expect to be able to use brk or sbrk again, however. If you've already called them, the kernel will respect their work and allocate the shared segment out of harm's way." I ran into this problem while working on a System V-specific application, so the issue of portability did not arise. I considered several solutions, both of which required knowledge of how memory was to be used. The first was Rochkind's: specify an address at which to attach shared memory. This solution requires that you calculate the end of the data segment, and know how much room is available to expand the data segment. You can locate the end of the data segment with the sbrk() system call, passing it a '0' argument. On p. 229 of the aforementioned book, Rochkind provides a small subroutine calculate available space. (Notice that it, too, uses sbrk). long avail () /* return number of available bytes */ { char *sbrk (); long ulimit (); return (ulimit (3, 0L) - (long) sbrk (0)); } With these two pieces of information, along with the size of the shared memory segment you wish, you can allocate your shared memory as far away from the current end of your data segment as possible. Note shmat *may* attach the shared memory a little closer to the end of your data segment than you ask for, since it may try to align the segment on a segment boundary. If you really care about this, you can set some arbitrary lower address limit and compare the return result from shmat with that limit. A simpler, but slightly kludgier, way of doing it is to pre-allocate a large area of memory with malloc, before you attach your shared memory segment. Estimate the maximum amount of memory your program needs to malloc (and add a safety factor). At the beginning of your program, malloc that much memory. Then free it. Typically, malloc and free try to minimize the number of system calls that are made. Consequently, free doesn't actually reduce the size of the data segment; it simply marks the freed memory as available and adds it to a free list. When you call malloc, it examines the free list to determine whether it can satisfy your request from there. If it can't, then it attempts to enlarge the data segment with an sbrk call. So, by doing one super-large malloc at the beginning, you force malloc to call sbrk to enlarge the data segment. When you free it, you don't reduce the data segment, so you can attach your shared memory segment without specifying an address and be reasonably sure it is out of harm's way. As in the previous solution, you might want to calculate whether you have enough room to do all this first. This solution is simpler (in a sense), but it requires you to couple your code to a presumed malloc/free behavior. This is probably safe, but only you can make that decision. Finally, I did not get the impression that malloc can overlap the shared memory segment. If the shared memory segment is 'in the way' of malloc's attempt to enlarge the data segment, the malloc will fail (because its call to sbrk fails). Your data area should not become corrupt (assuming you are programming carefully :-)); you just won't be able to malloc. Caveat: The above assumptions were verified on a VAX 11/785 running AT&T's System V Release 2.0. They should apply to SunOS, but I haven't experimented with Sun's implementation of the System V IPC primitives. By the way, I enthusiastically recommend Rochkind's book to anyone doing any serious UNIX programming. It is excellent. Hope this helps. Brian M. Clapper clapper@nadc.arpa Naval Air Development Center (215) 441-2118 Warminster, PA 18974-5000 ------------------------------ Date: Thu, 22 Sep 88 09:00:12 PDT From: jrich@devnet4.hac.com (john richardson) Subject: Re: Problem with Sun TCP We have some users wired to a terminal server so they may connect to any one of our Suns. Their .login disables interrupts (initially we were using 'onintr -' and now 'stty susp undef intr undef') to prevent them as best as we can from getting to the operating system. When their connection is broken, either due to some failure or their hitting the break key (the terminal server is configured to disconnect them when this happens), the login csh runs away (soaks up as much cpu time as it can get) until we kill it with SIGHUP. We have noted, however, the in.telnetd process that spawned the csh did go away on its own when the connection was broken. My uneducated (no source code) guess is that the in.telnetd process is trying to kill its spawned child process with a SIGINTR, although it would seem SIGHUP would be the proper signal to use in this situation. The cpu activity generated by the csh is a result of it repeatedly trying to read from the closed socket and ignoring the errors (for robustness!). I tried adding 'unset ignoreeof' to the .login. We are running 3.2 on both Sun 2's and Sun 3's. We have tried importing a 3.5 version of in.telnetd but this didn't seem to help. Several items on the net recently seem to be complaining about problems with various parts of Sun's TCP. One question about whether Sun uses keepalives may be the same problem. One piece of information on the net suggested that init does not reset some signals to SIG_DFL and so all its children including logins may inherit a screwed up signal environment. I do not think this is a factor. I renamed in.telnetd to in.telnetd- and substituted a small program that did reset all signals to SIG_DFL prior to execing in.telnetd- with no effect. Here is a minimalist description of how to cause this problem. 1. Save this file somewhere, say /tmp/xxx. It is a trimmed down version of the offending .login. ---------------cut here--------------------------- # Login file for wire planners # @(#)planner.login 1.16 8/23/88 # # Leave suspend keys undefined so user can't # if ( `stty speed` !~ [0-9]* ) stty 9600 term -m ":?hp" reset stty susp undef dsusp undef intr undef quit undef # # Ask the user if they want to change their password # echo "Do you wish to change your password?" set answer = "$<" if ("$answer" == "y") then passwd endif # # Present menu of planning programs. # set choice="1" while( "$choice" != "0" ) clear echo " Wire Planner Menu" echo "" echo " 0 = logout" echo " 1 = help" echo " 2 = Editor" echo " 3 = More" echo " 4 = Csh" echo "" echo -n "?" set choice="$<" switch ( "$choice" ) case "0": breaksw case "1": # # help # echo "This is a modified version of the planner's menu" echo "used for testing the runaway csh problem" echo "and possible solutions." breaksw case "2": vi /etc/termcap breaksw case "3": more /etc/termcap breaksw case "4": /bin/csh breaksw default: echo "Unknown option $choice" breaksw endsw echo "Hit return to continue" set junk="$<" end logout ----------------------end of file------------------ 2. telnet to localhost and after logging in, source this file, /tmp/xxx. 3. Answer the terminal type prompt appropriately, presumably answer no to changing your password, and then select any of options 2, 3, or 4. 4. When the specified option is started, type control-] to interrupt telnet. Enter 'close' in response to the telnet prompt. 5. If you have access to top, use it to watch the csh consume all the available cpu on your machine. Naturally, other applications running on the affected machine are adversed affected, so this can be a debilitating nuisance. Currently, we are dealing with the problem by running a program cloned from wnl's top that looks for csh processes with excessive cpu usage and sending SIGHUP's to them. It is effective, but not ideal. Any input would be welcome. In particular, any comments about whether this is observed in 3.4 or 3.5 (or I suppose 4.0 although this is probably 8 or 9 months away for us). Thank you. John Richardson Hughes Aircraft Company jrich@devnet4.hac.com (714) 732-5588 ------------------------------ Date: Thu, 22 Sep 88 09:21:26 EDT From: paul@morganucodon.cis.ohio-state.edu (Paul Placeway) Subject: Serial line H/W flow control under 3.5 and 4.0 I am interested in getting hardware flow control (RTS/CTS handshaking) to work on Sun 3/180s (and probably their 3/50 clients) running 3.5.1 or 4.0, and on a Sun 4/280 running 4.0. Primerily, I want to make the Sun respond to handshakes supplied by a printer (that is, making the serial driver obey the MDMBUF bit), but we would also like to know how to make the Sun supply handshaking for it's input (like TANDEM but in hardware). Any pointers? Thanks in advance, Paul Placeway paul@cis.ohio-state.edu ------------------------------ Date: Thu, 22 Sep 88 10:34:42 EDT From: ehrlich@shire.cs.psu.edu (Dan Ehrlich) Subject: stdio.h and dbm.h have different views of NULL Machine Type: Sun 4/260S O/S Version: SunOS 4.0 Organization: Computer Science Department The Pennsylvalia State University 333 Whitmore Laboratory University Park, PA 16802 Phone Number: +1 814 865 9723 Description: If a program includes both stdio.h and dbm.h a compiler diagnostic is generated complaining that NULL is redefined. From looking at stdio.h we find that NULL is: #define NULL 0 Now if we look at dbm.h we find that NULL is also: #define NULL ((char *) 0) Probably either will work. Which one is correct? Is dbm not implemented with ndbm as in 4.3BSD? Repeat-By: Compile the following C program: #include <stdio.h> #include <dbm.h> main(){} Fix: Try doing it the 4.3BSD way by surrounding the definition of NULL in dbm.h with #ifndef NULL ... #endif ------------------------------ Date: 22 Sep 88 13:08:57 GMT From: emory!km@gatech.edu (Ken Mandelberg) Subject: Problem: SLOW Boot on diskless Suns We frequently find that some of our diskless Sun 3's take a very long time to boot. The problem is early in the game during the tftp, presumably of the boot program from the server. From the console of the diskless Sun all one sees is a very slowly rotating cursor. Eventually the cursor takes off and the tftp completes rapidly and all is well. The delay can be 5 minutes or more. If I monitor the network from another node with etherfind I can see that the diskless Sun has successfully rarped with the server and the delay period is actually the server repeatedly sending a large packet to the client followed by the client sending a small packet back. It looks like the server is trying to send a tftp data packet that the client keeps rejecting. After several minutes of this amounting to 1000s of these transactions, something times out and it all starts again. I have only monitored this a few times. One time the next attempt after the timeout was successful, another time it took until the third attempt. Right now we are seeing this consistantly between a Sun 3/60 client and a Sun 4/280 server, both running 4.0. We have also seen it to a lesser extent between other Sun 3 clients and servers on 3.X. It appears that what is happening is that the tftp is going wrong right at the beginning and the protocol is to naiive to recover without several minutes of timeout. In case its relevant, we have a class B network using a netmask of ffff0000 on all Sun nodes. There are other nodes on the network that are running older TCP software (AT&T and DEC VMS) that should not be talking to the Suns, but I suppose could be. During my etherfind I monitored all traffic to or from the client's IP address and saw no packets from anyone but the server. Any ideas? [[ This send part was mailed in two days later: --wnl ]] I recently posted a note about our problem with the slow booting of our diskless sun 3's. Since the time I posted the note, we have monitored the network a bit more carefully and I find that my original description was misleading. The external symptoms from the console of the booting diskless Suns remains the same. A certain percentage of the boots are uneventful and quite kick. However, other boots go very slowly and take 10 minutes to do the tftp of the boot program from the host. I can take one machine and repeatedly boot it, with maybe 3 fast boots and 2 slow ones out of 5. Previously I was convinced that on the slow boots that the tftp was getting hung, timing out and restarting from the beginning until it finally succeeded. More carful observations with etherfind actually show a different pattern. On the slow tftp's the tftp is completing successfully with no bad packets, just running very slowly. The server sends a packet and the client acknowledges each with no retransmits. The problem is that the client frequently waits several seconds before sending its acknowledgement. In fact it looks like this delay occurs approximately on every other acknowledgement. Since the boot program is several hundred packets long, the accumulated delay for the entire tftp is several hundred seconds. Once the boot program takes control the rest of the boot goes quickly. It appears that PROM based tftp is fragile, and some condition on our net can induce it to subsequently introduce a long delay between receiving a tftp packet and acknowledging it. We think that it may be getting confused by broadcast packets. If the tftp starts at a clear moment it completes quickly before the next broadcast packet appears. This is just a guess. >From an operational point of view the workaround is clear. If the boot is a slow one, you just interrupt it and try again. After a small number of tries you get a fast tftp. Does this seem familiar to anyone? We are running Sun 3/50 and 3/60 diskless clients on a Sun 4 server all running 4.0. The same symptoms showed up on 3.X with a Sun 3 server too. Since the problem seems to be in the PROM tftp, it would seem to be OS invariant. -- Ken Mandelberg | km@mathcs.emory.edu PREFERRED Emory University | {decvax,gatech}!emory!km UUCP Dept of Math and CS | km@emory NON-DOMAIN BITNET Atlanta, GA 30322 | Phone: (404) 727-7963 ------------------------------ Date: Thu, 22 Sep 88 10:52:30 BST From: Alan McIvor <mcivor%robots.oxford.ac.uk@nss.cs.ucl.ac.uk> Subject: Device driver for Motorola MVME300 GPIB board? Hi, Does anybody know where I can get a device driver for the Motorola MVME300 GPIB(HP-IB, IEEE-488) board. The only GPIB software in Catalyst is based on a National Instruments board. Thanks, Alan M. McIvor JANET: mcivor@uk.ac.oxford.robots (or robots.oxford.ac.uk) ARPA: mcivor%uk.ac.oxford.robots@nss.cs.ucl.ac.uk UUCP: ...!mcvax!ukc!ox-rob!mcivor Post: Alan M. McIvor Robotics Research Group Department of Engineering Science Oxford University Parks Road Oxford OX1 3PJ UK ------------------------------ End of SUN-Spots Digest ***********************