devil@diablery.10A.com (Gil Tene) (06/01/91)
Hello UUCPeople, I am running an Smail 3.1.19 configuration (straight off the uunet archives), and I have lately been getting core dumps from Smail during heavy uucp mail transfers. The core dumps are not consistent, and the data of the mail message is NOT lost, it is sent succesfully 20 min. later during the next smail daemon pass. I have not been able to generate these core dumps on purpose, since it does not seem to be the actual mail message data that causes them, but a combination of data and load, probably causing some locking problems. When smail core dumps, it sends the a mail message to the originator of the message that failed, below is a sample message of this kind : remote execution [uucp job uunetCZbF3 (6/1-2:06:02)] rmail gad.fibronics!pablo exited with status 138 ===== stderr was ===== sh: 10123 Bus error - core dumped The remote node (gad.fibronics) is known, and like I said before, this exact same execution WILL succeed if there is no "heavy" load of mail messages being delivered at the same time. Anyone out there have any ideas on how to fix this? The real bad part is that the originator is getting a failure message while the message is actually getting through later. AdvThanks, -- Gil. -- -------------------------------------------------------------------- -- Gil Tene "Some days it just dosn't pay - -- devil@imp.HellNet.org to go to sleep in the morning." - -- devil@diablery.10A.com - --------------------------------------------------------------------
tron@Veritas.COM (Ronald S. Karr) (06/02/91)
In article <345@imp.UUCP> devil@diablery.10A.com (Gil Tene) writes: >The remote node (gad.fibronics) is known, and like I said >before, this exact same execution WILL succeed if there is >no "heavy" load of mail messages being delivered at the >same time. > >Anyone out there have any ideas on how to fix this? The real >bad part is that the originator is getting a failure message >while the message is actually getting through later. Smail operates on each message individually, so I don't quite know how a bug in the program itself can be load related. However, smail does have serious problems on systems with a heavy load. It does not limit its usage of machine resources, in any way, so it is capable of exhausting resources that it needs to operate (such as memory and paging space). There are known bugs in 3.1.19 that can cause core dumps in fairly random situations, there are likely more. Since these bugs are most often related to malloc/free bugs or data-clobbering bugs, they tend to be very machine and situationally dependent. Also, the reported core dump stack traces are often unrelated to the real problem (which is often true of malloc/free bugs). The best thing to do is to get a stack trace from the core file, hopefully with -g-style debugging information, and to mail it to me. That is, unless you can track down the bug and mail a fix to me, which is even better (I think I have been reasonably responsive lately, though I have not always been responsive in the past). Since I only have a few types of machines that are directly available to me, my ability to track down vague problems is very limited. -- tron |-<=>-| ARPAnet: veritas!tron@apple.com tron@veritas.com UUCPnet: {amdahl,apple,pyramid}!veritas!tron
fkk@stasys.sta.sub.org (Frank Kaefer) (06/03/91)
devil@diablery.10A.com (Gil Tene) writes: |Hello UUCPeople, |I am running an Smail 3.1.19 configuration (straight off the |uunet archives), and I have lately been getting core dumps |from Smail during heavy uucp mail transfers. The core dumps I have the same problem with smail "/\==/\ Smail3.1.21.1 #21.1" (I had the same problem in 3.1.20 and 3.1.19). The debugger says: core dumped from rmail (link to smail), functions: main -> perform_deliver_mail -> unlock_message -> SIGSEV (11). These coredumps appear reagardless of the system load, I even had them just mailing one message with no load on the system. Any help/pointers etc. greatly appreciated. [My system is a Sun 4/40 running SunOS 4.1.1 B] Cheers, Frank -- | Frank Kaefer | fkk@stasys.sta.sub.org | Starnberg, Germany | | Compuserve: 72427,2101 | Internet: fkk@Germany.Sun.COM | | unido!sunde!fkaefer | postmaster@Germany.Sun.COM |
les@chinet.chi.il.us (Leslie Mikesell) (06/04/91)
In article <345@imp.UUCP> devil@diablery.10A.com (Gil Tene) writes: >Hello UUCPeople, > >I am running an Smail 3.1.19 configuration (straight off the >uunet archives), and I have lately been getting core dumps >from Smail during heavy uucp mail transfers. The core dumps >are not consistent, and the data of the mail message is NOT >lost, it is sent succesfully 20 min. later during the next >smail daemon pass. I have not been able to generate these >core dumps on purpose, since it does not seem to be the >actual mail message data that causes them, but a combination >of data and load, probably causing some locking problems. It may be memory problems instead, expecially if you have set delivery_mode = background in the config file. If you do this, uuxqt won't wait for smail to complete delivery before starting a new one. After writing the copy to the queue file, smail3 will attempt to malloc() message_buf_size (default = 100k) plus the the work space for handling the alias, paths and forwarding files, etc. Several of these at once can swamp a small machine. >Anyone out there have any ideas on how to fix this? The real >bad part is that the originator is getting a failure message >while the message is actually getting through later. If you have background delivery set (smail forks after queuing to continue delivery in a different process), change to foreground or daemon mode. Try setting the message_buf_size lower (or increase the available resources if you are hitting a swap space or per-user memory limit). If you have HDB uucp, be sure your Maxuuxqts is set to a low number. Les Mikesell les@chinet.chi.il.us