[comp.mail.uucp] Smail 3.1 core dumping question...

devil@diablery.10A.com (Gil Tene) (06/01/91)

Hello UUCPeople,

I am running an Smail 3.1.19 configuration (straight off the
uunet archives), and I have lately been getting core dumps 
from Smail during heavy uucp mail transfers. The core dumps
are not consistent, and the data of the mail message is NOT
lost, it is sent succesfully 20 min. later during the next
smail daemon pass. I have not been able to generate these
core dumps on purpose, since it does not seem to be the 
actual mail message data that causes them, but a combination 
of data and load, probably causing some locking problems.

When smail core dumps, it sends the a mail message to the
originator of the message that failed, below is a sample 
message of this kind :


remote execution        [uucp job uunetCZbF3 (6/1-2:06:02)]
        rmail gad.fibronics!pablo
exited with status 138


        ===== stderr was =====
sh: 10123 Bus error - core dumped

The remote node (gad.fibronics) is known, and like I said 
before, this exact same execution WILL succeed if there is 
no "heavy" load of mail messages being delivered at the
same time.

Anyone out there have any ideas on how to fix this? The real
bad part is that the originator is getting a failure message
while the message is actually getting through later.

AdvThanks,

-- Gil.
-- 
--------------------------------------------------------------------
-- Gil Tene			"Some days it just dosn't pay      -
-- devil@imp.HellNet.org	   to go to sleep in the morning." -
-- devil@diablery.10A.com 					   -
--------------------------------------------------------------------

tron@Veritas.COM (Ronald S. Karr) (06/02/91)

In article <345@imp.UUCP> devil@diablery.10A.com (Gil Tene) writes:
>The remote node (gad.fibronics) is known, and like I said 
>before, this exact same execution WILL succeed if there is 
>no "heavy" load of mail messages being delivered at the
>same time.
>
>Anyone out there have any ideas on how to fix this? The real
>bad part is that the originator is getting a failure message
>while the message is actually getting through later.

Smail operates on each message individually, so I don't quite know
how a bug in the program itself can be load related.  However, smail
does have serious problems on systems with a heavy load.  It does
not limit its usage of machine resources, in any way, so it is
capable of exhausting resources that it needs to operate (such as
memory and paging space).

There are known bugs in 3.1.19 that can cause core dumps in fairly
random situations, there are likely more.  Since these bugs are most
often related to malloc/free bugs or data-clobbering bugs, they tend to
be very machine and situationally dependent.  Also, the reported core
dump stack traces are often unrelated to the real problem (which is
often true of malloc/free bugs).

The best thing to do is to get a stack trace from the core file,
hopefully with -g-style debugging information, and to mail it to me.
That is, unless you can track down the bug and mail a fix to me,
which is even better (I think I have been reasonably responsive lately,
though I have not always been responsive in the past).  Since I only
have a few types of machines that are directly available to me, my
ability to track down vague problems is very limited.
-- 
	tron |-<=>-|		ARPAnet:  veritas!tron@apple.com
      tron@veritas.com		UUCPnet:  {amdahl,apple,pyramid}!veritas!tron

fkk@stasys.sta.sub.org (Frank Kaefer) (06/03/91)

devil@diablery.10A.com (Gil Tene) writes:

|Hello UUCPeople,

|I am running an Smail 3.1.19 configuration (straight off the
|uunet archives), and I have lately been getting core dumps 
|from Smail during heavy uucp mail transfers. The core dumps

I have the same problem with smail "/\==/\ Smail3.1.21.1 #21.1"
(I had the same problem in 3.1.20 and 3.1.19).
The debugger says: core dumped from rmail (link to smail),
functions: main -> perform_deliver_mail -> unlock_message -> SIGSEV (11).
These coredumps appear reagardless of the system load, I even had them
just mailing one message with no load on the system.
Any help/pointers etc. greatly appreciated.

[My system is a Sun 4/40 running SunOS 4.1.1 B]

Cheers,
Frank
-- 
| Frank Kaefer | fkk@stasys.sta.sub.org | Starnberg, Germany |
| Compuserve: 72427,2101   | Internet: fkk@Germany.Sun.COM   |
| unido!sunde!fkaefer      |    postmaster@Germany.Sun.COM   |

les@chinet.chi.il.us (Leslie Mikesell) (06/04/91)

In article <345@imp.UUCP> devil@diablery.10A.com (Gil Tene) writes:
>Hello UUCPeople,
>
>I am running an Smail 3.1.19 configuration (straight off the
>uunet archives), and I have lately been getting core dumps 
>from Smail during heavy uucp mail transfers. The core dumps
>are not consistent, and the data of the mail message is NOT
>lost, it is sent succesfully 20 min. later during the next
>smail daemon pass. I have not been able to generate these
>core dumps on purpose, since it does not seem to be the 
>actual mail message data that causes them, but a combination 
>of data and load, probably causing some locking problems.

It may be memory problems instead, expecially if you have set
delivery_mode = background in the config file.  If you do this,
uuxqt won't wait for smail to complete delivery before starting
a new one.   After writing the copy to the queue file, smail3
will attempt to malloc() message_buf_size (default = 100k) plus
the the work space for handling the alias, paths and forwarding
files, etc.  Several of these at once can swamp a small machine.

>Anyone out there have any ideas on how to fix this? The real
>bad part is that the originator is getting a failure message
>while the message is actually getting through later.

If you have background delivery set (smail forks after queuing
to continue delivery in a different process), change to foreground
or daemon mode.  Try setting the message_buf_size lower (or
increase the available resources if you are hitting a swap space
or per-user memory limit).  If you have HDB uucp, be sure your
Maxuuxqts is set to a low number.

Les Mikesell
  les@chinet.chi.il.us