[comp.sys.apollo] Problems with email

mike@tuvie (Inst.f.Techn.Informatik) (07/06/90)

I'm not sure whether this is a bug or a feature (on Apollos you never
quite know :-(, but here comes my problem:

Our mail works OK as long as the registry is available, but
when the registry is down (we do not have slave registries), then 
/bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
problem seems to be that the mailer cannot acquire the gid of mail, 
but about this I'm not too sure. The mailer does not seem to return 
an error code (or does /usr/lib/sendmail ignore it ?), whenever 
this happens. The log file contains the status report Stat=Sent, but
the mail is nowhere to be found (except on /dev/null).
Has anybody had a similar problem and if so, how was it solved 
(without resorting to slave registries)?

BTW, the prototype sendmail configuration files supplied in the sys5 version of
/usr/lib are buggy: for the local mailer, /bin/mail is invoked with the 
bsd options which are not compatible to the sys5 options. Beware!

					bye,
						mike
       ____  ____
      /   / / / /   Michael K. Gschwind             mike@vlsivie.at
     /   / / / /    Technical University, Vienna    mike@vlsivie.uucp
     ---/           Voice: (++43).1.58801 8144      e182202@awituw01.bitnet
       /            Fax:   (++43).1.569697
   ___/

rogden@uceng.UC.EDU (rob ogden) (07/07/90)

mike@tuvie (Inst.f.Techn.Informatik) writes:

>I'm not sure whether this is a bug or a feature (on Apollos you never
>quite know :-(, but here comes my problem:

>Our mail works OK as long as the registry is available, but
>when the registry is down (we do not have slave registries), then 
>/bsd4.3/bin/mail will not deliver mail to the recipients. Now the 

Our dn10000 had a similar problem. The dn10000 would receive the mail and
not deliver it.  I am constantly befuddled by the Apollo potpourri of
aegis,bsd,sys5, and then decided to  /sys5.3/usr/lib/sendmail.
To my amazement, the mail was going through.

Go figure.


Rob Ogden
rogden@uceng.UC.EDU
Aerospace Engineering and Engineering Mechanics, ML70
University of Cincinnati, Cincinnati, OH 45221   513/556-3549

nazgul@alphalpha.com (Kee Hinckley) (07/08/90)

In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes:
>/bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
>problem seems to be that the mailer cannot acquire the gid of mail, 
Correct, although I'm not sure why it was modified to need the gid.

>but about this I'm not too sure. The mailer does not seem to return 
>an error code (or does /usr/lib/sendmail ignore it ?), whenever 
It doesn't return an error code.  This isn't an Apollo problem but
generic to mail.  There are a number of cases where it totally punts,
it's error handling is grotesque or non-existant.

I've reported the bug to Apollo, but I suspect your best bet is to
find a PD version of mail (I believe there is one on uunet, it also
goes by the name of rmail often) and use that.

						-kee

-- 
Alphalpha Software, Inc.	|	motif-request@alphalpha.com
nazgul@alphalpha.com		|-----------------------------------
617/646-7703 (voice/fax)	|	Proline BBS: 617/641-3722

I'm not sure which upsets me more; that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

jimr@metro (Jim Richardson) (07/09/90)

In article <1664@tuvie>, mike@tuvie (Inst.f.Techn.Informatik) writes:
> Our mail works OK as long as the registry is available, but
> when the registry is down (we do not have slave registries), then 
> /bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
> problem seems to be that the mailer cannot acquire the gid of mail, 
> but about this I'm not too sure. The mailer does not seem to return 
> an error code (or does /usr/lib/sendmail ignore it ?), whenever 
> this happens. The log file contains the status report Stat=Sent, but
> the mail is nowhere to be found (except on /dev/null).
> Has anybody had a similar problem and if so, how was it solved 
> (without resorting to slave registries)?

This does happen to us when the registry dies completely or gets that kind of
registry disease where the password file appears to be empty.  Another cause
is when /usr/spool/mail is unavailable: for historical reasons involving backup
/usr/spool/mail on our mail gateway machine is a soft link to a directory on
another node (avoid this if you can!).

We run a script like the following continuously on the gateway node.  It detects
either of these problems and kills the sendmail daemon if they occur.  If you
call it "netcheck" you can start it as root via "/etc/server -p netcheck &".  It
chews up some CPU time but it's worth it for the peace of mind!

#! /bin/ksh
#
#  Check Apollo system is fit to receive incoming mail messages

exec >> /sys/node_data/system_logs/netcheck.log 2>&1

print "netcheck starting at $( /bin/date )"

SLEEP_TIME=60
REPORT_INTERVAL=60
STOPPED_FLAG="/usr/spool/mail/STOPPED_FLAG"

shutsm() {
	/bin/ps aux
	print "$( /bin/date ) stopping sendmail daemon: $*"
	pids="$( /bin/ps ax | /bin/awk '/\/usr\/lib\/sendmail -bd -q[0-9]*m$/ {print $1}' )"
	print "Killing $pids"
	/bin/kill $pids
	/usr/ucb/logger -t netcheck "sendmail daemon(s) $pids stopped: $*"
	/usr/bin/touch ${STOPPED_FLAG}
	/bin/ps aux
}

typeset -i count=$REPORT_INTERVAL

while :
do
	if [ ! -f ${STOPPED_FLAG} ]
	then
		if [ ! -d /usr/spool/mail ]
		then
			shutsm "/usr/spool/mail unavailable"
		fi
		
		if [ ! -s /etc/passwd ]
		then
			shutsm "/etc/passwd missing or empty"
			/bin/ls -l /etc/passwd
		fi
	fi

	count=count-1
		
	if [ count -le 0 ]
	then
		# log the date periodically
		/bin/date
		# force new test next time even if stop flag exists
		/bin/rm -f ${STOPPED_FLAG}
		count=$REPORT_INTERVAL
	fi

	sleep $SLEEP_TIME
done

This will only work for you if /etc/passwd appears to have size zero whenever
the registry is down.  Furthermore, your sendmail daemon needs to look like
"/usr/lib/sendmail -bd -q[0-9]*m" to a "ps ax" command: if you use something
else like "-q1h -bd" adjust the script accordingly.  

Finally, the actual script we run does some other local stuff and I've hacked
it down to give the above, which may therefore have a flaw or too.  But you
should get the idea.
--
Jim Richardson
Department of Pure Mathematics, University of Sydney, NSW 2006, Australia
Internet: jimr@maths.su.oz.au  ACSNET: jimr@maths.su.oz  FAX: +61 2 692 4534

chen@digital.sps.mot.com (Jinfu Chen) (07/09/90)

In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes:
> I'm not sure whether this is a bug or a feature (on Apollos you never
> quite know :-(, but here comes my problem:
>
> Our mail works OK as long as the registry is available, but
> when the registry is down (we do not have slave registries), then 
> /bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
> problem seems to be that the mailer cannot acquire the gid of mail, 
> but about this I'm not too sure.

I believe calls in <pwd.h> are eventually translated to registry calls, and
/etc/passwd, /etc/group, etc aren't just plain unstruct file either. The
only solusion is to have slave registry running somewhere else.

--
Jinfu Chen                  (602)898-5338      |
Motorola, Inc.  SPS  Mesa, AZ                  |
 ...uunet!motsps!digital!chen                  |
chen@digital.sps.mot.com                       |

jonathan@jarthur.Claremont.EDU (Jonathan Ball) (07/10/90)

I've also had problems with mail.  I have two questions about it:

1) /bsd4.3/usr/ucb/mail seems to work ok to get messages between users on
the Apollos EXCEPT for root:  The root user can send messages fine; but when
mail is sent to root, "mail" reports "No mail for root." -- the message
never arrives.  Is there something I am doing wrong?

2) As a fledgling sys admin, I am having lots of difficulties getting a mail
handler set up to send and receive mail with the outside world (i.e.
anything but the Apollos, either on the local network or any other network
we are connected to).  I've tried to read the manuals, but am not sure where
to start...and my local Apollo service guy,though he is extremely friendly and
helpful, doesn't know anything about mail and can't give me help.  Does
anyone know what manuals I should read or what I should do to get started?

We are running SR10.1 Aegis and BSD (all users use BSD) on 11 3500 and
4500's.

Thank you very much!
Jon
-- 
jonathan@jarthur.claremont.edu (134.173.4.42)

thompson%pan@UMIX.CC.UMICH.EDU (John Thompson) (07/11/90)

Jinfu Chen writes:
> In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes:
> > I'm not sure whether this is a bug or a feature (on Apollos you never
> > quite know :-(, but here comes my problem:
> >
> > Our mail works OK as long as the registry is available, but
> > when the registry is down (we do not have slave registries), then 
> > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
> > problem seems to be that the mailer cannot acquire the gid of mail, 
> > but about this I'm not too sure.
> 
> I believe calls in <pwd.h> are eventually translated to registry calls, and
> /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The
> only solusion is to have slave registry running somewhere else.

True.  From the man page for getpwuid, it says:

     Under Domain/OS BSD, /etc/passwd is a read-only object of the type
     "passwd," maintained by the registry server.  See rgyd(8).  The presence
     of the registry server affects the implementation of these interfaces in
     the following way.

     If there was no call to setpwfile, these interfaces call the registry
     server.  If this call fails, they search the local registry.

     If there was a call to setpwfile, these interfaces search name.  They
     access name by way of its type manager.  If name is of type "passwd" (as
     in the case of /etc/passwd), its manager will cause the interface to call
     the registry server.  If, in this case, the call to the registry server
     fails, the local registry will not be searched.  name remains in effect
     until the next call to setpwfile or the process fails.

Notice that, in all cases except one where you define your own password file
to access, it goes into the registry services (rgyd).  In my opinion, you're
asking for trouble when you only have one rgyd running.  We have 6, for
sixty nodes.  That's a little TOO redundant, except that several of them are
acting as safety nets for when we split the ring (not an infrequent occurance).

John Thompson
Honeywell, SSEC
Plymouth MN  55441

thompson@pan.ssec.honeywell.com

My opinions are my own;  my beliefs are my own;  my soul belongs to Honeywell.

johnr@dhump.lakesys.COM (John W. Raffensperger Jr.) (07/12/90)

>
>Jinfu Chen writes:
>> In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes:
>> > I'm not sure whether this is a bug or a feature (on Apollos you never
>> > quite know :-(, but here comes my problem:
>> >
>> > Our mail works OK as long as the registry is available, but
>> > when the registry is down (we do not have slave registries), then 
>> > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
>> > problem seems to be that the mailer cannot acquire the gid of mail, 
>> > but about this I'm not too sure.
>> 
>> I believe calls in <pwd.h> are eventually translated to registry calls, and
>> /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The
>> only solusion is to have slave registry running somewhere else.
>

There is another alternative; 

The above broblems stem from the fact that the system can not find a
group of mail, either from the registry, or in the /etc/group (how
UNIX like) file.  

If no action is taken to create an /etc/group file, the system will
have an empty file.  There is an official and unofficial way to create
the file.

Officially, ether run a slave rgyd on the node or run llbd on the
node.  In a casual conversation with our system support engineer, it
was recommended that ALL nodes run llbd (not mandatory, but
recomended).  Once we started llbd on our nodes, all was well,
assuming that the mail spool directory is available.

Unofficially, you could copy the /etc/group file from the master rgyd
node.

Hope this helps;

John W. Raffensperger, Jr.
Milwaukee Cylinder, Beaver Dam, Wisconsin
(414) 887-0317
johnr@dhump.lakesys.com
-- 
John W. Raffensperger, Jr.
Milwaukee Cylinder, Beaver Dam, Wisconsin, USA
johnr@dhump.lakesys.COM
{uunet!marque,uwvax!uwm}!lakesys!dhump!johnr

nazgul@alphalpha.com (Kee Hinckley) (07/12/90)

In article <4b7d5e6e.12c9a@digital.sps.mot.com> chen@digital.sps.mot.com (Jinfu Chen) writes:
>/etc/passwd, /etc/group, etc aren't just plain unstruct file either. The
>only solusion is to have slave registry running somewhere else.

A difficult proposition on a one-machine network.  :-)
-- 
Alphalpha Software, Inc.	|	motif-request@alphalpha.com
nazgul@alphalpha.com		|-----------------------------------
617/646-7703 (voice/fax)	|	Proline BBS: 617/641-3722

I'm not sure which upsets me more; that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

thompson%pan@UMIX.CC.UMICH.EDU (John Thompson) (07/12/90)

> >
> >Jinfu Chen writes:
> >> In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes:
> >> > I'm not sure whether this is a bug or a feature (on Apollos you never
> >> > quite know :-(, but here comes my problem:
> >> >
> >> > Our mail works OK as long as the registry is available, but
> >> > when the registry is down (we do not have slave registries), then 
> >> > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the 
> >> > problem seems to be that the mailer cannot acquire the gid of mail, 
> >> > but about this I'm not too sure.
> >> 
> >> I believe calls in <pwd.h> are eventually translated to registry calls, and
> >> /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The
> >> only solusion is to have slave registry running somewhere else.
> >
> 
> There is another alternative; 
> 
> The above broblems stem from the fact that the system can not find a
> group of mail, either from the registry, or in the /etc/group (how
> UNIX like) file.  
Presumably, he's using sendmail, which is _very_ Unix-like.   :-)

> If no action is taken to create an /etc/group file, the system will
> have an empty file.  There is an official and unofficial way to create
> the file.
The 'empty' file is not really empty.  It's a file of type group, and
the type manager for that filetype knows to contact the registry
server for the information.

> Officially, ether run a slave rgyd on the node or run llbd on the
> node.  In a casual conversation with our system support engineer, it
> was recommended that ALL nodes run llbd (not mandatory, but
> recomended).  Once we started llbd on our nodes, all was well,
> assuming that the mail spool directory is available.
Running llbd merely allows NCS servers to register themselves with
the global location broker (glbd).  It has nothing to do with clients
trying to _get_ services.  If you aren't running an llbd on the
node that has rgyd and/or glbd running, you _do_ have a problem 
(see the "Mananging The NCS Broker" manual).

> Unofficially, you could copy the /etc/group file from the master rgyd
> node.
Well, yeah... you _could_ do that.  Note though that ALL the group files
on our system (master rgy node's, slave rgy nodes', and everyone else)
are 0-length files of type 'group'.  Copying the file (at least in Aegis)
would not do it for you.  You'd need to cat[f] the file, and redirect
output to the desired location.  If you do, first off, you'll have the
registry get out of date eventually.  The whole point of registry services
being provided by NCS is to _avoid_ having things get out of date, and
becoming a management headache.  Additionally, you might need to change
some source code and recompile too.  Note the man page on getpwXXX --
     ...
     If there was no call to setpwfile, these interfaces call the registry
     server.  If this call fails, they search the local registry.
     If there was a call to setpwfile, these interfaces search name.  They
     access name by way of its type manager.  If name is of type "passwd" (as
     in the case of /etc/passwd), its manager will cause the interface to call
     the registry server.  If, in this case, the call to the registry server
     fails, the local registry will not be searched.  name remains in effect
     until the next call to setpwfile or the process fails.
I don't know how the /etc/group file is being accessed, but the Unix-y
routines for the passwd file, and therefore presumably any that operate on
the group file, operate BY DEFAULT (no call to setpwfile) by accessing the
NCS registry.  If this is true for the group file as well, you'd need to
insert a call to some "setgroupfile" routine to force it to check out 
your manually copied-over file's type, and only then would it NOT use the
registry services.

Hope this helps --

John Thompson (jt)
Honeywell, SSEC
Plymouth, MN  55441
(612) 541-2604
thompson@pan.ssec.honeywell.com

My opinions are my own;  my facts may not be correct;  
my heart's in the right place.

szabo_p@maths.su.oz.au (Paul Szabo) (07/16/90)

In article <1664@tuvie>, mike@tuvie (Inst.f.Techn.Informatik) writes:
> Our mail works OK as long as the registry is available [...]
> [...] cannot acquire the gid of mail [...]

In article <547@dhump.lakesys.com>, johnr@dhump.lakesys.com writes:
> [...] can not find a group of mail, either from the registry,
> or in the /etc/group file. [...] 
> If no action is taken to create an /etc/group file, the system will
> have an empty file. [...] 
> [...] recommended that ALL nodes run llbd [...]

In article <9007121504.AA24057@pan.ssec.honeywell.com>,
thompson@pan.ssec.honeywell.com writes:
> The 'empty' file is not really empty.  It's a file of type group, and
> the type manager for that filetype knows to contact the registry
> server for the information.
> Running llbd merely allows NCS servers to register themselves with
> the global location broker (glbd).  It has nothing to do with clients
> trying to _get_ services.

The files /etc/passwd, /etc/group, /etc/org are typed objects, of types passwd,
group and org respectively (which in fact is the same type manager
/sys/mgrs/passwd).

I am not really sure how these type managers work. But I suspect your problems
are related to ACLs on the `node_data/systmp directory.

Whenever one of the /etc/passwd-like files are accessed, the network registry is
consulted for the information, which is then stored in `node_data/systmp/.cache.
When the information is complete, the .cache file is renamed .passwd (or .group
etc), and this then appears as the contents of /etc/passwd. What this means is
that the ACLs on `node_data/systmp must make it possible for anybody to create a
file, write into it, rename it, and remove files already there. If there are
any problems in this then all sorts of undesirable things may happen.

Probably you will need something like
/com/edacl -dir -p root prwx -g wheel rwxk -w rwxk /sys/node_data?*/systmp
/com/edacl      -p root prwx -g wheel rwx  -w rwx  /sys/node_data?*/systmp/?*
/com/edacl -id  -p root ik   -g wheel pk   -w k    /sys/node_data?*/systmp
/com/edacl -if  -p root irwx -g wheel prwx -w rwx  /sys/node_data?*/systmp

Paul Szabo    szabo_p@maths.su.oz.au