mike@tuvie (Inst.f.Techn.Informatik) (07/06/90)
I'm not sure whether this is a bug or a feature (on Apollos you never quite know :-(, but here comes my problem: Our mail works OK as long as the registry is available, but when the registry is down (we do not have slave registries), then /bsd4.3/bin/mail will not deliver mail to the recipients. Now the problem seems to be that the mailer cannot acquire the gid of mail, but about this I'm not too sure. The mailer does not seem to return an error code (or does /usr/lib/sendmail ignore it ?), whenever this happens. The log file contains the status report Stat=Sent, but the mail is nowhere to be found (except on /dev/null). Has anybody had a similar problem and if so, how was it solved (without resorting to slave registries)? BTW, the prototype sendmail configuration files supplied in the sys5 version of /usr/lib are buggy: for the local mailer, /bin/mail is invoked with the bsd options which are not compatible to the sys5 options. Beware! bye, mike ____ ____ / / / / / Michael K. Gschwind mike@vlsivie.at / / / / / Technical University, Vienna mike@vlsivie.uucp ---/ Voice: (++43).1.58801 8144 e182202@awituw01.bitnet / Fax: (++43).1.569697 ___/
rogden@uceng.UC.EDU (rob ogden) (07/07/90)
mike@tuvie (Inst.f.Techn.Informatik) writes: >I'm not sure whether this is a bug or a feature (on Apollos you never >quite know :-(, but here comes my problem: >Our mail works OK as long as the registry is available, but >when the registry is down (we do not have slave registries), then >/bsd4.3/bin/mail will not deliver mail to the recipients. Now the Our dn10000 had a similar problem. The dn10000 would receive the mail and not deliver it. I am constantly befuddled by the Apollo potpourri of aegis,bsd,sys5, and then decided to /sys5.3/usr/lib/sendmail. To my amazement, the mail was going through. Go figure. Rob Ogden rogden@uceng.UC.EDU Aerospace Engineering and Engineering Mechanics, ML70 University of Cincinnati, Cincinnati, OH 45221 513/556-3549
nazgul@alphalpha.com (Kee Hinckley) (07/08/90)
In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes: >/bsd4.3/bin/mail will not deliver mail to the recipients. Now the >problem seems to be that the mailer cannot acquire the gid of mail, Correct, although I'm not sure why it was modified to need the gid. >but about this I'm not too sure. The mailer does not seem to return >an error code (or does /usr/lib/sendmail ignore it ?), whenever It doesn't return an error code. This isn't an Apollo problem but generic to mail. There are a number of cases where it totally punts, it's error handling is grotesque or non-existant. I've reported the bug to Apollo, but I suspect your best bet is to find a PD version of mail (I believe there is one on uunet, it also goes by the name of rmail often) and use that. -kee -- Alphalpha Software, Inc. | motif-request@alphalpha.com nazgul@alphalpha.com |----------------------------------- 617/646-7703 (voice/fax) | Proline BBS: 617/641-3722 I'm not sure which upsets me more; that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
jimr@metro (Jim Richardson) (07/09/90)
In article <1664@tuvie>, mike@tuvie (Inst.f.Techn.Informatik) writes: > Our mail works OK as long as the registry is available, but > when the registry is down (we do not have slave registries), then > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the > problem seems to be that the mailer cannot acquire the gid of mail, > but about this I'm not too sure. The mailer does not seem to return > an error code (or does /usr/lib/sendmail ignore it ?), whenever > this happens. The log file contains the status report Stat=Sent, but > the mail is nowhere to be found (except on /dev/null). > Has anybody had a similar problem and if so, how was it solved > (without resorting to slave registries)? This does happen to us when the registry dies completely or gets that kind of registry disease where the password file appears to be empty. Another cause is when /usr/spool/mail is unavailable: for historical reasons involving backup /usr/spool/mail on our mail gateway machine is a soft link to a directory on another node (avoid this if you can!). We run a script like the following continuously on the gateway node. It detects either of these problems and kills the sendmail daemon if they occur. If you call it "netcheck" you can start it as root via "/etc/server -p netcheck &". It chews up some CPU time but it's worth it for the peace of mind! #! /bin/ksh # # Check Apollo system is fit to receive incoming mail messages exec >> /sys/node_data/system_logs/netcheck.log 2>&1 print "netcheck starting at $( /bin/date )" SLEEP_TIME=60 REPORT_INTERVAL=60 STOPPED_FLAG="/usr/spool/mail/STOPPED_FLAG" shutsm() { /bin/ps aux print "$( /bin/date ) stopping sendmail daemon: $*" pids="$( /bin/ps ax | /bin/awk '/\/usr\/lib\/sendmail -bd -q[0-9]*m$/ {print $1}' )" print "Killing $pids" /bin/kill $pids /usr/ucb/logger -t netcheck "sendmail daemon(s) $pids stopped: $*" /usr/bin/touch ${STOPPED_FLAG} /bin/ps aux } typeset -i count=$REPORT_INTERVAL while : do if [ ! -f ${STOPPED_FLAG} ] then if [ ! -d /usr/spool/mail ] then shutsm "/usr/spool/mail unavailable" fi if [ ! -s /etc/passwd ] then shutsm "/etc/passwd missing or empty" /bin/ls -l /etc/passwd fi fi count=count-1 if [ count -le 0 ] then # log the date periodically /bin/date # force new test next time even if stop flag exists /bin/rm -f ${STOPPED_FLAG} count=$REPORT_INTERVAL fi sleep $SLEEP_TIME done This will only work for you if /etc/passwd appears to have size zero whenever the registry is down. Furthermore, your sendmail daemon needs to look like "/usr/lib/sendmail -bd -q[0-9]*m" to a "ps ax" command: if you use something else like "-q1h -bd" adjust the script accordingly. Finally, the actual script we run does some other local stuff and I've hacked it down to give the above, which may therefore have a flaw or too. But you should get the idea. -- Jim Richardson Department of Pure Mathematics, University of Sydney, NSW 2006, Australia Internet: jimr@maths.su.oz.au ACSNET: jimr@maths.su.oz FAX: +61 2 692 4534
chen@digital.sps.mot.com (Jinfu Chen) (07/09/90)
In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes: > I'm not sure whether this is a bug or a feature (on Apollos you never > quite know :-(, but here comes my problem: > > Our mail works OK as long as the registry is available, but > when the registry is down (we do not have slave registries), then > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the > problem seems to be that the mailer cannot acquire the gid of mail, > but about this I'm not too sure. I believe calls in <pwd.h> are eventually translated to registry calls, and /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The only solusion is to have slave registry running somewhere else. -- Jinfu Chen (602)898-5338 | Motorola, Inc. SPS Mesa, AZ | ...uunet!motsps!digital!chen | chen@digital.sps.mot.com |
jonathan@jarthur.Claremont.EDU (Jonathan Ball) (07/10/90)
I've also had problems with mail. I have two questions about it: 1) /bsd4.3/usr/ucb/mail seems to work ok to get messages between users on the Apollos EXCEPT for root: The root user can send messages fine; but when mail is sent to root, "mail" reports "No mail for root." -- the message never arrives. Is there something I am doing wrong? 2) As a fledgling sys admin, I am having lots of difficulties getting a mail handler set up to send and receive mail with the outside world (i.e. anything but the Apollos, either on the local network or any other network we are connected to). I've tried to read the manuals, but am not sure where to start...and my local Apollo service guy,though he is extremely friendly and helpful, doesn't know anything about mail and can't give me help. Does anyone know what manuals I should read or what I should do to get started? We are running SR10.1 Aegis and BSD (all users use BSD) on 11 3500 and 4500's. Thank you very much! Jon -- jonathan@jarthur.claremont.edu (134.173.4.42)
thompson%pan@UMIX.CC.UMICH.EDU (John Thompson) (07/11/90)
Jinfu Chen writes: > In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes: > > I'm not sure whether this is a bug or a feature (on Apollos you never > > quite know :-(, but here comes my problem: > > > > Our mail works OK as long as the registry is available, but > > when the registry is down (we do not have slave registries), then > > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the > > problem seems to be that the mailer cannot acquire the gid of mail, > > but about this I'm not too sure. > > I believe calls in <pwd.h> are eventually translated to registry calls, and > /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The > only solusion is to have slave registry running somewhere else. True. From the man page for getpwuid, it says: Under Domain/OS BSD, /etc/passwd is a read-only object of the type "passwd," maintained by the registry server. See rgyd(8). The presence of the registry server affects the implementation of these interfaces in the following way. If there was no call to setpwfile, these interfaces call the registry server. If this call fails, they search the local registry. If there was a call to setpwfile, these interfaces search name. They access name by way of its type manager. If name is of type "passwd" (as in the case of /etc/passwd), its manager will cause the interface to call the registry server. If, in this case, the call to the registry server fails, the local registry will not be searched. name remains in effect until the next call to setpwfile or the process fails. Notice that, in all cases except one where you define your own password file to access, it goes into the registry services (rgyd). In my opinion, you're asking for trouble when you only have one rgyd running. We have 6, for sixty nodes. That's a little TOO redundant, except that several of them are acting as safety nets for when we split the ring (not an infrequent occurance). John Thompson Honeywell, SSEC Plymouth MN 55441 thompson@pan.ssec.honeywell.com My opinions are my own; my beliefs are my own; my soul belongs to Honeywell.
johnr@dhump.lakesys.COM (John W. Raffensperger Jr.) (07/12/90)
> >Jinfu Chen writes: >> In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes: >> > I'm not sure whether this is a bug or a feature (on Apollos you never >> > quite know :-(, but here comes my problem: >> > >> > Our mail works OK as long as the registry is available, but >> > when the registry is down (we do not have slave registries), then >> > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the >> > problem seems to be that the mailer cannot acquire the gid of mail, >> > but about this I'm not too sure. >> >> I believe calls in <pwd.h> are eventually translated to registry calls, and >> /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The >> only solusion is to have slave registry running somewhere else. > There is another alternative; The above broblems stem from the fact that the system can not find a group of mail, either from the registry, or in the /etc/group (how UNIX like) file. If no action is taken to create an /etc/group file, the system will have an empty file. There is an official and unofficial way to create the file. Officially, ether run a slave rgyd on the node or run llbd on the node. In a casual conversation with our system support engineer, it was recommended that ALL nodes run llbd (not mandatory, but recomended). Once we started llbd on our nodes, all was well, assuming that the mail spool directory is available. Unofficially, you could copy the /etc/group file from the master rgyd node. Hope this helps; John W. Raffensperger, Jr. Milwaukee Cylinder, Beaver Dam, Wisconsin (414) 887-0317 johnr@dhump.lakesys.com -- John W. Raffensperger, Jr. Milwaukee Cylinder, Beaver Dam, Wisconsin, USA johnr@dhump.lakesys.COM {uunet!marque,uwvax!uwm}!lakesys!dhump!johnr
nazgul@alphalpha.com (Kee Hinckley) (07/12/90)
In article <4b7d5e6e.12c9a@digital.sps.mot.com> chen@digital.sps.mot.com (Jinfu Chen) writes: >/etc/passwd, /etc/group, etc aren't just plain unstruct file either. The >only solusion is to have slave registry running somewhere else. A difficult proposition on a one-machine network. :-) -- Alphalpha Software, Inc. | motif-request@alphalpha.com nazgul@alphalpha.com |----------------------------------- 617/646-7703 (voice/fax) | Proline BBS: 617/641-3722 I'm not sure which upsets me more; that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate everyone else's.
thompson%pan@UMIX.CC.UMICH.EDU (John Thompson) (07/12/90)
> > > >Jinfu Chen writes: > >> In article <1664@tuvie> mike@tuvie (Inst.f.Techn.Informatik) writes: > >> > I'm not sure whether this is a bug or a feature (on Apollos you never > >> > quite know :-(, but here comes my problem: > >> > > >> > Our mail works OK as long as the registry is available, but > >> > when the registry is down (we do not have slave registries), then > >> > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the > >> > problem seems to be that the mailer cannot acquire the gid of mail, > >> > but about this I'm not too sure. > >> > >> I believe calls in <pwd.h> are eventually translated to registry calls, and > >> /etc/passwd, /etc/group, etc aren't just plain unstruct file either. The > >> only solusion is to have slave registry running somewhere else. > > > > There is another alternative; > > The above broblems stem from the fact that the system can not find a > group of mail, either from the registry, or in the /etc/group (how > UNIX like) file. Presumably, he's using sendmail, which is _very_ Unix-like. :-) > If no action is taken to create an /etc/group file, the system will > have an empty file. There is an official and unofficial way to create > the file. The 'empty' file is not really empty. It's a file of type group, and the type manager for that filetype knows to contact the registry server for the information. > Officially, ether run a slave rgyd on the node or run llbd on the > node. In a casual conversation with our system support engineer, it > was recommended that ALL nodes run llbd (not mandatory, but > recomended). Once we started llbd on our nodes, all was well, > assuming that the mail spool directory is available. Running llbd merely allows NCS servers to register themselves with the global location broker (glbd). It has nothing to do with clients trying to _get_ services. If you aren't running an llbd on the node that has rgyd and/or glbd running, you _do_ have a problem (see the "Mananging The NCS Broker" manual). > Unofficially, you could copy the /etc/group file from the master rgyd > node. Well, yeah... you _could_ do that. Note though that ALL the group files on our system (master rgy node's, slave rgy nodes', and everyone else) are 0-length files of type 'group'. Copying the file (at least in Aegis) would not do it for you. You'd need to cat[f] the file, and redirect output to the desired location. If you do, first off, you'll have the registry get out of date eventually. The whole point of registry services being provided by NCS is to _avoid_ having things get out of date, and becoming a management headache. Additionally, you might need to change some source code and recompile too. Note the man page on getpwXXX -- ... If there was no call to setpwfile, these interfaces call the registry server. If this call fails, they search the local registry. If there was a call to setpwfile, these interfaces search name. They access name by way of its type manager. If name is of type "passwd" (as in the case of /etc/passwd), its manager will cause the interface to call the registry server. If, in this case, the call to the registry server fails, the local registry will not be searched. name remains in effect until the next call to setpwfile or the process fails. I don't know how the /etc/group file is being accessed, but the Unix-y routines for the passwd file, and therefore presumably any that operate on the group file, operate BY DEFAULT (no call to setpwfile) by accessing the NCS registry. If this is true for the group file as well, you'd need to insert a call to some "setgroupfile" routine to force it to check out your manually copied-over file's type, and only then would it NOT use the registry services. Hope this helps -- John Thompson (jt) Honeywell, SSEC Plymouth, MN 55441 (612) 541-2604 thompson@pan.ssec.honeywell.com My opinions are my own; my facts may not be correct; my heart's in the right place.
szabo_p@maths.su.oz.au (Paul Szabo) (07/16/90)
In article <1664@tuvie>, mike@tuvie (Inst.f.Techn.Informatik) writes: > Our mail works OK as long as the registry is available [...] > [...] cannot acquire the gid of mail [...] In article <547@dhump.lakesys.com>, johnr@dhump.lakesys.com writes: > [...] can not find a group of mail, either from the registry, > or in the /etc/group file. [...] > If no action is taken to create an /etc/group file, the system will > have an empty file. [...] > [...] recommended that ALL nodes run llbd [...] In article <9007121504.AA24057@pan.ssec.honeywell.com>, thompson@pan.ssec.honeywell.com writes: > The 'empty' file is not really empty. It's a file of type group, and > the type manager for that filetype knows to contact the registry > server for the information. > Running llbd merely allows NCS servers to register themselves with > the global location broker (glbd). It has nothing to do with clients > trying to _get_ services. The files /etc/passwd, /etc/group, /etc/org are typed objects, of types passwd, group and org respectively (which in fact is the same type manager /sys/mgrs/passwd). I am not really sure how these type managers work. But I suspect your problems are related to ACLs on the `node_data/systmp directory. Whenever one of the /etc/passwd-like files are accessed, the network registry is consulted for the information, which is then stored in `node_data/systmp/.cache. When the information is complete, the .cache file is renamed .passwd (or .group etc), and this then appears as the contents of /etc/passwd. What this means is that the ACLs on `node_data/systmp must make it possible for anybody to create a file, write into it, rename it, and remove files already there. If there are any problems in this then all sorts of undesirable things may happen. Probably you will need something like /com/edacl -dir -p root prwx -g wheel rwxk -w rwxk /sys/node_data?*/systmp /com/edacl -p root prwx -g wheel rwx -w rwx /sys/node_data?*/systmp/?* /com/edacl -id -p root ik -g wheel pk -w k /sys/node_data?*/systmp /com/edacl -if -p root irwx -g wheel prwx -w rwx /sys/node_data?*/systmp Paul Szabo szabo_p@maths.su.oz.au