[comp.sys.apollo] Georg's SR10.2 tcp problem

smhall@peg.UUCP (07/13/90)

Georg,

I'm not sure what's happening, but the error messages about
unknown service and protocol makes me think there is a problem
with /etc/services and /etc/protocols (exist, permissions?).

If this helps, get on the phone to Werner Beck at HP/Apollo
Boeblingen and tell him to hire me! (I tried to come out
awhile back).

You can also pass a debug mask to tcpd to perhaps get more.

Steve Hall, HP/Apollo Brisbane, Australia

smhall@peg.UUCP (07/13/90)

Georg,

Forgot to add, if you want to rule out different h/w config as a
problem, boot the dn10K off the other (already running 10.2.p).

Steve Hall, HP/Apollo Brisbane Australia

schmid@asterix.luftfahrt.uni-stuttgart.de (Georg Schmid) (07/16/90)

As there are still mails and articles concerning my boot problems
with SR10.2.p, I'd like to post some additional things:

 1. I'm still using my workaround, to recreate systmp at every boot, and
    it has been working so far, although it causes some minor problems 
    when doing an 'ex' from the dm and 'go' from phase II.
 2. Several people sent me mail because they had problems with 
    'etc/hostid' called in 'rc.local'. 
    For example, system@alchemy.chem.utoronto.ca (Mike Peterson, Univ, of
    Toronto) writes:

|> If you are using a nameserver for address resolution, and the
|> /etc/nmconfig -h hostent_bind
|> leads me to think that you are, there are severe problems

     (Yes, we do use a nameserver)

|> when the system tries to run 'hostid' or start tcpd with the
|> /etc/rc.local file supplied with SR10.2.p.
|> We had exactly the same problems, where each daemon related to TCP/IP
|> would take 10-15 minutes to start.
|>
|> The solution is:
|>
|> 1) reorder your /etc/rc.local file so that the 'nmconfig' is done
|> after the TCP daemons have started, and don't do the 'hostid' command
|> until all networks are fully configured and up (since if you are using
|> a name server, it must resolve the name to an Internet number, but
|> if you do this before tcpd has started and the interface configured
|> and the name server is running,
|> the system can not get a valid packet out onto the network -- this
|> eventually times out, but takes 10-15 minutes per daemon, hence the
|> long boot delays). A sample /etc/rc.local file is given later.
|>
|> 2) Do NOT use names in the /etc/ifconfig commands - these will cause
|> a name lookup, but you can't look things up until your ifconfig
|> commands have completed (Catch-22). Use only internet numbers.

     (We never used names in ifconfig)

|> 3) make SURE your /etc/hosts file contains at least enough information
|> for the local system (and the name server system possibly), so that
|> both the '//' level name of the system, AND the fully qualified
|> name of the system are defined (we had more problems until this was
|> fixed - when using a hosts file, no domain names are appended, but when
|> using a name server, the default domain name is appended before lookup).
|> Doing this lets your system boot if the name server is not
|> accessible.
|>
|> The Apollo HOTLINE should have been well aware of this problem and
|> the solution, since our site worked through this with them
|> back in April 1990. The problem does not manifest itself on any of
|> our m68k nodes (reason unknown). The DN10000 is supposed to save its'
|> Internet address on disk during the shutdown process, and will use that
|> when rebooted if TCP is not working properly. Of course this has 2
|> major problems:
|> 1) How do you get booted the first time you want to use a name
|> server?
|> 2) If the node ever crashes, you can't reboot easily.
    
    I'm not sure if the problems Mike describes, and my boot-problems
    are really the same or at least related, though they seem to be.
    The strange thing about this is, that our second DN10000 has no
    problems with booting or with the Internet-stuff (it uses the original
    rc.local)
    Mike sent me a copy of his rc.local, but it didn't solve the boot-problem 
    (I'm sorry Mike). He also offered me to post his rc.local, but since it's 
    quite long, I'll put it in our anonymous ftp at 
    'obelix.luftfahrt.uni-stuttgart.de' (129.69.57.14). You can get it from
    'pub/misc' if you're interested. (The line Germany - USA is very busy, so
    choose European night time if you can).
 
 3. I got a phone call from the german support center of HP/Apollo, where
    they offered their help !! (It seems they do read news).
    Let's see what they find out, probably I'll post the results (short if
    possible).
 
 4. If you're fed up with my annoying postings, use 'Author kill'.


____________________________________________________________________________
Georg Schmid, ISD University of Stuttgart, W.-Germany     
email: schmid@asterix.luftfahrt.uni-stuttgart.de (129.69.110.2)
voice: 0(049-)711-685-2053
fax:   0(049-)711-685-3706
____________________________________________________________________________