obrennan@CC3.CC.UMR.EDU (obrennan) (12/27/90)
>>> Sounds like the glbd is not 1) getting started before rgyd, >>> or 2) accepting server requests >> >> Any suggestions of how to change "/etc/rc" to handle "1)"? Or >>how to determine if the reason is "2)" and if so, any suggestions >>on how to find the problem (which logs to look etc?). >> >>> I would try rebuilding the GLBD with -create -first options ... >> >> Hmm.. I have tried to do this few times. When I do it under root, >>like >> /etc/server /etc/ncs/glbd -create -first >> or >> /etc/ncs/glbd -create -first >> >> Nothing seems to happen, the man page leaves me with the impression >>that even with "-create", glbd would be left running with these >>commands, but it doesn't. Either way I try, I don't get any feedback >>from the commands and no running glbd results. >> >> In general the apollo behaves now rather randomly. When logging >>in from the net with telnet, I sometimes get "bind: address in use", >>trying to FTP outwards just hangs on Username prompt etc.. and when >>trying again, all goes well. Running lb_admin, I get similar kinds >>of inconsistensies, sometimes it complains about comm failures, and >>yet next command may succeed. First I would like to get this into >>some simple known initial state. >> >> Continuing... >> >> I did try to start with "/etc/server -p ...", but it doesn't seem >>to help. >> >>replicas... Like, I just run into section about ns_helper? When do I >>need to run that? >> >> Apologies for troubling you again, but it seems that there is >>something very *basic* wrong in the system. Most of the management >>tools (lb_admin, drm_admin, edns ..) complain mostly about >>communications problems. >> >> From glb_log I can see that "glbd -create -first" fails because glb >>object already exists. How do I delete? Should I? Have already >>tried with drm_admin and delrep, but that attempt just terminated >>with error "communication failure .. NCS/RPC runtime)", as are >>terminated most other attempts with this and other tools. (llbd, >>glbd and rgyd are all running). >> It sounds like all of your servers think they have replicas on other Apollo nodes. Here is what I would do to essentially cleanup and restart: 1) edit the rc.user file and comment out the ns_helper startup. NS_HELPER is only useful when you have a network of Apollos. It is used to keep the node cataloging (//name and nodeid correspondence) in a database so that nodes can resolve conflicts with their local caches (contains nodenames and nodeids). 2) delete these GLBD files: /sys/node_data/glb.p /sys/node_data/glb.e 3) Run: /etc/server /etc/ncs/llbd & 4) Run: /etc/server /etc/ncs/glbd -create -first & 5) Run: /etc/server -p /etc/rgyd & 6) Use CRF or TOUCH to create the following Zero-length files so that at bootup the RC files will start your servers: /sys/node_data/etc/daemons/llbd /sys/node_data/etc/daemons/glbd /sys/node_data/etc/daemons/rgyd If the above does not work out, I would check the /sys/node_data/system_logs for problem determination. Hopefully, the above will clear your problem. Gerry O'Brennan Programmer/Analyst II Computing Services University of Missouri - Rolla ------------------------------ obrennan@apollo.cc.umr.edu c0022@umrvmb.umr.edu ------------------------------