[comp.sys.apollo] Problem with registry date

azman@rangkom.MY (Azman Bahrom) (08/02/90)

Hi netters !!

Recently I've been having the following problems :

    1. the registry date is incorrect. It seems to have
       a "future" date. I realized this when I invoked rgy_admin :=

        $ /etc/rgy_admin
	        Default object: rgy  default host: dds://idea_4
        	State: in service  slave
        rgy_admin: lrep -st
        (master)  dds://dsp10    	state: in service        1991/03/20.11:26:17	
                  dds://idea_4   	state: in service        1991/03/20.11:25:33	

    As a result (I think), this made the registry server to point
    to a replica host instead of the master node everytime I invoked
    /etc/rgy_admin.

    2. consequently, the user could not run chpass. Only root could
       edit the registry by /etc/edrgy -s .

I would appreciate it if someone in netland could help me out on these
problems. BTW, our system is running SR10.1 and SR10.2 in the same ring
of 25 nodes. Please respond by email.

Thanks in advance.

azman baharom.
email address : azman@rangkom.MY
 

rees@pisa.ifs.umich.edu (Jim Rees) (08/07/90)

In article <247@rangkom.MY>, azman@rangkom.MY (Azman Bahrom) writes:

      1. the registry date is incorrect. It seems to have
         a "future" date. I realized this when I invoked rgy_admin :=

I keep my clocks in sync with ntpdate, available for anonymous ftp from
gw.ccie.utoronto.ca.  I run it from /etc/rc and a couple times a day from
cron.

You'll want to apply this patch (thanks to Bill Sommerfeld of Apollo).
There are others if you want to run xntpd (a more general and flexible way
of keeping your clocks synced).

*** ntpdate.c_o	Wed Apr 11 12:32:17 1990
--- ntpdate.c	Wed Apr 11 12:31:31 1990
***************
*** 1158,1163 ****
--- 1158,1165 ----
  	full_recvbufs = 0;
  	free_recvbufs = sys_numservers + 2;
  
+ 	setpgrp(0, getpid());
+ 
  	/*
  	 * Point SIGIO at service routine
  	 */

pphillip@cs.ubc.ca (Peter Phillips) (08/07/90)

In article <1990Aug6.193047.25822@terminator.cc.umich.edu> rees@citi.umich.edu (Jim Rees) writes:

>I keep my clocks in sync with ntpdate, available for anonymous ftp from
>gw.ccie.utoronto.ca.  I run it from /etc/rc and a couple times a day from
>cron.
>
>You'll want to apply this patch (thanks to Bill Sommerfeld of Apollo).
>There are others if you want to run xntpd (a more general and flexible way
>of keeping your clocks synced).

[ patch omitted ]

This sounds great.  However, is it really safe to be adjusting the system
clock time while the machine is running?  The clock is used a seed for
assigning identifiers (UIDs in Domain/OS parlance (no UNIX UIDs)) to
objects.  If the clock is set back, two identifiers can coincide causing
disaster.  I know ntp implementations use adjtime(2) which shouldn't move
the clock backward at all but the manual page for adjtime(2) in Domain/OS
is not clear if the use of adjtime is safe.  So, does anyone out there
with "kernel" knowledge know if running NTP-related time synchronization
software is safe/reliable on Apollos?

--
Peter Phillips, UBC Computer Science| Listen:
<pphillip@cs.ubc.ca>                | Billy Pilgrim has come unstuck in time.

rees@pisa.ifs.umich.edu (Jim Rees) (08/08/90)

In article <9032@ubc-cs.UUCP>, pphillip@cs.ubc.ca (Peter Phillips) writes:
  This sounds great.  However, is it really safe to be adjusting the system
  clock time while the machine is running?  The clock is used a seed for
  assigning identifiers (UIDs in Domain/OS parlance (no UNIX UIDs)) to
  objects.  If the clock is set back, two identifiers can coincide causing
  disaster.  I know ntp implementations use adjtime(2) which shouldn't move
  the clock backward at all but the manual page for adjtime(2) in Domain/OS
  is not clear if the use of adjtime is safe.  So, does anyone out there
  with "kernel" knowledge know if running NTP-related time synchronization
  software is safe/reliable on Apollos?

Actually, ntpdate uses adjtime() only if the clock is within some reasonable
threshold of being right (a few seconds?).  If the clock is out of
tolerance, it uses settimeofday() instead.

Turns out that both of these are relatively safe.  You can never get
duplicate uids within a single boot of Domain/OS, because the kernel ensures
that uids are always monotonically increasing regardless of any gyrations in
the time-of-day clock.  The only danger is if the clock is set back so that
it overlaps a previous boot of Domain/OS; for example:

1. set time to 1:00
2. boot Domain/OS
3. shut down
4. set time to 1:00
5. boot Domain/OS

So setting the clock back by less time than you will be shut down is safe.
This is why 'calendar' doesn't complain unless you are setting back by more
than 5 minutes.  In normal operation, xntp won't be adjusting your clock by
more than a few seconds, so no problem.

There may be other problems associated with a vacillating clock.  This
discussion applies to duplicate uids only.

By the way, don't try this at home but I have set node clocks back by an
hour, followed by immediate reboot, without noticing any ill effects.

pato@apollo.HP.COM (Joe Pato) (08/08/90)

In article <247@rangkom.MY>, azman@rangkom.MY (Azman Bahrom) writes:
|> 
|> Hi netters !!
|> 
|> Recently I've been having the following problems :
|> 
|>     1. the registry date is incorrect. It seems to have
|>        a "future" date. I realized this when I invoked rgy_admin :=
|> 
|>         $ /etc/rgy_admin
|> 	        Default object: rgy  default host: dds://idea_4
|>         	State: in service  slave
|>         rgy_admin: lrep -st
|>         (master)  dds://dsp10    	state: in service       
1991/03/20.11:26:17	
|>                   dds://idea_4   	state: in service       
1991/03/20.11:25:33	
|> 
|>     As a result (I think), this made the registry server to point
|>     to a replica host instead of the master node everytime I invoked
|>     /etc/rgy_admin.
|> 
|>     2. consequently, the user could not run chpass. Only root could
|>        edit the registry by /etc/edrgy -s .
|> 
|> I would appreciate it if someone in netland could help me out on these
|> problems. BTW, our system is running SR10.1 and SR10.2 in the same ring
|> of 25 nodes. Please respond by email.
|> 
|> Thanks in advance.
|> 
|> azman baharom.
|> email address : azman@rangkom.MY
|>  

When rgy_admin prints a timestamp for each replica "lrep -st" it is printing
the timestamp for the last update seen by that replica.  Update timestamps are
generated by the master server and are monotonically increasing.  If you
correct
the master's clock the timestamps generated for new updates will still appear
to be in the future - until realtime catches up with the update timestamp

Skewed update timestamps have nothing (directly) to do with why you can't find
the master server.  Rgy_admin will pick an arbitrary registry server
when it is 
first started, so will the other tools that manipulate the registry
(like chpass)
To find the registry servers (and to find the master site) the registry library
code queries the global location broker.

If the tools cannot find the master registry it is likely to be because they
cannot find a current registration for that server in the glbd.  I suspect that
this is happening because the glbd replicas also do not have their clocks
synchronized.  (If your site is like many others, you are running a glbd on
the registry machines too and we already know that at least one of these 
machines is way out of synch).  Unlike the registry servers, if the glbd sites'
clocks drift out of synch by more than 10 minutes, then they will not exchange 
information (more precisely for any two replicas, if their clocks are more than
10 minutes out of synch, then they will not exchange updates - other replicas 
that are within the synchronization tolerance will continue to exchange
updates)
Use drm_admin to determine if glbs are out of synch.

If you have glbs that are way out of synch (like the registry server that 
thought it was march of '91) it will probably be best to just delete those
glb sites, fix the clocks, and then re-create the glb site.

                    -- Joe Pato
                       Cooperative Object Computing Operation
                       Hewlett-Packard Company
                       pato@apollo.hp.com