[comp.sys.pyramid] CRON runs things twice

car@trux.UUCP (Chris Rende) (02/22/90)

(Nixdorf Targon M35/50 TOS 3.2 --> Pyramid 9810 OSx 4.0)

On rare occaisions ATTCRON is running things twice apparently because it
does things early. (System V Release 2 CRON)

Here is a section from my cron's log:

>  CMD: /etc/dmesg - >>/usr/adm/messages
>  root 15896 c Sat Feb 10 06:49:59 1990
<  root 15896 c Sat Feb 10 06:50:00 1990
>  CMD: /etc/dmesg - >>/usr/adm/messages
>  root 15898 c Sat Feb 10 06:50:00 1990
<  root 15898 c Sat Feb 10 06:50:00 1990

Here is the associated crontab entry:
00,10,20,30,40,50 * * * * /etc/dmesg - >>/usr/adm/messages

What seems to be happening is that CRON is not calculating a long enough
delay time before running an entry.

Maybe it's an oddball rounding error that causes the calculation to
come up short... ?

If CRON would add another .5 seconds to each delay time then there
probably wouldn't be a problem.

Has anyone else observed this behaviour?
Is it a known bug?
Is there a fix?

car.
-- 
Christopher A. Rende           Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3)
uunet!edsews!rphroy!trux!car   Multics,DTSS,Unix,Shortwave,Scanners,StarTrek
 trux!car@uunet.uu.net         Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF
       "I don't ever remember forgetting anything." - Chris Rende

hedrick@athos.rutgers.edu (Charles Hedrick) (02/26/90)

Are you using a program that does adjtime, i.e. nntp or timed?  This
is a Berkeley call to adjust the time slowly.  It changes the clock
speed.  System V cron gets confused when this is done.  We had to fix
in on the Suns.  Presumably the same fix will work.  If this is
your problem, send me mail and I'll try to find out who has the
diffs.  If you don't have source to cron, there's not much I can
do for you.

keith@bain3.oz (Keith Brinck) (03/01/90)

in article <362@trux.UUCP>, car@trux.UUCP (Chris Rende) says:
> Xref: bain3 comp.sys.pyramid:582 comp.bugs.sys5:855
> 
> (Nixdorf Targon M35/50 TOS 3.2 --> Pyramid 9810 OSx 4.0)
> 
> On rare occaisions ATTCRON is running things twice apparently because it
> does things early. (System V Release 2 CRON)
> 
> [Stuff deleted ....]
> 
> Has anyone else observed this behaviour?

We sure have - on the same configuration with the same version
of OSx.  Its not so rare either about once every 1-2 months in
our case.

> Is it a known bug?

I have spoken to Pyramid Australia about it and they say its a
known bug in the AT&T system.  They don't know when it will be
fixed (although its rumoured to be fixed in version 5).

> Is there a fix?

We use a lock file system which catches the bug most of the
time.  Fairly recently we had an occurence which was not
caught by the lock file and as a result screwed up a
significant Sybase database.  It took us some time to rebuild
said base.

What has always amazed me about this bug is the fact that
no-one else appeared to be particularly worried about it,
including the people at Pyramid in Sydney.  Given the rate of
the incidence of the bug at our site I would guess that there
is a cron job firing of twice somewhere in the world every
hour of the day (or more) and that someone would have been
hurt by it !!

Pyramid's lack of enthusiasm in pursuing a fix for this bug
shows up one of the disadvantages of using unix - one is not
dealing directly with the originator of the os and its
difficult to get things done as a result.

---------------

I've posted this for someone else - please direct any email
replies to barry@bain3.bain.oz (Barry Allebone)

ejp@bohra.cpg.oz (Esmond Pitt) (03/01/90)

In article <362@trux.UUCP>, car@trux.UUCP (Chris Rende) says:
> Xref: bain3 comp.sys.pyramid:582 comp.bugs.sys5:855
> 
> (Nixdorf Targon M35/50 TOS 3.2 --> Pyramid 9810 OSx 4.0)
> 
> On rare occaisions ATTCRON is running things twice apparently because it
> does things early. (System V Release 2 CRON)
> 
> [Stuff deleted ....]
> 
> Has anyone else observed this behaviour?

Yes.

Does running ucb cron solve the problem?


-- 
Esmond Pitt, Computer Power Group
ejp@bohra.cpg.oz

car@trux.UUCP (Chris Rende) (03/08/90)

Thanks to all those who either posted or Emailed responses regarding
the problem with CRON running things twice.

The bottom line is that there is a bug in the AT&T System V CRON. It may
have been fixed in more recent releases.

The bug manifests itself by running something 1 second early and then AGAIN
at the proper time.

The following are NOT the cause of this particular problem:

- Change of date/time either with date(1) or with BSD's adjtime(2).
  (nada.kth.se!paf)
- Two CRON's running at the same time. (mcorrigan@ucsd.edu)
- NNTP or TIMED (hedrick@athos.rutgers.edu)

A few other notes which people sent to me:

- Even the System V Release 3 CRON is reported to get messed up by date/time
  changes while CRON is running. (motcsd!brian)
- This same bug also exists under SunOS 4.0.3 (bugids 1022379 and 1027075).
  (ata!eggert)
- It's a known bug in the AT&T system. (keith@bain3.oz)
- Observered frequency is once every 1-2 months. (keith@bain3.oz)
- It is rumored to be fixed in OSx5. (keith@bain3.oz)
- It is estimated that twice per hour CRON goofs up some where in the world.
  (keith@bain3.oz)

Suggested solutions:

- Run the UCB CRON instead of the ATT CRON. (ejp@bohra.cpg.oz)
- Use lock files in your jobs. (ejp@bohra.cpg.oz)

Here is a good summary and a fix from vogon.cetia.fr!philip:

Most SV Rel. 2 systems share your problem.
It seems to be that the (twisted) logic of cron takes the time
several times during execution, and it is very lax in which one of
the values obtained it actually believes.

Rather than try to corect the logic, I have used a fix, which cures
the problem, but has a side effect that *some* commands may be
run one second late. I find this acceptable, since one second is within
the normal scheduling tolerances of UNIX.

I hope you have access to the sources, because here is a context
diff showing my modificaton:

*** cron.c	Thu Jan  4 12:26:40 1990
--- cron.c.orig	Tue Mar  6 10:33:59 1990
***************
*** 239,245
  #endif
  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
  		if(ne_time > (long) 0)
! 			idle(seconds == 1L ? 2L : seconds);
  		if(notexpired) {
  			notexpired = 0;
  			last_time = INFINITY;

--- 239,245 -----
  #endif
  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
  		if(ne_time > (long) 0)
! 			idle(seconds);
  		if(notexpired) {
  			notexpired = 0;
  			last_time = INFINITY;

I suppose that on a really slow system, you may need to change the 2L
into 3L - but that would be a *slow* machine.

---------------

car.
-- 
Christopher A. Rende           Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3)
uunet!edsews!rphroy!trux!car   Multics,DTSS,Unix,Shortwave,Scanners,StarTrek
 trux!car@uunet.uu.net         Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF
       "I don't ever remember forgetting anything." - Chris Rende