[comp.bugs.sys5] CRON runs things twice

car@trux.UUCP (Chris Rende) (02/22/90)

(Nixdorf Targon M35/50 TOS 3.2 --> Pyramid 9810 OSx 4.0)

On rare occaisions ATTCRON is running things twice apparently because it
does things early. (System V Release 2 CRON)

Here is a section from my cron's log:

>  CMD: /etc/dmesg - >>/usr/adm/messages
>  root 15896 c Sat Feb 10 06:49:59 1990
<  root 15896 c Sat Feb 10 06:50:00 1990
>  CMD: /etc/dmesg - >>/usr/adm/messages
>  root 15898 c Sat Feb 10 06:50:00 1990
<  root 15898 c Sat Feb 10 06:50:00 1990

Here is the associated crontab entry:
00,10,20,30,40,50 * * * * /etc/dmesg - >>/usr/adm/messages

What seems to be happening is that CRON is not calculating a long enough
delay time before running an entry.

Maybe it's an oddball rounding error that causes the calculation to
come up short... ?

If CRON would add another .5 seconds to each delay time then there
probably wouldn't be a problem.

Has anyone else observed this behaviour?
Is it a known bug?
Is there a fix?

car.
-- 
Christopher A. Rende           Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3)
uunet!edsews!rphroy!trux!car   Multics,DTSS,Unix,Shortwave,Scanners,StarTrek
 trux!car@uunet.uu.net         Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF
       "I don't ever remember forgetting anything." - Chris Rende

car@trux.UUCP (Chris Rende) (03/08/90)

Thanks to all those who either posted or Emailed responses regarding
the problem with CRON running things twice.

The bottom line is that there is a bug in the AT&T System V CRON. It may
have been fixed in more recent releases.

The bug manifests itself by running something 1 second early and then AGAIN
at the proper time.

The following are NOT the cause of this particular problem:

- Change of date/time either with date(1) or with BSD's adjtime(2).
  (nada.kth.se!paf)
- Two CRON's running at the same time. (mcorrigan@ucsd.edu)
- NNTP or TIMED (hedrick@athos.rutgers.edu)

A few other notes which people sent to me:

- Even the System V Release 3 CRON is reported to get messed up by date/time
  changes while CRON is running. (motcsd!brian)
- This same bug also exists under SunOS 4.0.3 (bugids 1022379 and 1027075).
  (ata!eggert)
- It's a known bug in the AT&T system. (keith@bain3.oz)
- Observered frequency is once every 1-2 months. (keith@bain3.oz)
- It is rumored to be fixed in OSx5. (keith@bain3.oz)
- It is estimated that twice per hour CRON goofs up some where in the world.
  (keith@bain3.oz)

Suggested solutions:

- Run the UCB CRON instead of the ATT CRON. (ejp@bohra.cpg.oz)
- Use lock files in your jobs. (ejp@bohra.cpg.oz)

Here is a good summary and a fix from vogon.cetia.fr!philip:

Most SV Rel. 2 systems share your problem.
It seems to be that the (twisted) logic of cron takes the time
several times during execution, and it is very lax in which one of
the values obtained it actually believes.

Rather than try to corect the logic, I have used a fix, which cures
the problem, but has a side effect that *some* commands may be
run one second late. I find this acceptable, since one second is within
the normal scheduling tolerances of UNIX.

I hope you have access to the sources, because here is a context
diff showing my modificaton:

*** cron.c	Thu Jan  4 12:26:40 1990
--- cron.c.orig	Tue Mar  6 10:33:59 1990
***************
*** 239,245
  #endif
  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
  		if(ne_time > (long) 0)
! 			idle(seconds == 1L ? 2L : seconds);
  		if(notexpired) {
  			notexpired = 0;
  			last_time = INFINITY;

--- 239,245 -----
  #endif
  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
  		if(ne_time > (long) 0)
! 			idle(seconds);
  		if(notexpired) {
  			notexpired = 0;
  			last_time = INFINITY;

I suppose that on a really slow system, you may need to change the 2L
into 3L - but that would be a *slow* machine.

---------------

car.
-- 
Christopher A. Rende           Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3)
uunet!edsews!rphroy!trux!car   Multics,DTSS,Unix,Shortwave,Scanners,StarTrek
 trux!car@uunet.uu.net         Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF
       "I don't ever remember forgetting anything." - Chris Rende

gerry@hcx1.ssd.csd.harris.com (Gerry Baumgartner) (03/15/90)

Newsgroups: comp.bugs.sys5
Subject: Re: CRON runs things twice (SUMMARY)
Summary: 
Expires: 
References: <366@trux.UUCP>
Sender: 
Followup-To: 
Distribution: 
Organization: Harris Computer Systems, Fort Lauderdale, FL
Keywords: cron

I'm getting into this discussion a little late, but I don't normally read this
group.  I was alerted to this discussion by someone who knew I worked on this
problem.

In article <366@trux.UUCP> car@trux.UUCP (Chris Rende) writes:
>Thanks to all those who either posted or Emailed responses regarding
>the problem with CRON running things twice.
>
>The bottom line is that there is a bug in the AT&T System V CRON. It may
>have been fixed in more recent releases.
>
>The bug manifests itself by running something 1 second early and then AGAIN
>at the proper time.

I guess you could call it a bug in cron.  However, I kind of think of it as a
bug in the way the system keeps track of process that have called alarm, and
how it keeps time.   I believe that this problem would NOT occur on a system
that ran at 50HZ instead of 60HZ.

I worked on this problem a couple years ago, so the details may be a bit fuzzy.
Cron works one job at a time.  After he fires off one, he finds the next job to
be started, calculates the time difference between now and then, then calls
alarm.  This time is in seconds.

Every clock tick the system updates those processes who are in "alarm" mode by
updating their time-to-go by 1 tick.   This tick is an integer.   It is
calculated by taking 1,000,000 microseconds (1 sec) and dividing by HZ, 60 in
most cases.   This comes out to 15666.  The real answer is 15666.66666.....
This works out to the process alarm time being about 40usecs "faster" than the
system time for every 60 ticks.  

This causes the process to wakeup before it is "really" scheduled to, according
to the system time.   After cron starts it up, he checks his queue to see what
the next job he has to schedule is.  He looks at its time, looks at the system
time and says, "hey, this job starts in 1 sec" so it does an alarm(1) and
starts the job again 1 second later.

Working out the numbers, if you had one job on the schedule to run once every 7
hours or more, this problem would occur every time the job ran.  I don't recall
exactly if having other jobs on the schedule would affect the outcome, but I
believe they did.
>
>Here is a good summary and a fix from vogon.cetia.fr!philip:
>
>Most SV Rel. 2 systems share your problem.
>It seems to be that the (twisted) logic of cron takes the time
>several times during execution, and it is very lax in which one of
>the values obtained it actually believes.
>
>Rather than try to corect the logic, I have used a fix, which cures
>the problem, but has a side effect that *some* commands may be
>run one second late. I find this acceptable, since one second is within
>the normal scheduling tolerances of UNIX.
>
>I hope you have access to the sources, because here is a context
>diff showing my modificaton:
>
>*** cron.c	Thu Jan  4 12:26:40 1990
>--- cron.c.orig	Tue Mar  6 10:33:59 1990
>***************
>*** 239,245
>  #endif
>  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
>  		if(ne_time > (long) 0)
>! 			idle(seconds == 1L ? 2L : seconds);
>  		if(notexpired) {
>  			notexpired = 0;
>  			last_time = INFINITY;
>
>--- 239,245 -----
>  #endif
>  		seconds = (ne_time < (long) 0) ? (long) 0 : ne_time;
>  		if(ne_time > (long) 0)
>! 			idle(seconds);
>  		if(notexpired) {
>  			notexpired = 0;
>  			last_time = INFINITY;
>
>I suppose that on a really slow system, you may need to change the 2L
>into 3L - but that would be a *slow* machine.

My solution was a little different.  When cron got the alarm it would check the
current system time with what time he was supposed to be awakened.  If it was
more than 0 but less than 60 seconds away, he would sleep for that amount of
time, and start the job then.   

-------------------------------------------------------------------------------
Gerry Baumgartner                |    gerry@ssd.csd.harris.com 
System Software Development      | or gerry%ssd.csd.harris.com@eddie.mit.edu
Harris Computer Systems Division | or ...!{mit-eddie,uunet,novavax}!hcx1!gerry
Fort Lauderdale FL 33309         |
-------------------------------------------------------------------------------