car@trux.UUCP (Chris Rende) (02/22/90)
(Nixdorf Targon M35/50 TOS 3.2 --> Pyramid 9810 OSx 4.0) On rare occaisions ATTCRON is running things twice apparently because it does things early. (System V Release 2 CRON) Here is a section from my cron's log: > CMD: /etc/dmesg - >>/usr/adm/messages > root 15896 c Sat Feb 10 06:49:59 1990 < root 15896 c Sat Feb 10 06:50:00 1990 > CMD: /etc/dmesg - >>/usr/adm/messages > root 15898 c Sat Feb 10 06:50:00 1990 < root 15898 c Sat Feb 10 06:50:00 1990 Here is the associated crontab entry: 00,10,20,30,40,50 * * * * /etc/dmesg - >>/usr/adm/messages What seems to be happening is that CRON is not calculating a long enough delay time before running an entry. Maybe it's an oddball rounding error that causes the calculation to come up short... ? If CRON would add another .5 seconds to each delay time then there probably wouldn't be a problem. Has anyone else observed this behaviour? Is it a known bug? Is there a fix? car. -- Christopher A. Rende Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3) uunet!edsews!rphroy!trux!car Multics,DTSS,Unix,Shortwave,Scanners,StarTrek trux!car@uunet.uu.net Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF "I don't ever remember forgetting anything." - Chris Rende
car@trux.UUCP (Chris Rende) (03/08/90)
Thanks to all those who either posted or Emailed responses regarding the problem with CRON running things twice. The bottom line is that there is a bug in the AT&T System V CRON. It may have been fixed in more recent releases. The bug manifests itself by running something 1 second early and then AGAIN at the proper time. The following are NOT the cause of this particular problem: - Change of date/time either with date(1) or with BSD's adjtime(2). (nada.kth.se!paf) - Two CRON's running at the same time. (mcorrigan@ucsd.edu) - NNTP or TIMED (hedrick@athos.rutgers.edu) A few other notes which people sent to me: - Even the System V Release 3 CRON is reported to get messed up by date/time changes while CRON is running. (motcsd!brian) - This same bug also exists under SunOS 4.0.3 (bugids 1022379 and 1027075). (ata!eggert) - It's a known bug in the AT&T system. (keith@bain3.oz) - Observered frequency is once every 1-2 months. (keith@bain3.oz) - It is rumored to be fixed in OSx5. (keith@bain3.oz) - It is estimated that twice per hour CRON goofs up some where in the world. (keith@bain3.oz) Suggested solutions: - Run the UCB CRON instead of the ATT CRON. (ejp@bohra.cpg.oz) - Use lock files in your jobs. (ejp@bohra.cpg.oz) Here is a good summary and a fix from vogon.cetia.fr!philip: Most SV Rel. 2 systems share your problem. It seems to be that the (twisted) logic of cron takes the time several times during execution, and it is very lax in which one of the values obtained it actually believes. Rather than try to corect the logic, I have used a fix, which cures the problem, but has a side effect that *some* commands may be run one second late. I find this acceptable, since one second is within the normal scheduling tolerances of UNIX. I hope you have access to the sources, because here is a context diff showing my modificaton: *** cron.c Thu Jan 4 12:26:40 1990 --- cron.c.orig Tue Mar 6 10:33:59 1990 *************** *** 239,245 #endif seconds = (ne_time < (long) 0) ? (long) 0 : ne_time; if(ne_time > (long) 0) ! idle(seconds == 1L ? 2L : seconds); if(notexpired) { notexpired = 0; last_time = INFINITY; --- 239,245 ----- #endif seconds = (ne_time < (long) 0) ? (long) 0 : ne_time; if(ne_time > (long) 0) ! idle(seconds); if(notexpired) { notexpired = 0; last_time = INFINITY; I suppose that on a really slow system, you may need to change the 2L into 3L - but that would be a *slow* machine. --------------- car. -- Christopher A. Rende Central Cartage (Nixdorf/Pyramid/SysVR2/BSD4.3) uunet!edsews!rphroy!trux!car Multics,DTSS,Unix,Shortwave,Scanners,StarTrek trux!car@uunet.uu.net Minix 1.2,PC/XT,Mac+,TRS-80 Model I,1802 ELF "I don't ever remember forgetting anything." - Chris Rende
gerry@hcx1.ssd.csd.harris.com (Gerry Baumgartner) (03/15/90)
Newsgroups: comp.bugs.sys5 Subject: Re: CRON runs things twice (SUMMARY) Summary: Expires: References: <366@trux.UUCP> Sender: Followup-To: Distribution: Organization: Harris Computer Systems, Fort Lauderdale, FL Keywords: cron I'm getting into this discussion a little late, but I don't normally read this group. I was alerted to this discussion by someone who knew I worked on this problem. In article <366@trux.UUCP> car@trux.UUCP (Chris Rende) writes: >Thanks to all those who either posted or Emailed responses regarding >the problem with CRON running things twice. > >The bottom line is that there is a bug in the AT&T System V CRON. It may >have been fixed in more recent releases. > >The bug manifests itself by running something 1 second early and then AGAIN >at the proper time. I guess you could call it a bug in cron. However, I kind of think of it as a bug in the way the system keeps track of process that have called alarm, and how it keeps time. I believe that this problem would NOT occur on a system that ran at 50HZ instead of 60HZ. I worked on this problem a couple years ago, so the details may be a bit fuzzy. Cron works one job at a time. After he fires off one, he finds the next job to be started, calculates the time difference between now and then, then calls alarm. This time is in seconds. Every clock tick the system updates those processes who are in "alarm" mode by updating their time-to-go by 1 tick. This tick is an integer. It is calculated by taking 1,000,000 microseconds (1 sec) and dividing by HZ, 60 in most cases. This comes out to 15666. The real answer is 15666.66666..... This works out to the process alarm time being about 40usecs "faster" than the system time for every 60 ticks. This causes the process to wakeup before it is "really" scheduled to, according to the system time. After cron starts it up, he checks his queue to see what the next job he has to schedule is. He looks at its time, looks at the system time and says, "hey, this job starts in 1 sec" so it does an alarm(1) and starts the job again 1 second later. Working out the numbers, if you had one job on the schedule to run once every 7 hours or more, this problem would occur every time the job ran. I don't recall exactly if having other jobs on the schedule would affect the outcome, but I believe they did. > >Here is a good summary and a fix from vogon.cetia.fr!philip: > >Most SV Rel. 2 systems share your problem. >It seems to be that the (twisted) logic of cron takes the time >several times during execution, and it is very lax in which one of >the values obtained it actually believes. > >Rather than try to corect the logic, I have used a fix, which cures >the problem, but has a side effect that *some* commands may be >run one second late. I find this acceptable, since one second is within >the normal scheduling tolerances of UNIX. > >I hope you have access to the sources, because here is a context >diff showing my modificaton: > >*** cron.c Thu Jan 4 12:26:40 1990 >--- cron.c.orig Tue Mar 6 10:33:59 1990 >*************** >*** 239,245 > #endif > seconds = (ne_time < (long) 0) ? (long) 0 : ne_time; > if(ne_time > (long) 0) >! idle(seconds == 1L ? 2L : seconds); > if(notexpired) { > notexpired = 0; > last_time = INFINITY; > >--- 239,245 ----- > #endif > seconds = (ne_time < (long) 0) ? (long) 0 : ne_time; > if(ne_time > (long) 0) >! idle(seconds); > if(notexpired) { > notexpired = 0; > last_time = INFINITY; > >I suppose that on a really slow system, you may need to change the 2L >into 3L - but that would be a *slow* machine. My solution was a little different. When cron got the alarm it would check the current system time with what time he was supposed to be awakened. If it was more than 0 but less than 60 seconds away, he would sleep for that amount of time, and start the job then. ------------------------------------------------------------------------------- Gerry Baumgartner | gerry@ssd.csd.harris.com System Software Development | or gerry%ssd.csd.harris.com@eddie.mit.edu Harris Computer Systems Division | or ...!{mit-eddie,uunet,novavax}!hcx1!gerry Fort Lauderdale FL 33309 | -------------------------------------------------------------------------------