ajudge@maths.tcd.ie (02/09/90)
Here is a summary of the replies I have received about a cron bug which causes some cron jobs to be run twice. The bug is acknowledged by Sun and a patch is available, but even after the patch the problem still recurs. >>> cron:1 X-From: bengts@Sweden.Sun.com It's a bug in the cron program, bugid #1022379. You can get a new cron program from your local answercentre. This new cron also works on the 4/390 but not on the 4/330. On the 4/330 you patch a value in the kernel. >>> cron:3 X-From: jay@silence.princeton.nj.us Yes, this is a known problem. It affects all Suns (bug in the SysV version of cron in SunOS) but it bites the 4/60 and the 386i more than others because of some kernel workaround for a hardware problem (the details I've forgotten). On the 4/60, I believe that the problem is partly that the clock is frequently reset. Sun has supplied fixes for some architectures but not, to my knowledge, for the 4/60. If you run ntp the problem will become even more severe. The only current workaround (necessary even with Suns patch if you run ntp) is to wrap each cron job with a shell script or program which creates a lockfile to prevent duplicate invocations. Here's an example locker. Fancier than anything you really need, but you can weed out the cruft: *** cut *** #! /bin/csh -f # Prevent cron from executing jobs twice unset MAILTO set JOBSHELL = "/bin/sh -c" goto start usage: echo Usage: `basename $0` '[options] lockname command ...\ Options:\ -m mailto mail output to "mailto"\ -s shell execute command with "shell"\ -c execute command with "csh -c"\ -C execute command with "csh -cf"' exit start: set CMD = "$0 $*" set parsing = 1 while ( $parsing ) if ( $#argv < 2 ) then goto usage endif switch ( x$1 ) case x-m: set MAILTO = "$2" shift; shift breaksw case x-s: set JOBSHELL = "$2" shift; shift breaksw case x-c: set JOBSHELL = "/bin/csh -c" shift breaksw case x-C: set JOBSHELL = "/bin/csh -cf" shift breaksw case x-*: goto usage breaksw default: set parsing = 0 breaksw endsw end set LOCK = /tmp/$1.cronlock.$LOGNAME echo $$ > $LOCK sleep 60 set OUT = /tmp/$1.$$.$LOGNAME touch $OUT chmod 600 $OUT shift if ( -e $LOCK ) then if ( x$$ == x`cat $LOCK` ) then $JOBSHELL "$*" >& $OUT rm -f $LOCK goto wrapup endif endif echo "Passing the buck." > $OUT wrapup: if ( ! -z $OUT ) then if ( $?MAILTO ) then /usr/ucb/Mail -s "Cron job (`hostname`): $CMD" "$MAILTO" < $OUT else echo "Cron job (`hostname`)" cat $OUT endif endif rm -f $OUT >>> cron:6 X-From: alex <alexl%daemon.cna.tek.com@RELAY.CS.net> > From: Ed Anselmo <anselmo-ed@yale.edu> > Subject: Re: cron running twice > > Sun is offering a patched version of cron. Part of the README file follows: > > Bugs Fixed: > ------------ > 1. cron.c: > 1019719: print at(1) job number in syslog messages > 1023418: cron queue handling and scheduling is broken > 1012011: Initialize USER as well as LOGNAME environment variable > 1017698: cron sends erroneous error message when job can't be executed > 1014181: add pid and queue name to the CMD syslog message > 1012398: "cron"/"at"/"batch" runs more jobs than queue limit > 1022379: cron executes crontab entries twice (duplicate of 1027075) > > 2. funcs.c: > 1011113: invalid sys_errlist message number is >= sys_nerr, not > sys_nerr > > (We received this through the standard support channels, i.e. hotline@sun.com ) > -- > Ed Anselmo anselmo-ed@cs.yale.edu {harvard,decvax}!yale!anselmo-ed > > > From: Dan Lorenzini <uunet.uu.net!gcm!amadeus!dal@tektronix.TEK.COM> > To: uunet!eecs.nwu.edu!sun-managers@uunet.uu.net > Subject: Re: Double Cron > In-Reply-To: Your message of Tue, 07 Nov 89 14:45:07 -0800. > Date: Wed, 08 Nov 89 11:42:51 -0500 > Status: OR > > > Re: cron doing things twice: > > The way I heard it, it is a hardware problem (the Sparcstation is too > fast) :-) > > Apparently, there was a problem with 4.0 cron executing jobs twice > (there was (still is?) a problem with calendar also). Sun patched it, > but it still has the problem on the Sparcstation-1's that we have > here. > > Sun sent me a workaround -- I haven't used it yet, but here it is in > case anybody needs it: > > ------------------------------------------------------------------------ > #!/bin/sh > LOCK=/tmp/.mumble-lock > echo $$ > ${LOCK} > sleep 60 > if [ $$ = `cat ${LOCK}` ]; then > # I get to do it > rm ${LOCK} > else > # The other process gets to do it > exit > fi > # Actually do whatever you wanted to do... > ------------------------------------------------------------------------ > > Dan Lorenzini > uunet!gcm!dal > I put in the a fixed cron and still get the problem on occasion. >>> cron:9 X-From: dmc%cam.sri.com@Warbucks.AI.SRI.com You probably know this by now, but this is a Sun bug (Sparcstations are too fast for the software or something). There is a workaround; replace "mycommand blah blah" by "safe_cron mycommand blah blah" in your crontab, where safe_cron is the following script: #!/bin/sh # Workaround for the bug where cron jobs sometimes get run twice, a # minute apart, on Sparcstations. if [ `arch` = sun4 ] then LOCK=/tmp/.`basename $1`.lock echo $$ > ${LOCK} sleep 60 if [ "$$" = "`cat ${LOCK}`" ]; then # I get to do it rm ${LOCK} else # The other process gets to do it exit fi # Actually do it $* else # not a Sun4, just do it $* fi ======================================================================= Alan Judge, SysAdmin, Dept. of Maths, Trinity College, Dublin, Ireland. ajudge@maths.tcd.ie a.k.a. amjudge@cs.tcd.ie also, Distributed System Group, Dept. of Computer Science, TCD.
dworkin@solbourne.com (Dieter Muller) (02/11/90)
In article <4887@brazos.Rice.edu> ajudge@maths.tcd.ie writes: >X-Sun-Spots-Digest: Volume 9, Issue 36, message 12 > >Here is a summary of the replies I have received about a cron bug which >causes some cron jobs to be run twice. > >The bug is acknowledged by Sun and a patch is available, but even after >the patch the problem still recurs. The `bug' appears to be a kernel problem, technically. What happens is that cron does a sleep for N seconds, but wakes up after N-1 seconds. It starts the next job (the one we're a second early for), and then performs a reschedule. Well, since the time for the `next' job (the one we just started) hasn't arrived yet, put it back at the front of the list, sleep 1 second, and poof! You just ran the job twice.... This is a side-effect of the mechanism for user crontabs. Specifically, while cron is `sleeping', it's really waiting in a select for messages on a named pipe. If a message came in (user X's crontab changed, etc), it handles that and goes back into the select. If the select timed out, cron assumes the timer expired, and that no other external event occurred to fake out the timer. A simple way to demonstrate the problem is to send a SIGALRM to the cron process. And, as mentioned above, the official Sun fixes don't. The correct fix is for cron to check the time after a select time out. If the desired time hasn't yet occurred, reset the timer and go back to the select (basically, act like a null message came in on the pipe). I put this fix into Solbourne's version of cron, and we haven't heard of the problem recurring since then. Dworkin boulder!stan!dworkin dworkin%stan@boulder.colorado.edu dworkin@solbourne.com