smitty@essnj1.ESSNJAY.COM (Hibbard T. Smith JR) (07/25/90)
Within the past 2 weeks, we've upgraded several systems from 2.0.2 to 2.2. On one of those systems, on Sunday morning at 05:17 or thereabouts most of the files on the system were deleted. The problem was caused by a root crontab driven execution of /etc/cleanup. This system's /lost+found directory was inadvertently lost during the upgrade installation, and we we're planning on recreating it on Monday morning. The last two lines of the distributed /etc/cleanup are as follows: -- cd /lost+found -- find . -mtime +14 -exec rm -rf {} \; If there's no lost and found directory in the root file system, this deletes everything in the system that's older than 14 days. Two possible fixes exist: -- cd /lost+found && find . -mtime +14 -exec rm -rf {} \; -- find /lost+found -mtime +14 -exec -rm {} \; Either of these is much safer than the distributed code. This bad code is different than 2.0.2, so beware! I hope this saves someone the grief of starting over, or worse yet, losing a whole system when you're not prepared to rebuild it. -- Smitty ------------------------------------------- Hibbard T. Smith JR smitty@essnj1.ESSNJAY.COM ESSNJAY Systems Inc. uunet!hsi!essnj1!smitty
cpcahil@virtech.uucp (Conor P. Cahill) (07/26/90)
In article <772@essnj1.ESSNJAY.COM> smitty@essnj1.ESSNJAY.COM (Hibbard T. Smith JR) writes: >The last two lines of the distributed /etc/cleanup are as follows: >-- cd /lost+found >-- find . -mtime +14 -exec rm -rf {} \; >If there's no lost and found directory in the root file system, this deletes >everything in the system that's older than 14 days. Two possible fixes exist: This is not a problem if the shell that executes these lines is the borne shell since the borne shell exits a shell procedure (non-interactive) when a cd fails. So, to get the problem you must have changed root's login shell - something that is not recommended because it can cause this kind of problem. No beside that point, you are right, that is bad coding and should be fixed with something similar to your suggestions. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
jrh@mustang.dell.com (James Howard) (07/26/90)
In article <772@essnj1.ESSNJAY.COM>, smitty@essnj1.ESSNJAY.COM (Hibbard T. Smith JR) writes: > Within the past 2 weeks, we've upgraded several systems from 2.0.2 to 2.2. > On one of those systems, on Sunday morning at 05:17 or thereabouts most of > the files on the system were deleted. The problem was caused by a root > crontab driven execution of /etc/cleanup. This system's /lost+found > directory was inadvertently lost during the upgrade installation, and we > we're planning on recreating it on Monday morning. > > > The last two lines of the distributed /etc/cleanup are as follows: > -- cd /lost+found > -- find . -mtime +14 -exec rm -rf {} \; > If there's no lost and found directory in the root file system, this deletes > everything in the system that's older than 14 days. Two possible fixes exist: > -- cd /lost+found && find . -mtime +14 -exec rm -rf {} \; > -- find /lost+found -mtime +14 -exec -rm {} \; > Either of these is much safer than the distributed code. This bad code is > different than 2.0.2, so beware! Well, it looks like ISC tried to fix a bug that was in 2.0.2, and created an even bigger bug. The second fix you list above has the subtle bug that was present in 2.0.2. The fix that we put in our release looks like this: touch /lost+found find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1 without the first, /lost+found will get deleted if it hasn't been modified in 14 days. James Howard Dell Computer Corp. !'s:uunet!dell!mustang!jrh (512) 343-3480 9505 Arboretum Blvd @'s:jrh@mustang.dell.com Austin, TX 78759-7299
shwake@raysnec.UUCP (Ray Shwake) (07/27/90)
jrh@mustang.dell.com (James Howard) writes: >touch /lost+found >find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1 ... except that touch will create a FILE if the entity does not already exist. Better to do something like: if [ -d /lost+found ]; do find .... fi
drector@orion.oac.uci.edu (David Rector) (07/30/90)
In <11@raysnec.UUCP> shwake@raysnec.UUCP (Ray Shwake) writes: >jrh@mustang.dell.com (James Howard) writes: >>touch /lost+found >>find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1 > ... except that touch will create a FILE if the entity does not > already exist. Better to do something like: > if [ -d /lost+found ]; do > find .... > fi Sorry, this doesn't work either. It has the same bug as 2.0.2; it will delete lost+found. Howard's fix will work if lost+found exists. If you want to be absolutely sure use something like if [ -d /lost+found ]; do touch /lost+found find ... else mkdir /lost+found fi This, of course, would also fail if /lost+found exists as a file. The pedantic may decorate the above accordingly. David Rector drector@orion.uci.edu Dept. of Math. U. C. Irvine, Irvine CA 92717 -- David L. Rector
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (08/02/90)
>-- cd /lost+found >-- find . -mtime +14 -exec rm -rf {} \; >If there's no lost and found directory in the root file system, this deletes >everything in the system that's older than 14 days. The last time I looked, it was an undocumented feature in sh and csh (and probably in ksh though I didn't check) that a cd that failed would abort the rest of the script. In fact, sh and csh (but not ksh) went a bit too far, and the statement cd dir || exit 1 would never execute the exit 1. It looks like the sh you are using has had this undocumented feature removed, resulting in disaster. Standard practice in cleanup scripts is to do a cd followed by something else on the same line: cd /lost+found; find . -mtime +14 -exec rm -rf {} \; If the cd fails, no damage is done, because the rest of the line is not executed. Any sensible shell ought to let at least this work, even if it doesn't abort the entire script. -- Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> UUCP: oliveb!cirrusl!dhesi
mpl@pegasus.ATT.COM (Michael P. Lindner) (08/02/90)
In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: deleted >The last time I looked, it was an undocumented feature in sh and csh >(and probably in ksh though I didn't check) that a cd that failed would >abort the rest of the script. In fact, sh and csh (but not ksh) went a deleted >-- >Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> >UUCP: oliveb!cirrusl!dhesi I don't know of any undocumented feature wrt. "cd", but for safety's sake, all my shell scripts start with the line set -e which says "exit on error". Anyplace where I expect a command to fail but it's OK to go on, I put either # do something special if the command fails if command then : else echo >&2 "command failed -- exit code $?" fi # or # ignore the code - useful for those commands which # don't return a meaningful exit code command || : # or # ignore the failure - useful for things like mkdir -p $dir 2> /dev/null || : # or mv -f $files 2> /dev/null || : Mike Lindner AT&T Bell Labs attmail!mplindner
walter@mecky.UUCP (Walter Mecky) (08/03/90)
In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
< >-- cd /lost+found
< >-- find . -mtime +14 -exec rm -rf {} \;
< >If there's no lost and found directory in the root file system, this deletes
< >everything in the system that's older than 14 days.
Guys, you talked about very many aspects of the problem and missed the
most important one. It was discussed here in november last year:
If fsck links a file in /lost+found, then its mtime is left unchanged.
That's true too for all the files in a directory tree if fsck links
in a directory. So, you MUST NOT use the mtime to decide if deleting
files in /lost+found because find deletes files in your filesystem you
have not changed the last 14 days. The idea behind the "find ..."
seemed to be: delete the files and directory trees, which are longer
than 14 days in /lost+found.
In the november discussion were some solutions posted. I dont't
remember and don't trust anyone. In my /etc/cleanup there is
only mail produced for user root and no deletions of files:
for i in `/etc/mount | cut -d' ' -f1`
do
[ "`echo $i/lost+found/*`" = "$i/lost+found/*" ] ||
echo "There is something in $i/lost+found.\nLook at it!" |
mail -s 'File(s) in /lost+found' root
done
--
Walter Mecky [ walter@mecky.uucp or ...uunet!unido!mecky!walter ]
Dan_Jacobson@ATT.COM (08/03/90)
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: > cd /lost+found; find . -mtime +14 -exec rm -rf {} \; >If the cd fails, no damage is done, because the rest of the line is not >executed. Any sensible shell ought to let at least this work, even if >it doesn't abort the entire script. Saying that there should be a special case just for the cd command, and just for the rest of this line is ripping up the whole uniformity and generality of the shell [/bin/sh family of shells assumed]. If you want a failed cd to kill the script, then do "set -e" or "cd dir || exit 1". For just missing the rest of the line: "cd dir && bla bla bla". [I'm speaking from a general UNIX view, and don't even read the i386 newsgroup, Followup-To: comp.unix.wizards] -- Dan_Jacobson@ATT.COM +1-708-979-6364
daveh@marob.masa.com (Dave Hammond) (08/04/90)
In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com writes: >>-- cd /lost+found >>-- find . -mtime +14 -exec rm -rf {} \; >>If there's no lost and found directory in the root file system, this deletes >>everything in the system that's older than 14 days. > >The last time I looked, it was an undocumented feature in sh and csh >(and probably in ksh though I didn't check) that a cd that failed would >abort the rest of the script. The /bin/sh in both Xenix 386 and Altos Unix V/386 only aborts the script on a failed cd, if invoked as `sh script'. If the script has been made executable, and is invoked as simply `script', then sh does not abort on a failed cd: Script started [typescript] at Fri Aug 3 17:27:24 1990 daveh$ cat >foo cd /fred/ethel/wilma ; who daveh$ sh foo foo: /fred/ethel/wilma: bad directory daveh$ chmod +x foo daveh$ ./foo ./foo: /fred/ethel/wilma: not found daveh tty5E Aug 3 17:27 clifford tty02 Aug 2 00:21 daveh$ Script ended [typescript] at Fri Aug 3 17:28:04 1990 BTW, I just checked the action taken when /bin/sh sources (as in `. ./foo') the script -- there also, the script is not aborted on cd failure. -- Dave Hammond daveh@marob.masa.com uunet!masa.com!marob!daveh
guy@auspex.auspex.com (Guy Harris) (08/05/90)
>If you want a failed cd to kill the script, then do...
If you want a failed "cd" to kill the script, don't bother doing
anything. The SunOS 4.0.3 Bourne shell, based on the S5R3.1 one, will
kill the script if a "cd" fails; I checked the source code to the 4.3BSD
Bourne shell, based on the V7 one, and it appears as if it'll do the
same.
Given that, and given that, as far as I know, neither Sun nor Berkeley
introduced this feature, it's probably in most if not all UNIX Bourne
shells, going back at least as far as V7 (it existed, at least within
Bell Labs, before V7 came out; I can't speak for those versions).