[comp.unix.i386] Interactive 2.2 File zapper

smitty@essnj1.ESSNJAY.COM (Hibbard T. Smith JR) (07/25/90)

Within the past 2 weeks, we've upgraded several systems from 2.0.2 to 2.2.
On one of those systems, on Sunday morning at 05:17 or thereabouts most of
the files on the system were deleted.  The problem was caused by a root
crontab driven execution of /etc/cleanup.  This system's /lost+found 
directory was inadvertently lost during the upgrade installation, and we
we're planning on recreating it on Monday morning.


The last two lines of the distributed /etc/cleanup are as follows:
--	cd /lost+found
--	find . -mtime +14 -exec rm -rf {} \;
If there's no lost and found directory in the root file system, this deletes
everything in the system that's older than 14 days. Two possible fixes exist:
-- cd /lost+found && find . -mtime +14 -exec rm -rf {} \;
-- find /lost+found -mtime +14 -exec -rm {} \;
Either of these is much safer than the distributed code.  This bad code is 
different than 2.0.2, so beware!

I hope this saves someone the grief of starting over, or worse yet, losing
a whole system when you're not prepared to rebuild it.

-- 
		Smitty
-------------------------------------------
Hibbard T. Smith JR                 smitty@essnj1.ESSNJAY.COM	
ESSNJAY Systems Inc.                uunet!hsi!essnj1!smitty

cpcahil@virtech.uucp (Conor P. Cahill) (07/26/90)

In article <772@essnj1.ESSNJAY.COM> smitty@essnj1.ESSNJAY.COM (Hibbard T. Smith JR) writes:
>The last two lines of the distributed /etc/cleanup are as follows:
>--	cd /lost+found
>--	find . -mtime +14 -exec rm -rf {} \;
>If there's no lost and found directory in the root file system, this deletes
>everything in the system that's older than 14 days. Two possible fixes exist:

This is not a problem if the shell that executes these lines is the borne shell
since the borne shell exits a shell procedure (non-interactive) when a
cd fails.

So, to get the problem you must have changed root's login shell - something
that is not recommended because it can cause this kind of problem.

No beside that point, you are right, that is bad coding and should be fixed
with something similar to your suggestions.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

jrh@mustang.dell.com (James Howard) (07/26/90)

In article <772@essnj1.ESSNJAY.COM>, smitty@essnj1.ESSNJAY.COM (Hibbard
T. Smith JR) writes:
> Within the past 2 weeks, we've upgraded several systems from 2.0.2 to 2.2.
> On one of those systems, on Sunday morning at 05:17 or thereabouts most of
> the files on the system were deleted.  The problem was caused by a root
> crontab driven execution of /etc/cleanup.  This system's /lost+found 
> directory was inadvertently lost during the upgrade installation, and we
> we're planning on recreating it on Monday morning.
> 
> 
> The last two lines of the distributed /etc/cleanup are as follows:
> --	cd /lost+found
> --	find . -mtime +14 -exec rm -rf {} \;
> If there's no lost and found directory in the root file system, this deletes
> everything in the system that's older than 14 days. Two possible fixes exist:
> -- cd /lost+found && find . -mtime +14 -exec rm -rf {} \;
> -- find /lost+found -mtime +14 -exec -rm {} \;
> Either of these is much safer than the distributed code.  This bad code is 
> different than 2.0.2, so beware!

Well, it looks like ISC tried to fix a bug that was in 2.0.2, and created an
even bigger bug.  The second fix you list above has the subtle bug that was
present in 2.0.2.  The fix that we put in our release looks like this:

touch /lost+found
find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1

without the first, /lost+found will get deleted if it hasn't been modified
in 14 days.  


James Howard        Dell Computer Corp.        !'s:uunet!dell!mustang!jrh
(512) 343-3480      9505 Arboretum Blvd        @'s:jrh@mustang.dell.com
                    Austin, TX 78759-7299   

shwake@raysnec.UUCP (Ray Shwake) (07/27/90)

jrh@mustang.dell.com (James Howard) writes:

>touch /lost+found
>find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1

	... except that touch will create a FILE if the entity does not
	already exist. Better to do something like:

	if [ -d /lost+found ]; do
		find ....
	fi

drector@orion.oac.uci.edu (David Rector) (07/30/90)

In <11@raysnec.UUCP> shwake@raysnec.UUCP (Ray Shwake) writes:

>jrh@mustang.dell.com (James Howard) writes:

>>touch /lost+found
>>find /lost+found -mtime +14 -exec rm -rf {} \; >/dev/null 2>&1

>	... except that touch will create a FILE if the entity does not
>	already exist. Better to do something like:

>	if [ -d /lost+found ]; do
>		find ....
>	fi

Sorry, this doesn't work either.  It has the same bug as 2.0.2; it will
delete lost+found.  Howard's fix will work if lost+found exists.  If
you want to be absolutely sure use something like

        if [ -d /lost+found ]; do
                touch /lost+found
                find ...
        else
                mkdir /lost+found
        fi

This, of course, would also fail if /lost+found exists as a file.
The pedantic may decorate the above accordingly.

David Rector                         drector@orion.uci.edu
Dept. of Math.                       U. C. Irvine, Irvine CA 92717

-- 
David L. Rector

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (08/02/90)

>--	cd /lost+found
>--	find . -mtime +14 -exec rm -rf {} \;
>If there's no lost and found directory in the root file system, this deletes
>everything in the system that's older than 14 days.

The last time I looked, it was an undocumented feature in sh and csh
(and probably in ksh though I didn't check) that a cd that failed would
abort the rest of the script.  In fact, sh and csh (but not ksh) went a
bit too far, and the statement

     cd dir || exit 1

would never execute the exit 1.

It looks like the sh you are using has had this undocumented feature
removed, resulting in disaster.

Standard practice in cleanup scripts is to do a cd followed by
something else on the same line:

     cd /lost+found; find . -mtime +14 -exec rm -rf {} \;

If the cd fails, no damage is done, because the rest of the line is not
executed.  Any sensible shell ought to let at least this work, even if
it doesn't abort the entire script.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

mpl@pegasus.ATT.COM (Michael P. Lindner) (08/02/90)

In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
	deleted
>The last time I looked, it was an undocumented feature in sh and csh
>(and probably in ksh though I didn't check) that a cd that failed would
>abort the rest of the script.  In fact, sh and csh (but not ksh) went a
	deleted
>--
>Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
>UUCP:  oliveb!cirrusl!dhesi

I don't know of any undocumented feature wrt. "cd", but for safety's sake,
all my shell scripts start with the line

	set -e

which says "exit on error".  Anyplace where I expect a command to fail
but it's OK to go on, I put either

	# do something special if the command fails
	if command
	then
		:
	else
		echo >&2 "command failed -- exit code $?"
	fi

	# or

	# ignore the code - useful for those commands which
	# don't return a meaningful exit code
	command || :

	# or

	# ignore the failure - useful for things like
	mkdir -p $dir 2> /dev/null || :
	# or
	mv -f $files 2> /dev/null || :

Mike Lindner
AT&T Bell Labs
attmail!mplindner

walter@mecky.UUCP (Walter Mecky) (08/03/90)

In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
< >--	cd /lost+found
< >--	find . -mtime +14 -exec rm -rf {} \;
< >If there's no lost and found directory in the root file system, this deletes
< >everything in the system that's older than 14 days.

Guys, you talked about very many aspects of the problem and missed the
most important one. It was discussed here in november last year:

If fsck links a file in /lost+found, then its mtime is left unchanged.
That's true too for all the files in a directory tree if fsck links
in a directory. So, you MUST NOT use the mtime to decide if deleting
files in /lost+found because find deletes files in your filesystem you
have not changed the last 14 days. The idea behind the "find ..." 
seemed to be: delete the files and directory trees, which are longer 
than 14 days in /lost+found. 

In the november discussion were some solutions posted. I dont't
remember and don't trust anyone. In my /etc/cleanup there is
only mail produced for user root and no deletions of files:

   for i in `/etc/mount | cut -d' ' -f1`
   do
	 [ "`echo $i/lost+found/*`" = "$i/lost+found/*" ] || 
		   echo "There is something in $i/lost+found.\nLook at it!" | 
		   mail -s 'File(s) in /lost+found' root
   done
-- 
Walter Mecky	[ walter@mecky.uucp	or  ...uunet!unido!mecky!walter ]

Dan_Jacobson@ATT.COM (08/03/90)

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>     cd /lost+found; find . -mtime +14 -exec rm -rf {} \;
>If the cd fails, no damage is done, because the rest of the line is not
>executed.  Any sensible shell ought to let at least this work, even if
>it doesn't abort the entire script.

Saying that there should be a special case just for the cd command, and
just for the rest of this line is ripping up the whole uniformity and
generality of the shell [/bin/sh family of shells assumed].  If you want
a failed cd to kill the script, then do "set -e" or "cd dir || exit 1".
For just missing the rest of the line: "cd dir && bla bla bla".

[I'm speaking from a general UNIX view, and don't even read the i386
newsgroup, Followup-To: comp.unix.wizards]
-- 
Dan_Jacobson@ATT.COM +1-708-979-6364

daveh@marob.masa.com (Dave Hammond) (08/04/90)

In article <2108@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com writes:
>>--	cd /lost+found
>>--	find . -mtime +14 -exec rm -rf {} \;
>>If there's no lost and found directory in the root file system, this deletes
>>everything in the system that's older than 14 days.
>
>The last time I looked, it was an undocumented feature in sh and csh
>(and probably in ksh though I didn't check) that a cd that failed would
>abort the rest of the script.

The /bin/sh in both Xenix 386 and Altos Unix V/386 only aborts the
script on a failed cd, if invoked as `sh script'.  If the script has
been made executable, and is invoked as simply `script', then sh does
not abort on a failed cd:

Script started [typescript] at Fri Aug  3 17:27:24 1990
daveh$ cat >foo
cd /fred/ethel/wilma ; who
daveh$ sh foo
foo: /fred/ethel/wilma: bad directory
daveh$ chmod +x foo
daveh$ ./foo
./foo: /fred/ethel/wilma:  not found
daveh      tty5E        Aug  3 17:27
clifford   tty02        Aug  2 00:21
daveh$ 
Script ended [typescript] at Fri Aug  3 17:28:04 1990

BTW, I just checked the action taken when /bin/sh sources (as in
`. ./foo') the script -- there also, the script is not aborted on cd
failure.

--
Dave Hammond
daveh@marob.masa.com
uunet!masa.com!marob!daveh

guy@auspex.auspex.com (Guy Harris) (08/05/90)

>If you want a failed cd to kill the script, then do...

If you want a failed "cd" to kill the script, don't bother doing
anything.  The SunOS 4.0.3 Bourne shell, based on the S5R3.1 one, will
kill the script if a "cd" fails; I checked the source code to the 4.3BSD
Bourne shell, based on the V7 one, and it appears as if it'll do the
same.

Given that, and given that, as far as I know, neither Sun nor Berkeley
introduced this feature, it's probably in most if not all UNIX Bourne
shells, going back at least as far as V7 (it existed, at least within
Bell Labs, before V7 came out; I can't speak for those versions).