[fa.info-vax] Filesystem brain-damage in VMS V4...

info-vax@ucbvax.ARPA (02/16/85)

From: engvax!KVC@cit-vax

Here's a horror story I think ought to be related.  If anyone can
explain exactly what's going on here, I'd sure appreciate it...
(Andy Goldstein are you still out there????)

I've been running VMS V4 for awhile, but last weekend we finally decided
to put up V4 on our production system (you know, the one with all the
whiners on it).

Anyway, we have this here relational database system called ORACLE
which likes to have very large files in which it keeps all that
data.  There are several ORACLE databases out there which are of
varying lengths, between 30,000 and 80,000 blocks each.   Once we
got VMS V4 up, we released the WRITE protect on our user disks and
went on with the rest of the installation...  The user disks are
a bound volume set of 4 SI Eagles.  (did I hear you say "foreign
disk device driver?"  I really don't think this is the device driver.)

The next day when we were prepared to bring up the new ORACLE for
VMS V4, we noticed that 2 of the oracle database files were now
of size 0/0.  These files used to be 40,000 and 60,000 blocks. Panic!
In addition, the file creation date said "<none specified>".

When I did a DUMP/HEADER on the files, I noticed two things of interest.
First of all, the file had a enormous number of retrieval pointers to
little 5 and 10 block chunks (cluster size of the disk is 5) and that
it had lots of extents (on the order of 10 or more).  The second thing
of interest was that since all the pointers were there, I could probably
recover the data...  (by the way, these files are simple, sequential,
fixed length (512 byte record), no-carriage-return files)

Apparently, something in the V4 filesystem has a little problem with
fragmented files with lots of extents.  (I agree that the file should
not have been so fragmented, but that's not the issue here...).
The end of file block for both files seems to have been moved from
40,000 (or 60,000) to 0.

I then tried using SET FILE/END_OF_FILE to move the end of file of the
file to the end of the allocated space for the file.  This looked like
it worked, but immediately after I touched the file, it went BACK to
0!  How annoying...  In any case, every time I tried to copy the file,
I'd get a zero block file as the copy.  I have a program which lets me
perform most of the ACP QIO functions on files, and I tried poking the
end of file to where I thought it should be, and got the same results I
had with SET FILE/END_OF_FILE.

Eventually, someone noticed that if I poked the end of file to where it
should be, and then opened the file and did a directory WHILE THE FILE
WAS OPEN, it showed up with the correct file size and the creation date
and owner fields intact.  This was really wierd!  The work-around to
copy the file to another disk (where it would not be so fragmented) goes
as follows.  Assume you have a file called OOPS.DAT with this problem..

	$ fileset/attr=(eof_block:40001, eof_byte:0) OOPS.DAT
	$ open zeep OOPS.DAT
	$ copy OOPS.DAT empty_disk:OOPS.DAT
	$ close zeep

FILESET is my own toy, but you SHOULD get the same results with
SET FILE/END_OF_FILE OOPS.DAT.

Now, it appears that after restoring the EOF and then while the file
is open, the file header is intact.  After the file is closed (with the
CLOSE DCL command) it is munged again.

I did try creating a huge file on the same fragmented disk to make sure
the same problem occurs with any random fragmented file and not just
ORACLE databases.   Sure enough it is reproducible.

Now, I'd like to SPR this to DEC, but there isn't any way I can get them
a copy of the file while it's damaged.  I suppose I could send
them an image backup of my user disks, but I think that's a last resort.
I really don't want to just mail a copy of all our data to DEC.

Can anyone shed some light on the situation?

	/Kevin Carosso       engvax!kvc @ CIT-VAX.ARPA
	 Hughes Aircraft Co.
ps.
  By the way, I did see the flash news item on the Software Information
  Network that described the new filesystem causing system crashes with
  large files with multiple extents (especially on bound volume sets).  I
  haven't seen that problem, but this one may be related.