info-vax@ucbvax.ARPA (02/16/85)
From: engvax!KVC@cit-vax Here's a horror story I think ought to be related. If anyone can explain exactly what's going on here, I'd sure appreciate it... (Andy Goldstein are you still out there????) I've been running VMS V4 for awhile, but last weekend we finally decided to put up V4 on our production system (you know, the one with all the whiners on it). Anyway, we have this here relational database system called ORACLE which likes to have very large files in which it keeps all that data. There are several ORACLE databases out there which are of varying lengths, between 30,000 and 80,000 blocks each. Once we got VMS V4 up, we released the WRITE protect on our user disks and went on with the rest of the installation... The user disks are a bound volume set of 4 SI Eagles. (did I hear you say "foreign disk device driver?" I really don't think this is the device driver.) The next day when we were prepared to bring up the new ORACLE for VMS V4, we noticed that 2 of the oracle database files were now of size 0/0. These files used to be 40,000 and 60,000 blocks. Panic! In addition, the file creation date said "<none specified>". When I did a DUMP/HEADER on the files, I noticed two things of interest. First of all, the file had a enormous number of retrieval pointers to little 5 and 10 block chunks (cluster size of the disk is 5) and that it had lots of extents (on the order of 10 or more). The second thing of interest was that since all the pointers were there, I could probably recover the data... (by the way, these files are simple, sequential, fixed length (512 byte record), no-carriage-return files) Apparently, something in the V4 filesystem has a little problem with fragmented files with lots of extents. (I agree that the file should not have been so fragmented, but that's not the issue here...). The end of file block for both files seems to have been moved from 40,000 (or 60,000) to 0. I then tried using SET FILE/END_OF_FILE to move the end of file of the file to the end of the allocated space for the file. This looked like it worked, but immediately after I touched the file, it went BACK to 0! How annoying... In any case, every time I tried to copy the file, I'd get a zero block file as the copy. I have a program which lets me perform most of the ACP QIO functions on files, and I tried poking the end of file to where I thought it should be, and got the same results I had with SET FILE/END_OF_FILE. Eventually, someone noticed that if I poked the end of file to where it should be, and then opened the file and did a directory WHILE THE FILE WAS OPEN, it showed up with the correct file size and the creation date and owner fields intact. This was really wierd! The work-around to copy the file to another disk (where it would not be so fragmented) goes as follows. Assume you have a file called OOPS.DAT with this problem.. $ fileset/attr=(eof_block:40001, eof_byte:0) OOPS.DAT $ open zeep OOPS.DAT $ copy OOPS.DAT empty_disk:OOPS.DAT $ close zeep FILESET is my own toy, but you SHOULD get the same results with SET FILE/END_OF_FILE OOPS.DAT. Now, it appears that after restoring the EOF and then while the file is open, the file header is intact. After the file is closed (with the CLOSE DCL command) it is munged again. I did try creating a huge file on the same fragmented disk to make sure the same problem occurs with any random fragmented file and not just ORACLE databases. Sure enough it is reproducible. Now, I'd like to SPR this to DEC, but there isn't any way I can get them a copy of the file while it's damaged. I suppose I could send them an image backup of my user disks, but I think that's a last resort. I really don't want to just mail a copy of all our data to DEC. Can anyone shed some light on the situation? /Kevin Carosso engvax!kvc @ CIT-VAX.ARPA Hughes Aircraft Co. ps. By the way, I did see the flash news item on the Software Information Network that described the new filesystem causing system crashes with large files with multiple extents (especially on bound volume sets). I haven't seen that problem, but this one may be related.