[comp.unix.aix] AIX 3.1 File system mystery

wjones@nwnexus.WA.COM (Warren Jones) (05/18/91)

I've observed something very mysterious in our RS/6000 file system:
"ls -l" shows a file of ~24 Mbytes, but "du" shows the directory
using only ~17 Mbytes.  Can anyone out there offer an explanation?
The following script file tells the whole story:

|   Script command is started on Fri May 17 14:37:51 1991
| 
|   [58] ls -l /usr/tmp
|   total 17372
|   -rw-r--r--   1 sharp    macro     222048 May 17 14:34 CpGUmYQAAA
|   -rw-r--r--   1 sharp    macro     300024 May 17 14:34 CpGUmYQAAB
|   -rw-r--r--   1 sharp    macro      19052 May 17 14:35 CpGUmYQAAC
|   -rw-r--r--   1 sharp    macro          0 May 17 14:34 CpGUmYQAAD
|   -rw-r--r--   1 sharp    macro    24480468 May 17 14:36 CpGUmYQAAE !!!
|   drwxr-xr-x   2 root     system       512 Mar 09 14:06 X11
|   -rw-r--r--   1 jones    support        0 May 17 14:37 typescript
|   [59] du -s /usr/tmp
|   17376   /usr/tmp      !!!

The files "Cp*" are scratch files created by a Fortran application,
which was still running when this typescript was made.  The files
are presumably still open.

To compound the mystery:  Before the 24 Mbyte file was created
in /usr/tmp, "df" shows only ~19 Mbytes available on the /usr
partition.  Where did the extra ~7 Mbytes go?  Has IBM invented
a hyperspace extension to the JFS?

Oh, by the way, we're running AIX 3.1 (3003 update).
Thanks in advance for any enlightening comments.

Warren Jones
wjones@nwnexus.wa.com

sfreed@ariel.unm.edu (Steven Freed CIRT) (05/19/91)

In article <509@nwnexus.WA.COM>, wjones@nwnexus.WA.COM (Warren Jones) writes:

> I've observed something very mysterious in our RS/6000 file system:
> "ls -l" shows a file of ~24 Mbytes, but "du" shows the directory
> using only ~17 Mbytes.

Probably a hole in the file. When I was in school we used to drive the 
sys admins crazy with this. (some weren't too bright). We would have like a 
1 meg quota, (yeah, that quota topic again ;-) and we would write a
program that would write 8k, do an lseek for about 500 megs and write
another 8k. They would come after us, trying to find out how we broke
the quota system, not stopping to think that the partion we were on was 
only 200 megs.

Data base files are usually the most common type of file with holes.

-- 

Steve.                    sfreed@ariel.unm.edu

scott@prism.gatech.EDU (Scott Holt) (05/20/91)

In article <509@nwnexus.WA.COM> wjones@nwnexus.WA.COM (Warren Jones) writes:
>I've observed something very mysterious in our RS/6000 file system:
>"ls -l" shows a file of ~24 Mbytes, but "du" shows the directory
>using only ~17 Mbytes.  Can anyone out there offer an explanation?
>The following script file tells the whole story:
> ....

The file may be "sparse" - on some UNIX file system implementation, if any
entire block of the file contains only zeros, the appropriate block pointer
in the inode may be set to zero rather than the location of a disk block 
containing data. The idea is why allocate disk space to something you know
contains only zeros. This is typical of database files (esp those that use
mdbm) and other applications which write randomly to a file. It also is
not a unique property of AIX.

Word of warning about such files - backup programs love them - and I do 
mean this to be taken sarcasticly. When you back the file up, a typical
backup program will read the file sequentially. When this is done, it
doesn't matter much that a block contains all zeros. It is very possible
that a file "appears" much larger than even the total amount of space 
on your disk. When this file is backed up, it will take up its apparent 
size on the backup media. Worse yet, when it is restored, the restore 
program may not "sparsify" the file - that its, it will try to restore it
back to its apparent size and then you have real problems.

I don't know how AIX backup and restore deal with this (any comments from
IBM?), but most other backup schemes (such as tar and cpio) deal with it a 
naive manner. This too is something not unique to IBM.

- Scott
-- 
This is my signature. There are many like it, but this one is mine.
Scott Holt                 		Internet: scott@prism.gatech.edu
Georgia Tech 				UUCP: ..!gatech!prism!scott
Office of Information Technology, Technical Services

hbergh@nlicl1.oracle.com (Herbert van den Bergh) (05/27/91)

In article <1991May18.184923.28785@ariel.unm.edu>, sfreed@ariel.unm.edu (Steven Freed CIRT) writes:
|> Data base files are usually the most common type of file with holes.

	I know at least one RDBMS (guess which) that doesn't do that, and 
	for a number of reasons: when updating your database you don't want the
	overhead of the filesystem finding free blocks and more important it
	may lead to file fragmentation, slowing down access to the
	file. So *REAL* databases ;-) won't use files with holes,
	but more likely raw devices (even less overhead).

|> Steve.                    sfreed@ariel.unm.edu