mark@ria-emh2.army.mil (Mark D. McKamey IM SA) (04/27/89)
Hello, I have just finished compiling fsanalyze, version 4.2 written by Mr. Michael Young. One of the output report fields says: Sparse files = 0 (0.00%) What is the definition of a "Sparse file" in the UNIX world? -- Mark D. McKamey - mark@RIA-EMH2.ARMY.MIL
madd@bu-cs.BU.EDU (Jim Frost) (05/01/89)
In article <19342@adm.BRL.MIL> mark@ria-emh2.army.mil (Mark D. McKamey IM SA) writes: | What is the definition of a "Sparse file" in the UNIX world? UNIX stores data in files by maintaining pointers to data blocks. By allocating only those blocks which have actually been written to, you can create files which appear to be larger than they actually are. These are usually created by lseek()ing and write()ing. When you create an empty file, the system allocates a file information block (called an inode) which contains a small list of block pointers. This list is initially blank. When we write into the file, the system gets data blocks and sets the appropriate block pointer to point to the block. When we just create a file we get seomthing like this: ptr1 -> null ptr2 -> null ptr3 -> null When we write to that file we get something like this: ptr1 -> block1 ptr2 -> null ptr3 -> null We deposit data into block1 until block1 is filled, then get another block and set ptr2 to point to it. If instead of just opening and writing the file you open, seek into the file somewhere, and then write, you can get something like: ptr1 -> null ptr2 -> block1 ptr3 -> null To the user it looks like he has a two-block file which has one block of zeros (the system returns zeros for reads into null blocks), but to the system he has only a one-block file. This difference can add up to a considerable savings in some cases. For the normal case, this behavior affects nothing. jim frost madd@bu-it.bu.edu
scott@rdahp.UUCP (Scott Hammond) (05/02/89)
In article <30481@bu-cs.BU.EDU> madd@bu-it.bu.edu (Jim Frost) writes: >In article <19342@adm.BRL.MIL> mark@ria-emh2.army.mil (Mark D. McKamey IM SA) writes: >| What is the definition of a "Sparse file" in the UNIX world? > >[discussion on how it works] I'm interested in knowing how much UNIX _application_ software (besides news, mailers, or pathalias) uses sparse files. In particular, given an underlying file system implementation which doesn't permit holes, are there many of situations where a lot of space is going to be wasted by traditional attempts at creating sparse files? -- Scott Hammond, R & D Associates, Marina del Rey, CA (213) 822-1715 : {ksuvax1,zardoz,randvax}!rdahp!scott : scott@harris.cis.ksu.edu
dg@lakart.UUCP (David Goodenough) (05/02/89)
madd@bu-cs.BU.EDU (Jim Frost) sez:
] mark@ria-emh2.army.mil (Mark D. McKamey IM SA) writes:
] | What is the definition of a "Sparse file" in the UNIX world?
]
] UNIX stores data in files by maintaining pointers to data blocks. By
] allocating only those blocks which have actually been written to, you
] can create files which appear to be larger than they actually are.
] These are usually created by lseek()ing and write()ing.
]
] When you create an empty file, the system allocates a file information
] block (called an inode) which contains a small list of block pointers.
] This list is initially blank. When we write into the file, the system
] gets data blocks and sets the appropriate block pointer to point to
] the block.
Hummm - this sounds a bit like CP/M - obviously UNIX stuffs a lot more info
into the inode than CP/M does into it's directory slot, but the method of
a list of block numbers is exactly the same. Now for the $64000 question:
what does UNIX do when it runs out of block number slots in the inode. I
doubt it's the same as CP/M (which just allocates a second directory entry
for the file, and sets a flag to show this is an extension). So how does
UNIX handle very big files?
--
dg@lakart.UUCP - David Goodenough +---+
IHS | +-+-+
....... !harvard!xait!lakart!dg +-+-+ |
AKA: dg%lakart.uucp@xait.xerox.com +---+
guy@auspex.auspex.com (Guy Harris) (05/04/89)
>Hummm - this sounds a bit like CP/M - obviously UNIX stuffs a lot more info >into the inode than CP/M does into it's directory slot, but the method of >a list of block numbers is exactly the same. Now for the $64000 question: >what does UNIX do when it runs out of block number slots in the inode. I >doubt it's the same as CP/M (which just allocates a second directory entry >for the file, and sets a flag to show this is an extension). So how does >UNIX handle very big files? In the most common UNIX file systems, namely the V7 one (used by 4.1BSD, System III, and System V, as well as V7 itself, and derivatives of the aforementioned systems) and the 4.2BSD one, the first N (N == 10 for V7, N == 12 for 4.2BSD) slots are "direct" pointers; they contain the block number of a block in the file (the zeroth one contains the block number of the zeroth block, etc.). The N+1st one is an "indirect" pointer; it contains the block number of a block full of block numbers in the file, starting with the N+1st block of the file. The N+2nd one is a "doubly-indirect" pointer; it contains the block number of a block full of block numbers of indirect blocks, and the N+3rd one is a "triply-indirect" block, containing just what you think it would. An block in the V7 file system is usually either 512 or 1024 bytes, and a block number is 4 bytes; a block in the 4.2BSD file system is typically 4K or 8K. Some versions of both file system can have even bigger blocks. Thus, if it runs out of block number slots in the inode, given that the last slot maps a block full of blocks that map blocks full of..., your file has gotten pretty big; the V7 and 4.2BSD file systems just say "enough is enough" and don't let your file get any bigger.
madd@bu-cs.BU.EDU (Jim Frost) (05/08/89)
In article <228@rdahp.UUCP> scott@rdahp.UUCP (Scott Hammond) writes: |I'm interested in knowing how much UNIX _application_ software (besides |news, mailers, or pathalias) uses sparse files. In particular, given an |underlying file system implementation which doesn't permit holes, are |there many of situations where a lot of space is going to be wasted by |traditional attempts at creating sparse files? Not generally, although some common database techniques can create sparse files (dbm and ndbm packages use techniques which commonly make sparse files, and dbm/ndbm are often used by applications because they're already there). Unless you're dealing with a large database application, it's unlikely to be a problem, and many databases avoid sparse files. In answer to your question, there is not very much application software which uses sparse files. Be careful, though -- it only takes one such application to make your life difficult. jim frost madd@bu-it.bu.edu