johnl@iecc.cambridge.ma.us (John R. Levine) (12/05/90)
In article <10960:Dec507:07:4190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: >> Unfortunately, you have to get pretty intimate with the disk to tell that >> the 20 meg of nulls aren't there > >Hardly. You just look at the file size. Other than the file size, there >is no way a portable program can tell the difference between a hole and >an allocated block of zeros. On every modern version of Unix that I know of, there is no way for an application to tell the difference between a block of zeros and a hole other than poking at the raw disk. The file size is the logical file size including the holes, e.g. if you seek out to byte 1000000 and write something, the file size will be 1000000 even though the file is mostly holes. For that reason, an entirely reasonable strategy is always to leave a hole when writing a full block of zeros. There may even be some versions of Unix that do that automatically in the write() call. -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl "Typically supercomputers use a single microprocessor." -Boston Globe
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/07/90)
In article <1990Dec05.155248.8929@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: > In article <10960:Dec507:07:4190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <1990Dec5.052124.28435@erg.sri.com> zwicky@erg.sri.com (Elizabeth Zwicky) writes: > >> Unfortunately, you have to get pretty intimate with the disk to tell that > >> the 20 meg of nulls aren't there > >Hardly. You just look at the file size. Other than the file size, there > >is no way a portable program can tell the difference between a hole and > >an allocated block of zeros. > On every modern version of Unix that I know of, there is no way for an > application to tell the difference between a block of zeros and a hole other > than poking at the raw disk. #include <sys/types.h> #include <sys/stat.h> main() { struct stat st; fstat(0,&st); printf("size on disk (not including holes) %ld\n",st.st_blocks); } > For that reason, an entirely reasonable strategy is always to leave a hole > when writing a full block of zeros. This is poor advice. An application may depend on the sizes of files it creates. > There may even be some versions of > Unix that do that automatically in the write() call. It's conceivable. So what? ---Dan
jfh@rpp386.cactus.org (John F Haugh II) (12/07/90)
In article <6193:Dec618:43:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >#include <sys/types.h> >#include <sys/stat.h> > >main() >{ > struct stat st; > > fstat(0,&st); > printf("size on disk (not including holes) %ld\n",st.st_blocks); >} % grep st_blocks /usr/include/sys/stat.h % hmmm. BSD feature? how about "no PORTABLE way" to determine the size of a file on disk? -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org
rob@b15.INGR.COM (Rob Lemley) (12/13/90)
In <6193:Dec618:43:4390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Dec05.155248.8929@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: >> For that reason, an entirely reasonable strategy is always to leave a hole >> when writing a full block of zeros. >This is poor advice. An application may depend on the sizes of files it >creates. Examples please. As stated before, when READING a file (ie: via open/read), there is NO WAY to determine if a block of zeros constituted an actual hole in the file or a disk block full of zeros. Rob --- Rob Lemley 205-730-1546 System Consultant, Scanning Software INTERGRAPH Corp ...!uunet!ingr!b15!rob Huntsville, AL OR b15!rob@ingr.com
jfh@rpp386.cactus.org (John F Haugh II) (12/15/90)
In article <1820@b15.INGR.COM> rob@b15.INGR.COM (Rob Lemley) writes: >Examples please. As stated before, when READING a file (ie: via open/read), >there is NO WAY to determine if a block of zeros constituted an actual hole >in the file or a disk block full of zeros. The example of checksumming the results of stat() works pretty well. A few weeks back there was some discussion about date roll back and so on. Keeping track of the exact contents of the entire inode and file might be one form of copy protection or file validation. You could ignore the parts that change (like st_atime), but keep track of the rest (like st_mtime, st_blocks, etc) along with file data. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "While you are here, your wives and girlfriends are dating handsome American movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (12/16/90)
In <1820@b15.INGR.COM> rob@b15.INGR.COM (Rob Lemley) writes: >As stated before, when READING a file (ie: via open/read), >there is NO WAY to determine if a block of zeros constituted an actual hole >in the file or a disk block full of zeros. I will make an even stronger statement than that: There is no difference between an actual hole and a disk block full of zeroes. There *is* no difference, even if you can detect a difference. There is no difference because both are ways of storing zeros. An operating system is perfectly free to store zeroes in some blocks as 0xff bytes and store 0xff bytes as zeros, so long as it correctly translates during reads and writes. You and I have no business asking what's on disk. All that we dare ask is whether we read back what we wrote. We also have no business asking whether each disk block is really stored with some overhead such as CRC, preamble, postamble, etc., for the benefit of the read/write hardware. We have no business asking whether the block even exists on disk (it might just be in the buffer cache and not yet written on disk). Our concern ought to be with data and how fast we can access it, and how secure it is; not the raw form it's written in. If we are picky we can even ask whether our data fits in the space available on disk, and this is why me might (vaguely) want to be aware that some data storage schemes (e.g. holes in files) are more efficient than others (e.g. zero bytes in files). But for any specific file, at any specific offset in the file, we should not be asking such this question. Unless we are writing device drivers, of course. I don't think we are in this discussion. -- Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> UUCP: oliveb!cirrusl!dhesi
bzs@world.std.com (Barry Shein) (12/16/90)
Although this doesn't work in practice it seems that getrusage()'s ru_inblocks ought to tell you the actual number of blocks read, in which case that number wouldn't increment when you read a hole. So you could watch your rusage structure to detect holes. Right now ru_inblocks doesn't increment when the block was in the cache, which I suppose is honest in some sense, but I don't think it would be too odd if the only case it didn't increment on were when a hole was read. Perhaps that would take another structure element (one to indicate blocks read by the process and another to indicate how many blocks actually had to be retrieved from a device or were already in the cache, either way.) -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/17/90)
In article <18823@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: > The example of checksumming the results of stat() works pretty well. > A few weeks back there was some discussion about date roll back and > so on. Keeping track of the exact contents of the entire inode and > file might be one form of copy protection or file validation. You > could ignore the parts that change (like st_atime), but keep track > of the rest (like st_mtime, st_blocks, etc) along with file data. Right. Another realistic example is that many sites run a program to search for all files of a particular type---setuid, for instance. Some such programs work by generating an ls-type output, then ``diff''ing the list from the previous list and sending a message to the admin about any changes. If the block size changes sporadically, the admin will get false alarms. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/17/90)
In article <2806@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: > I will make an even stronger statement than that: There is no > difference between an actual hole and a disk block full of zeroes. If every hole on this system were allocated as a 0-filled block, we'd need twice as many disks. Another system has a huge page size and loosely padded executables; it would need three times as many disks. If every 0-filled block on this system were made into a hole, several well-written programs would crash miserably as soon as the disk is full. > If we are picky we > can even ask whether our data fits in the space available on disk, and > this is why me might (vaguely) want to be aware that some data storage > schemes (e.g. holes in files) are more efficient than others (e.g. zero > bytes in files). Yes, this is what st_blocks in stat is for. > But for any specific file, at any specific offset in > the file, we should not be asking such this question. Oh? I have a file open. I want to make sure that blocks 17 through 22 (expressed in byte sizes) will be guaranteed not to run out of space when I write to them. You're saying that I should have no way to make this guarantee. ---Dan
adeboer@gjetor.geac.COM (Anthony DeBoer) (12/18/90)
In article <8432:Dec1622:40:0790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >If every hole on this system were allocated as a 0-filled block, we'd >need twice as many disks. Another system has a huge page size and >loosely padded executables; it would need three times as many disks. > >If every 0-filled block on this system were made into a hole, several >well-written programs would crash miserably as soon as the disk is full. [for example,] >I have a file open. I want to make sure that blocks 17 through 22 >(expressed in byte sizes) will be guaranteed not to run out of space >when I write to them. You're saying that I should have no way to make >this guarantee. We have another angle on the problem here: The application software we run will define a file with an index starting at location 0 and the actual data starting a ways into the file, just past the point where the index will eventually end when full (yes, this means you get an error when you try to write the 1001st record into a file you defined for 1000 records, even with lots of disk free, and you have to expand it by copying to a larger-defined file). It never writes to the tail part of the index space until it needs it, so we wind up with a hole there. Getting to the point, it's happened that a client has had to restore a backup with lots of these indexed files and ran out of disk space because cpio or tar was writing all the zeros and allocating the holes! This is essentially your case #1, but I'm just bringing up the backup angle; if you back up and restore (or compress and uncompress) a swiss cheese file you may lose a lot of disk space to the phantom holes. The previous poster [<2806@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi)] was arguing about how the underlying operating system should treat this, ie. what happens if you create a file, seek 1meg forward, and start writing there? Or what if you write a block of zeros; should it detect that and deallocate the block? We could argue the point, but we're stuck with the way that real live Unix out in the field does it today. Case #2 is valid as well; if you've explicitly written zeros it's quite reasonable for you to be able to rely on their being there. The system's existing behaviour of allowing a seek past EOF and not allocating space never written, reading it back as zeros is reasonable; it's just that any copying operation can get caught by it and write more than it read. In article <BZS.90Dec10190615@world.std.com> bzs@world.std.com (Barry Shein) writes: > Actually, under BSD, you can write a fairly portable program to > identify holes without getting intimate with the disk, tho I'm not > entirely certain if there are any, um, holes in it, probably. > The basic idea goes like this: > 1. Holes always read back as a block of zeros, so only > blocks that appear to be filled with zeros are interesting. > 2. If you rewrite a real hole with all zeros (still > with me?) the number of blocks in the file will change, > a stat() will indicate this. > Here's a basic program (which could be improved in various ways, but > illustrates the idea) which prints out which blocks in the file are > holes, have fun picking holes in it (at least grant me that I said it ^^^^^ aaurgh, a punster in our midst! > was BSD-only)! (followed by program code; deleted) Granted, that will take care of identifying if a file has holes, and will as a side effect act as a "hole-filling" program. Somewhere in my "to-do" queue is writing a quick-and-dirty C program to "dig out" these holes, copying a file and fseeking past any large block of zeros, replacing the original file with the copy when done so as to free up the empty space. -- Anthony DeBoer - NAUI #Z8800 adeboer@gjetor.geac.com Programmer, GEAC J&E Systems Ltd. uunet!jtsv16!geac!gjetor!adeboer Toronto, Ontario, Canada #include <std.random.opinions.disclaimer>
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (12/18/90)
In <8432:Dec1622:40:0790@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
I want to make sure that blocks 17 through 22 (expressed in byte
sizes) will be guaranteed not to run out of space when I write to
them. You're saying that I should have no way to make this
guarantee.
Well, "df" works nicely.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP: oliveb!cirrusl!dhesi
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/18/90)
In article <2809@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: > In <8432:Dec1622:40:0790@kramden.acf.nyu.edu> > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > I want to make sure that blocks 17 through 22 (expressed in byte > sizes) will be guaranteed not to run out of space when I write to > them. You're saying that I should have no way to make this > guarantee. > Well, "df" works nicely. Is ``reliability'' a dead concept? What do you propose I do if the disk has just a bit more memory than I need? df works nicely only if the system does not, in fact, run out of space, and then I might as well not bother checking. Do you write software this way? ---Dan
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (12/20/90)
In <18119:Dec1809:38:3990@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
What do you propose I do if the disk has just a bit more memory
than I need? df works nicely only if the system does not, in fact,
run out of space, and then I might as well not bother checking.
Hmmm...I suppose you could buy a new disk, or delete some old files.
That's what *I* do when I am about to run out of disk space.
I often use tar or cpio to move directory hierarchies around, so it
wouldn't do to expect that holes in files will be preserved. The
occasional ndbm database...will just have to be rebuilt. The
occasional core dump with holes I just get rid of.
Let's take this to email.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP: oliveb!cirrusl!dhesi
andrew@alice.att.com (Andrew Hume) (12/26/90)
In article <2809@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: ~ In <8432:Dec1622:40:0790@kramden.acf.nyu.edu> ~ brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: ~ ~ I want to make sure that blocks 17 through 22 (expressed in byte ~ sizes) will be guaranteed not to run out of space when I write to ~ them. You're saying that I should have no way to make this ~ guarantee. ~ ~ Well, "df" works nicely. I rate this as a completely fatuous answer, devoid of use and common sense. i had a similar problem. a network server gets a request from a client to store a file of given length. it is not permissible to say yes and then say no halfway through the file. i do it by writing zeros to the given length first and then saying yes/no and then read/write the actual data. when do i do the df, exactly? actually, from what this thread has uncovered, it might be safer to write non-zero data to avoid smart filesystems. what scares me more are hyperintelligent disk drives that have built in data compression and might be able to take 20 blocks of some values but not be able to overwrite them because of different compression rates. andrew hume andrew@research.att.com
jim@segue.segue.com (Jim Balter) (12/27/90)
In article <11749@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: >In article <2809@cirrusl.UUCP>, dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: >actually, from what this thread has >uncovered, it might be safer to write non-zero data to avoid >smart filesystems. what scares me more are hyperintelligent >disk drives that have built in data compression and might be able >to take 20 blocks of some values but not be able to overwrite them >because of different compression rates. Obviously, the worst thing you can do is write zeros. Write random data. Better than using a random number generator on the fly is to precompute a block of data that looks like noise (there are various statistical measures for randomness (lack of signal)). While this isn't guaranteed to defeat all compression schemes, it greatly reduces the likelihood of too few blocks being allocated. When the odds of that happening are on a par with the odds that a plane will crash through the roof and destroy the disk drive, you can sleep better at night. Also, if you are writing critical real time applications, your hardware and OS are significant parts of the system and should be carefully specified so that they do not violate your requirements. Some people seem to think, though, that it is better to have inefficient disk drives or archivers to prevent breaking such programs as yours. st_blocks is so that programs (e.g., du, ls) can determine actual disk usage. It isn't for any other purpose, it is silly to try to imagine such purposes, and it is foolish, if you can think of such a purpose, to implement it. I'm sure that, if the designer had thought of it and had thought it necessary, s/he would have added something like "st_blocks is not a permanent attribute of a file; it may, for instance, change if a file is archived or is treated by a disk compacter." to the documentation. Pretend that this was said. Programs that read the disk directly are bypassing file logical structure and have no right to make any kind of assumption about the persistence of file attributes. As a general principle, people and programs care about files with holes turning into out of space conditions, but conversely they have no reason to object if out of space conditions turn into files with holes. Archivers that restore with holes are doing it right. They acknowledge that disk space matters (welcome to the real world) and that the presence of holes is invisible within UNIX file semantics except for st_blocks, which is a report value and not a permanent or persistent attribute of a file (welcome to conceptual clarity).
barmar@think.com (Barry Margolin) (12/27/90)
In article <11749@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: > I rate this as a completely fatuous answer, devoid of >use and common sense. i had a similar problem. a network server >gets a request from a client to store a file of given length. >it is not permissible to say yes and then say no halfway through >the file. Sounds like a pretty unrobust protocol. A simple protocol can't be expected to deal with complicated situations. > i do it by writing zeros to the given length first >and then saying yes/no and then read/write the actual data. >when do i do the df, exactly? actually, from what this thread has >uncovered, it might be safer to write non-zero data to avoid >smart filesystems. what scares me more are hyperintelligent >disk drives that have built in data compression and might be able >to take 20 blocks of some values but not be able to overwrite them >because of different compression rates. A more immediate fear should be WORM read-write file systems. I don't know about the current generation of WORM FS standards, but a proposal I saw several years ago at MIT specified that any time a block is modified such that a 1 bit is replaced with a 0 bit a new block is allocated for the new contents of the block, the old block is marked obsolete, and the inode is changed to point to the new block (WORM media is like paper tape and punched cards -- you can add 1 bits, but you can't take any way). Thus, overwriting a non-zero block could result in a file system full error! -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
tif@doorstop.austin.ibm.com (Paul Chamberlain) (12/27/90)
In article <11749@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: >what scares me more are hyperintelligent >disk drives that have built in data compression and might be able >to take 20 blocks of some values but not be able to overwrite them >because of different compression rates. You also may eventually run into a problem with a WORM drive that records new versions of blocks/files in a different place. Then again, you may also have to cope with a filesystem that looks almost full but, when you fill it up, it gets bigger. This will work until you run out of space to grow the filesystem. This is just a small step from what AIX Version 3 currently does. Paul Chamberlain | I do NOT represent IBM. IBM VNET: sc30661 at ausvm6 512/838-9662 | This is rumored to work now --> tif@doorstop.austin.ibm.com
andrew@alice.att.com (Andrew Hume) (12/28/90)
i am on the ansi committee working on worm fs standards and the problem of reserving space on worms is understood and provided for. It is a little moot how you can do it with vanilla unix but at the file system driver level, you can allocate arbitrary (well, limited by 2^64 bytes and the world's production of media) extents for future use. presumably clever vendors will add ioctl's or fcntl's to do such. as for being clever about adding 1 bits etc, forget it. it is true at the innermost hardware level but user-level access requires turning off ECC and VERY few drives provide even plausible performance without ECC. It is unlikely you will ever find a rewrite of a block such that the new data + new ECC == old data + old ECC plus some ones, particularly on disks like SONY (12in) which has 512 bytes of ECC for each 1024 byte sector. returning to the original point, i don't care what the file system does behind my back. i do care about being able to reserve space on the file system that no other user can eat. andrew hume (908) 582-6262 andrew@research.att.com
paul@actrix.gen.nz (Paul Gillingwater) (12/29/90)
In article <11753@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: > i am on the ansi committee working on worm fs standards and > the problem of reserving space on worms is understood and provided for. > It is a little moot how you can do it with vanilla unix but at the file > system driver level, you can allocate arbitrary (well, limited by 2^64 bytes > and the world's production of media) extents for future use. presumably > clever vendors will add ioctl's or fcntl's to do such. For a very good discussion on these matters, go along to your local HP office, and ask for a copy of the HP Journal. The latest edition goes into great detail about their new Optical R/W auto-changer (juke box) and has an excellent discussion on how they implemented things under a UNIX (HP-UX McKusick) file system. -- Paul Gillingwater, paul@actrix.gen.nz
david@bacchus.esa.oz.au (David Burren) (01/30/91)
Where can I find mention of the implementation of "holes" in files under the BSD ffs or other filesystems? Do I guess and assume that the relevant block pointer(s) have some sentinel value (eg. 0) to flag the fact that there is no data? It seems the logical explanation, but I'm surprised I haven't found it mentioned (or am I blind?). I've read McKusick, et. al, "A Fast File System for UNIX" and Leffler, et. al. "The Design and Implementation of the 4.3BSD UNIX Operating System", but have found nary a mention of holes. One of the excercises at the end of the filesystem chapter in Leffler et. al. mentions them, but that's the only lead. I'd appreciate any pointers towards a definite answer on this. BTW, under some OSes I've seen processes with sparse address maps produce core dumps with holes in them. I once had a user ask me "how can I make my file a core file?" He was a student trying to get around quotas.... _____________________________________________________________________________ David Burren [Athos] Email: david@bacchus.esa.oz.au Software Development Engineer Phone: +61 3 819 4554 Expert Solutions Australia, Hawthorn, VIC Fax: +61 3 819 5580
david@bacchus.esa.oz.au (David Burren) (01/31/91)
In article <1111@bacchus.esa.oz.au>, david@bacchus.esa.oz.au (David Burren) writes: > Where can I find mention of the implementation of "holes" in files under > the BSD ffs or other filesystems? > Do I guess and assume that the relevant block pointer(s) have some > sentinel value (eg. 0) to flag the fact that there is no data? Thank you all who have responded. Apparently that is how it's done. One respondent noted that block 0 on the disks he'd looked at were zeroed out and that that's why reading a hole returned zeros, but those must have been non-boot disks that for some reason had block 0 cleared. I was under the impression that it wasn't used on non-boot volumes. Anyway, I'm told that the fs code interprets the 0 pointer as being to an unallocated block no matter what the values in block 0 are. Thanks folks, I don't think I need any more email on the subject.... > BTW, under some OSes I've seen processes with sparse address maps produce > core dumps with holes in them. I once had a user ask me "how can I make my > file a core file?" He was a student trying to get around quotas.... Yes folks, that time it was a SunOS 4.x machine. Is anyone aware of other flavours of Unix/etc that regularly exhibit the same behaviour? - David B.