hastings@coherent.com (Reed Hastings) (02/22/90)
Two weeks ago I wrote: >I have heard that there was an old BSD bug such that if you let your >disk get more than 90% full you were likely to "lose files" or >"create garbage inodes" or similar ugly things. Thanks to all who responded. Enclosed are the two most informative answers. -Reed. (hastings@coherent.com) Date: Tue, 13 Feb 90 16:57:23 EST From: ames!harvard!genrad.com!jpn (John P. Nelson) Subject: Re: SunOS file corruption Organization: GenRad, Inc., Concord, Mass. UUCP: {decvax,mit-eddie}!genrad!jpn smail: jpn@genrad.com Boy. These things really get corrupted after they are retold five or six times by people who don't really understand them. None of the above things can ever happen. The Berkeley "Fast Filesystem", first introduced in BSD 4.1c, I believe, has certain characteristics: when the filesystem becomes 90% full, the performance begins to drop (both for writing, and also for reading files written once the filesystem gets full). This is because it gets harder to find an "optimal" block for the file being written. When the filesystem gets VERY full (like 98% - 99%), there is the possibility that the disk driver will not be able to allocate a disk block at all. This is because the tail ends of files are stored in "block fragments" (which are 1/4th of ordinary full blocks). It is possible that there are no full disk blocks available on the disk even though there appears to be space, because all the space is used by fragments. I'm not sure what happens in this situation, but I believe you get a kernel "panic". You certainly don't lose files or get garbage inodes. Starting with BSD 4.2, the kernel ENFORCED the 90% limit (actually, this is a tunable parameter) for anyone but root. In other words, if you attempt to write to a "fast filesystem" disk as an ordinary user, and the disk is already 90% full, you will get an error: ENOSPC. The disk APPEARS to be full. In addition, utilities like "df" only report "kbytes free" and "% used" AFTER first subtracting 10%. So you can use up to 100% of the space that df reports, and after that, the disk appears to be full, and all writes will fail. >Can anyone confirm & clarify this, and most importantly, comment >on whether it still exists in SunOS 4.x. This behavior is still there. You will never see it, unless you attempt to cram your disk past 100% full (as df reports it). We have occasionally run read-only (to ordinary users) disks at 108% full with no problems. Date: Wed, 14 Feb 90 08:35:53 PST From: sun!bit!jayl (Jay Lessert) Subject: Re: SunOS file corruption Organization: BIT Portland, OR Jay Lessert {ogccse,sun,decwrl}!bit!jayl Bipolar Integrated Technology, Inc. 503-629-5490 (fax)503-690-1498 We've never seen anything like this in 3+ years of heavy Sun3/Sun4 usage, and we've filled filesystems many a time. HOWEVER, there are at least two extremely serious file corruption problems in SunOS 4.0.x that are NFS-related, and Sun *won't* volunteer the information... :-) 1) The 1st we call "NFS read corruption". An NFS client starts substituting random chunks of NFS buffer cache for NFS file reads. The actual file contents on the server(s) are ok. Once it starts happening (it is sort of a "mode"), it keeps happening randomly until said client is rebooted. Amusingly enough, a fastboot(8) won't do the job, fasthalt(8) and then booting is the quickest way. We have this happen about once every two weeks. 2) The 2nd we call "UFS fragment write corruption". A UFS write on an NFS server, followed by an NFS read of that file that happens *before* the UFS write buffer is flushed to disk, can cause the UFS fragment of said file to be replaced with random chunks of UFS buffer cache. In this case the physical file is truly corrupted. Often shows up when mail or news spool directories are NFS-mounted, as you might imagine. These bugs are present in all 4.0.x versions, through 4.0.3. There are no patches. I can dig out the Sun bugid's if you're interested. Sun claims that they will not release 4.1 to production until these are fixed, we'll see... --