dan@rna.UUCP (12/03/83)
In bringing up 4.2BSD on an 11/750 with SC750 and 2 Eagles, I encountered a mystery. After reading in /usr/sys and /usr, I did a fsck on the /usr pseudo disk, which came out fine, but then I did a fsck on /usr using the block device entry which gave me an unreadable disk block. A dcheck on /dev/rhp0h also gave me an error. I thought it might be a bad or flakely sector on the disk except that I tried the whole procedure again on the same disk after reformatting it more carefully, and again on the second disk drive. Both trials yielded the same result down to the same block number which was unreadable. What am I missing ? When will using the raw device versus the block device with fsck give you a different result ? Is dcheck supposed to be functional ? I don't think I believe its output. Thanks, ...cmcl2!rna!dan
thomas@utah-gr.UUCP (Spencer W. Thomas) (12/04/83)
I may be wrong about this, but I think your problem is a partial block at the end of the file system. Check the file system size to make sure it is divisible by the block size. This is supposed to work (I think), but it doesn't in either the block device or the raw device. =Spencer
smb@ulysses.UUCP (Steven Bellovin) (12/05/83)
I've seen a similar problem, and I believe it to be a bug in the kernel and/or fsck. In my case, the block number that fsck couldn't read was 291344. By a curious non-coincidence, the slice in question (the 'h' slice on an RM05) has 291346 blocks. Thus, there are only 2 blocks available before the end of the file system, and an attempt to read a full block will fail. Worse yet, it will behave differently with the cooked and raw devices. Try this program -- on the cooked device, *all three* reads fail, even though the first two should work! On the raw device, the first two work (512 and 1024 bytes), but the third fails. My own opinion is that the third read (of 4096 bytes) should return a short block. -------------------- #include <ctype.h> extern int errno; main(argc, argv) int argc; char *argv[]; { extern long lseek(); char buf[4096]; int i, fd; chdir("/dev"); fd = open (argv[1], 0); if (fd < 0) { perror("open"); exit(1); } if (lseek(fd, 291344L*512L, 0) < 0) { perror("lseek 1"); exit(1); } i = read(fd, buf, 512); printf("512 %d %d\n", i, errno); if (lseek(fd, 291344L*512, 0) < 0) { perror("lseek 2"); exit(1); } i = read(fd, buf, 1024); printf("1024 %d %d\n", i, errno); if (lseek(fd, 291344L*512L, 0) < 0) { perror("lseek 3"); exit(1); } i = read(fd, buf, 4096); printf("4096 %d %d\n", i, errno); }
stevesu@azure.UUCP (Steve Summit) (12/06/83)
I occasionally had bizarre problems like this when working on a 2.8 system. I would fsck the raw interface, and the block interface would still be broken, and vice versa. df's on the two interfaces could show differend free block counts, etc. restoring to one interface and fsck-ing the other could wipe out the restor. My hunch (never verified, I just learned to use one interface consistently) was that the kernel was getting confused on block caching (particularly the superblock), not realizing that a block from /dev/??? was the same as a block from /dev/r???. Suppose a block from /dev/??? was modified, but not written out. A subsequent read of the same block from /dev/r??? might go directly to disk, not realizing that the modified block was in core, and get an obsolete version. The only flaw in this reasoning is that I have this memory that doing a sync between the write to one device and the read from the other didn't necessarily help... Can anyone confirm or deny this supposition of mine? If it's true, there should be some warning about it in the documentation. Steve Summit tektronix!tekmdp!stevesu
thomas@utah-gr.UUCP (12/08/83)
The kernel (pre 4.2, anyway) keeps a copy of the superblock for every mounted filesystem in-core, and only writes it out to the disk i.e., the disk copy is never reconsulted. So, if you fsck the raw copy of a mounted filesystem, and it modifies the superblock, then you are in big trouble, because what's on the disk doesn't agree with what the kernel thinks is there. In 4.1 (and maybe others), you CAN fsck the block device, as long as the filesystem is "quiescent" (sp?). I think in earlier versions, the situation was even worse, but I don't remember the details now. =Spencer
olson@fortune.UUCP (12/08/83)
#R:rna:-18400:fortune:2000004:000:298 fortune!olson Dec 7 23:10:00 1983 I can't confirm or deny Steve's question as to the problem with raw vs. block devices and fsck, but I do know that the 4.1 (and I suspect the V7) versions of fsck do a sync() system call before they start doing any checking Dave Olson, Fortune Systems {ihnp4,harpo,ucbvax!amd70}!fortune!olson
dmmartindale@watrose.UUCP (Dave Martindale) (12/09/83)
In ALL versions of UNIX that I know of, the kernel has no idea of any sort of correspondence between disk blocks accessed via the "block" and "character"/"raw" devices. They are completely different interfaces as far as the system is concerned, and there is no locking of updates between them. References to the "raw" device ALWAYS read/write data directly from disk to memory or vice versa; references to the "block" device always go through the buffer cache. If you do a write on the block device, doing a sync will ensure that it has been written out and so it will be there for any subsequent reads of either device. But if a particular block is already in the buffer cache, there is no way of convincing the kernel that it is an invalid copy of the disk data after you've written the block using the raw device. Thus, reads of that block using the block interface will give you the old data as long as the block remains in the cache. Yet another effect is that most versions of UNIX keep the superblock of a mounted filesystem in a secret buffer in memory, that is not part of the buffer cache. If you update the superblock in the cache with the block device interface, you still haven't updated the superblock that the kernel deals with. The only way you can safely fsck a filesystem is with it NOT MOUNTED - then the kernel doesn't have the secret copy. 4.1BSD did not have this problem, but it seems to be back in 4.2BSD. Dave Martindale