ronnie@mit-eddie.UUCP (Ronnie Schnell) (01/13/86)
I'm having trouble with my vax. There is a corrupted directory and fsck says: SALVAGE? And no matter what I type it just returns to the cshell and it doesn't seem to fix anything. #Ron (ronnie%sutcase.bitnet@wiscvm.wisc.edu) (ronnie@mit-eddie.uucp)
bzs@bu-cs.UUCP (Barry Shein) (01/16/86)
Re: fsck reports corrupted directory, asks to SALVAGE and then exits on any answer. I believe I have seen this behavior, rather than attempt to fix fsck for you the following *might* work, but the risk is yours... Safety First: Do you have a reasonable backup of the partition? If not maybe you better figure out how to do that now. I might try going single user, mounting the system (maybe read-only) and seeing what appears to be there. Note that there is a possibility a crash may result causing more problems so you might want to back-up root first (tho if you don't touch any files on root while you do this and sync root *before* mounting everything should be ok even if you do crash.) Dump often does not work on an addled file system (I wouldn't even try as the dump tape created might be unusuable even if it thinks it's ok.) Tar is a possibility, perhaps avoiding that addled directory (ie. Tar the directories around it.) Another possibility is to grab another, healthier disk area and try moving things over there. Another possibility is to say 'Hey, we did a backup last night, if I lose it I lose it...' Another last resort is to just DD the entire raw device to tape, at least you could possibly start all over again (tho hours could be wasted.) Also, if desperation set in you could try to recover files from the DD tape later (I don't envy this but some straightforward wizardry might make this do-able, a friend once wrote a tiny shell-like thing which did 'ls', 'cd' 'cp' and 'cat' using a raw device as input, that was V6 tho [why he wrote it I leave to your imagination.]) Now, being as you have planned the backout, I would try to locate the addled directory, move out of it what I can, delete the directory in question, dismount and re-run fsck. In desperation you could clri the directory. Chances are good fsck will then figure things out EXCEPT if you are having hardware troubles (a bad block in the directory area) tho I think fsck would report this (maybe.) Something that has happened to me and is very unnerving is a situation where the structure seemed empty!! After a little playing around and much brain-games with fsck we hadn't lost a thing other than my sanity (well, a few files landed in lost+found which of course didn't exist till I kinda forced my way with that disk.) I would also let any suggestions like this congeal a little to see if someone finds a flaw or a much better way. On the other hand, your users may not be terribly patient. The problem is that with the hardened filesystem on 4.2 you probably are going to have to seriously consider a hardware problem, that is often what it comes down to tho it could have been an unfortunately timed power hit. Good luck. It also might be a good day to announce that you are paying for next year's xmas party...:-) -Barry Shein, Boston University (P.S. I posted this to see if others found a major flaw I guess, also, being as you are right across the river feel free to call me if things get too hairy, the price is your first-born.) (P.P.S. Glancing at that area of fsck.c there seem to be a few conditions which could cause this to happen, like a calculation in fsck_readdir yielding a zero (NULL) to dirscan and the filesize at that point appearing to be zero tho it's hard to check.)
wje@daisy.UUCP (William J. Earl) (01/17/86)
>>Re: fsck reports corrupted directory, asks to SALVAGE and then exits >>on any answer. > >I believe I have seen this behavior, rather than attempt to fix fsck >for you the following *might* work, but the risk is yours... > > ... > >(P.P.S. Glancing at that area of fsck.c there seem to be a few >conditions which could cause this to happen, like a calculation in >fsck_readdir yielding a zero (NULL) to dirscan and the filesize at >that point appearing to be zero tho it's hard to check.) At our site, we fixed a bug in dirscan(). Just inside the for loop which calls fsck_readdir(), we added: if (dp->d_reclen > DIRBLKSIZ) /* force it to be <= DIRBLKSIZ */ dp->d_reclen = DIRBLKSIZ; (This is added just before the statement "dsize = dp->d_reclen;".) Without the check, fsck failed when it encountered a corrupted directory block, even though it recognized it as corrupted, as it tried to salvage directory entries. We did not have time to work on it further, but there are probably other such cases where certain kinds of garbage cause fsck to fail, due to it not being suspicious enough. -- William J. Earl Daisy Systems Corporation, Mountain View, CA