rnelson@eecs.wsu.edu (Roger Nelson - Grad Student) (01/14/91)
------ Considering the number of people that have been complaining about the header corruption/validation error, it would appear that there is a real problem with the File System. I usually get the error when ever two applications access the HD at the same time. For example, I always get this error whenever I unarchive two files simultaneously with lharc to the HD. I now do everything in ram and copy to the HD; this is unacceptable. I got the Amiga for its multitasking abilities which I need for my research. My research project requires several processes writting to the hard drive. Has anyone else noticed a pattern of validation errors other than those cause by programs terminating or crashing without closing files? Did the old file system have this problem, and is it fixed in 2.0? Without anything else to go by, I would venture to guess that under certain circumstances two different files are allocated that same block; as if the task that allocates the file block switchs out before it can finish marking the block as allocated. I would be especially suspicious if blocks are allocated sequentially. That is, if two tasks are writing to two files simultaneously, would the allocated blocks be interleaved something like: -----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-- file1|file1|file2|file1|file1|file2|file1|file2|file1|file2|file2|file2| -----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-- If this is the case, wouldn't it make more sense to start block allocation in a sector where there currently isn't any disk activity when the file is opened? This would probably also help alleviate fragmentation. _____________________________________________________________________ ______________ ____ | ^ | Roger Nelson rnelson@yoda.eecs.wsu.edu \^^ |*| ^ | Agricultural Engineering Department /// |^^// ^^ | Computer Science Department /// | ' ^ +| Washington State University \\\/// \_ ^ _________| Pullman, WA 99164 \XX/ `-----'
jesup@cbmvax.commodore.com (Randell Jesup) (01/15/91)
In article <1991Jan13.213610.16726@eecs.wsu.edu> rnelson@yoda.UUCP (Roger Nelson - Grad Student) writes: >I usually get the error when ever two applications access the HD at the >same time. For example, I always get this error whenever I unarchive two >files simultaneously with lharc to the HD. Are you certain lharc doesn't conflict with itself? A number of ported utilities are known to assume that only one copy of their program is running (or running in a given directory). Someone here has indicated to me that lharc probably has this problem. The FS is quite reliable against large numbers of simultaneous operations. Unlike OFS, even re-use of the blocks of deleted files in a directory while it's currently exnexting over them will not hurt FFS (the classic design problem with ExNext). -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
rnelson@eecs.wsu.edu (Roger Nelson - Grad Student) (01/16/91)
------ > Are you certain lharc doesn't conflict with itself? I ask then, at what level does lharc "conflict with itself". In what way could lharc "conflict with itself". Apparently there is some conflict with some resource, Since lharc has never crashed while running multiple lharc process to the ram disk, I would have to conclude that the resource in question is the HD. Is it not the responsibility of the File System to manage the HD resource? If the File System is reliable against large numbers of simultaneous operations, then there must be some flaw in the HD, the controller, or the HD driver software? Since it appears that others have had problems with multitasking lharc and other utilities on HDs, we can probably (although not necessarily) rule out the HD itself and probably the controller as well. This would leave the device driver or bring us back to the file system. > A number of ported utilities are known to assume that only one copy of their > program is running (or running in a given directory). Why shouldn't a program assume it is the only copy of itself running? Why should this be any different than running several different programs? Why should this cause problems? and why only with the HD. What could lharc be doing to cause the machine to crash in this manner? By "running in a given directory" does this mean writing to a given directory? I assume that lharc is crashing during some sort of write operation, since it would seem unlikely that there is a problem with reads (I usually unarc from the HD to MEM). Given the nature of what lharc does, I don't see that it would be doing anything more complex opening a file and writing to it a character at a time (or, most likely, a buffer at a time). Surely the Amiga can handle multiple process do that kind of operation. It apparently can since I haven't experience this problem with zoo. I suppose I can resign myself to the idea that the problem rests totally with lharc, but this leaves the question: what is lharc is doing that should be avoided? The answer should be nothing if lharc is making standard system calls correctly. Has anyone experienced lharc crashing the machine whilst multiple lharc processes are unarchiving to devices other that the HD or to other file systems? _____________________________________________________________________ ______________ ____ | ^ | Roger Nelson rnelson@yoda.eecs.wsu.edu \^^ |*| ^ | Agricultural Engineering Department /// |^^// ^^ | Computer Science Department /// | ' ^ +| Washington State University \\\/// \_ ^ _________| Pullman, WA 99164 \XX/ `-----'
limonce@pilot.njin.net (Tom Limoncelli) (01/17/91)
In article <1991Jan16.040022.22135@eecs.wsu.edu> rnelson@eecs.wsu.edu (Roger Nelson - Grad Student) writes: > Is it not the responsibility of the File System to manage the HD resource? > If the File System is reliable against large numbers of simultaneous > operations, then there must be some flaw in the HD, the controller, or the > HD driver software? I don't know if this is what happens with LHARC, but it is a simple counter example to what you suggest: The FS can only protect you to a certain (excuse the pun) extent. What if a utility needs to create a temporary file. It creates it as T:lharc.tmp every time. Running the utility multiple times will cause problems. This is the kind of thing I would expect from a program ported from the PC. I didn't like the one I found in Manx 3.6 so I wrote my own. I believe AmigaDOS 2.0 has a "create unique file name" system call. -- tlimonce@drew.edu Tom Limoncelli "Flash! Flash! I love you! tlimonce@drew.bitnet +1 201 408 5389 ...but we only have fourteen tlimonce@drew.uucp limonce@pilot.njin.net hours to save the earth!"
jesup@cbmvax.commodore.com (Randell Jesup) (01/17/91)
In article <1991Jan17.032930.18196@eecs.wsu.edu> rnelson@yoda.UUCP (Roger Nelson - Grad Student) writes: >I suppose I have said enough based on a lot of speculation. >I should try some experiments to see if multiple process attempting to >write to a single file cause the read/write error and ultimately the >validation error. (or is this a known problem?) Please do try some things (make sure you have a good backup or two first!) If you hit problems, send a report with a much info as possible (ks/wb version, hardware config, lharc version, what you were doing, etc). >However, suppose it is the case that this situation does indeed cause >the read write error. I contend that the file system should recover >more gracefully than crashing the machine and requiring one to reformat >the HD partition. Merely colliding writing to a file won't cause problems, but if lharc didn't consider the possibility of it's temp file being corrupted, it might (note: might) die, trashing the filesystem's bitmap or other important things. I don't know that this is the case, or it could be some other sort of bug in lharc. I will probably do some tests here as well. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
jesup@cbmvax.commodore.com (Randell Jesup) (01/17/91)
In article <1991Jan17.033827.18355@eecs.wsu.edu> rnelson@yoda.UUCP (Roger Nelson - Grad Student) writes: >Granted a program should use a unique names generator, but is it >unreasonable not to do so even in a multitasking environment? Yes, it's unreasonable, since your temp file might get corrupted or deleted out from under your program. It's one of the things one must do to work properly in a shared environment (even single-tasking if you have a network filesystem). >If opening multiple same name files on the HD causes such serious >problems, and is not an uncommon fax paus, shouldn't the file system simply >return a read/write error to the offending program and not allow the HD to >be corrupted? Also, what is 'exnexting' and 'using memory after free'? >Could someone provide me with a description? ExNext is the basic call used to examine a directory. The full explanation would get rather complex if you're not used to programming Amigas. >> It may be lharc, or it may be something else (a 3rd-party utility >> you have running, though this is quite unlikely). > >I do not preclude the possibility that the problem is only manifest by >my system configuration (I may very well have old release of 1.3 since >I bought my machine used and it is a rev. 4.2, but I also find that unlikely). If you submit a bug report, include the version numbers (use the version menu entry from WB, or "version" from a shell). >This brings me back to my original posting: because the machine crashed, >I decided to repartition the HD. Since I bought the machine used, I didn't >get the documentation on the 2090a controller, and the Amiga Owner's >manual (Introducing....) only mentions the 2090 controller; thus, I have >no information on making an autobooting HD partition. How do you go about >making such a HD partition bootable? Simply prep it as you would a 2090. Then reboot from a floppy WB, and format the dh<whatever>: partition. Then copy workbench to it (note that it will be old filesystem, and slower than FFS). Then reboot without a floppy and it should boot off the HD. Your machine would have originally came with an install disk to re-setup the disk. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
gregg@cbnewsc.att.com (gregg.g.wonderly) (01/18/91)
From article <1991Jan13.213610.16726@eecs.wsu.edu>, by rnelson@eecs.wsu.edu (Roger Nelson - Grad Student): > ------ > > Considering the number of people that have been complaining about the > header corruption/validation error, it would appear that there is a real > problem with the File System. I used to have continual problems and frustration with having to backup and restore my development partition. Now, I too have moved to a smaller partition which only requires a minimum of effort to backup and restore. There was a comment from one of the CA people that the "Read/Write error" requester is only generated on a media error. When I diskdoctor such a partition, I get "multiple files reference block", but after restoring (I backed up before diskdoctoring with a file by file type backup program), not one single file is corrupted or missing. The data checks out flawlessly when I just read off of that partition using my media checking program. My personal opinion is that the problem is in the FFS handler! -- ----- gregg.g.wonderly@att.com (AT&T bell laboratories)