[net.micro.cbm] save-replace bug on 1541

dpa@snow.UUCP (David Angier) (01/06/85)

I think the problem with save-replace on the 1541 is only
caused when a disk is changed i.e. when you are transfering
a program from one disk to another.
The DOS in the 1541 fails to get the new BAM for the second disk
even if its ID is different!!!
Therefore when an attempt is made the DOS uses what was spare space
on the first disk for the program, thus overwriting if those spare
blocks were not so spare on the second disk.

		Dave (Maths @ Warwick University, UK)

mab@druxp.UUCP (BlandMA) (01/10/85)

There's a book out by Datamost, I think it's called "Inside Commodore DOS"
that does a very good job of describing how the 1541 ROM works and how
to play tricks with it.  It also includes a well-annotated memory map of
the ROM.  Highly recommended for anyone who wants to hack at the 1541, or
who just wants to know how it works.

Anyway, they claim that there is really no bug in the save-replace command,
but the symptoms that everybody reports can be caused in the following
two situations:

	1) If you scratch an unclosed file, you have just poisoned the BAM.
	   An unclosed file shows up in a directory list with a * next to the
	   file type.  An unclosed file does not have the last block written
	   to disk.  When a file is scratched, the DOS follows the chain of
	   block pointers, deallocating as it goes.  The last block of an
	   unclose file contains garbage, so it's possible that the block pointer
	   points at some other block on the disk.  If that block happens to be
	   in a good file, the blocks in the rest of that file get deallocated!
	   The correct way to get rid of unclosed files is to do a validate on
	   the disk.  The validate command clears the BAM, then goes through
	   all files on the disk, following the pointers and allocating all
	   blocks it finds.

	2) If the disk fills up while you're writing a file, the BAM also
	   gets wasted.  I don't recall the details, but they basically said
	   that you had better make sure there's enough space on the disk
	   before you write.  They included a sample program that showed how
	   to check the free space before each write.  Pretty messy.

>	I think the problem with save-replace on the 1541 is only
>	caused when a disk is changed i.e. when you are transfering
>	a program from one disk to another.
>	The DOS in the 1541 fails to get the new BAM for the second disk
>	even if its ID is different!!!
>	Therefore when an attempt is made the DOS uses what was spare space
>	on the first disk for the program, thus overwriting if those spare
>	blocks were not so spare on the second disk.

This also sounds plausible to me.  The Datamost book mentioned that
there were times when the DOS failed to read the BAM for the new disk,
although I can't recall if they mentioned any specific circumstances
when this happened.  For this reason, every sample program in the book 
begins with an Intialize command just in case.  They say it's a good
habit to get into if you like to hold onto your data.

Based on the discussions in this book, I have decided that save-replace
is safe to use as long as you're careful.  The times I have lost data
because of the save-replace "bug" was after I had scratched an unclosed
file (I didn't know what that asterisk was, but a scratch seemed to
get rid of it ok!!).  After a year of abstention, I've began using
save-replace again several weeks ago with no ill effects so far.

Anyone who finds out other situations where you can corrupt a disk,
please post them.
-- 
Alan Bland
{ihnp4, allegra}!druxp!mab
AT&T Information Systems Labs, Denver

miller@uiucdcsb.UUCP (01/12/85)

If that were true, then the problems would occur in the data sectors on the
disk as they were overwritten using the old BAM.  Instead, the problem arises
in the directory as the start of file pointer gets smashed on an innocent
(and in my cases, directory contiguous) file.  Furthermore, your hypothesis
would lead to many files all getting smashed at random since another disk's
BAM is being used to allocate sectors.  Instead, only one file (at a time) gets
lost.  I think the jury is still out on the cause of the 1541 @replace bug.
If anyone "really" comes up with the cause, then you should be able to repeat
it by duplicating the configuration.

A. Ray Miller
Univ Illinois

miller@uiucdcsb.UUCP (01/19/85)

Alan,
Although the problems you mentioned are indeed very real, and users should
follow your suggestions, I have lost files under situations different than you
described.  For my money, I avoid save & replace like the plague.

A. Ray Miller
Univ Illinois

jdr@cmu-cs-speech2.ARPA (Jeff Rosenfeld) (01/23/85)

One might also be careful of the fact that when the drive does a
save-replace, it writes the new file and THEN scratches the old version. If
filling up the disk during a write can foul up BAM, then a likely explanation
for the bug is that you don't have sufficient space on the disk for TWO
copies of your file. I have not had the problem using the save-replace
command (I use it constantly), but I have lost backup copies of some rather
large files that way. 
It may be useful to note that the VALIDATE command not only cleans up
"poisoned" blocks, but also reallocates the disk space so that blocks that
are allocated but unused  can be freed for re-use. These unused blocks become
allocated through continued use of the save-replace command. It is therefore
a pretty good idea to VALIDATE your disks every so often.

                                           - Jeff Rosenfeld.
                                             jdr@cmu-cs-speech2.ARPA