[comp.dcom.lans] Lingering floppy drive directories problem solved!!!

ugogan@ecsvax.UUCP (Jim Gogan) (02/15/88)

Last week I posted a request for help on the following problem:
>We (University of North Carolina at Chapel Hill) maintain a number of
>networked public microcomputer labs on campus. 
>For the past week, we have been seeing random occurrances of floppy disk
>directories being overwritten with the directory of the previous user at
>that workstation.

After I posted that item, we soon saw the problem spreading to stand
alone PCs with similar file names showing up at the beginning of the
directories on these "scrambled" diskettes.  After close examination of
several of the diskettes, we managed to identify the problem and put a
"fix" in place.  

Before I identify the problem, however, let me first thank all those who
shared possible solutions with me.  (A summary of the replies follows our
solution.)  Seems this type of problem has been encountered elsewhere,
although the cause of our problem did not seem to be related to these
other cases.  On the one hand, it's great to see the level of support
that we can all provide to each other; however, on the other hand, now
I get to worry about when we'll start seeing all these other causes!!!

And now the cause of our problem:
The first byte in the File Allocation Table (FAT) on a DOS-formatted
diskette describes the type of diskette.  Normally, for double-sided 5.25
inch diskettes, the value of this byte is "FD" (for nine-sector
diskettes), while for hard disks, this ID byte is "F8".  Apparently,
someone decided they'd be able to take better advantage of disk caching
capabilities if they "tricked" DOS into thinking their floppy was a hard
disk and patched that FAT ID byte with an "F8" value.  (Inasmuch as BOTH
copies of the FAT on these diskettes had an "F8", I don't think it was
caused by a random disk error.)

As you can probably guess by now, that "someone" brought their disk into one
of our labs to work on and the directory and FAT on that diskette wound
up getting buffered in memory (with DOS thinking it was a hard disk -- a
single sided hard disk!).  They would quit an application, get back to
the network menu, take their disk, and the next person to use that machine
would get that directory and FAT written on to their diskette as the disk
buffers were refreshed.  Now their FAT has an "F8" value in the first
byte and so it spread!!

The quick fix turned out to be upgrading to DOS 3.2 or 3.3 in those labs
and in our stand-alone PC clusters.  We had found that this "scrambling
disease" seemed to only occur in the labs that were running under DOS
3.1.  Turns out starting with DOS 3.2, DOS routines used the info in the
BIOS parameter block to determine media types instead of this FAT ID
byte.  

We also found that if we went in with DEBUG or Norton Utilities and
changed the "F8" back to "FD" on these "nasty disks", their directories
were no longer being buffered, thus losing their contagiousness.  (Although
CHKDSK indicated that these disks were still pretty screwed up, our User
Service folks were able to recover the majority of data on most of the 
disks that were brought to them for repair.)  

In order to "decontaminate" the unknown number of "scrambled hard
floppies" that were now out there, we also wrote a program that checks
the FAT ID; if it contains an "F8" value for a floppy disk, the pgm
replaces it with the value from the BIOS parameter block, locks up the PC
(forcing a reboot), and tells the user their disk has been modified,
directing them to User Services for any necessary data recovery.

If you are using any type of "disk speedup" program that
uses this method for "improving" floppy drive access, please be sure
you're aware of the consequences of your actions.  When people share
microcomputers, everyone can be affected by your "personal computer"
activities!

Thanks again to everyone who suggested other possibilities.  A summary of
these replies follows.

-- Jim Gogan (ugogan@ecsvax)
   University of North Carolina at Chapel Hill
   Microcomputing Support Center

===================================
Responses of other possible causes:

"I have just seen something very similar to what you describe,
including being able to get a directory with the door open.  This was
a high-density drive on an AT clone.  I know that MS-DOS times out
(clock-time) its own disk cache on low-density drives and never times
out its cache on fixed disks.  My current theory is that on disks that
have a disk-change indicator line (e.g., high-density disks), MS-DOS
depends completely on getting the disk-change error from the BIOS rather
than a timeout.  Should the disk-change line fail or should the machine
miss the interrupt (unliklely) then MS-DOS may well assume that the
disk hasn't changed.  I would suggest, then, that you have a problem in
the drive or controller hardware."
======
"I have also run into the PC/Net problems you have mentioned.
We are running mostly PC-1's with the upgrade bios installed
and memory expansion to a full 640K.  Also, most of our systems have
the original version of the PC Net card in them.  We have found our
problems with several pieces of software including Modula2.

To date I have not been able to get a firm duplication on a 
problem, even after running the suspect unit under looped IBM rev 2.
20 diagnostics.  I have reached the point where I finally changed out both
drives and controller card in one machine.  However, I still doubt
that the problem was "fixed".  The machine in question was taken off
the net before the problem had a chance to re-occur.

My thought, as our problems seemed to be occuring with a class
in languages, that someone may have inadvertantly done something in
a program, but I have no way to confirm this."
======
"What you describe is the standard MS-DOS behavior when a floppy disk is changed
while a file is open. DOS caches directory sectors and flushes the cache when
ever there aren't any open files on the device.

The problem you describe could result like this...

First user comes in, runs an application, saves a file to a floppy disk.
He takes the disk out, but leaves the application running. The next user 
comes along, puts disk in drive, then notices application running and shuts 
it down.  At that point the cached directory is written out.

Be careful of Word Perfect. It uses a swap file and a control file that could
cause exactly this problem if these files are on the floppy."
======
"I suspect the network is not to blame; it is most likely
a flaky clock or DMA chip in that particular PC (or less likely, the floppy
disk controller).  I have been able to produce such symptoms on a PC when
installing a speedup kit (to 8MHz) on a machine that was using a device driver
for disks (QuadDrive, to be exact).  All is fine at 5MHz or with the driver
removed, but otherwise disks get trashed the way you describe (including the
symptoms with reading directories).

The only reliable DMA/floppy controller test I know of comes with FastBack,
when you run its installation program.  Why don't you try that as a
start?"
======
"I have had a similar problem on a standalone system, but I can't say
what produced it, nor whether DIR didn't cause floppy drive activity.

However, one circumstance that might make DOS forget to re-read the
floppy is if there is an open file left on the floppy drive.
Perhaps some program isn't closing its files properly?

In any case, executing the RESET FILESYSTEM system call whenever a new
user sits down at the PC should clear all caches, and would probably fix 
the problem.
A debug script to create a program which does this is
        a 100
        mov ah, 0d
        int 21
        mov ax, 4c
        int 21

        rcx
        10
        w
        q
Put this in "script.txt" (don't forget the blank line), then type
        debug resetdos.com < script.txt
This should create "resetdos.com"."
======
"We have seen the same problem here, we are using 3Com 3+ Software
running on Ethernet, and have seen it across two labs running on IBM and
Zenith Hardware.  The culprit as far as we can tell is Logitech Modula-2.
Are using this package?  In any case that had the problems, the user was
always using the Logitech Modula-2 Package."
======
"Your problem with directories being over written in combination with
a reference to MS Word rang a bell.

You may want to check if people are using Word's Library run command.
I have experienced trashed files when using this command.  It seems
that Word does some odd things with file handles and doesn't bother
to close files when you execute another program from within Word.

I don't recall the exact situation, but the results were that the
program that was run from within Word had its data redirected to
unexpected places.  It took quite a search to find where the data
went.  Data was written to files that had been previously opened
for some unrelated purpose and then closed (or at least would normally
have been closed).

I would suggest that you make a test floppy with Word and whatever
typical contents you may normally have.  Clear the archive bits on every
file before using the test disk.  Make a copy of the test disk to use as
a reference disk.  Run Word and execute a few programs using the Library
Run command.

Check after each run to see if any files have been changed, ie., check
for set archive bits.  After testing is complete you may also want to do
a file by file compare or a disk compare (against the unused reference
disk) to verify that nothing was changed that shouldn't have been."
======
"I had a similar experience with a fast serial network for
the IBM PC.  It was PC Easy Net, which runs at about 120,000 bps.
The problem turned out to be a conflict of memory resident
programs.  The programs involved were PC Easy Net, CED, a print
spooler from Everex (we were using the Everex EMS board and its
software), Sidekick, and some other program I can't remember.

But the key thing is that when I would be on drive C, for
instance, and enter DIR B:, I would get  a listing of the files
on drive A, or sometimes drive C!  Copying files had the same
bizarre results --  files ended up in the wrong place, or got
mangled.  Once, when I was running VTERM, a communications
package that can stay resident, and I tried to download a file to
drive C, it ended up on drive B.  

The problems did not happen if I only had the Easy Net resident,
and no other resident programs.  Even VTERM caused  problems.
I realize  our two problems are not the same, but the similarity
of symptoms is interesting.  I too had over-written directories,
but only on drives A  and B, never C.  I suspect that your
problem is some new memory resident program that interferes with
the PC network program."
======
"The behavior that you are seeing will occur nearly any time that the previous
user removed the floppy while at least one file was open for write.  This 
assumes that you are using XT class machines, as opposed to ATs.  XTs and
their ilk cannot detect a floppy change.  MS-DOS has heuristics for detecting
floppy change, but they are designed for speed, not safety."

-- 
     Jim Gogan                             mail:ugogan@ecsvax (UUCP/BITNET)
     Microcomputing Support Center
     University of North Carolina at Chapel Hill
     Chapel Hill, NC  27514