charles@c3pe.UUCP (07/31/86)
In article <826@PUCC.BITNET> D0430@PUCC.BITNET writes: >In article <147@itcatl.UUCP>, robin@itcatl.UUCP (Robin Cutshaw) writes: >>I have been getting numerous soft errors on the G partition >>and a few hard errors there. > >We had the same problem with an rd53, lots of soft errors, turning into >hard errors, running rabads every day to replace it, etc. We finally >just reformatted the disk and all the problems vanished. That raises an interesting question: I've noticed a similar phenomenon with certain 5.25" Winchesters. In our case (Xenix), reformatting destroys special "bad sector" flags in the address fields of bad blocks. These are used to construct a map later used by the kernel to make all partitions appear "clean"; if one reformats the disk without noting previously-bad sectors, they may come back to bite later. Thus, I'm reluctant to tell someone to reformat their disk unless they're getting "Address Not Found" errors. But I'm beginning to wonder: after the address marks are written on a disk during formatting, as the years go by, do they gradually "entrophy" (atrophy via entropy!), or melt into the noise? -- _____-__---__-_----_-__-_-_----__-_--_--___-_---__-__-_-_-_--__-_-__--___-_____ -Charles Green at C3 Inc. {{styx!seismo,cvl}!decuac,dolqci}!c3pe!charles You hear the howling of the Winchester. The voltage spike hits! You crash.-More
grr@cbmvax.cbm.UUCP (George Robbins) (08/08/86)
In article <217@c3pe.UUCP> charles@c3pe.UUCP (Charles Green) writes: > >But I'm beginning to wonder: after the address marks are written on a disk >during formatting, as the years go by, do they gradually "entrophy" >(atrophy via entropy!), or melt into the noise? Yes, there is such a thing as 'bitrot', but I'm sure you could get quite a bit of argument going on the subject... The following notions should be considered: Magnetic storage devices work when a magnetic field from a read/write head magnitizes the vast majority of magnetic particles in an area on the media in the same orientation. Not every particle particle cooperates, and some are likely to change, as a function of time and temperature. Also, the head assembly may retain a slight magnetic field, and/or be subject to leakage currents from the circuitry. This can also encourage particles to change orientation. As the number of appropriately oriented particles diminish, the signal to noise ratio seen by the read circuitry will decrease and you may eventually see problems. Note that there are other problems that can cause similar symptoms, such as media wear, gradual change of drive speed, thermal effects and shifts in positioner repeatability. Drives that do not move the heads to a parking postion may also suffer occasional glitches on power down. The problem is not limited to 5.25" drives, I've had problems with 100MB disk pack drives, where 'read-only' system packs had to be reformatted maybe twice a year when random errors started to occur. (note that ECC and/or track substitution was not involved here). Thoughtful comment appreciated... -- George Robbins - now working with, uucp: {ihnp4|seismo|caip}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
cprice@vianet.UUCP (Charlie Price) (08/08/86)
> -Charles Green at C3 Inc. {{styx!seismo,cvl}!decuac,dolqci}!c3pe!charles > But I'm beginning to wonder: after the address marks are written on a disk > during formatting, as the years go by, do they gradually "entrophy" > (atrophy via entropy!), or melt into the noise? The answer is -- YES. Data (and "format info" is just data) *can* degrade on a magnetic disk; though not just from some random evaporation into the air. I used to work for Storage Technology Corp (in the very recent past) and I'm familiar with at least one mechanism for gradually degrading recorded data on the disk. Of course, Storage Tek makes fairly big disks, (2.5 Gbyte Head Disk Assembly using 14" disks) but the physics is the same. In a winchester technology disk you have read/write heads flying REALLY CLOSE to a disk. What happens if there are any little particles of gruk in the drive? If it is the right kind of gruk and the right sized particles the particle can either provide a material to rub "under" the head or it can just bang the head around and cause it to "bounce" and momentarily touch down on the surface. If this is really bad, you have a crash in the making. If it isn't too bad, the contact (in the disk business this is called head-disk-interface) maybe knocks some more particles loose from the disk surface and generatates a whole bunch of short-lived localized heat. If you heat up a magnetized material above some particular temperature for the material, called the curie point, the magnetic domains can move. Since the media isn't in a strong field here, it will probably demagnetize. If this happens repeatedly in the same area, the stored data can actually degrade to the point it can't be read. Though they believe it had always been going on, Storage Tech only noticed this behavior with the most recent generation of drive technology (very low-mass thin-film heads flying REALLY close to the media surface). [A cleaner clean room eliminated the problem]. A typical cheap winchester is using technology that isn't as prone to this sort of problem (head flight fairly far away from the disk). If it weren't build-it-and-ship-it technology the drives would be too expensive. Gradually degrading behavior on a disk drive can indicate that it is gradually getting dirtier (start-stop can kick loose particles). Reformatting and/or rewriting all data CAN help but clearly doesn't make the problem go away. -- Charlie Price {hao stcvax nbires}!vianet!cprice (303) 440-0700 ViaNetix, Inc. / 2900 Center Green Ct. South / Boulder, CO 80301
jc@sdcsvax.UUCP (John Cornelius) (08/14/86)
Bits normally do not rot on magnetic media, at least not in the lifetime of a winchester disk drive. Bits have been known to spread and/or migrate on reels of magnetic tape that have been stored for long periods of time (7-10 years) but one would not expect this behaviour on a disk drive. The most probable causes of 'bit rot' on a disk drive are: 1) Worn or defective erase heads, might be caused by head crashes or chronic power off on the disk. When you power down a winchester the heads crash and are subject to wear. 2) Worn or defective write heads, they can spread the bits out resulting in lower signal/noise ratio. 3) Defective media. The retentivity of the media may not be good enough. The tendency of magnetic media is toward a uniform polarization. The media is designed to have a half life in the tens of years and in some cases hundreds of years. The drive itself should disintegrate before the half life is reached. On the other hand, nobody's perfect and on occasion less than ideal media winds up in disk drives. 4) Impurities in the HDA environment can make the heads less sensitive and less precise. This is the infamous problem with the RA-81 where glue vaporized inside the HDA and began coating the heads and disks. The usually untimely result is a catastrophic head crash but read errors often precede the crash so you have some warning. 5) Defective head selection matrix resulting in small write currents on unselected heads during writing. This will often be followed by a catastrophic failure and hard 'write-fault' or 'head select' errors. The first two of these causes can be avoided by never turning your winchester off. The last three are harder to avoid but judicious selection of disk vendors can be a help. Lowest purchase price does not usually have anything to do with lowest cost to own. The wisdom of leaving your winchester running, even if the system it is connected to is not running, cannot be too heavily stressed. Winchesters are designed for a continuous operating environment, not a sporadic one. There is a school of thought that being nice to your disk drive involves turning it off when it is not in use. I recognize that this thinking has some intuitive basis but it is, alas, quite incorrect. John Cornelius aka jc@sdcsvax
geoff@desint.UUCP (Geoff Kuenning) (08/16/86)
In article <1978@sdcsvax.UUCP> jc@sdcsvax.UUCP (John Cornelius) writes: > The wisdom of leaving your winchester running, even if the system it is > connected to is not running, cannot be too heavily stressed. Winchesters are > designed for a continuous operating environment, not a sporadic one. There is > a school of thought that being nice to your disk drive involves turning it off > when it is not in use. I recognize that this thinking has some intuitive > basis but it is, alas, quite incorrect. I wonder if John could give us some references to support this contention. In particular, one of the failure modes I have seen in Winchesters is bearing failure. Bearing wear is directly related to on-time, not to the number of startup/shutdown cycles. Let's remember that a lot of Winchesters are spec'ed with MTBF's of 10,000 hours or less. There are 8760 hours in a year, so if you leave your Winchesters on 24 hours a day, you can expect the average one to fail after about 14 months. -- Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff
bass@dmsd.UUCP (John Bass) (08/18/86)
In article <247@desint.UUCP>, geoff@desint.UUCP (Geoff Kuenning) writes: >In article <1978@sdcsvax.UUCP> jc@sdcsvax.UUCP (John Cornelius) writes: > >>The wisdom of leaving your winchester running, even if the system it is >>connected to is not running, cannot be too heavily stressed. Winchesters are >>designed for a continuous operating environment, not a sporadic one. There is >>a school of thought that being nice to your disk drive involves turning it off >>when it is not in use. I recognize that this thinking has some intuitive >>basis but it is, alas, quite incorrect. > > I wonder if John could give us some references to support this contention. > In particular, one of the failure modes I have seen in Winchesters is > bearing failure. Bearing wear is directly related to on-time, not to > the number of startup/shutdown cycles. Sorry, but bearing wear is also a function of the number of cold starts, running temp, and thermal cycling. On-time is just one componet in the life factor. Furthermore the media/head life is also a function of start/stops, as is the life of the spindle motor control circuit in most smaller drives (startup current rush). > > Let's remember that a lot of Winchesters are spec'ed with MTBF's of > 10,000 hours or less. There are 8760 hours in a year, so if you leave > your Winchesters on 24 hours a day, you can expect the average one to > fail after about 14 months. > -- Most vendors don't spec the number of cold start cycles, the number of host start cycles, or the effects of thermal cycling on life. I don't think very many drives will run over 1,000 hours of a power/thermal cycling combination. I think that a survey of 10 drives under continuous service compared to 10 drives under cycling of 1 hour on/off will result in a VERY skewed comparison favoring continuous duty when plotted again operating time. This cycling rate is not out of line, given most desk top micro usage is for a very short interval. -- John Bass (DBA:DMS Design) DMS Design (System Design, Performance and Arch Consultants) {dual,fortune,polyslo,hpda}!dmsd!bass (805) 541-1575