andy@cheviot.UUCP (Andy Linton) (03/19/85)
There has been a lot of traffic about UDA-50 devices on the net and I was confused about bad blocks etc. on them. I asked our Dec Service Engineer for more info and he produced the following: ************************** ALL YOU WANT TO KNOW ABOUT BAD BLOCK REPLACEMENT AND MORE Introduction The purpose of this transmission is to inform the readership of the differences between Dec standard 144 and 166 disk media. Dec Standard 144 media: Rx01/2, R101/2, Rk05J/F, Rk06/7, Rp04/5/6/7, R80, Rm02/3/5/80. Above are some of the media that falls into the Dec standard 144 classification. A general rule to thumb is, any massbus disk media conforms to this Standard. The rule may change, but in general the rule holds true. The above also includes serial and parallel drive subsystems, i.e. the R10/2, Rk06/7 and Rk05, Rx01/2. Dec Standard 166 media: Ra60/1, Ra80/1/2. Above are some of the media that falls into the Dec standard 166 classification. A general rule here is, if it plugs into a UDA-50 or a UDA-50 emulator, its 166 media. Differences with respect to bad blocking: Bad blocking, by definition, is the generation of a file by a software utility that contains information with respect to pattern sensitive or unreadable areas of the media under test. With the exception of Rx and Rk05 media, the manufacture of the media tests and creates a manufacturer's bad block area. One of the major differences between these media is that on 166 additions to the manufactures area are not allowed as on 144. Another major difference between these standards are the number of bad blocks allowed, i.e. 61 entries (Rp06) on 144 vs 17 thousand (Ra81) on 166. The above manufacture areas differ greatly between these two media. On the Ra series this table is known as the Factory Control Table, or the FCT. This table could be loosely compared with the Rp series manufacture's bad block, i.e. they both contain bad blocks found during manufacture, but this assumption is misleading. During the initialisation process on a RPxx pack, the manufacture's bad block table is read and normally, dependent on the operating system, two separate files are generated. During the initialisation process on a Ra pack the FCT table is not readable and we therefore create two files with null entries, assuming our initialisation process doesn't know about 166 media. If we compare what occurs on major operating systems during bad block detection, I hope the reader can make sense of the above statement. On, lets say an initialised RPxx on a running system, a bad block develops. The drive subsystem reports a hard ECC error to the operating system, the actions taken by the operating system on receipt of this error normally takes the form of x number of retries with offsets. If at the end of the day the error reported by the subsystem still is an ECH, hard uncorrectable ECC, an addition to a file, lets say, badblock.log is made. The resultant actions taken by the system is; one, the data in that block is lost, and two, that block is never used again during write operations. How this is accomplished, again dependent on operating systems, is that a mount time or detection, the badblock files are read and stored in memory. In other words, it becomes a system overhead. This differs with respect to the RAx series. On mount the same action occurs, as on the RPx subsystem, but since there's no entries (lets say), no harm is done here. As we write information onto this structure the RAx micor processor notes from the target header that the block is bad, it consults the Re-Vector Control Table, RCT, where the data should be written, i.e. where has the block been re-vectored and thus after the write, a re-vector is accomplished without system intervention. The RCT is a direct copy of the FTC, both these tables are not directly accessible, at present, by any operating system other than the applicable engineering diagnostic. If during a read operation the subsystem reports an ECC error and the operating system supports Bad Block Replacement, BBR, the system, dependent on the reported error, i.e. 1-8 symbol ECC errors, can determine when it wants to re-vector the block prior to the data degrading to unreadable. If it is ascertained that the data is unreadable, worst case, ECH, a four phase process is started. The first phase; the error, hard or recoverable, is reported via the UDA-50 to the operating system. If hard or limit is reached the system starts phase two; recover data, test block, and report findings on suspect block. What happens during this phase is the data is read and written into a scratch area of the RCT and a test pattern similar to the read data is re-written. Error information is then passed back to the system after a re-read, "yes bad block". System says go, phase three please, find and test primary or secondary replacement block, mark header of bad block as bad, add block to RCT, and report errors or when finished. Go phase four, write data to re-vectored block, if ECH occurred during read, write good ECC but invert EDC bit to notify system that a forced replacement occurred, i.e. data had ECH must be re-written or restored from whatever last backup media used. Summary I hope the Readership can tell from the above, BBR if implemented, will protect data to a level not previously thought possible. Drive micro code determines at what error limit BBR kicks off. If you'll note from the above only the RCT is updated with additional bad blocks not the FCT. If a reformat is done on the device the RCT data is zero'ed and FCT information replaced. This implies any additional blocks to your users, assuming the formatter doesn't find these pattern sensitive areas. I would only recommend that a re-format be done after gross numbers of re-vectors, due to read/write problems. If inverted edc's are a problem, have engineering write to the customer area using /sec:manual, your data will be lost but this action will re-invert edc's, it will not lose the good information held in the RCT. Regards Ed Merrill Country support Engineer Internal Consultancy group Basingstoke, England 44-256-56101 ext 3778 ****************** I hope this is of some interest to those of you who have problems with Ra81's (as I do). Andy Aindrias Mac Giolla Fhionntain - Computing Lab., U of Newcastle upon Tyne ARPA : andy%cheviot%newcastle.mailnet@MIT-MULTICS.ARPA UUCP : UK!ukc!cheviot!andy *** Ni fui moran beagan d'aon rud, ach is fui moran beagan ceille. ***