[unix-pc.general] perl-based UNIXPC disk error message browser

mkp@taqwa.UUCP (Michael K. Peterson) (06/25/89)

I suppose others have written awk scripts or whatever to glean clues from
the unix.log file, but I haven't seen anything posted, so here goes.

I put some of the high points from John Milton's "Hardware Notes #13" 
into a perl script, and it made deciphering the hard disk error messages
in my unix.log file much easier. John's discussion of the WD1010 error
register is included in the comment block at the beginning of the script.
Also, there are three variables, $HEADS, $BADLIST, and $SWAPSIZE, that you'll 
want to change for your particular disk.  (No provision as yet for
more than one drive.)

If you have perl, you might find this useful; otherwise, just hit
the 'n' key...


Mike Peterson                         Domain: mkp@mti.com
Micro Technology, Inc.                  UUCP: uunet!mti!mkp
5065 E. Hunter Ave., Anaheim, CA 92807  home: ...!{mti,hacgate}!taqwa!mkp


#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed tncli as standard input via
# unshar, or by typing "sh <file", e.g..  If tncli archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  badblocks
# Wrapped by mkp@taqwa on Sat Jun 24 11:09:08 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'badblocks' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'badblocks'\"
else
echo shar: Extracting \"'badblocks'\" \(3457 characters\)
sed "s/^X//" >'badblocks' <<'END_OF_FILE'
Xeval "exec /usr/bin/perl -S $0 $*"
X	if $running_under_some_shell;
X
X#
X# perl hack for interpreting UNIXPC hard disk error messages
X#
X# example log line:
X#
X# pid 0: #HDERR ST:51 EF:10 CL:42D4 CH:4201 SN:4200 SC:4202 \
X# SDH:4223 DMACNT:FFFF DCRREG:93 MCRREG:9100 Fri May 19 19:25:14 1989
X#
X# Calculations courtesy of John B. Milton IV, "Hardware Notes # 13"
X# jbm@uncle.uucp 
X#
X# sez Milton:
X#
X# EF: The "error register" from the WD1010
X#   Bit 7 Bad Block Detect. From what I can tell about how things are done on
X#     our systems, this feature is not used. We use a direct mapping method where
X#     the position of bad blocks is determined by the bad block table. If this
X#     gets turned on, it is some kind of glitch on the disk.
X#   Bit 6 CRC Data Field. This one deserves a direct quote:
X#       "This bit is set when a CRC error occures in the data
X#        field. With Retry enabled, ten more attempts are made
X#        to read the sector correctly. If none of these attempts
X#        are successful, the Error Status is set also (bit 0 in
X#        the Status Register). If one of the attempts is suc-
X#        cessful, this bit remains set to inform the Host that
X#        a marginal condition exists. However, the Error Status
X#        bit is not set. Even if errors exist, the data can be read."
X#     On our machines, if bits 7, 5, 1 or 0 are set or if the error register is
X#     not zero!, or if there was DMA trouble, an HDERR message will be printed.
X#     This is extremely good. It means every time there is the slightest flicker
X#     in the data, you will get an error message. If you get only one, the error
X#     is probably transient and does not mean anything. You should NOT try to
X#     lock out the block! If you get a bunch of CRC errors, but a good read,
X#     this is probably a weak spot and should be locked out.
X#   Bit 5 Reserved. Always zero.
X#   Bit 4 ID not found. Like CRC, this bit is set when the ID field for the
X#     requested sector can not be found, or has a bad CRC.
X#   Bit 3 Reserved. Always zero.
X#   Bit 2 Aborted Command. Should never happen on our system. If you get it, it
X#     probably means BAD power line trouble.
X#   Bit 1 Track Zero Error. This is very bad, and usually indicates a very bad
X#     hardware failure in the drive, so you'll never see it until you get a
X#     second hard drive on your system :)
X#   Bit 0 Data Address Mark Not Found. Yet another thing not found.
X#
X
X# The following are drive specific; set for your drive.
X# You can get these values by running:  iv -tv /dev/rfp000
X# Don't screw up.
X$HEADS = 8;
X$BADLIST = 64;
X$SWAPSIZE = 5000;
X
Xopen(unixlog, 'grep HDERR /usr/adm/unix.log|') || 
X	die('cannot open & sort unix.log');
X
Xwhile(<unixlog>)
X{
X	if(/EF:(..) CL:..(..) CH:..(..) SN:..(..) SC:..(..) SDH:...(.)/)
X	{
X		$err = hex($1);
X		$locyl = hex($2);
X		$hicyl = hex($3);
X		$secnum = hex($4);
X		$count = hex($5);
X		$head = hex($6&0x7);
X		$cyl = $hicyl*256 + $locyl;
X
X		$sector = (((($cyl)*$HEADS)+($head))*16) + $secnum;
X		$absblock = $sector/2;
X		$block = $absblock - ($BADLIST + $SWAPSIZE);
X		printf("    %4d/%d/%d\t%6d\t", $cyl, $head, $secnum, $block);
X		if($err&0x040)
X		{
X			printf("<CRC>");
X		}
X		if($err&0x10)
X		{
X			printf("<ID>");
X		}
X		if($err&0x4)
X		{
X			printf("<ACMD>");
X		}
X		if($err&0x2)
X		{
X			printf("<TZE>");
X		}
X		if($err&0x1)
X		{
X			printf("<DAM>");
X		}
X		if(/MCRREG:.... (........................)/)
X		{
X			printf("\t%s", $1);
X		}
X		printf("\n");
X
X	}
X}
END_OF_FILE
if test 3457 -ne `wc -c <'badblocks'`; then
    echo shar: \"'badblocks'\" unpacked with wrong size!
fi
chmod +x 'badblocks'
# end of 'badblocks'
fi
echo shar: End of shell archive.
exit 0
-- 
Mike Peterson                       Internet: mkp@mti.com
Micro Technology, Inc.                  UUCP: uunet!mti!mkp
5065 E. Hunter Ave., Anaheim, CA 92807  home: ...!hacgate!taqwa!mkp
              		

jbm@uncle.UUCP (John B. Milton) (06/27/89)

In article <19@taqwa.UUCP> mkp@taqwa.UUCP (Michael K. Peterson) writes:
>
>I suppose others have written awk scripts or whatever to glean clues from
>the unix.log file, but I haven't seen anything posted, so here goes.
>
>I put some of the high points from John Milton's "Hardware Notes #13" 
>into a perl script, and it made deciphering the hard disk error messages
>in my unix.log file much easier. John's discussion of the WD1010 error
>register is included in the comment block at the beginning of the script.
>Also, there are three variables, $HEADS, $BADLIST, and $SWAPSIZE, that you'll 
>want to change for your particular disk.  (No provision as yet for
>more than one drive.)

Ooopss. I guess I should have posted my program when I got it working :)
This program does the same thing. It is C, and check the VHB to get the sizes
of the partitions, and heads. It prints out the offsets of the bad blocks found
into each partition. Use post awk processing if you want to automate it's use,
or just hack the code. I use the output of this with bf and ncheck to track
down the file in which a bad spot appears. I have not yet integrated this into
my nightly unix.log backup routine.

John
---
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  chs2blk.c
# Wrapped by jbm@uncle on Mon Jun 26 21:39:35 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'chs2blk.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'chs2blk.c'\"
else
echo shar: Extracting \"'chs2blk.c'\" \(2355 characters\)
sed "s/^X//" >'chs2blk.c' <<'END_OF_FILE'
X/* vi:set ts=2 sw=2: */
X
X#include <fcntl.h>
X#include <stdio.h>
X#include <string.h>
X#include <sys/gdisk.h>
X
Xextern int errno;
Xextern int optind;
Xextern char *optarg;
X
X#define LINESIZE 200
X#define SCTSPERTRK 4 /* shift factor */
X#define SCTPERBLK 2
X#define BLKPERTRK 8 /* 16 512 bytes sectors is 8 1024 bytes blocks */
X#define PART0 72
X#define PART1 5000
X
Xchar *me,Debug=0;
X
Xstatic void show(dname,f)
Xchar *dname;
XFILE *f;
X{
X	int ST,EF,CL,CH,SN,SC,SDH;
X	int block,cyl,drive,head,i,sector,v;
X	char line[LINESIZE];
X	struct vhbd vhb;
X	struct gdswprt dsk;
X
X	if ((v=open(dname))==-1)
X		qperrorf("%s: open device \"%s\"",me,dname);
X	if ((v=read(v,&vhb,sizeof(struct vhbd))==-1))
X		qperrorf("%s: read VHB from device \"%s\"",me,dname);
X	if (vhb.magic!=VHBMAGIC) {
X		fprintf(stderr,"%s: bad magic in VHB (slice zero!,%x!=%x)\n",
X			me,vhb.magic,VHBMAGIC);
X		exit(1);
X	}
X	while (fgets(line,LINESIZE,f)!=NULL) {
X		if (strncmp(line,"HDERR",5)==0) {
X			sscanf(line,"HDERR ST:%x EF:%x CL:%x CH:%x SN:%x SC:%x SDH:%x",
X				&ST,&EF,&CL,&CH,&SN,&SC,&SDH);
X			/* SDH: ESSDDHHH */
X			drive=(SDH&0x18)>>3;
X			cyl=((CH&0xff)<<8)+(CL&0xff);
X			head=SDH&0x07;
X			sector=SN&0x1f;
X			if (Debug) {
X				*strchr(line,'\n')=='\0';
X				fprintf(stderr,"Found: \"%s\"\n",line);
X				fprintf(stderr,"%x %x %x %x %x %x %x\n",ST,EF,CL,CH,SN,SC,SDH);
X				fprintf(stderr,"Drive: %d, Cylinder: %d, Head: %d, Sector: %d\n",
X					drive,cyl,head,sector);
X			}
X			block=((((cyl*vhb.dsk.heads)+head)<<SCTSPERTRK)+sector)/SCTPERBLK;
X			printf("disk block:%d",block);
X			for (i=1; vhb.partab[i].sz.strk && i<MAXSLICE; i++)
X				printf("; part:%d, block:%d",
X					i,block-vhb.partab[i].sz.strk*BLKPERTRK);
X			putchar('\n');
X			if (Debug)
X				fflush(stdout);
X		}
X	}
X}
X
Xstatic char *usage="Usage: %s drive [errfile]\n";
X
Xint main(argc,argv)
Xint argc;
Xchar *argv[];
X{
X	int opt;
X	FILE *f;
X
X	(me=strrchr(argv[0],'/'))==NULL?me=argv[0]:me++;
X	while ((opt=getopt(argc,argv,"d?"))!=EOF)
X		switch (opt) { /* optarg, optind */
X			case 'd':
X				Debug=1;
X				break;
X			case '?':
X			default:
X				fprintf(stderr,usage,me);
X				exit(1);
X		}
X	if (argc-optind<1 || argc-optind>2) {
X		fprintf(stderr,usage,me);
X		exit(1);
X	}
X	if (argc-optind==1)
X		show(argv[optind],stdin);
X	else
X		if ((f=fopen(argv[optind+1],"r"))==NULL)
X			perrorf("%s: open %s",me,argv[optind+1]);
X		else {
X			show(argv[optind],f);
X			fclose(f);
X		}
X}
END_OF_FILE
if test 2355 -ne `wc -c <'chs2blk.c'`; then
    echo shar: \"'chs2blk.c'\" unpacked with wrong size!
fi
# end of 'chs2blk.c'
fi
echo shar: End of shell archive.
exit 0
-- 
John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu
(614) h:294-4823, w:466-9324; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!

mkp@taqwa.UUCP (Michael K. Peterson) (06/28/89)

In article <565@uncle.UUCP> jbm@uncle.UUCP (John B. Milton) writes:
>In article <19@taqwa.UUCP> mkp@taqwa.UUCP (Michael K. Peterson) writes:
>>
>>I suppose others have written awk scripts or whatever to glean clues from
>>the unix.log file, but I haven't seen anything posted, so here goes.
>>
>
>Ooopss. I guess I should have posted my program when I got it working :)
>This program does the same thing. 

Well, hey, all I was trying to do was prod Milton into posting his program --
I figured he must have one by now... ;-)
-- 
Mike Peterson                       Internet: mkp@mti.com
Micro Technology, Inc.                  UUCP: uunet!mti!mkp
5065 E. Hunter Ave., Anaheim, CA 92807  home: ...!hacgate!taqwa!mkp