[comp.unix.wizards] Location of seek pointer after read error?

roy@alanine.phri.nyu.edu (Roy Smith) (07/24/90)

The SunOS-3.5.2 man page for read(2) says:

	On objects capable of seeking, the read starts at a position
	given by the pointer associated with d (see lseek(2)).  Upon
	return from read, the pointer is incremented by  the  number
	of bytes actually read.

Now, if you are reading from a raw disk partition (say /dev/rxy0a) and get
a read error (because, for example, there is a bad block on the disk),
where should the pointer be after the read(2) call returns?  It turns out
that, at least for SunOS-3.5.2, the pointer is incremented, as if the bytes
in the bad block had actually been read.  I would consider this incorrect
behavior.  Do you agree?

This came up when I was trying to recover a disk that had started to go
sour.  I was getting lots of read errors (turned out to be a controller
problem, not a drive problem) and wanted to recover all the data on the
disk.  I played around and discovered that it looked like the errors were
all soft, and that if I just retried them enough times, I would be able to
read everything on the disk.  Just dd'ing the partition to another disk
didn't work, because dd's idea of "conv=noerror" is to just skip the block
and keep going, not to retry it.  What I put together was something to read
each block in turn, retrying every read that failed as many times as needed
to get an error-free read, and then writing the block to another disk.
After each read that failed, I had to do a seek to back up a block, other
wise I got the next block, not a retry of the one that failed.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

domo@tsa.co.uk (Dominic Dunlop) (07/25/90)

From:  Dominic Dunlop <domo@tsa.co.uk>

[Moderator: please cross-post to comp.unix.wizards -- or let me know that
you won't cross-post to unmoderated groups]

[I prefer not to cross post, but I sometimes do so if the number of
newsgroups is small, the subject matter is appropriate, and especially
if there's a Followup-To.  -mod]

In article <1990Jul23.171022.17798@phri.nyu.edu> roy@alanine.phri.nyu.edu
(Roy Smith) writes:
>The SunOS-3.5.2 man page for read(2) says:
>
>	On objects capable of seeking, the read starts at a position
>	given by the pointer associated with d (see lseek(2)).  Upon
>	return from read, the pointer is incremented by  the  number
>	of bytes actually read.
>
Ah.  Isn't this interesting?  Here's what POSIX.1 (ANSI/IEEE Std.
1003.1:1988) has to say:

	On a regular file or other file capable of seeking, read() shall
	start at a position in the file given by the file offset associated
	with fildes.  Before successful return from read(), the file
	offset shall be incremented by the number of bytes actually read.

>Now, if you are reading from a raw disk partition (say /dev/rxy0a) and get
>a read error (because, for example, there is a bad block on the disk),
>where should the pointer be after the read(2) call returns?  It turns out
>that, at least for SunOS-3.5.2, the pointer is incremented, as if the bytes
>in the bad block had actually been read.  I would consider this incorrect
>behavior.  Do you agree?
>
Looking at the tighter and arguably sneakier wording of the standard,
it appears that all bets are off as to the value of the file offset
after an error.  Sure enough, the rationale says:

	The standard does not specify the value of the file offset after an
	error is returned; there are too many cases.  For programming
	errors, such as [EBADF], the concept is meaningless since no file
	is involved.  For errors that are detected immediately, such as
	[EAGAIN], clearly the pointer should not change.  After an
	interrupt or hardware error, however, an updated value would be
	very useful, and this is the behavior of many implementations.

	References to actions taken on an ``unrecoverable error'' have been
	removed [from the standard].  It is considered beyond the scope of
	this standard to describe what happens in the case of hardware
	errors.

So, you'll be nonplussed to learn that SunOS' behaviour, which I agree is
less useful than it could be, is POSIX-conformant.

>[Description of writing program which repeatedly seeked back to start of
>failing blocks, and so eventually recovered slightly soft errors deleted.]

Should Sun wish to modify their drivers so that the file pointer points to
the start of a failing block after an error, that behaviour too would be
POSIX conformant.   You can't legislate for everything...

>"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

Wouldn't be POSIX either...
-- 
Dominic Dunlop

Volume-Number: Volume 20, Number 144