[net.unix-wizards] read

stevens@hsi.UUCP (11/08/83)

On a VAX running 4.1bsd I found some code in a tape reading program
that went

	int length;

	length = read(tapefd, buf, 62000);
	if (length < 0)
		/* error */ ...
	if (length != -1)
		length &= 0xffff;

I removed the mask and sure enough, a read of a 3120 byte tape record
returned a length of 65536+3120, hence the mask was required.
I reduced the buffer size and the length to read to 30,000 bytes and
the returned value became 3120, so the mask was not needed.

Anyone know whats going on ?  (The tape driver is tm.c.)

	Richard Stevens
	Health Systems International, New Haven, CT
	{ decvax | hao | seismo | sdcsvax } ! kpno ! hsi ! stevens
                                             ihnp4 ! hsi ! stevens

chris@umcp-cs.UUCP (11/13/83)

Re:  read () from a tape gives ridiculous return value.  Bug is on
4.1BSD and possibly other versions of Unix].

Welcome to the Wonderful World of Weirdness -- tape drives.  The
4.1BSD tm.c has a sign extension bug when filling in b->b_resid.
CVL has a fixed version of tm.c; here's a diff listing, with my
comments, and some irrelevant stuff (Digi-Data stuff, and some
error logging) removed.

*** /usr/src/sys/dev/tm.c	Fri Oct  9 17:24:17 1981 [4.1 version]
--- tm.c	Sat Nov 12 19:07:53 1983		[CVL's fixed version]
***************
This one looks like someone wasn't thinking...
*** 100,106
  	u_short	sc_dens;	/* prototype command with density info */
  	daddr_t	sc_timo;	/* time until timeout expires */
  	short	sc_tact;	/* timeout is active */
! } te_softc[NTM];
  #ifdef unneeded
  int	tmgapsdcnt;		/* DEBUG */
  #endif

--- 100,117 -----
  	u_short	sc_dens;	/* prototype command with density info */
  	daddr_t	sc_timo;	/* time until timeout expires */
  	short	sc_tact;	/* timeout is active */
! } te_softc[NTE];	/* was NTM - JIP, CVL, 12/30/82 */
  #ifdef unneeded
  int	tmgapsdcnt;		/* DEBUG */
  #endif
***************
Not all tape drives are the same speed...
*** 422,427
  		 * Set next state; give 5 minutes to complete
  		 * rewind, or 10 seconds per iteration (minimum 60
  		 * seconds and max 5 minutes) to complete other ops.
  		 */
  		if (bp->b_command == TM_REW) {
  			um->um_tab.b_active = SREW;

--- 433,440 -----
  		 * Set next state; give 5 minutes to complete
  		 * rewind, or 10 seconds per iteration (minimum 60
  		 * seconds and max 5 minutes) to complete other ops.
+ 		 * Changed to allow 30 seconds per iteration, 10 min max,
+ 		 *  with 10 min rewind JIP
  		 */
  		if (bp->b_command == TM_REW) {
  			um->um_tab.b_active = SREW;
***************
*** 425,431
  		 */
  		if (bp->b_command == TM_REW) {
  			um->um_tab.b_active = SREW;
! 			sc->sc_timo = 5 * 60;
  		} else {
  			um->um_tab.b_active = SCOM;
  			sc->sc_timo =

--- 438,444 -----
  		 */
  		if (bp->b_command == TM_REW) {
  			um->um_tab.b_active = SREW;
! 			sc->sc_timo = 10 * 60;
  		} else {
  			um->um_tab.b_active = SCOM;
  			sc->sc_timo =
***************
*** 429,435
  		} else {
  			um->um_tab.b_active = SCOM;
  			sc->sc_timo =
! 				imin(imax(10*(int)-bp->b_repcnt,60),5*60);
  		}
  		if (bp->b_command == TM_SFORW || bp->b_command == TM_SREV)
  			addr->tmbc = bp->b_repcnt;

--- 442,448 -----
  		} else {
  			um->um_tab.b_active = SCOM;
  			sc->sc_timo =
! 				imin(imax(30*(int)-bp->b_repcnt,60),10*60);
  		}
  		if (bp->b_command == TM_SFORW || bp->b_command == TM_SREV)
  			addr->tmbc = bp->b_repcnt;
***************
I'm not sure why this change... maybe it has something to do with the
error logging.  [note: I collapsed two diff entries]
*** 616,622
  		 * If we were reading raw tape and the only error was that the
  		 * record was too long, then we don't consider this an error.
  		 */
! 		if (bp == &rtmbuf[TMUNIT(bp->b_dev)] && (bp->b_flags&B_READ) &&
  		    (addr->tmer&(TMER_HARD|TMER_SOFT)) == TMER_RLE)
!  			goto ignoreerr;
  		/*

--- 635,641 -----
  		 * If we were reading raw tape and the only error was that the
  		 * record was too long, then we don't consider this an error.
  		 */
! /*		if (bp == &rtmbuf[TMUNIT(bp->b_dev)] && (bp->b_flags&B_READ) &&
  		    (addr->tmer&(TMER_HARD|TMER_SOFT)) == TMER_RLE)
!  			goto ignoreerr;		JIP CVL */
  		/*
***************
*** 629,635
  				ubadone(um);
  				goto opcont;
  			}
! 		} else
  			/*
  			 * Hard or non-i/o errors on non-raw tape
  			 * cause it to close.

--- 656,662 -----
  				ubadone(um);
  				goto opcont;
  			}
! 		} else {
  			/*
  			 * Hard or non-i/o errors on non-raw tape
  			 * cause it to close.
***************
*** 634,639
  			 * Hard or non-i/o errors on non-raw tape
  			 * cause it to close.
  			 */
  			if (sc->sc_openf>0 && bp != &rtmbuf[TMUNIT(bp->b_dev)])
  				sc->sc_openf = -1;
  		/*

--- 661,668 -----
  			 * Hard or non-i/o errors on non-raw tape
  			 * cause it to close.
  			 */
+ 	/* JIP CVL */	if ((addr->tmer&TMER_HARD)==0 &&
+ 				um->um_tab.b_errcnt) goto ignoreerr;
  			if (sc->sc_openf>0 && bp != &rtmbuf[TMUNIT(bp->b_dev)])
  				sc->sc_openf = -1;
  		}
***************
*** 636,641
  			 */
  			if (sc->sc_openf>0 && bp != &rtmbuf[TMUNIT(bp->b_dev)])
  				sc->sc_openf = -1;
  		/*
  		 * Couldn't recover error
  		 */

--- 665,671 -----
  				um->um_tab.b_errcnt) goto ignoreerr;
  			if (sc->sc_openf>0 && bp != &rtmbuf[TMUNIT(bp->b_dev)])
  				sc->sc_openf = -1;
+ 		}
  		/*
  		 * Couldn't recover error
  		 */
***************
[This here is your length error.]
*** 688,694
  	 */
  	um->um_tab.b_errcnt = 0;
  	dp->b_actf = bp->av_forw;
! 	bp->b_resid = -addr->tmbc;
  	ubadone(um);
  	iodone(bp);
  	/*

--- 729,739 -----
  	}
  #endif ERRORLOG
  	dp->b_actf = bp->av_forw;
! 	/* allow for long reads JIP */
! 	/* compiler bug!! casting as (short unsigned) before assigning to
! 	 * long doesn't do anything.
! 	 */
! 	bp->b_resid = (-addr->tmbc) & 0xffff;
  	ubadone(um);
  	iodone(bp);
  	/*
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@CSNet-Relay

buck%nrl-css@sri-unix.UUCP (03/02/84)

From:  Joe Buck <buck@nrl-css>


There are two meanings for portability here; in the more realistic (but
weaker) sense, this is a portable construct. Any C compiler can recover
objects (array, structure, int, etc) of type y written with
write(fd,&y,sizeof(y)) by using read(fd,&y,sizeof(y)).  Of course sizeof(y)
is different on different machines; that's the reason sizeof is included in
the language, to take care of machine dependencies in an elegant way.

There's a second, tougher standard of portability. This is, what if machine
A does the write and machine B does the read? For this case, even ints of
the same size may be nonportable because the VAX and PDP-11 have one way of
ordering bytes and everyone else (almost) has another. You have to encode
everything as char values to have any hope at all of this type of
portability; even then there are problems in that some bytes aren't eight
bits.

In summary, the use of sizeof with read and write is the proper thing to do
and should be encouraged.

ARPA: buck@nrl-css
UUCP: ...!decvax!nrl-css!buck

-Joe

buck%nrl-css@sri-unix.UUCP (03/04/84)

From:  Joe Buck <buck@nrl-css>

Well, almost. On machines with character pointers of different length and
structure from other pointers (and in all cases, just to please lint)
you should say

	read(fd, (char *) &y, sizeof y)

Ok Doug?

By the way, does anyone know of such a Unix implementation (one in which
the statement above, without the cast, won't work?

-Joe

gwyn%brl-vld@sri-unix.UUCP (03/04/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

The (char *) is not "just to please lint".  Different pointer types
in general have different sizes so one MUST coerce the pointer to
the type expected by the function.

I know of C (not UNIX) implementations where the cast is definitely
necessary.  This usually occurs on word-addressible machines where a
(char *) cannot be fully contained in a single word.

You seem to have a funny idea about "lint"'s purpose.

ark@rabbit.UUCP (Andrew Koenig) (03/05/84)

You are better off writing:

	read (fd, (char *) &y, sizeof (y))

It makes a difference on some machines.

hartwell%shasta@sri-unix.UUCP (03/14/84)

From:  Steve Hartwell <hartwell@shasta>

I don't think this is non-portable.  For a machine which has pointers of more
than one width, the compiler can be expected to widen the shorter ones to
the width of the largest as it is pushed onto the argument list, just as
chars are promoted to ints when pushed.  The called function will know it's
stored that way and shorten it if it needs to before it's used.

So it doesn't matter what the type of "y" is in the read call is or what
the actual width of &y is.  It seems simple to me that there should be only
one width of a pointer on the argument stack [Not necessarily the width of
an int, either].

Steve Hartwell, Stanford University

p.s. this also speaks to the NULL vs. 0 vs. ((char *) 0) issue.

pb%camjenny@ucl-cs.arpa (03/15/84)

From:  Piete Brooks <pb%camjenny@ucl-cs.arpa>

Consider the case of a WORD addressed machine, such as the PERQ.
It encodes BYTE pointers specially, shifting them left.
Thus the WORD pointer for an object and it's BYTE pointer are NOT the same.
As the compiler does not know that read expects a BYTE pointer, when given a
WORD pointer, it will not CAST it for you, unless you EXPLICITLY tell it to.
 
It does keep one on one's toes ..........

guy@rlgvax.UUCP (Guy Harris) (03/18/84)

> I don't think this is non-portable.  For a machine which has pointers of more
> than one width, the compiler can be expected to widen the shorter ones to
> the width of the largest as it is pushed onto the argument list, just as
> chars are promoted to ints when pushed.  The called function will know it's
> stored that way and shorten it if it needs to before it's used.

Anyone who expects a C compiler to do this is going to be sorely disappointed.
There is NOTHING in K&R which says that this must be done, and there is no
reason for a compiler to do so.  It is explicitly stated in K&R that integer
and floating point values are coerced to "int" and "double", respectively, so
one would expect this to happen.

> So it doesn't matter what the type of "y" is in the read call is or what
> the actual width of &y is.  It seems simple to me that there should be only
> one width of a pointer on the argument stack [Not necessarily the width of
> an int, either].

WHY?  Why should there be only one width of a pointer on the argument stack?
Just to make life easier for lazy programmers who refuse to write type-correct
code no matter how often they've been told to?  The analogy between different
widths of "int" and different width of pointer breaks down because there is
a semantic difference between "char", "short", "int", and "long" in the C
language that pertains directly to the width of the object; the semantic
difference between "char *" and "int *" has nothing to do with the *width*
of those pointers - any difference or lack of same between their widths is
purely a consequence of the C implementation and of the architecture of the
underlying machine.

> p.s. this also speaks to the NULL vs. 0 vs. ((char *) 0) issue.

No, it doesn't.  Even if a C compiler took the ill-advised step of "widening"
pointers when passed as arguments, this would have no effect on a program
which illegally passed an "int" of 0 to a routine expecting a pointer.

We've repeatedly heard ideas for changes to the C language or the C compiler
to "solve" the "problem" caused by the facts that 1) C has several different
pointer types which may not have identical implementations and 2) that null
pointers in C are represented by coercing the "int" value 0 to a pointer, and
that C has no way of telling the compiler what kinds of arguments a function
takes, so the 0 value must be coerced explicitly with a cast when passed
to a function.  THIS ISN'T A "PROBLEM", FOLKS, AND IT DOESN'T REQUIRE A
"SOLUTION".  The way to "solve" the "problem" is to write type-correct code
and explicitly cast all NULLs or 0s passed as values to routines expecting
pointers.  This requires NO changes to C, or to any correct C compilers; it
merely requires changes to incorrect C code and to the incorrect models of
the C language held by certain programmers.  And if you have trouble finding
all the places you forgot to cast pointers, well, there's a very nice tool -
at least on UNIX - to fix this.  It's called "lint".  USE IT.

This non-problem requires no further debate; there is only one correct way to
deal with it.

	Tired of explaining pointers, and tired of pointing people back to
		K&R,
	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

geoff@callan.UUCP (Geoff Kuenning) (03/19/84)

>	You are better off writing:
>
>		read (fd, (char *) &y, sizeof (y))
>
>	It makes a difference on some machines.

Actually, you are still non-portable if there is a possibility that the data
will be read on a machine different from the one it was written on.  Any of
the following problems might crop up:

	Character sizes differ (yes, there are still 6-, 7-, and 9-bit bytes
		out there--GCOS, for example, uses 9 bits)
	Long sizes differ (less likely but conceivable)
	Byte orderings differ

I ran into the last one trying to read the Bell distribution tapes on a
68000.  "cpio" writes the tape header with the type of construct suggested
above, but writes the tape contents in character form.  If I byte-swap the
contents appropriately, the header gets screwed up because of 68000/vax
byte ordering differences.  "cpio" has a switch ('-c') to solve this problem
by never writing binary data, but Bell (in their infinite wisdom) did not
use this option when writing their distribution tapes.

			 _ _ _ _ _ _ _
(Isn't every computer a |d|i|g|i|t|a|l| computer?)
			 - - - - - - -

			Geoff Kuenning
			Callan Data Systems
			...!ihnp4!sdcrdcf!trwrb!wlbr!callan!geoff

hartwell@Su-Shasta.ARPA (03/22/84)

From:  Steve Hartwell <hartwell@Su-Shasta.ARPA>

I think you should curb your dogma, Guy.

I see a degree of cleanness in the basic datatype promotions from small
ints to "generic" ints, and small floats to "generic" floats (that is,
doubles) when passed as parameters, and I believe that that specification
was made with the intention to aid in simplifying compiler implementation
[ and not just because the architecture of the pdp11 demanded it, as
  suggested to me in a previous letter ].
And I see no reason why this concept should not be generalized to
pointers as well.  Passing a character as an argument to a function
without an explicit cast is hardly reckless abandon; K&R say that
chars are members of the int family and are treated that way.  Why should
passing a pointer be any more stringent?  It seems so much more conceptually
clean to me to say that a (foo *) is a member of the pointer family and
give the compiler implementors (and program writers) a break.

That is why I think that null pointers should be represented as NULL,
whose definition is 0 *cast to a pointer to anything you like*; I don't
believe that it should make a difference whether it is a (char *) 0 or
a (struct _iob *)0.

My view is that program control and data management is complex enough
as it is, and if by clustering basic operative groups {ints, floats, pointers}
can free me from cast slavery then I would argue for it.  Fervent
bible-thumping serves no purpose in a discussion which considers the
merits and limitations of the standard(s); we /know/ what it SAYS;
but you certainly hold no patent on its interpretation, or on what
is worth cross-examining.

Steve Hartwell, Stanford

guy@rlgvax.UUCP (Guy Harris) (03/25/84)

> I think you should curb your dogma, Guy.

Sorry, my dogma is trained as a watchdog and has to bark at intruders.

> I see a degree of cleanness in the basic datatype promotions from small
> ints to "generic" ints, and small floats to "generic" floats (that is,
> doubles) when passed as parameters, and I believe that that specification
> was made with the intention to aid in simplifying compiler implementation
> [ and not just because the architecture of the pdp11 demanded it, as
>   suggested to me in a previous letter ].

It's nice that you believe that, but do you have any evidence to back up
that belief?  Have you asked Dennis Ritchie about this?  By the way,
the architecture of the PDP-11 *doesn't* demand it.  Any instruction which
pushes a byte onto the stack decrements the stack pointer by two, so if
you do a

	movb	frobozz,-(sp)

it'll push a word onto the stack.  However, *nothing* in the PDP-11 architecture
requires that the program referencing frobozz at some offset "frob(sp)"
reference it with word instructions.  And I don't see why it's any cleaner
than the alternative.  As for floats and doubles, my uninformed guess (which
is worth neither more nor less than your uninformed guess) is that one
reason they promote floats to doubles in general is that they didn't want
to have to write a compiler which generated "setd" and "setf" instructions.
Fine, but it doesn't elevate the notion of "design your language around the
person writing the compiler" to a general design principle.  In fact, taken
as such a principle, it's flatly *wrong*.

> And I see no reason why this concept should not be generalized to
> pointers as well.  Passing a character as an argument to a function
> without an explicit cast is hardly reckless abandon; K&R say that
> chars are members of the int family and are treated that way.  Why should
> passing a pointer be any more stringent?

I see no reason why it *should* be generalized to pointers.  Again, your
comparison of "char" <-> "int" and "xxx *" <-> "char *" missed the point -
K&R says "char"s are members of the "int" family but does NOT say ANYTHING
remotely similar about a "pointer family".  As such, passing characters
without explicit casts isn't reckless abandon because K&R says explicitly
that there is such a cast, but passing pointers without explicit casts is
dangerous and wrong because K&R does not promise that any such cast will
be done.  If you don't like the C language's rules for dealing with
pointers as parameters, fine; say that the language should be changed.  Just
don't add your own rules on top of K&R and claim that it's part of C.

To quote from your original article:

> I don't think this is non-portable.  For a machine which has pointers of more
> than one width, the compiler can be expected to widen the shorter ones to
> the width of the largest as it is pushed onto the argument list, just as
> chars are promoted to ints when pushed.  The called function will know it's
> stored that way and shorten it if it needs to before it's used.

This statement is flatly false.  No exceptions, no appeals.  There exist
implementations of C in which the code

	read(fd,&y,sizeof(y));

will not properly execute.  (Proof by counterexample.)  Your claim that
"the compiler can be expected to widen the shorter ones... just as chars
are promoted to ints" is equally incorrect - see the same counterexample,
and see K&R and notice the lack of any such claim.  C is not what people
want it to be; pending either an ANSI C language standard, or public release
of any of AT&T's internal C language standards, C is what K&R says it is.
No more, no less.  If you deny that, you're denying that there is any
authoritative reference manual to C, which would render it useless as a
language for writing portable code.

(By the way, I also note the use of the word "pushed" in the paragraph
quoted.  The term "passed as an argument" should be used, because there's
no guarantee that parameters will be passed on a simple stack.  It indicates
that a lot of the thinking on this question is based on the low-level details
of how C is implemented.  If C is to be used as a portable implementation
language, however, people will just have to forget what they know about the
C implementation most of the time and target their code for an abstract C
implementation; otherwise, when their code is ported to a C implementation
that doesn't reproduce the characteristics of the implementation they wrote
the code for, it may not work.)

> It seems so much more conceptually clean to me to say that a (foo *) is
> a member of the pointer family and give the compiler implementors (and
> program writers) a break.

Again, why is this more conceptually clean?  I haven't thought of pointers
as a generic data type since I stopped programming in PL/I, lo these many
years ago.  ALGOL 68, PASCAL, Modula-2, Mesa, and many other languages have
"pointer" as an adverb, so that you have "pointer to int" and "pointer to
char" and "pointer to frobozz" - C is another of these languages.  Several
of these languages have a generic null pointer, but so does C, in a sense.

> That is why I think that null pointers should be represented as NULL,
> whose definition is 0 *cast to a pointer to anything you like*; I don't
> believe that it should make a difference whether it is a (char *) 0 or
> a (struct _iob *)0.

In all of these languages, except C, you declare the types of the arguments
to a procedure when you want to use the procedure, and the compiler can
automatically generate code to pass a null pointer of the appropriate type.
C currently lacks this facility.  Why not ask for that facility, instead
of changing the language in ways that:

	1) make current reasonable implementations non-conforming; and
	2) cause extra code to be generated (to cast the pointers to
	   this "generic" type when passing them as parameters, and to
	   cast them back to the appropriate type when the pointers
	   are used);

> My view is that program control and data management is complex enough
> as it is, and if by clustering basic operative groups {ints, floats, pointers}
> can free me from cast slavery then I would argue for it.

"program control and data management is complex enough as it is"?  Sorry, son,
if that's a plea for sympathy it fails miserably.  Running your code through
"lint" and throwing in a few casts doesn't cost much on top of the rest of
the work I hope you put into the code you write.  Referring to it as "cast
slavery" is cute but wrong.

> Fervent bible-thumping serves no purpose in a discussion which considers the
> merits and limitations of the standard(s); we /know/ what it SAYS;
> but you certainly hold no patent on its interpretation, or on what
> is worth cross-examining.

1) Show me where you can interpret K&R as *requiring* the treatment of pointers
you desire, not just *permitting* it (which it certainly does).  Otherwise,
my interpretation that it doesn't require your treatment of pointers *is*
the only correct one.  And, if that is the case, code which requires
that treatment of pointers is incorrect code, and is not guaranteed to
work on all implementations of C.

2) Throwing around terms like "Fervent bible-thumping" serves no purpose
in this discussion at all.

Here's what I said in the article you're responding to:

> Anyone who expects a C compiler to do this is going to be sorely disappointed.
> There is NOTHING in K&R which says that this must be done, and there is no
> reason for a compiler to do so.  It is explicitly stated in K&R that integer
> and floating point values are coerced to "int" and "double", respectively, so
> one would expect this to happen.

The second and third sentences are true, as anyone with a copy of K&R can
verify.  The first is also true, if what another poster says about the
Perq C compiler is true, namely that you *do* get bitten if you aren't
type-correct in your handling of pointers.

The rest of my article wasn't bible-thumping, it was expressing frustration
at dealing with code written by people who assume that 0 and NULL can freely
be passed to routines expecting pointers without casting them.  I have to pick
up the "core" files when such a program dies on our 68K-based machines, and
I have to fix them.  I'm justifiably tired of doing so.  (I'm also tired
of dealing with programs that either flatly assume that there's a null
string at whatever location NULL points to, or just assumes that trying to
dereference a NULL pointer is harmless; unfortunately, I suspect people are
going to continue to write that kind of code.  As such, I suspect that even
if C were changed to permit you to explicitly declare the arguments that
a function takes, people would still not bother using it and cause the same
old problems all over again.)

At this point, I say debate about the subject doesn't help much.  The facts
about how the language *is* (not how it *should be*) have been laid out more
times than I can count by several people; if people still aren't convinced that
until the language changes they'll just have to start casting their pointers,
they're not ever going to be convinced.  I'll just hope that most people start
casting their pointers properly, or that some way of declaring function argument
types enters the language and people start using it, and that I rarely have to
deal with code that doesn't properly coerce pointers.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

jbf@ccieng5.UUCP (Jens Bernhard Fiederer) (03/28/84)

Speak for yourself.  The need to cast every null pointer argument to
the specific pointer required IS A PROBLEM WITH THE C LANGUAGE.  It is
a bloody pain, not to mention a waste of time.  I do it, but I would rather
not.  Three cheers for the set of languages that support the "lazy" programmer!
Three cheers for the empty set!

Azhrarn
-- 
Reachable as
	....allegra![rayssd,rlgvax]!ccieng5!jbf
Or just address to 'native of the night' and trust in the forces of evil.