[comp.unix.wizards] Weird Problem with cat

mcneill@eplrx7.UUCP (mcneill) (11/16/88)

Here's a weird one folks:

As superuser on one of our machines I have this problem sometimes:

eplrx7 {root} % cat ~mcneill/.login
cat: write error bad file number

eplrx7 {root} % cat /u3/mcneill/.login
---- works fine ----

eplrx7 {root} % exit

(eplrx7:20) cat ~mcneill/.login
---- works fine ----

su again....

eplrx7 {root} % cat ~mcneill/.login
---- works fine ----

The problem is not confined to ~mcneill/.login.  It happens with different
~users trying to cat different files.  The problem seems to have something
to do with expansion of ~.  The operating system is SUNOS 3.5 running yellow
pages.

-- 
    Keith D. McNeill              |    E.I. du Pont de Nemours & Co.
    uunet!eplrx7!mcneill          |    Experimental Station
    (302) 695-7395                |    P.O. Box 80357
                                  |    Wilmington, Delaware 19880-0357

jc@minya.UUCP (John Chambers) (11/21/88)

In article <41@eplrx7.UUCP>, mcneill@eplrx7.UUCP (mcneill) writes:
> 
> As superuser on one of our machines I have this problem sometimes:
> 
> eplrx7 {root} % cat ~mcneill/.login
> cat: write error bad file number
> 
Hey, an excuse to post one of my favorite flames:  Once again, some
turkey programmer wrote an application that produced an error message,
and didn't say what caused the error.  The program could have given
more details, and perhaps the info would have helped track down the
problem.  For instance, suppose the error had been:
>	cat: write error bad file number 65537="/tmp/fubar"
telling us that the program had thought it was accessing /tmp/fubar,
and the variable containing the file number got garbaged.  When passed
on to the vendor's support people, it would give some hints as to where 
the problem lies.  Just saying "bad file number" isn't very helpful.

It's especially annoying to be told that a program failed because of
"Permission denied", and not be told what the problem is.  Knowing the
name of a file it was trying to open (or exec) will usually, after a 
quick "ls -l" or "ls -ld", lead to an explanation.  Without the file
name, it's often hopeless.

How can we get programmers to do this right?  It isn't difficult.
Or perhaps we should be hitting on the QA people, and let them know
how shoddy we think their product is if it won't even tell us why
it is failing.  Any ideas?

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

[Any errors in the above are due to failures in the logic of the keyboard,
not in the fingers that did the typing.]

gandalf@csli.STANFORD.EDU (Juergen Wagner) (11/22/88)

In article <134@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>...
>Hey, an excuse to post one of my favorite flames:  Once again, some
>turkey programmer wrote an application that produced an error message,
>and didn't say what caused the error.
>...

One of the worst examples of this is probably the disk formatter on XEROX 1108
LispMachines. When you are formatting a disk, and some error occurs, you get
the message
	Something is wrong. Cannot proceed.
and you are back to the executive prompt. Very helpful, indeed. Something must
be wrong... :-)

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

ps72234@naakka.tut.fi (Pertti Suomela) (11/22/88)

In article <41@eplrx7.UUCP> mcneill@eplrx7.UUCP (mcneill) writes:

   eplrx7 {root} % cat ~mcneill/.login
   cat: write error bad file number

Tcsh (I don't know if you were running it) contains a bug. If I try to
complete a '~user/file' -style filename by hitting <tab> in the middle
of typing 'file', similar action results. I do not know any way to
cure the bug, you just have to avoid completing filenames containing
'~username'. Tcsh gets into a strange state after the described error
condition. To get it straight again, type 'echo ~legalusername' twice.
In the first time, echo won't print anything (strange?), but in the
second time it works normally.

--
Pertti Suomela, studying (?) at		! Internet:  ps72234@tut.fi
Tampere University of Technology,	! UUCP:      ps72234@tut.uucp
Finland					! Bitnet:    ps72234@fintut.bitnet

nagel@blanche.ics.uci.edu (Mark Nagel) (11/25/88)

In article <PS72234.88Nov22020044@naakka.tut.fi>, ps72234@naakka (Pertti Suomela) writes:
|In article <41@eplrx7.UUCP> mcneill@eplrx7.UUCP (mcneill) writes:
|
|   eplrx7 {root} % cat ~mcneill/.login
|   cat: write error bad file number
|
|Tcsh (I don't know if you were running it) contains a bug.

It isn't just tcsh -- csh has the same bug.  So the problem is in
the original csh source, not in anything extra tcsh brings along.
What the actual problem is?  I'm as clueless as the next person!  If
anyone knows what's going on here, please, feel free to give us a
hint...

Mark D. Nagel
  UC Irvine - Dept of Info and Comp Sci | radiation n. 1. the act or process
  nagel@ics.uci.edu             (ARPA)  | of radiating.  2. smog with an
  {sdcsvax|ucbvax}!ucivax!nagel (UUCP)  | attitude.

jay@phoenix.Princeton.EDU (Jay Plett) (11/25/88)

In article <PS72234.88Nov22020044@naakka.tut.fi>, ps72234@naakka.tut.fi (Pertti Suomela) writes:
> In article <41@eplrx7.UUCP> mcneill@eplrx7.UUCP (mcneill) writes:
>>    eplrx7 {root} % cat ~mcneill/.login
>>    cat: write error bad file number
> 
> Tcsh (I don't know if you were running it) contains a bug.
> .....................................type 'echo ~legalusername' twice.
> In the first time, echo won't print anything (strange?), but in the
> second time it works normally.

I get the same thing on a Sun 386i running a different hacked csh.
Haven't bothered to track it down yet, but here's a hint that might
help someone who can be bothered:  the shell fires up the first
command with file descriptor 1 (stdout) closed.  Here's the relevant
trace output from running ls twice in succession:
	% trace ls ~pro
	 . . .
	open ("/home/pro", 0, 037367540134) = 1
	 . . .
	close (1) = 0
	write (1, "admin\nbugs\ndistrib\nledger\nmail\nm".., 61) = 0
	 . . .
	% trace ls ~pro
	open ("/home/pro", 0, 037367540134) = 3
	 . . .
	close (3) = 0
	 . . .
	write (1, "admin        bugs         distri".., 108) = 108
	 . . .
	close (1) = 0

---
	...jay@princeton.edu

peter@stca77.stc.oz (Peter Jeremy) (11/28/88)

In article <134@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>It's especially annoying to be told that a program failed because of
>"Permission denied", and not be told what the problem is.  Knowing the
>name of a file it was trying to open (or exec) will usually, after a 
>quick "ls -l" or "ls -ld", lead to an explanation.  Without the file
>name, it's often hopeless.

This leads one into the area of whether you want a secure system or
a friendly/usable one.  If you want a really secure system, you don't
want to tell the users what went wrong, because if they were permitted
to do it, they wouldn't have gotten the message.  If they are violating
security, any information you give them might help them to get around
the security system.

Anecdote time:  I once worked on an OS (not Unix or a flavour thereof)
with a hole in this area - If you tried to create a file, it returned
the error "file already exists" if that file existed, whether or not
you had permission to access the file or directory.  In some cases, just
_knowing_ that a file exists (or doesn't exist) can be useful information.

>How can we get programmers to do this right?
From the security point of view, it is right.

Having said all that, I agree that messages like "Permission Denied" are
a severe pain when one is trying to debug a system.  I tend towards the
view that you always provide additional information - just not necesssarily
in a form useful to the end user (like giving the source file/line and
internal error numbers when an error occurs) when the end-user is just
a user.

What it comes down to is, do we want Unix to be friendly and helpful, or
secure?  I prefer the friendly approach personally.
-- 
Peter Jeremy (VK2PJ)         peter@stca77.stc.oz
Alcatel-STC Australia        ...!uunet!stca77.stc.oz!peter
41 Mandible St               peter%stca77.stc.oz@uunet.UU.NET
ALEXANDRIA  NSW  2015

dupuy@douglass.columbia.edu (Alexander Dupuy) (11/29/88)

In article <PS72234.88Nov22020044@naakka.tut.fi>, ps72234@naakka (Pertti Suomela) writes:
|In article <41@eplrx7.UUCP> mcneill@eplrx7.UUCP (mcneill) writes:
|
|   eplrx7 {root} % cat ~mcneill/.login
|   cat: write error bad file number
|
|Tcsh (I don't know if you were running it) contains a bug.

The problem is not with cat, in fact, cat is the most helpful program you could
run in the circumstance, because it tells you what the problem is.  Most
commands just execute with no output whatsoever.  This is much worse than a
weird error message.  However, the cat error message is still not helpful
enough - I only figured out the bug using ofiles(1), available from a sun-spots
archive near you.

This bug exists in all versions of the csh which do filename completion on Suns
(or other NFS systems) running yellow pages.

What happens is that when getpwnam(3) is called to find the home directory for
some user (~j-user) in the tilde() function, the yellow pages opens up a UDP
socket or two to talk to the various yellow pages daemons (ypbind, ypserv).  It
closes the first of these, but leaves the second one open.  Since the csh keeps
file descriptors 0-3 unused (it stashes stdin, etc. up around 20), the file
descriptor for this socket is usually 1 (aka stdout).  When the csh fork/execs
a program, it moves stdin, back down to the 0-2 range.  But because the yp
socket is already down there, the stdout (1) file descriptor gets closed when
all is said and done (this may be due to close-on-exec being set for 1, or yp
closing it explicitly, or something else, I never bothered to find out).

The result of this is that your cat was invoked with no file descriptor 1.  As
a result it got a write error (EBADF bad file number).  Most programs which
never checked the result of writes to stdout got errors too, but didn't tell
you about them.  You can duplicate this with 'sh -c "program >&-"'.

The fix, in tilde() is to make a call to the undocumented, but extremely useful
yp_unbind() function after you've gotten the answer back from getpwnam(3).  You
shouls also check that the closem() function invokes yp_unbind(), although that
may not be necessary if you fix tilde() (at any rate, it won't hurt).

@alex
-- 
inet: dupuy@columbia.edu
uucp: ...!rutgers!columbia!dupuy

gregg@ihlpb.ATT.COM (Wonderly) (12/06/88)

>>How can we get programmers to do this right?
> From the security point of view, it is right.
> 
> Having said all that, I agree that messages like "Permission Denied" are
> a severe pain when one is trying to debug a system.  I tend towards the
> view that you always provide additional information - just not necesssarily
> in a form useful to the end user (like giving the source file/line and
> internal error numbers when an error occurs) when the end-user is just
> a user.

The biggest problem is getting people to use the OS error messages and
capabilities instead of inventing their own.  Time after time I have
changed.

	if ((fd = creat (file, 0600)) == -1) {
		printf ("Can't create some file\n");
		handle_the_error_exit();
	}

to

	if ((fd = creat (file, 0600)) == -1) {
		perror (file);
		handle_the_error_exit();
	}

in code I have ported from the net.  Perror(3) (and the associated
sys_errlist array) is one of the MOST useful parts of the C-library
under UN*X (please don't start another 'errno should not be global'
war though).

-- 
It isn't the DREAM that NASA's missing...  DOMAIN: gregg@ihlpb.att.com
It's a direction!                          UUCP:   att!ihlpb!gregg

ditto@cbmvax.UUCP (Michael "Ford" Ditto) (12/06/88)

In article <360@stca77.stc.oz> peter@stca77.stc.oz (Peter Jeremy) writes:
>In article <134@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>>It's especially annoying to be told that a program failed because of
>>"Permission denied", and not be told what the problem is.
>
>This leads one into the area of whether you want a secure system or
>a friendly/usable one.  If you want a really secure system, you don't
>want to tell the users what went wrong, because if they were permitted
>to do it, they wouldn't have gotten the message.  If they are violating
>security, any information you give them might help them to get around
>the security system.

Although you are right in the particular example of "Permission denied",
I think the original complaint was about error reporting in general,
not reporting of security violations.

This is a particular pet peeve of mine, and I always make it a point
to call perror() with the name of the program, and a description of
the operation that failed.  A *minimal* error message should be
something like:

	$ cat foo
	cat: can't open "foo": Permission denied

yet, on many systems, the above command would print out

	foo: Permission denied
or
	cat: can't open input.
or
	cat: Permission denied,

none of which is very useful, and some of them can be quite misleading.
The first message, for example, seems to be from a program called "foo",
and the last one makes it appear that the user doesn't have permission
to run the cat program.
-- 
					-=] Ford [=-

"The number of Unix installations	(In Real Life:  Mike Ditto)
has grown to 10, with more expected."	ford@kenobi.cts.com
- The Unix Programmer's Manual,		...!sdcsvax!crash!elgar!ford
  2nd Edition, June, 1972.		ditto@cbmvax.commodore.com