[comp.unix.programmer] filters

tchrist@convex.com (Tom Christiansen) (12/02/90)

There seem to be a lot of filters out there that don't check that their
output succeeded and blindly exit with success irrespective of the
success of their writes.

Cat and col are two utilities I've just caught doing this.  Cat does
make a stab at detecting this, but it does it wrong.  It just checks
ferror(stdout), but it's never done an fflush() or an fclose() first.
Plus it only says "output write error", not what the error is.

My first question I have is what could possible break if cat were to do
an fclose(stdout) at the end (and check the status and set the exit 
accordingly).  

(We know it really needs the fclose and not just fflush because of the
recent EDQUOTA debate, right?)

My second question is whether this is just a symptom of sloppy programming
or whether there's some reason why filters (mis)behave this way.

thanks,

--tom

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/03/90)

In article <109685@convex.convex.com> tchrist@convex.com (Tom Christiansen) writes:
  [ checking write errors ]
> Cat and col are two utilities I've just caught doing this.  Cat does
> make a stab at detecting this, but it does it wrong.  It just checks
> ferror(stdout), but it's never done an fflush() or an fclose() first.

That's true. However (old discussion resurfacing again), what happens if
another process has the file open, so things don't get written to disk
until that process closes the file? There is simply no way to recover
the data. fsync()ing all the data just to test for things like EDQUOT,
ENOSPC, etc. would be a huge overhead.

You have two strategies for solving this:

1. Replace NFS with a sane remote filesystem, so that you get quota and
space errors on the write()s that trigger them. Unfortunately, it's a
bit difficult to simply throw away NFS at some sites, so go on to #2.

2. Write a program that does nothing but create a file on disk, read
input, and write output to the file, checking all errors. Give the
program options, so that it can do things like fsync if you want, or
write the file to another disk and put in a symbolic link if you run out
of space, or send you a mail message and buffer output in memory as long
as it can hold out. Or whatever.

After #2, you'll notice that a lot of programs can be drastically
simplified, because they don't have to open files for writing. Meanwhile
errors like the one you're complaining about will disappear. Meanwhile
you'll be able to get programs to work with new filesystems, or
networks, or whatever, just by using a different program. Guess what?
This works for reading files as well. Guess what? It also works for
networks (see my auth and authutil packages). Guess what? There really
is an advantage to modularity.

> Plus it only says "output write error", not what the error is.

stdio's fault. cat should use read() and write() for efficiency and
error checking, but that means a rewrite that nobody's bothered doing. 

> My first question I have is what could possible break if cat were to do
> an fclose(stdout) at the end (and check the status and set the exit 
> accordingly).  

There's nothing wrong with the change, but it doesn't solve the problem
unless you're guaranteed to be the only writer.

> My second question is whether this is just a symptom of sloppy programming
> or whether there's some reason why filters (mis)behave this way.

Although management at Sun might wish otherwise, UNIX was not rewritten
from scratch when NFS was released upon the world.

---Dan

les@chinet.chi.il.us (Leslie Mikesell) (12/04/90)

In article <109685@convex.convex.com> tchrist@convex.com (Tom Christiansen) writes:

>There seem to be a lot of filters out there that don't check that their
>output succeeded and blindly exit with success irrespective of the
>success of their writes.

>Cat and col are two utilities I've just caught doing this.  Cat does
>make a stab at detecting this, but it does it wrong.  It just checks
>ferror(stdout), but it's never done an fflush() or an fclose() first.
>Plus it only says "output write error", not what the error is.

>My first question I have is what could possible break if cat were to do
>an fclose(stdout) at the end (and check the status and set the exit 
>accordingly).  

I can't imagine anything breaking, but the situtions that would be helped
are pretty rare.  The filters in question are normally used in a pipeline
and thus typically exit before the final program is finished.  

>(We know it really needs the fclose and not just fflush because of the
>recent EDQUOTA debate, right?)
>My second question is whether this is just a symptom of sloppy programming
>or whether there's some reason why filters (mis)behave this way.

Didn't most of these programs exist long before EDQUOTA?  The only likely
thing to catch would be a completely full disk, or in the case of pipes,
PIPEDEV being full.    There is seldom a good response to this and if
it happens in a shell script it's pretty likely you are already doing
something wrong.

Les Mikesell
  les@chinet.chi.il.us

ch@dce.ie (Charles Bryant) (12/04/90)

In article <10763:Dec221:21:1590@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>...  cat should use read() and write() for efficiency and
>error checking, but that means a rewrite that nobody's bothered doing. 

Well I can't resist a challenge like that. See <1990Dec4.103255.14195@dce.ie>
in alt.sources. Not only does it use read() and write(), and checks for
errors on close(), but it also has the nice feature of having no features.
Many cat programs have several options, none of which can really be justified:

	-u (unbuffered). My cat is always unbuffered. dd(1) is for buffering

	-s (silent).	Just use "cat 2>/dev/null".

	-v (show control characters) That's a job for a seperate program

Anyway, my cat.c is available in alt.sources. It is public domain - you
can sell it or give it away or ignore it as you like.

Btw, I agree that its silly having to check for errors on close().
-- 
Charles Bryant (ch@dce.ie)
--
/usr/ch/.signature: Block device required

ch@dce.ie (Charles Bryant) (12/04/90)

In article <1990Dec4.105612.14422@dce.ie> I write:
> See <1990Dec4.103255.14195@dce.ie> in alt.sources.

Well I have just found out that our news feed has a problem with
alt.sources so rather than disappoint all you eager readers who can't
live without a new cat, here's the said feline:


To use this you will need:
	read(2), write(2), open(2), close(2), exit(2)
----------------------------------------------------------------
/* cat.c - the infamous cat(1) without bell and whistles */
#include <fcntl.h>
	/* for O_RDONLY */
#include <stdio.h>
	/* for BUFSIZ only */
#define error(s) write(2, (s), sizeof(s)-1)
#define fileerr(msg, fil) (error(msg), perror(fil), status=1)

int status=0;	/* exit status */

static void cat(fd, nam)
int fd;
char *nam;
{
 for (;;)
	{
	 char iobuff[BUFSIZ];
	 register numin;
	 numin = read(fd, iobuff, sizeof(iobuff));
	 if (numin==0) return;
	 if (numin < 0)
		{
		 fileerr("Read error: ", nam);
		 return;
		}
	 {
	  register written;
	  for (written=0; written<numin;)
		{
		 register i = write(1, iobuff+written, numin-written);
		 if (i<0)
			{
			 fileerr("Write error writing ", nam);
			 exit(1);
			}
		 written += i;
		}
	 }
	}
}

int main(argc,argv)
char **argv;
int argc;
{
 int i;
 if (argc==1)
	{
	 cat(0, "<stdin>");
	 exit(status);
	}
 for (i=1; i<argc; i++)
	{
	 int fd=open(argv[i], O_RDONLY);
	 if (fd<0)
		{
		 fileerr("Can't open ", argv[i]);
		 continue;
		}
	 cat(fd, argv[i]);
	 if (close(fd)) fileerr("Error closing input file: ", argv[i]);
	}
 if (close(1)) fileerr("Error closing output: ","<stdout>");
 exit(status);
}
-- 
Charles Bryant (ch@dce.ie)
--
/usr/ch/.signature: Block device required
-- 
Charles Bryant (ch@dce.ie)
--
/usr/ch/.signature: Block device required

dylan@ibmpcug.co.uk (Matthew Farwell) (12/05/90)

In article <1990Dec4.105612.14422@dce.ie> ch@dce.ie (Charles Bryant) writes:
>Many cat programs have several options, none of which can really be justified:
>
>	-u (unbuffered). My cat is always unbuffered. dd(1) is for buffering

Sorry, but I find typing 'cat foobar' quite a bit easier than typing
'dd if=foobar of=/dev/tty bs=512'. Strange, I know, but its this little
idiosyncracy of mine.

>	-s (silent).	Just use "cat 2>/dev/null".

Assuming, of course that you're in a shell of some sort.

Dylan.
-- 
Matthew J Farwell                 | Email: dylan@ibmpcug.co.uk
The IBM PC User Group, PO Box 360,|        ...!uunet!ukc!ibmpcug!dylan
Harrow HA1 4LQ England            | CONNECT - Usenet Access in the UK!!
Phone: +44 81-863-1191            | Sun? Don't they make coffee machines?

ch@dce.ie (Charles Bryant) (12/07/90)

In article <1990Dec5.014313.28592@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
>In article <1990Dec4.105612.14422@dce.ie> ch@dce.ie (Charles Bryant) writes:
>>Many cat programs have several options, none of which can really be justified:
>>
>>	-u (unbuffered). My cat is always unbuffered. dd(1) is for buffering
>
>Sorry, but I find typing 'cat foobar' quite a bit easier than typing
>'dd if=foobar of=/dev/tty bs=512'. Strange, I know, but its this little
>idiosyncracy of mine.

Firstly, if foobar is a regular file there is no difference, and also
on my system 'dd if=foobar of=/dev/tty bs=512' can be abbreviated to
'dd <foobar'. Of course dd has an annoying 'feature' of printing the number
of blocks processed, but I have written a replacement dd to fix this as
well as the lack of an option to pad output to a multiple of the block
size (as is required with some tape drives). (I can give anyone a copy
of this who wants it).

>>	-s (silent).	Just use "cat 2>/dev/null".
>
>Assuming, of course that you're in a shell of some sort.

How else would it be invoked? The only other way I can think of is via
exec() when it is trivial to either close(2) or (better) redirect it
to /dev/null.
-- 
Charles Bryant (ch@dce.ie)
--
/usr/ch/.signature: Block device required

paul@tetrauk.UUCP (Paul Ashton) (12/08/90)

In article <1990Dec5.014313.28592@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
>Sorry, but I find typing 'cat foobar' quite a bit easier than typing
>'dd if=foobar of=/dev/tty bs=512'.

In that case, try "dd<foobar" that's even easier than "cat foobar" :-)
-- 
Paul

lws@comm.wang.com (Lyle Seaman) (12/16/90)

dylan@ibmpcug.co.uk (Matthew Farwell) writes:
>Sorry, but I find typing 'cat foobar' quite a bit easier than typing
>'dd if=foobar of=/dev/tty bs=512'. Strange, I know, but its this little
>idiosyncracy of mine.
> [ ... etc ... ]

So write an alias or a shell script if you want that special functionality.
There's no point in burdening the world with {code, documentation, user
interface} bloat for the sake of features that are adequately provided
elsewhere.

-- 
Lyle                  Wang           lws@capybara.comm.wang.com
508 967 2322     Lowell, MA, USA     Source code: the _ultimate_ documentation.

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/19/90)

As quoted from <1990Dec15.175917.27136@comm.wang.com> by lws@comm.wang.com (Lyle Seaman):
+---------------
| dylan@ibmpcug.co.uk (Matthew Farwell) writes:
| >Sorry, but I find typing 'cat foobar' quite a bit easier than typing
| >'dd if=foobar of=/dev/tty bs=512'. Strange, I know, but its this little
| >idiosyncracy of mine.
| > [ ... etc ... ]
| 
| So write an alias or a shell script if you want that special functionality.
+---------------

Hold it.  *cat* is user bloat?!  I won't argue about cat -v, but:

... pipeline ... | cat header - trailer | ... rest of pipeline ...

So what "more basic" utility does this for me?  I thought cat *was* the basic
utility for this.

User interface bloat is one thing; eliminating a basic utility in favor of a
different filter is something completely different.  (I would argue that "dd"
should not have "if" and "of" because shell redirection or "cat" will do that.
But it must be remembered that dd was designed for IBM-heads.)

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

ggs@ulysses.att.com (Griff Smith) (12/20/90)

In article <1990Dec19.032258.7823@NCoast.ORG>, allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) writes:
> (I would argue that "dd"
> should not have "if" and "of" because shell redirection or "cat" will do that.

Which means I'm stuck with the brain-damaged error messages that some
shells supply when the `open' fails.  No thanks, I'll do it myself
and make sure it's done right.

> But it must be remembered that dd was designed for IBM-heads.)

No sense of humor.  Can't you recognize a treacherous parody when you
see one?  Besides, it's an indespensable system administration tool
when you REALLY need to work with blocks instead of character streams.
There are also some of us who use UNIX systems to get at data written
by IBM systems.  Do you really want to sentence us to use //sysin dd *
on the real thing?  dd may be a bit strange, but it reads blocked
EBCDIC a lot better than awk does, and I don't see a lot of people
offering to write a politically correct replacement.

> He: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
> Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
> America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
> uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

dylan@ibmpcug.co.uk (Matthew Farwell) (12/20/90)

In article <1990Dec15.175917.27136@comm.wang.com> lws@comm.wang.com (Lyle Seaman) writes:
>dylan@ibmpcug.co.uk (Matthew Farwell) writes:
>>Sorry, but I find typing 'cat foobar' quite a bit easier than typing
>>'dd if=foobar of=/dev/tty bs=512'. Strange, I know, but its this little
>>idiosyncracy of mine.
>> [ ... etc ... ]
>So write an alias or a shell script if you want that special functionality.
>There's no point in burdening the world with {code, documentation, user
>interface} bloat for the sake of features that are adequately provided
>elsewhere.

Yes I will straight away. Now lets see.....

-rwx--x--x   1 bin      bin         9525 Apr 13  1987 /bin/cat*

Now if I write a shell script called 'cat' such as:

#!/bin/sh
if [ $# = 0 ] 
then
	dd bs=512 2> /dev/null
else
	for i in $*
	do
		dd if=$i bs=512 2> /dev/null
	done
fi

(apologies for my shell programming if its wrong)

-rwx--x--x   2 bin      bin        47228 Jul 13  1988 /bin/sh*

Ok, so my super-duper shell script should work brilliantly for cat.
Ok, it'll take longer to load + execute, and on some machines i can't
exec() it, but I can rewrite my code to get round that can't I?

All for 9k's worth of disk space. Sigh, I don't know, some people.....

Dylan.

p.s. Sorry, I forgot the documentation too. On my machine, that comes to
an extra 1.5k. Even more savings!!!!!
-- 
Matthew J Farwell                 | Email: dylan@ibmpcug.co.uk
The IBM PC User Group, PO Box 360,|        ...!uunet!ukc!ibmpcug!dylan
Harrow HA1 4LQ England            | CONNECT - Usenet Access in the UK!!
Phone: +44 81-863-1191            | Sun? Don't they make coffee machines?

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (01/03/91)

Oh, come on, you guys.  Everybody knows that cat should just be an alias for

    perl -pe ''

:-)  :-)  :-)

I will admit, however, that it does have a few more options than the
standard cat program...

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (01/05/91)

As quoted from <10895@jpl-devvax.JPL.NASA.GOV> by lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall):
+---------------
| Oh, come on, you guys.  Everybody knows that cat should just be an alias for
|     perl -pe ''
| :-)  :-)  :-)
+---------------

Obviously the unexpected discovery of "s///ee" has driven him off the deep
end....  ;-)  

(For the folks in .programmer:  someone discovered an unexpected feature in
Perl which startled quite a few people, including its author....)

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY