[comp.unix.questions] Replies to 'Using pipes within awk programs'

gwc@root.co.uk (Geoff Clare) (01/20/88)

This is a summary of the responses I got to my question about
pipes in awk.  The original question was:

> 
> I have discovered that it is possible to send data down a pipe from
> within an awk program.  For example
> 
> 	who | awk '{ print $1 | "sort" }'
> 
> gives the same output as
> 
> 	who | awk '{ print $1 }' | sort
> 
> I have tried this on a whole variety of flavours of UNIX, and it
> works on all of them.
> 
> My question is this:  I can't find any reference to this feature
> in any of our awk manuals, so is it safe to use it? and will it
> work on ALL systems or just some of them?  (I have tried it on
> Uniplus+ V.2, BSD4.2, HP-UX, Sequent DYNIX, Tolerant TX).

First, I must apologise for some incorrect information in the above
article.  I said I had tried it on BSD4.2, but in fact the VAX I tried
it on (which I use very rarely, I hasten to add) had been upgraded
to 4.3 without me realising it!  Who says nobody takes any notice of
login messages? :-)

My thanks to everyone who replied.  I have included all the replies
in this article for those who are interested, but for those who don't
want to wade through them, here is a brief summary:

Yes, the feature is there on *almost* all implementations (a few people gave
examples of specific systems on which it doesn't work).  However, there is
a bug on many versions (mostly sysV & derivatives): because awk never
closes the pipe, it may return before the child has finished.  So if
you are running a script and expecting to process the output of the
child straight after the awk, you may get incorrect results.  I didn't
have this problem because I was piping the output of the awk into another
process (in fact it was another awk!), so the receiving process was waiting
for the output from the child to finish.  This means that a work around for
the problem is to use

	awk '/.../ { print .... | "command" }' | cat > outfile
	process outfile

in a script instead of just

	awk '/.../ { print .... | "command" }' > outfile
	process outfile

to force the script to wait for "command" before executing "process".

So the answer seems to be: if you've got it, it's safe to use provided
you pipe the output to another process.

Geoff.

-------------------------------------
From: Peter Hamer <pgh@stl.stc.co.uk>

A long time ago, on some UN*X, I tried to use the pipe feature of awk - which
I found documented somewhere.

Unfortunately, another bug in UN*X meant that the output file was still being
sorted *after* awk finished.  This produced interesting and wildly
unpredictable results as the next command in my shell script tried to
operate on a changing file.

The UN*X bug was to do with waiting for grand-child processes to finish before
letting a process finish, I think that its still there in most UN*Xs.

Hence I would not try to use this otherwise highly desirable option. If you 
find out that it is now safe to use, please let me know.

------------------------------------
From: Jim Reid <jim@cs.strath.ac.uk>

This feature is documented in a new book "The Awk Programming Language"
by Aho, Weinberger and Kernighan. Their book covers the old (V7) awk and
the new one with Sys 5.3 that has functions calls and other goodies. The
first example you give DOES NOT work with 4.2 BSD awk or with the awk
supplied in version 3.0 SunOS (they both silently do nothing), though awk
in 3.2 SunOS does work the way you expect. This leads me to conclude
(without checking our sources) that the 4.2 BSD and 3.0 SunOS awk is
essentially V7 awk which does not support pipes.

Since awk's authors are using the new awk, this must be considered the
definitive version (ie the one true UNIX is whatever runs on Dennis
Ritchie's machine) and vendors should be expected to supply it.

Get the book if you can. It is well worth it no matter what flavour of
awk you have.

------------------------------------
From: John Pavel <jrp@psg.npl.co.uk>

As well as looking in "The AWK Programming Language", Aho, Kernighan and
Weinberger, 1988, you might find the appended paper to be of interest.

[ The 'appended paper' was

	A Supplemental Document For AWK
	- or -
	Things Al, Pete, And Brian Didn't Mention Much

	John W. Pierce
	
	Department of Chemistry
	University of California, San Diego
	La Jolla, California  92093
	jwp%chem@sdcsvax.ucsd.edu

and it made very interesting reading.  If you want a copy I suggest you
contact John Pierce if you are in the US, or John Pavel or myself if you
are in the UK or Europe. - Geoff]

---------------------------------------
From: Tim Wortley <wortley@ee.hw.ac.uk>

It is in our manuals( not 'man' manuals though ) for our Perkin-Elmer 
running Xelos ( SysV ), though I can't remember all quirks, basically 
you can safely pipe to any command from within an awk script.
>Uniplus+ V.2, BSD4.2, HP-UX, Sequent DYNIX, Tolerant TX).
Also on peXELOS ( SysV ) and Gould UTX/32 ( 4.3BSD )

------------------------------------------------------------------
From: John Trinterud
	{ames,pyramid,inhp4,sun}!pacbell!pt06a!john

My understanding is that any legal UNIX command may by specified in
this precise manner, the double quotes surrounding it are mandatory!
This construct is described in my AT&T "UNIX Support Tools Guide",
and I've used it many times in the past.

Actually, I've had more nasty problems importing environment variables
into awk scripts, that's another story entirely.....

--------------------------------------
From: Jim Campbell <jimc@haddock.uucp>
	{ihnp4,harvard,decvax}!ima!haddock!jimc

Yes, it is safe to use -- it is a deliberate feature implemented
in the early stages.   References to it exist in the System III
and System V reference manuals, though peculiarly nothing about
it appears in the manual pages.  You will find it where output redirection
is described.  Often, awk programmers will use this as a means
of running a command from an awk script, whether or not the command
reads from standard input.  Though this can appear awkward, it at
least permits emulation of a "system()" routine, which awk, for
some crazy reason, doesn't have.

*HOWEVER* -- you should be warned about something.  In some Bell
sources, a bug exists in the pipe handling.  The pipe is opened through
the standard I/O call "popen()", but it is never closed with
"pclose()".  This is because piping is handled in the same section of
code that handles output redirection, and the original awk programmer,
whoever s/he was, remembered only to resolve the handling of output
redirection.  Thus, there is a call to "fflush()", which is fine in
output redirection, but that does nothing for the case of the pipe.
The result:  the process invoked by the pipe may not complete by the
time the next awk command is run, or, even worse, the process can be
left running even after awk exits.

Berkeley seems to have fixed the problem.  To test if this problem
exists on your system, try this:

	echo hi there | awk '{print | "sleep 10"; }'

If awk exits immediately, you have the bug. 

---------------------------------
From: Wolf Paul <wnp@killer.uucp>
	ihnp4!killer!wnp

I have just tried the two commandlines both on an AT&T 3B2 running SVR3,
where they seem to work identically, and on my iAPX286 machine running
Microport System V/AT, where the "internal pipe" version seems to run 
partly in the background: I got my shell prompt back before I got the 
output from awk. I should say that I have a very slow hard disk on my
AT clone, which may have something to do with it.

-----------------------------------
From: Neil Dixon <neil@yc.estec.nl>

I can't speak for all flavours of UNIX, but I can give you an example
where pipes are not implemented (or at least were they don't work).
They dont work in HP-UX v5.21 for HP9000/520 machines. Previous releases
worked ok. I dont know whether this is a bug or a feature.

------------------------------------
From: Dave Lennert <davel@hpda.uucp>
	ihnp4!hplabs!hpda!davel
	uunet!hpda!davel

This feature is mentioned in the article on awk in the Unix Programmer's
Manual, Volume 2 (called HP-UX Selected Articles in HP-UX).

An interesting thing to know about this feature is that, in most versions
of awk, awk doesn't wait for the child process (sort in your example)
to complete before exiting when input is exhausted.  So when you get
your prompt (or worse, when you proceed to the next command in your
script!) the sort may not actually be finished!

-------------------------------------------
From: David P Huelsbeck <dph@lanl.gov.arpa>
	{ihnp4,cmcl2}!lanl!dph

Yes, pipes are a standard feature of all awks I know of.
(bsd4.[23] ultrix unicos sysV) 

In fact all of these awks with the possible exception on
the new awk by  A W & K are almost identical in source code.
If you have source note the the comment "/* witchcraft */" in
the yacc source. Pretty unlikely that this was an independent
creation of multiple authors.

Also note that ls -l | awk '{ print $2 >> output }' will do the
same thing as ls -l | awk '{ print $2 }' >> output .

Caveat: Most awks unless they've been hacked keep all pipes and
files open for the duration.  So if you want to sort on some key
into various files (like to do a real fast sort of the 
 awk -f split.awk unsorted.file ; cat *.x > sorted.file
variety) awk will puke after 10-20 (your milage may vary) files 
have been written to.  Oh well.  Can't have everything.  

----------------------------------------------------
From: Hugh Dempsey (USAFAS | Howard) <hugh@brl.arpa>

	I don't know if it is allowed in all versions of unix, but it does
work on Xenix 3.4 running on an Intel 310.

	It is described pages 170 and 171 of the ATT Unix System V Programmers
Guide.
-- 

Geoff Clare              gwc@root.co.uk            seismo!mcvax!ukc!root44!gwc

guy@gorodish.Sun.COM (Guy Harris) (01/23/88)

> Yes, the feature is there on *almost* all implementations (a few people gave
> examples of specific systems on which it doesn't work).  However, there is
> a bug on many versions (mostly sysV & derivatives):

And 4.2BSD as well.

> This feature is documented in a new book "The Awk Programming Language"
> by Aho, Weinberger and Kernighan. Their book covers the old (V7) awk and
> the new one with Sys 5.3 that has functions calls and other goodies. The
> first example you give DOES NOT work with 4.2 BSD awk or with the awk
> supplied in version 3.0 SunOS (they both silently do nothing), though awk
> in 3.2 SunOS does work the way you expect. This leads me to conclude
> (without checking our sources) that the 4.2 BSD and 3.0 SunOS awk is
> essentially V7 awk which does not support pipes.

The 3.0 SunOS "awk" is essentially the 4.2BSD "awk" with a bug that caused some
problems with fields fixed.  This bug was introduced in the process of fixing
NULL pointer dereferencing bugs, so it's probably not in versions not derived
from the 4.2 one.  The bug is fixed in 4.3BSD.

The 3.2 "awk" is the S5R2 "awk", with the 4.3BSD fix to the pipe problem added.
It is not the new S5R3 "awk".  There were actually two V7 "awk"s: the one that
came on the original V7 tape and one that came on a "V7 addendum" tape.  The
4.2BSD (and 4.1BSD) "awk" is derived from the "V7 addendum" tape version; it
included code to handle pipes, but did not include the fix from 4.3BSD.

The main difference between the "V7 addendum" "awk" and the S5R2 "awk" is that
the latter is considerably faster; a number of things were changed that
probably account for this (for one thing, the structure-valued functions and
function arguments were nuked).
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com