gwc@root.co.uk (Geoff Clare) (01/20/88)
This is a summary of the responses I got to my question about pipes in awk. The original question was: > > I have discovered that it is possible to send data down a pipe from > within an awk program. For example > > who | awk '{ print $1 | "sort" }' > > gives the same output as > > who | awk '{ print $1 }' | sort > > I have tried this on a whole variety of flavours of UNIX, and it > works on all of them. > > My question is this: I can't find any reference to this feature > in any of our awk manuals, so is it safe to use it? and will it > work on ALL systems or just some of them? (I have tried it on > Uniplus+ V.2, BSD4.2, HP-UX, Sequent DYNIX, Tolerant TX). First, I must apologise for some incorrect information in the above article. I said I had tried it on BSD4.2, but in fact the VAX I tried it on (which I use very rarely, I hasten to add) had been upgraded to 4.3 without me realising it! Who says nobody takes any notice of login messages? :-) My thanks to everyone who replied. I have included all the replies in this article for those who are interested, but for those who don't want to wade through them, here is a brief summary: Yes, the feature is there on *almost* all implementations (a few people gave examples of specific systems on which it doesn't work). However, there is a bug on many versions (mostly sysV & derivatives): because awk never closes the pipe, it may return before the child has finished. So if you are running a script and expecting to process the output of the child straight after the awk, you may get incorrect results. I didn't have this problem because I was piping the output of the awk into another process (in fact it was another awk!), so the receiving process was waiting for the output from the child to finish. This means that a work around for the problem is to use awk '/.../ { print .... | "command" }' | cat > outfile process outfile in a script instead of just awk '/.../ { print .... | "command" }' > outfile process outfile to force the script to wait for "command" before executing "process". So the answer seems to be: if you've got it, it's safe to use provided you pipe the output to another process. Geoff. ------------------------------------- From: Peter Hamer <pgh@stl.stc.co.uk> A long time ago, on some UN*X, I tried to use the pipe feature of awk - which I found documented somewhere. Unfortunately, another bug in UN*X meant that the output file was still being sorted *after* awk finished. This produced interesting and wildly unpredictable results as the next command in my shell script tried to operate on a changing file. The UN*X bug was to do with waiting for grand-child processes to finish before letting a process finish, I think that its still there in most UN*Xs. Hence I would not try to use this otherwise highly desirable option. If you find out that it is now safe to use, please let me know. ------------------------------------ From: Jim Reid <jim@cs.strath.ac.uk> This feature is documented in a new book "The Awk Programming Language" by Aho, Weinberger and Kernighan. Their book covers the old (V7) awk and the new one with Sys 5.3 that has functions calls and other goodies. The first example you give DOES NOT work with 4.2 BSD awk or with the awk supplied in version 3.0 SunOS (they both silently do nothing), though awk in 3.2 SunOS does work the way you expect. This leads me to conclude (without checking our sources) that the 4.2 BSD and 3.0 SunOS awk is essentially V7 awk which does not support pipes. Since awk's authors are using the new awk, this must be considered the definitive version (ie the one true UNIX is whatever runs on Dennis Ritchie's machine) and vendors should be expected to supply it. Get the book if you can. It is well worth it no matter what flavour of awk you have. ------------------------------------ From: John Pavel <jrp@psg.npl.co.uk> As well as looking in "The AWK Programming Language", Aho, Kernighan and Weinberger, 1988, you might find the appended paper to be of interest. [ The 'appended paper' was A Supplemental Document For AWK - or - Things Al, Pete, And Brian Didn't Mention Much John W. Pierce Department of Chemistry University of California, San Diego La Jolla, California 92093 jwp%chem@sdcsvax.ucsd.edu and it made very interesting reading. If you want a copy I suggest you contact John Pierce if you are in the US, or John Pavel or myself if you are in the UK or Europe. - Geoff] --------------------------------------- From: Tim Wortley <wortley@ee.hw.ac.uk> It is in our manuals( not 'man' manuals though ) for our Perkin-Elmer running Xelos ( SysV ), though I can't remember all quirks, basically you can safely pipe to any command from within an awk script. >Uniplus+ V.2, BSD4.2, HP-UX, Sequent DYNIX, Tolerant TX). Also on peXELOS ( SysV ) and Gould UTX/32 ( 4.3BSD ) ------------------------------------------------------------------ From: John Trinterud {ames,pyramid,inhp4,sun}!pacbell!pt06a!john My understanding is that any legal UNIX command may by specified in this precise manner, the double quotes surrounding it are mandatory! This construct is described in my AT&T "UNIX Support Tools Guide", and I've used it many times in the past. Actually, I've had more nasty problems importing environment variables into awk scripts, that's another story entirely..... -------------------------------------- From: Jim Campbell <jimc@haddock.uucp> {ihnp4,harvard,decvax}!ima!haddock!jimc Yes, it is safe to use -- it is a deliberate feature implemented in the early stages. References to it exist in the System III and System V reference manuals, though peculiarly nothing about it appears in the manual pages. You will find it where output redirection is described. Often, awk programmers will use this as a means of running a command from an awk script, whether or not the command reads from standard input. Though this can appear awkward, it at least permits emulation of a "system()" routine, which awk, for some crazy reason, doesn't have. *HOWEVER* -- you should be warned about something. In some Bell sources, a bug exists in the pipe handling. The pipe is opened through the standard I/O call "popen()", but it is never closed with "pclose()". This is because piping is handled in the same section of code that handles output redirection, and the original awk programmer, whoever s/he was, remembered only to resolve the handling of output redirection. Thus, there is a call to "fflush()", which is fine in output redirection, but that does nothing for the case of the pipe. The result: the process invoked by the pipe may not complete by the time the next awk command is run, or, even worse, the process can be left running even after awk exits. Berkeley seems to have fixed the problem. To test if this problem exists on your system, try this: echo hi there | awk '{print | "sleep 10"; }' If awk exits immediately, you have the bug. --------------------------------- From: Wolf Paul <wnp@killer.uucp> ihnp4!killer!wnp I have just tried the two commandlines both on an AT&T 3B2 running SVR3, where they seem to work identically, and on my iAPX286 machine running Microport System V/AT, where the "internal pipe" version seems to run partly in the background: I got my shell prompt back before I got the output from awk. I should say that I have a very slow hard disk on my AT clone, which may have something to do with it. ----------------------------------- From: Neil Dixon <neil@yc.estec.nl> I can't speak for all flavours of UNIX, but I can give you an example where pipes are not implemented (or at least were they don't work). They dont work in HP-UX v5.21 for HP9000/520 machines. Previous releases worked ok. I dont know whether this is a bug or a feature. ------------------------------------ From: Dave Lennert <davel@hpda.uucp> ihnp4!hplabs!hpda!davel uunet!hpda!davel This feature is mentioned in the article on awk in the Unix Programmer's Manual, Volume 2 (called HP-UX Selected Articles in HP-UX). An interesting thing to know about this feature is that, in most versions of awk, awk doesn't wait for the child process (sort in your example) to complete before exiting when input is exhausted. So when you get your prompt (or worse, when you proceed to the next command in your script!) the sort may not actually be finished! ------------------------------------------- From: David P Huelsbeck <dph@lanl.gov.arpa> {ihnp4,cmcl2}!lanl!dph Yes, pipes are a standard feature of all awks I know of. (bsd4.[23] ultrix unicos sysV) In fact all of these awks with the possible exception on the new awk by A W & K are almost identical in source code. If you have source note the the comment "/* witchcraft */" in the yacc source. Pretty unlikely that this was an independent creation of multiple authors. Also note that ls -l | awk '{ print $2 >> output }' will do the same thing as ls -l | awk '{ print $2 }' >> output . Caveat: Most awks unless they've been hacked keep all pipes and files open for the duration. So if you want to sort on some key into various files (like to do a real fast sort of the awk -f split.awk unsorted.file ; cat *.x > sorted.file variety) awk will puke after 10-20 (your milage may vary) files have been written to. Oh well. Can't have everything. ---------------------------------------------------- From: Hugh Dempsey (USAFAS | Howard) <hugh@brl.arpa> I don't know if it is allowed in all versions of unix, but it does work on Xenix 3.4 running on an Intel 310. It is described pages 170 and 171 of the ATT Unix System V Programmers Guide. -- Geoff Clare gwc@root.co.uk seismo!mcvax!ukc!root44!gwc
guy@gorodish.Sun.COM (Guy Harris) (01/23/88)
> Yes, the feature is there on *almost* all implementations (a few people gave > examples of specific systems on which it doesn't work). However, there is > a bug on many versions (mostly sysV & derivatives): And 4.2BSD as well. > This feature is documented in a new book "The Awk Programming Language" > by Aho, Weinberger and Kernighan. Their book covers the old (V7) awk and > the new one with Sys 5.3 that has functions calls and other goodies. The > first example you give DOES NOT work with 4.2 BSD awk or with the awk > supplied in version 3.0 SunOS (they both silently do nothing), though awk > in 3.2 SunOS does work the way you expect. This leads me to conclude > (without checking our sources) that the 4.2 BSD and 3.0 SunOS awk is > essentially V7 awk which does not support pipes. The 3.0 SunOS "awk" is essentially the 4.2BSD "awk" with a bug that caused some problems with fields fixed. This bug was introduced in the process of fixing NULL pointer dereferencing bugs, so it's probably not in versions not derived from the 4.2 one. The bug is fixed in 4.3BSD. The 3.2 "awk" is the S5R2 "awk", with the 4.3BSD fix to the pipe problem added. It is not the new S5R3 "awk". There were actually two V7 "awk"s: the one that came on the original V7 tape and one that came on a "V7 addendum" tape. The 4.2BSD (and 4.1BSD) "awk" is derived from the "V7 addendum" tape version; it included code to handle pipes, but did not include the fix from 4.3BSD. The main difference between the "V7 addendum" "awk" and the S5R2 "awk" is that the latter is considerably faster; a number of things were changed that probably account for this (for one thing, the structure-valued functions and function arguments were nuked). Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com