warner@unc.cs.unc.edu (Byron Warner) (11/15/89)
My questions is how do you import csh variables into an awk script.
for example if I have a file called foo, which contains:
{
print import,$0
}
and I issue the command
awk -F: -f foo /etc/passwd import='hello
why do I get just a list of logins?
Thanx in Advancejik@athena.mit.edu (Jonathan I. Kamens) (11/15/89)
In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) writes: >My questions is how do you import csh variables into an awk script. >for example if I have a file called foo, which contains: >{ > print import,$0 >} > >and I issue the command >awk -F: -f foo /etc/passwd import='hello >why do I get just a list of logins? >Thanx in Advance First of all, I have never known the C-shell to allow the syntax "foo=bar" on a command-line to import a variable into a program. C shell doesn't have anything like that. Second, the only way to do what you want is to actually make the creation of this variable part of the awk script. Like this: % set import = 'hello' % awk 'BEGIN { import = "'"$import"'" } { print import, $0}' /etc/passwd The $import is evaluated before awk is actually called, and replaced by 'hello' (sans quotes). Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710
chris@mimsy.umd.edu (Chris Torek) (11/15/89)
>In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) >writes: [file foo] >>{ print import,$0 } [command] >>awk -F: -f foo /etc/passwd import='hello >>why do I get just a list of logins? In article <15919@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes: > First of all, I have never known the C-shell to allow the syntax >"foo=bar" on a command-line to import a variable into a program. It does not. However, awk does. That is, you are looking at the wrong program. > Second, the only way to do what you want is to actually make the >creation of this variable part of the awk script. Like this: Not so: Within some limits, you can set awk variables from its invocation. For instance: % cat t BEGIN { print "BEGIN: " this; } { print "INPUT: " this " " $0; } END { print "END: " this; } % cat u first line second line % awk -f t u this=that BEGIN: INPUT: first line INPUT: second line END: that % awk -f t this=that u BEGIN: INPUT: that first line INPUT: that second line END: that % rm t u The `BEGIN' statement is done before any `files' are opened; the `END' statement is done after all `files' have been read. Any `files' of the form `a=b' set variable `a' to value `b'. All of the above is with respect to the 4.3BSD flavour of `awk'. The new awk (as described in the awk book) appears to open the first `file' before executing the BEGIN statement, so that any assignments that appear before the first real file happen before the BEGIN. What GNU awk does, I do not know (but the above technique will tell you). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
steinbac@hpl-opus.HP.COM (Gunter Steinbach) (11/16/89)
> / hpl-opus:comp.unix.questions / warner@unc.cs.unc.edu (Byron Warner) > / 1:15 pm Nov 14, 1989 / > My questions is how do you import csh variables into an awk script. > [ deleted ] > awk -F: -f foo /etc/passwd import='hello The variable assignment has to come before the input file name. Guenter Steinbach | hplabs!gunter_steinbach | gunter_steinbach@hplabs.hp.com
jik@athena.mit.edu (Jonathan I. Kamens) (11/16/89)
In article <20774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >... >The `BEGIN' statement is done before any `files' are opened; the `END' >statement is done after all `files' have been read. Any `files' of >the form `a=b' set variable `a' to value `b'. Nifty! Two questions: 1. Why isn't this mentioned in the BSD man page awk(1), or in the /usr/doc documentation about awk? 2. What happens if you actually want to read in a file that has = in the filename? How am I supposed to know what happens if the feature isn't mentioned in documentation? :-) Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710
tale@pawl.rpi.edu (David C Lawrence) (11/16/89)
In <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) writes: Byron> [file foo] Byron> { print import,$0 } Byron> [command] Byron> awk -F: -f foo /etc/passwd import='hello Byron> why do I get just a list of logins? Because the variable assignment has to come before file name. I'm also assuming here that the ' is a typo, or the absence of a match is; either way variable assignment comes before the file list. If you change it to "awk -F: -f foo import=hello /etc/passwd" it will work. This applies to V7 awk, nawk and gawk. In <20774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: Chris> All of the above is with respect to the 4.3BSD flavour of `awk'. The Chris> new awk (as described in the awk book) appears to open the first `file' Chris> before executing the BEGIN statement, so that any assignments that Chris> appear before the first real file happen before the BEGIN. What GNU Chris> awk does, I do not know (but the above technique will tell you). Variables set as above are not available in the BEGIN block with gawk, but a special option, -v, is provided to do this. -v VAR=VAL will assign VAL to VAR before script execution begins; another -v must be specified for each variable you want to declare this way. Dave -- (setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
tale@pawl.rpi.edu (David C Lawrence) (11/16/89)
In <15924@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes:
Jon> 1. Why isn't this mentioned in the BSD man page awk(1), or in the
Jon> /usr/doc documentation about awk?
Oversight, I suppose. SunOS manual page has it.
Jon> 2. What happens if you actually want to read in a file that has = in
Jon> the filename? How am I supposed to know what happens if the
Jon> feature isn't mentioned in documentation? :-)
Good question. I just tried a few different things which I thought
might work and none of them did. It appears as though .*=.* patterns
which appear after a file name (/dev/null in my test case) are simply
ignored; they are neither interpreted as variable assigments nor as
file names. I also tried passing it an arg of foo\=bar (my test case)
and it still did nothing. In fact, it didn't even read stdin. Hmm ...
Dave
--
(setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))merlyn@iwarp.intel.com (Randal Schwartz) (11/16/89)
In article <10531@thorin.cs.unc.edu>, warner@unc (Byron Warner) writes: | My questions is how do you import csh variables into an awk script. | for example if I have a file called foo, which contains: | { | print import,$0 | } | | and I issue the command | awk -F: -f foo /etc/passwd import='hello ^ missing quote, perhaps? | why do I get just a list of logins? The order of command-line options is significant: % awk -F: -f foo import='hello' /etc/passwd yields the result you want. Also note that these variables are not available in the "BEGIN" action (unless something happened after the V7 version of awk). Just another UNIX old-timer, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/
richsc@ism780c.isc.com (Rich Scott) (11/17/89)
In article <15919@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes: >In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) >writes: >>My questions is how do you import csh variables into an awk script. >>for example if I have a file called foo, which contains: >>{ >> print import,$0 >>} >> >>and I issue the command >>awk -F: -f foo /etc/passwd import='hello >>why do I get just a list of logins? Well, apparently awk wants its 'imported' variables specified on the command line *before* the datafile(s), but this isn't obvious from the manual page. Someone here told me that the argument parsing may not be done correctly. Anyway, on my system, which runs SunOS3.5, I get the desired effect (using csh) by doing: awk -F: -f foo import='hello' /etc/passwd (This is running the 4.2 or 4.3 BSD 'awk'; I can't speak for the "new" awk.) > > First of all, I have never known the C-shell to allow the syntax >"foo=bar" on a command-line to import a variable into a program. C >shell doesn't have anything like that. Umm, I don't think it's up to the shell in this case to do anything with it; it's simply an argument to the program. Perhaps Byron, if he really wants to import a C-shell variable into awk, should do: hostname% setenv VAR='hello' hostname% awk -F: -f foo.awk import=$VAR /etc/passwd The first example doesn't set any C-shell variables. ---------------- rich scott rls@i88.isc.com interactive systems corporation voice: (800) LAI-UNIX x255 (formerly lachman associates) naperville, il, usa
lang@PRC.Unisys.COM (Francois-Michel Lang) (11/17/89)
It's time once again to post to this group a document that I have
which explains some important things about (vanilla) AWK
that are not elsewhere documented....
****************************************************************
\" to print this document, do ditroff -ms -Pip2 awk.supp
.RP
.TL
.B
A Supplemental Document For AWK
.sp
.R
- or -
.sp
.I
Things Al, Pete, And Brian Didn't Mention Much
.R
.AU
John W. Pierce
.AI
Department of Chemistry
University of California, San Diego
La Jolla, California 92093
jwp%chem@sdcsvax.ucsd.edu
.AB
As
.B awk
and its documentation are distributed with
.I
4.2 BSD UNIX*
.R
there are a number of bugs, undocumented features,
and features that are touched on so briefly in the
documentation that the casual user may
not realize their full significance. While this document
applies primarily to the \fI4.2 BSD\fR version of \fIUNIX\fR,
it is known that the \fI4.3 BSD\fR version does not have
all of the bugs fixed, and that it does not have updated
documentation. The situation with respect to the versions
of \fBawk\fR distributed with other versions \fIUNIX\fR and
similar systems is unknown to the author.
.FS
*UNIX is a trademark of AT&T
.FE
.AE
.LP
In this document references to "the user manual" mean
.I
Awk - A Pattern Scanning and Processing Language (Second Edition)
.R
by Aho, Kernighan, and Weinberger. References to "awk(1)" mean
the entry for
.B awk
in the
.I
UNIX Programmer's Manual, 4th Berkeley Distribution.
.R
References to "the documentation" mean both of those.
.LP
In most examples, the outermost set of braces ('{ }') have been
ommitted. They would, of course, be necessary in real scripts.
.NH
Known Bugs
.LP
There are three main bugs known to me. They involve:
.IP
Assignment to input fields.
.IP
Piping output to a program from within an \fBawk\fR script.
.IP
Using '*' in \fIprintf\fR field width and precision specifications
does not work, nor do '\\f' and '\\b' print formfeed and backspace
respectively.
.NH 2
Assignment to Input Fields
.LP
[This problem is partially fixed in \fI4.3BSD\fR;
see the last paragraph of this section regarding the unfixed portion.]
.LP
The user manual states that input fields may be objects of assignment
statements. Given the input line
.DS
field_one field_two field_three
.DE
the script
.DS
$2 = "new_field_2"
print $0
.DE
should print
.DS
field_one new_field_2 field_three
.DE
.LP
This does not work; it will print
.DS
field_one field_two field_three
.DE
That is, the script will behave as if the
assignment to $2 had not been made. However,
explicitly referencing an "assigned to" field
.I does
recognize that the assignment has been made.
If the script
.DS
$2 = "new_field_2"
print $1, $2, $3
.DE
is given the same input it will [properly] print
.DS
field_one new_field_2 field_three
.DE
Therefore, you can
get around this bug with, e.g.,
.DS
$2 = "new_field_2"
output = $1 # Concatenate output fields
for(i = 2; i <= NF; ++i) # into a single output line
output = output OFS $i # with OFS between fields
print output
.DE
.LP
In \fI4.3BSD\fR, this bug has been fixed to the extent that
the failing example above works correctly. However, a script like
.DS
$2 = "new_field_2"
var = $0
print var
.DE
still gives incorrect output. This problem can be bypassed by using
.DS
\fIvar\fR = sprintf("%s", $0)
.DE
instead of "\fIvar\fR = $0"; \fIvar\fR will have the correct value.
.NH 2
Piping Output to a Program
.LP
[This problem appears to have been fixed in \fI4.3BSD\fR,
but that has not been exhaustively tested.]
.LP
The user manual states that
.I print
and
.I printf
statements may write to a program using, e.g.,
.DS
print | "\fIcommand\fR"
.DE
This would pipe the output into \fIcommand\fR, and it
does work. However, you should be aware that this causes
.B awk
to spawn a child process (\fIcommand\fR), and that it
.I
does not
.R
wait for the child to exit before it exits itself. In the case of a
"slow" command like
.B sort,
.B awk
may exit before
.I command
has finished.
.LP
This can cause problems in, for example, a shell script that
depends on everything done by
.B awk
being finished before the next shell command is executed.
Consider the shell script
.DS
awk -f awk_script input_file
mv sorted_output somewhere_else
.DE
and the
.B awk
script
.DS
print output_line | "sort -o sorted_output"
.DE
If
.I input_file
is large
.B awk
will exit long before
.B sort
is finished. That means that the
.B mv
command will be executed before
.B sort
is finished, and the result is unlikely to be what you wanted.
Other than fixing the source, there is no way to avoid this
problem except to handle such pipes outside of the awk script, e.g.
.DS
awk -f awk_file input_file | sort -o sorted_output
mv sorted_output somewhere_else
.DE
which is not wholly satisfactory.
.LP
See
.I
Sketchily Documented Features
.R
below for other considerations in redirecting
output from within an
.B awk
script.
.NH 2
Printf and '*', '\\f', and '\\b'
.LP
The document says that the \fIprintf\fR function provided is
identical to the \fIprintf\fR provided by the \fIC\fR language
\fBstdio\fR package. This is incorrect: '*' cannot be used to
specify a field width or precision, and '\\f' and '\\b' cannot
be used to print formfeeds and backspaces.
.LP
The command
.DS
printf("%*.s", len, string)
.DE
will cause a core dump. Given \fBawk\fR's age, it is likely
that its \fIprintf\fR was written well before the use of '*'
for specifying field width and precision appeared in the \fBstdio\fR
library's \fIprintf\fR. Another possibility is that it wasn't
implemented because it isn't really needed to achieve the same effect.
.LP
To accomplish this effect, you can utilize the fact that \fBawk\fR
concatenates variables before it does any other processing on them.
For example, assume a script has two variables \fIwid\fR and
\fIprec\fR which control the width and precision used for printing
another variable \fIval\fI:
.DS
[code to set "wid", "prec", and "val"]
printf("%" wid "." prec "d\en", val)
.DE
If, for example, \fIwid\fR is 8 and \fIprec\fR is 3, then /fBawk\fR
will concatenate everything to the left of the comma in
the \fIprintf\fR statement, and the statement will really be
.DS
printf(%8.3d\en, val)
.DE
These could, of course, been assigned to some variable \fIfmt\fR before
being used:
.DS
fmt = "%" wid "." prec "d"
printf(fmt "\en", val)
.DE
Note, however, that the newline ("\en") in the second form \fIcannot\fR
be included in the assignment to \fIfmt\fR.
.LP
To allow use of '\\f' and '\\b', \fBawk\fR's \fIlex\fR script must
be changed. This is trivial to do (it is done at the point
where '\\n' and '\\t' are processed), but requires having source
code. [I have fixed this and have not seen any unwanted effects.]
# .bp
.NH
Undocumented Features
.LP
There are several undocumented features:
.IP
Variable values may be established on the command line.
.IP
A
.B getline
function exists that reads the next input line and starts processing it
immediately.
.IP
Regular expressions accept octal representations of characters.
.IP
A
.B -d
flag argument produces debugging output if
.B awk
was compiled with "DEBUG" defined.
.IP
Scripts may be "compiled" and run later (providing the installer
did what is necessary to make this work).
.NH 2
Defining Variables On The Command Line
.LP
To pass variable values into a script at run time, you may use
.IP
.I variable=value
.LP
(as many as you like) between any "\fB-f \fIscriptname\fR" or
.I program
and the names of any files to be processed. For example,
.DS
awk -f awkscript today=\e"`date`\e" infile
.DE
would establish for
.I awkscript
a variable named
.B today
that had as its value the output of the
.B date
command.
.LP
There are a number of caveats:
.IP
Such assignments may appear only between
.B -f
.I awkscript
(or \fIprogram\fR or [see below] \fB-R\fIawk.out\fR)
and the name of any
input file (or '-').
.IP
Each
.I variable=value
combination must be a single argument (i.e. there must not be spaces
around the '=' sign);
.I value
may be either a numeric value or a string. If it is a string,
it must be enclosed in
double quotes at the time \fBawk\fR reads the argument. That means
that the double quotes enclosing \fIvalue\fR on the command line
must be protected from the shell as in the example above or it will
remove them.
.IP
.I Variable
is not available for use within the script until after the first record
has been read and parsed, but it is available as soon as
that has occurred so that it may be used before any other
processing begins. It does not exist at the time the
.B BEGIN
block is executed, and if there was no input it will not exist in the
.B END
block (if any).
.NH 2
Getline Function
.LP
.B Getline
immediately reads the next input line (which is parsed into \fI$1\fR,
\fI$2\fR, etc) and starts processing it at the location of the call
(as opposed to
.B next
which immediately reads the next input line but starts processing
from the start of the script).
.LP
.B Getline
facilitates performing some types of tasks such as
processing files with multiline records and merging
information from several files. To use the latter as an example,
consider a case where two files, whose lines do not share
a common format, must be processed together. Shell and \fBawk\fR
scripts to do this might look something like
.sp
In the shell script
.DS
( echo DATA1; cat datafile1; echo ENDdata1 \e
echo DATA2; cat datafile2; echo ENDdata2 \e
) | \e
awk -f awkscript - > awk_output_file
.DE
In the
.B awk
script
.DS
/^DATA1/ { # Next input line starts datafile1
while (getline && $1 !~ /^ENDdata1$/)
{
[processing for \fIdata1\fR lines]
}
}
.sp 1
/^DATA2/ { # Next input line starts datafile2
while (getline && $1 !~ /^ENDdata2$/)
{
[processing for \fIdata2\fR lines]
}
}
.DE
There are, of course, other ways of accomplishing this particular task
(primarily using \fBsed\fR to preprocess the information),
but they are generally more difficult to write and more
subject to logic errors. Many cases arising in practice
are significantly more difficult, if not impossible, to handle
without \fBgetline\fR.
.NH 2
Regular Expressions
.LP
The sequence "\fI\eddd\fR" (where 'd' is a digit)
may be used to include explicit octal
values in regular expressions. This is often useful if "nonprinting"
characters have been used as "markers" in a file. It has not been
tested for ASCII values outside the range 01 through 0127.
.NH 2
Debugging output
.LP
[This is unlikely to be of interest to the casual user.]
.sp
If \fBawk\fR was compiled with "DEBUG" defined, then giving it a
.B -d
flag argument will cause it to produce debugging output when it is run.
This is sometimes useful in finding obscure problems in scripts, though
it is primarily intended for tracking down problems with \fBawk\fR itself.
.NH 2
Script "Compilation"
.LP
[It is likely that this does not work at most sites. If it does not, the
following will probably not be of interest to the casual user.]
.sp
The command
.DS
awk -S -f script.awk
.DE
produces a file named
.B awk.out.
This is a core image of
.B awk
after parsing the file
.I script.awk.
The command
.DS
awk -Rawk.out datafile
.DE
causes
.B awk.out
to be applied to \fIdatafile\fR (or the standard input if no
input file is given). This avoids having to reparse large
scripts each time they are used. Unfortunately, the way this
is implemented requires some special action on the part of the
person installing \fBawk\fR.
.LP
As \fBawk\fR is delivered with \fI4.2 BSD\fR (and \fI4.3 BSD\fR),
.I awk.out
is created by the \fBawk -S ...\fR process by calling
.B sbrk()
with '0', writing out the returned value, then
writing out the core image from location 0 to
the returned address. The \fBawk -R...\fR process
reads the first word of
.I awk.out
to get the length of the image, calls
.B brk()
with that length, and
then reads the image into itself starting at location 0.
For this to work, \fBawk\fR must have been loaded with its
text segment writeable. Unfortunately,
the \fIBSD\fR default for \fBld\fR is to load with the text
read-only and shareable. Thus, the installer must remember to take
special action (e.g. "cc -N ..."
[equivalently "ld -N ..."] for \fI4BSD\fR) if these
flags are to work.
.LP
[Personally, I don't think it is
a very good idea to give \fBawk\fR the opportunity
to write on its text segment; I changed it so that
only the data segment is overwritten.]
.LP
Also, due to what appears to be a lapse in logic, the first
non-flag argument following \fB-R\fIawk.out\fR is discarded.
[Disliking that behavior, the I changed it so that the \fB-R\fR flag
is treated like the \fB-f\fR flag: no flag arguments may follow it.]
# .bp
.NH
Sketchily Documented Features
.LP
.NH 2
Exit
.LP
The user manual says that using the
.B exit
function causes the script to behave as if end-of-input has been reached.
Not menitoned explicitly is the fact that this will cause the
.B END
block to be executed if it exists.
Also, two things are ommitted:
.IP
\fBexit(\fIexpr\fB)\fR causes the script's exit status to be
set to the value of \fIexpr\fR.
.IP
If
.B exit
is called within the
.B END
block, the script exits immediately.
.NH 2
Mathematical Functions
.LP
The following builtin functions exist and are mentioned in
.I awk(1)
but not in the user manual.
.IP \fBint(\fIx\fB)\fR 10
\fIx\fR trunctated to an integer.
.IP \fBsqrt(\fIx\fB)\fR 10
the square root of \fIx\fR for \fIx\fR >= 0, otherwise zero.
.IP \fBexp(\fIx\fB)\fR 10
\fBe\fR-to-the-\fIx\fR for -88 <= \fIx\fR <= 88, zero
for \fIx\fR < -88, and dumps core for \fIx\fR > 88.
.IP \fBlog(\fIx\fB)\fR 10
the natural log of \fIx\fR.
.NH 2
OFMT Variable
.LP
The variable
.B OFMT
may be set to, e.g. "%.2f", and purely numerical output will be
bound by that restriction in
.B print
statements. The default value is "%.6g". Again, this is mentioned in
.I awk(1)
but not in the user manual.
.NH 2
Array Elements
.LP
The user manual states that "Array elements ... spring into existence by
being mentioned." This is literally true;
.I any
reference to an array element causes it to exist.
("I was thought about, therefore I am.")
Take, for example,
.DS
if(array[$1] == "blah")
{
[process blah lines]
}
.DE
If there is not an existing element of
.B array
whose subscript is the same as the contents of the
current line's first field,
.I
one is created
.R
and its value (null, of course) is then compared
with "blah". This can be a bit
disconcerting, particularly when later processing is using
.DS
for (i in \fBarray\fR)
{
[do something with result of processing
"blah" lines]
}
.DE
to walk the array and expects all the elements to be non-null.
Succinct practical examples are difficult to construct, but
when this happens in a 500 line
script it can be difficult to determine what has gone wrong.
.NH 2
FS and Input Fields
.LP
By default any number of spaces or tabs can separate fields (i.e.
there are no null input fields) and trailing spaces and tabs
are ignored. However, if
.B FS
is explicitly set to any character other than a space
(e.g., a tab: \fBFS = "\et"\fR), then a field is defined
by each such character and trailing field separator characters are
not ignored. For example, if '>' represents a tab then
.DS
one>>three>>five>
.DE
defines six fields, with fields two, four, and six being empty.
.LP
If
.B FS
is explicitly set to a space (\fBFS\fR = "\ "), then
the default behavior obtains (this may be a bug); that
is, both spaces
and tabs are taken as field separators, there can be no
null input fields, and trailing spaces and tabs are ignored.
.NH 2
RS and Input Records
.LP
If
.B RS
is explicitly set to the null string (\fBRS\fR = ""), then the input
record separator becomes a blank line, and the newlines at the end
of input lines is a field separator. This facilitates
handling multiline records.
.NH 2
"Fall Through"
.LP
This is mentioned in the user manual, but it is important
enough that it is worth pointing out here, also.
.LP
In the script
.DS
/\fIpattern_1\fR/ {
[do something]
}
.sp
/\fIpattern_2\fR/ {
[do something]
}
.DE
all input lines will be compared with both
.I pattern_1
and
.I pattern_2
unless the
.B next
function is used before the closing '}' in the
.I pattern_1
portion.
.NH 2
Output Redirection
.LP
Once a file (or pipe) is opened by
.B awk
it is not closed until
.B awk
exits. This can occassionally cause problems. For example,
it means that a script that sorts its input lines into
output files named by the contents of their first fields
(similar to an example in the user manual)
.DS
{ print $0 > $1 }
.DE
is going to fail if the number of different first fields exceeds
about 10.
This problem
.I cannot
be avoided by using something like
.DS
{
command = "cat >> " $1
print $0 | command
}
.DE
as the value of the variable
.B command
is different for each different value of
.I $1
and is therefore treated as a different output "file".
.LP
[I have not been able to create a truly satisfactory
fix for this that doesn't involve having \fBawk\fR treat output
redirection to pipes differently from output to files; I
would greatly appreciate hearing of one.]
.NH 2
Field and Variable Types, Values, and Comparisons
.LP
The following is a synopsis of notes included with \fBawk\fR's
source code.
.NH 3
Types
.LP
Variables and fields can be strings or numbers or both.
.NH 4
Variable Types
.LP
When a variable is set by the assignment
.DS
\fIvar\fR = \fIexpr\fR
.DE
its type is set to the type of
.I expr
(this includes +=, ++, etc). An arithmetic
expression is of type
.I number,
a concatenation is of type
.I string,
etc.
If the assignment is a simple copy, e.g.
.DS
\fIvar1\fR = \fIvar2\fR
.DE
then the type of
.I var1
becomes that of
.I var2.
.LP
Type is determined by context; rarely, but always very inconveniently,
this context-determined type is incorrect. As mentioned in
.I awk(1)
the type of an expression can be coerced to that desired. E.g.
.DS
{
\fIexpr1\fR + 0
.sp 1
\fIexpr2\fR "" # Concatenate with a null string
}
.DE
coerces
.I expr1
to numeric type and
.I expr2
to string type.
.NH 4
Field Types
.LP
As with variables, the type of a field is determined by
context when possible, e.g.
.RS
.IP $1++ 8
clearly implies that \fI$1\fR is to be numeric, and
.IP $1\ =\ $1\ ","\ $2 16
implies that $1 and $2 are both to be strings.
.RE
.LP
Coercion is done as needed.
In contexts where types cannot be reliably determined, e.g.,
.DS
if($1 == $2) ...
.DE
the type of each field is determined on input by inspection. All fields are
strings; in addition, each field that contains only a number
is also considered numeric. Thus, the test
.DS
if($1 == $2) ...
.DE
will succeed on the inputs
.DS
0 0.0
100 1e2
+100 100
1e-3 1e-3
.DE
and fail on the inputs
.DS
(null) 0
(null) 0.0
2E-518 6E-427
.DE
"only a number" in this case means matching the regular expression
.DS
^[+-]?[0-9]*\e.?[0-9]+(e[+-]?[0-9]+)?$
.DE
.NH 3
Values
.LP
Uninitialized variables have the numeric value 0 and the string value "".
Therefore, if \fIx\fR is uninitialized,
.DS
if(x) ...
if (x == "0") ...
.DE
are false, and
.DS
if(!x) ...
if(x == 0) ...
if(x == "") ...
.DE
are true.
.LP
Fields which are explicitly null have the string value "", and are not numeric.
Non-existent fields (i.e., fields past \fBNF\fR) are also treated this way.
.NH 3
Types of Comparisons
.LP
If both operands are numeric, the comparison is made
numerically. Otherwise, operands are coerced to type
string if necessary, and the comparison is made on strings.
.NH 3
Array Elements
.LP
Array elements created by
.B split
are treated in the same way as fields.
----------------------------------------------------------------------------
Francois-Michel Lang
Paoli Research Center, Unisys lang@prc.unisys.com (215) 648-7256
Dept of Comp & Info Science, U of PA lang@linc.cis.upenn.edu (215) 898-9511arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (11/17/89)
OK. Hopefully this is the definitive word on how things work. V7 awk (old awk, /usr/bin/awk on Suns and other 4.3 based machines) awk '....' a=1 b=2 file c=3 file a is set to 1, b to 2, then the files are read and no more assignments are done. This feature was undocumented On my Sun, the value of a and b are NOT available in the BEGIN block. After the first file is read c gets set to 3. Then the next one is read. S5R3.n, n >= 1 nawk (new awk) awk '....' a=1 b=2 file c=3 file a is set to 1, b to 2, and those values ARE available in the BEGIN block. Then the first file is read, then c is set to 3, then the second file is read. The value of c is NOT set in the BEGIN block. There are inconsistencies here, since conceptually the assignments are done when it goes to do a file open, and it "notices" that it's really a variable assignment. But a and b are assigned before any program execution begins, while files aren't opened until after the BEGIN block has been run. Note that the assignment of c is done correctly, after the BEGIN block. GNU Awk 2.11 and S5R4 nawk awk -v z=26 '....' a=1 b=2 file c=3 file z is set to 26 before the BEGIN block is executed. Then the BEGIN block is run. a is set to 1, b to 2, the first file is opened and processed, then c is set to 3, and then the second file is processed. Unfortunately, people had come to rely on the way nawk did assignments before the BEGIN block was run. But yet the behavior was inconsistent. So, to have our cake and eat it too, ALL assignments that are where file names are supposed to be are done after the BEGIN block. But, to make a variable be available in the BEGIN block, the new -v option was added. You must supply a -v option for each variable to be assigned. It is important to note that normal assignments are done AT THE TIME they would have been opened as a file; don't expect c to be set while the first file is being processed. This is something that took some discussion and hammering out between the GNU people (me and David Trueman), Brian Kernighan at Bell Labs (and Al Aho through him), and Randall Howard at MKS. In fact, when Brian first changed his awk to be consistent he got the loudest complaints about needing variable assignments to happen before the BEGIN block was run (Hi Tom!). Adding a command line option was the best compromise we could come up with -- the text of the awk program does not change, just the command line to invoke it, and everyone felt that while it wasn't particularly pretty, we could all live with it. (I mentioned the S5R4 awk above; I can't promise this, but I do know that Brian has made his version of awk, which works as described above, available to them for inclusion is S5R4. Perhaps someone doing S5R4 at AT&T can let us know if it made it in. He also should have gotten his version to the toolchest, but I don't know about that for sure either.) GNU Awk 2.11.1 (version 2.11 at patchlevel 1) has been sent to comp.sources.unix and should be appearing there shortly. Some version of gnu awk will be in 4.4 BSD, when that comes out. *** There is the separate question, "what if I have a filename with an `=' in it?" The short answer is "don't do that". It should perhaps be possible to come up with a simple and consistent rule. I don't know what that rule is right now though, since we haven't given it a lot of thought yet. But I suspect you can look for a change in gawk 2.12 to address this. Any more questions, class? :-) -- Arnold Robbins -- guest account at Emory Math/CS | Laundry increases DOMAIN: arnold@emory.mathcs.emory.edu | exponentially in the UUCP: gatech!emory!arnold PHONE: +1 404 636-7221 | number of children. BITNET: arnold@emory | -- Miriam Hartholz
merlyn@iwarp.intel.com (Randal Schwartz) (11/19/89)
In article <15924@bloom-beacon.MIT.EDU>, jik@athena (Jonathan I. Kamens) writes: | 1. Why isn't this mentioned in the BSD man page awk(1), or in the | /usr/doc documentation about awk? I got it by looking through the source. That's the One True Way to know things about UNIX. Too bad the commercial world has seen fit to lock up the sources now that they "support" (ha!) things. Just another person who's read *every* line of the V7 source, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/