warner@unc.cs.unc.edu (Byron Warner) (11/15/89)
My questions is how do you import csh variables into an awk script.
for example if I have a file called foo, which contains:
{
print import,$0
}
and I issue the command
awk -F: -f foo /etc/passwd import='hello
why do I get just a list of logins?
Thanx in Advance
jik@athena.mit.edu (Jonathan I. Kamens) (11/15/89)
In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) writes: >My questions is how do you import csh variables into an awk script. >for example if I have a file called foo, which contains: >{ > print import,$0 >} > >and I issue the command >awk -F: -f foo /etc/passwd import='hello >why do I get just a list of logins? >Thanx in Advance First of all, I have never known the C-shell to allow the syntax "foo=bar" on a command-line to import a variable into a program. C shell doesn't have anything like that. Second, the only way to do what you want is to actually make the creation of this variable part of the awk script. Like this: % set import = 'hello' % awk 'BEGIN { import = "'"$import"'" } { print import, $0}' /etc/passwd The $import is evaluated before awk is actually called, and replaced by 'hello' (sans quotes). Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710
chris@mimsy.umd.edu (Chris Torek) (11/15/89)
>In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) >writes: [file foo] >>{ print import,$0 } [command] >>awk -F: -f foo /etc/passwd import='hello >>why do I get just a list of logins? In article <15919@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes: > First of all, I have never known the C-shell to allow the syntax >"foo=bar" on a command-line to import a variable into a program. It does not. However, awk does. That is, you are looking at the wrong program. > Second, the only way to do what you want is to actually make the >creation of this variable part of the awk script. Like this: Not so: Within some limits, you can set awk variables from its invocation. For instance: % cat t BEGIN { print "BEGIN: " this; } { print "INPUT: " this " " $0; } END { print "END: " this; } % cat u first line second line % awk -f t u this=that BEGIN: INPUT: first line INPUT: second line END: that % awk -f t this=that u BEGIN: INPUT: that first line INPUT: that second line END: that % rm t u The `BEGIN' statement is done before any `files' are opened; the `END' statement is done after all `files' have been read. Any `files' of the form `a=b' set variable `a' to value `b'. All of the above is with respect to the 4.3BSD flavour of `awk'. The new awk (as described in the awk book) appears to open the first `file' before executing the BEGIN statement, so that any assignments that appear before the first real file happen before the BEGIN. What GNU awk does, I do not know (but the above technique will tell you). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
steinbac@hpl-opus.HP.COM (Gunter Steinbach) (11/16/89)
> / hpl-opus:comp.unix.questions / warner@unc.cs.unc.edu (Byron Warner) > / 1:15 pm Nov 14, 1989 / > My questions is how do you import csh variables into an awk script. > [ deleted ] > awk -F: -f foo /etc/passwd import='hello The variable assignment has to come before the input file name. Guenter Steinbach | hplabs!gunter_steinbach | gunter_steinbach@hplabs.hp.com
jik@athena.mit.edu (Jonathan I. Kamens) (11/16/89)
In article <20774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >... >The `BEGIN' statement is done before any `files' are opened; the `END' >statement is done after all `files' have been read. Any `files' of >the form `a=b' set variable `a' to value `b'. Nifty! Two questions: 1. Why isn't this mentioned in the BSD man page awk(1), or in the /usr/doc documentation about awk? 2. What happens if you actually want to read in a file that has = in the filename? How am I supposed to know what happens if the feature isn't mentioned in documentation? :-) Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8495 Home: 617-782-0710
tale@pawl.rpi.edu (David C Lawrence) (11/16/89)
In <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) writes: Byron> [file foo] Byron> { print import,$0 } Byron> [command] Byron> awk -F: -f foo /etc/passwd import='hello Byron> why do I get just a list of logins? Because the variable assignment has to come before file name. I'm also assuming here that the ' is a typo, or the absence of a match is; either way variable assignment comes before the file list. If you change it to "awk -F: -f foo import=hello /etc/passwd" it will work. This applies to V7 awk, nawk and gawk. In <20774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: Chris> All of the above is with respect to the 4.3BSD flavour of `awk'. The Chris> new awk (as described in the awk book) appears to open the first `file' Chris> before executing the BEGIN statement, so that any assignments that Chris> appear before the first real file happen before the BEGIN. What GNU Chris> awk does, I do not know (but the above technique will tell you). Variables set as above are not available in the BEGIN block with gawk, but a special option, -v, is provided to do this. -v VAR=VAL will assign VAL to VAR before script execution begins; another -v must be specified for each variable you want to declare this way. Dave -- (setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
tale@pawl.rpi.edu (David C Lawrence) (11/16/89)
In <15924@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes:
Jon> 1. Why isn't this mentioned in the BSD man page awk(1), or in the
Jon> /usr/doc documentation about awk?
Oversight, I suppose. SunOS manual page has it.
Jon> 2. What happens if you actually want to read in a file that has = in
Jon> the filename? How am I supposed to know what happens if the
Jon> feature isn't mentioned in documentation? :-)
Good question. I just tried a few different things which I thought
might work and none of them did. It appears as though .*=.* patterns
which appear after a file name (/dev/null in my test case) are simply
ignored; they are neither interpreted as variable assigments nor as
file names. I also tried passing it an arg of foo\=bar (my test case)
and it still did nothing. In fact, it didn't even read stdin. Hmm ...
Dave
--
(setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
merlyn@iwarp.intel.com (Randal Schwartz) (11/16/89)
In article <10531@thorin.cs.unc.edu>, warner@unc (Byron Warner) writes: | My questions is how do you import csh variables into an awk script. | for example if I have a file called foo, which contains: | { | print import,$0 | } | | and I issue the command | awk -F: -f foo /etc/passwd import='hello ^ missing quote, perhaps? | why do I get just a list of logins? The order of command-line options is significant: % awk -F: -f foo import='hello' /etc/passwd yields the result you want. Also note that these variables are not available in the "BEGIN" action (unless something happened after the V7 version of awk). Just another UNIX old-timer, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/
richsc@ism780c.isc.com (Rich Scott) (11/17/89)
In article <15919@bloom-beacon.MIT.EDU> jik@athena.mit.edu (Jonathan I. Kamens) writes: >In article <10531@thorin.cs.unc.edu> warner@unc.cs.unc.edu (Byron Warner) >writes: >>My questions is how do you import csh variables into an awk script. >>for example if I have a file called foo, which contains: >>{ >> print import,$0 >>} >> >>and I issue the command >>awk -F: -f foo /etc/passwd import='hello >>why do I get just a list of logins? Well, apparently awk wants its 'imported' variables specified on the command line *before* the datafile(s), but this isn't obvious from the manual page. Someone here told me that the argument parsing may not be done correctly. Anyway, on my system, which runs SunOS3.5, I get the desired effect (using csh) by doing: awk -F: -f foo import='hello' /etc/passwd (This is running the 4.2 or 4.3 BSD 'awk'; I can't speak for the "new" awk.) > > First of all, I have never known the C-shell to allow the syntax >"foo=bar" on a command-line to import a variable into a program. C >shell doesn't have anything like that. Umm, I don't think it's up to the shell in this case to do anything with it; it's simply an argument to the program. Perhaps Byron, if he really wants to import a C-shell variable into awk, should do: hostname% setenv VAR='hello' hostname% awk -F: -f foo.awk import=$VAR /etc/passwd The first example doesn't set any C-shell variables. ---------------- rich scott rls@i88.isc.com interactive systems corporation voice: (800) LAI-UNIX x255 (formerly lachman associates) naperville, il, usa
lang@PRC.Unisys.COM (Francois-Michel Lang) (11/17/89)
It's time once again to post to this group a document that I have which explains some important things about (vanilla) AWK that are not elsewhere documented.... **************************************************************** \" to print this document, do ditroff -ms -Pip2 awk.supp .RP .TL .B A Supplemental Document For AWK .sp .R - or - .sp .I Things Al, Pete, And Brian Didn't Mention Much .R .AU John W. Pierce .AI Department of Chemistry University of California, San Diego La Jolla, California 92093 jwp%chem@sdcsvax.ucsd.edu .AB As .B awk and its documentation are distributed with .I 4.2 BSD UNIX* .R there are a number of bugs, undocumented features, and features that are touched on so briefly in the documentation that the casual user may not realize their full significance. While this document applies primarily to the \fI4.2 BSD\fR version of \fIUNIX\fR, it is known that the \fI4.3 BSD\fR version does not have all of the bugs fixed, and that it does not have updated documentation. The situation with respect to the versions of \fBawk\fR distributed with other versions \fIUNIX\fR and similar systems is unknown to the author. .FS *UNIX is a trademark of AT&T .FE .AE .LP In this document references to "the user manual" mean .I Awk - A Pattern Scanning and Processing Language (Second Edition) .R by Aho, Kernighan, and Weinberger. References to "awk(1)" mean the entry for .B awk in the .I UNIX Programmer's Manual, 4th Berkeley Distribution. .R References to "the documentation" mean both of those. .LP In most examples, the outermost set of braces ('{ }') have been ommitted. They would, of course, be necessary in real scripts. .NH Known Bugs .LP There are three main bugs known to me. They involve: .IP Assignment to input fields. .IP Piping output to a program from within an \fBawk\fR script. .IP Using '*' in \fIprintf\fR field width and precision specifications does not work, nor do '\\f' and '\\b' print formfeed and backspace respectively. .NH 2 Assignment to Input Fields .LP [This problem is partially fixed in \fI4.3BSD\fR; see the last paragraph of this section regarding the unfixed portion.] .LP The user manual states that input fields may be objects of assignment statements. Given the input line .DS field_one field_two field_three .DE the script .DS $2 = "new_field_2" print $0 .DE should print .DS field_one new_field_2 field_three .DE .LP This does not work; it will print .DS field_one field_two field_three .DE That is, the script will behave as if the assignment to $2 had not been made. However, explicitly referencing an "assigned to" field .I does recognize that the assignment has been made. If the script .DS $2 = "new_field_2" print $1, $2, $3 .DE is given the same input it will [properly] print .DS field_one new_field_2 field_three .DE Therefore, you can get around this bug with, e.g., .DS $2 = "new_field_2" output = $1 # Concatenate output fields for(i = 2; i <= NF; ++i) # into a single output line output = output OFS $i # with OFS between fields print output .DE .LP In \fI4.3BSD\fR, this bug has been fixed to the extent that the failing example above works correctly. However, a script like .DS $2 = "new_field_2" var = $0 print var .DE still gives incorrect output. This problem can be bypassed by using .DS \fIvar\fR = sprintf("%s", $0) .DE instead of "\fIvar\fR = $0"; \fIvar\fR will have the correct value. .NH 2 Piping Output to a Program .LP [This problem appears to have been fixed in \fI4.3BSD\fR, but that has not been exhaustively tested.] .LP The user manual states that .I print and .I printf statements may write to a program using, e.g., .DS print | "\fIcommand\fR" .DE This would pipe the output into \fIcommand\fR, and it does work. However, you should be aware that this causes .B awk to spawn a child process (\fIcommand\fR), and that it .I does not .R wait for the child to exit before it exits itself. In the case of a "slow" command like .B sort, .B awk may exit before .I command has finished. .LP This can cause problems in, for example, a shell script that depends on everything done by .B awk being finished before the next shell command is executed. Consider the shell script .DS awk -f awk_script input_file mv sorted_output somewhere_else .DE and the .B awk script .DS print output_line | "sort -o sorted_output" .DE If .I input_file is large .B awk will exit long before .B sort is finished. That means that the .B mv command will be executed before .B sort is finished, and the result is unlikely to be what you wanted. Other than fixing the source, there is no way to avoid this problem except to handle such pipes outside of the awk script, e.g. .DS awk -f awk_file input_file | sort -o sorted_output mv sorted_output somewhere_else .DE which is not wholly satisfactory. .LP See .I Sketchily Documented Features .R below for other considerations in redirecting output from within an .B awk script. .NH 2 Printf and '*', '\\f', and '\\b' .LP The document says that the \fIprintf\fR function provided is identical to the \fIprintf\fR provided by the \fIC\fR language \fBstdio\fR package. This is incorrect: '*' cannot be used to specify a field width or precision, and '\\f' and '\\b' cannot be used to print formfeeds and backspaces. .LP The command .DS printf("%*.s", len, string) .DE will cause a core dump. Given \fBawk\fR's age, it is likely that its \fIprintf\fR was written well before the use of '*' for specifying field width and precision appeared in the \fBstdio\fR library's \fIprintf\fR. Another possibility is that it wasn't implemented because it isn't really needed to achieve the same effect. .LP To accomplish this effect, you can utilize the fact that \fBawk\fR concatenates variables before it does any other processing on them. For example, assume a script has two variables \fIwid\fR and \fIprec\fR which control the width and precision used for printing another variable \fIval\fI: .DS [code to set "wid", "prec", and "val"] printf("%" wid "." prec "d\en", val) .DE If, for example, \fIwid\fR is 8 and \fIprec\fR is 3, then /fBawk\fR will concatenate everything to the left of the comma in the \fIprintf\fR statement, and the statement will really be .DS printf(%8.3d\en, val) .DE These could, of course, been assigned to some variable \fIfmt\fR before being used: .DS fmt = "%" wid "." prec "d" printf(fmt "\en", val) .DE Note, however, that the newline ("\en") in the second form \fIcannot\fR be included in the assignment to \fIfmt\fR. .LP To allow use of '\\f' and '\\b', \fBawk\fR's \fIlex\fR script must be changed. This is trivial to do (it is done at the point where '\\n' and '\\t' are processed), but requires having source code. [I have fixed this and have not seen any unwanted effects.] # .bp .NH Undocumented Features .LP There are several undocumented features: .IP Variable values may be established on the command line. .IP A .B getline function exists that reads the next input line and starts processing it immediately. .IP Regular expressions accept octal representations of characters. .IP A .B -d flag argument produces debugging output if .B awk was compiled with "DEBUG" defined. .IP Scripts may be "compiled" and run later (providing the installer did what is necessary to make this work). .NH 2 Defining Variables On The Command Line .LP To pass variable values into a script at run time, you may use .IP .I variable=value .LP (as many as you like) between any "\fB-f \fIscriptname\fR" or .I program and the names of any files to be processed. For example, .DS awk -f awkscript today=\e"`date`\e" infile .DE would establish for .I awkscript a variable named .B today that had as its value the output of the .B date command. .LP There are a number of caveats: .IP Such assignments may appear only between .B -f .I awkscript (or \fIprogram\fR or [see below] \fB-R\fIawk.out\fR) and the name of any input file (or '-'). .IP Each .I variable=value combination must be a single argument (i.e. there must not be spaces around the '=' sign); .I value may be either a numeric value or a string. If it is a string, it must be enclosed in double quotes at the time \fBawk\fR reads the argument. That means that the double quotes enclosing \fIvalue\fR on the command line must be protected from the shell as in the example above or it will remove them. .IP .I Variable is not available for use within the script until after the first record has been read and parsed, but it is available as soon as that has occurred so that it may be used before any other processing begins. It does not exist at the time the .B BEGIN block is executed, and if there was no input it will not exist in the .B END block (if any). .NH 2 Getline Function .LP .B Getline immediately reads the next input line (which is parsed into \fI$1\fR, \fI$2\fR, etc) and starts processing it at the location of the call (as opposed to .B next which immediately reads the next input line but starts processing from the start of the script). .LP .B Getline facilitates performing some types of tasks such as processing files with multiline records and merging information from several files. To use the latter as an example, consider a case where two files, whose lines do not share a common format, must be processed together. Shell and \fBawk\fR scripts to do this might look something like .sp In the shell script .DS ( echo DATA1; cat datafile1; echo ENDdata1 \e echo DATA2; cat datafile2; echo ENDdata2 \e ) | \e awk -f awkscript - > awk_output_file .DE In the .B awk script .DS /^DATA1/ { # Next input line starts datafile1 while (getline && $1 !~ /^ENDdata1$/) { [processing for \fIdata1\fR lines] } } .sp 1 /^DATA2/ { # Next input line starts datafile2 while (getline && $1 !~ /^ENDdata2$/) { [processing for \fIdata2\fR lines] } } .DE There are, of course, other ways of accomplishing this particular task (primarily using \fBsed\fR to preprocess the information), but they are generally more difficult to write and more subject to logic errors. Many cases arising in practice are significantly more difficult, if not impossible, to handle without \fBgetline\fR. .NH 2 Regular Expressions .LP The sequence "\fI\eddd\fR" (where 'd' is a digit) may be used to include explicit octal values in regular expressions. This is often useful if "nonprinting" characters have been used as "markers" in a file. It has not been tested for ASCII values outside the range 01 through 0127. .NH 2 Debugging output .LP [This is unlikely to be of interest to the casual user.] .sp If \fBawk\fR was compiled with "DEBUG" defined, then giving it a .B -d flag argument will cause it to produce debugging output when it is run. This is sometimes useful in finding obscure problems in scripts, though it is primarily intended for tracking down problems with \fBawk\fR itself. .NH 2 Script "Compilation" .LP [It is likely that this does not work at most sites. If it does not, the following will probably not be of interest to the casual user.] .sp The command .DS awk -S -f script.awk .DE produces a file named .B awk.out. This is a core image of .B awk after parsing the file .I script.awk. The command .DS awk -Rawk.out datafile .DE causes .B awk.out to be applied to \fIdatafile\fR (or the standard input if no input file is given). This avoids having to reparse large scripts each time they are used. Unfortunately, the way this is implemented requires some special action on the part of the person installing \fBawk\fR. .LP As \fBawk\fR is delivered with \fI4.2 BSD\fR (and \fI4.3 BSD\fR), .I awk.out is created by the \fBawk -S ...\fR process by calling .B sbrk() with '0', writing out the returned value, then writing out the core image from location 0 to the returned address. The \fBawk -R...\fR process reads the first word of .I awk.out to get the length of the image, calls .B brk() with that length, and then reads the image into itself starting at location 0. For this to work, \fBawk\fR must have been loaded with its text segment writeable. Unfortunately, the \fIBSD\fR default for \fBld\fR is to load with the text read-only and shareable. Thus, the installer must remember to take special action (e.g. "cc -N ..." [equivalently "ld -N ..."] for \fI4BSD\fR) if these flags are to work. .LP [Personally, I don't think it is a very good idea to give \fBawk\fR the opportunity to write on its text segment; I changed it so that only the data segment is overwritten.] .LP Also, due to what appears to be a lapse in logic, the first non-flag argument following \fB-R\fIawk.out\fR is discarded. [Disliking that behavior, the I changed it so that the \fB-R\fR flag is treated like the \fB-f\fR flag: no flag arguments may follow it.] # .bp .NH Sketchily Documented Features .LP .NH 2 Exit .LP The user manual says that using the .B exit function causes the script to behave as if end-of-input has been reached. Not menitoned explicitly is the fact that this will cause the .B END block to be executed if it exists. Also, two things are ommitted: .IP \fBexit(\fIexpr\fB)\fR causes the script's exit status to be set to the value of \fIexpr\fR. .IP If .B exit is called within the .B END block, the script exits immediately. .NH 2 Mathematical Functions .LP The following builtin functions exist and are mentioned in .I awk(1) but not in the user manual. .IP \fBint(\fIx\fB)\fR 10 \fIx\fR trunctated to an integer. .IP \fBsqrt(\fIx\fB)\fR 10 the square root of \fIx\fR for \fIx\fR >= 0, otherwise zero. .IP \fBexp(\fIx\fB)\fR 10 \fBe\fR-to-the-\fIx\fR for -88 <= \fIx\fR <= 88, zero for \fIx\fR < -88, and dumps core for \fIx\fR > 88. .IP \fBlog(\fIx\fB)\fR 10 the natural log of \fIx\fR. .NH 2 OFMT Variable .LP The variable .B OFMT may be set to, e.g. "%.2f", and purely numerical output will be bound by that restriction in .B print statements. The default value is "%.6g". Again, this is mentioned in .I awk(1) but not in the user manual. .NH 2 Array Elements .LP The user manual states that "Array elements ... spring into existence by being mentioned." This is literally true; .I any reference to an array element causes it to exist. ("I was thought about, therefore I am.") Take, for example, .DS if(array[$1] == "blah") { [process blah lines] } .DE If there is not an existing element of .B array whose subscript is the same as the contents of the current line's first field, .I one is created .R and its value (null, of course) is then compared with "blah". This can be a bit disconcerting, particularly when later processing is using .DS for (i in \fBarray\fR) { [do something with result of processing "blah" lines] } .DE to walk the array and expects all the elements to be non-null. Succinct practical examples are difficult to construct, but when this happens in a 500 line script it can be difficult to determine what has gone wrong. .NH 2 FS and Input Fields .LP By default any number of spaces or tabs can separate fields (i.e. there are no null input fields) and trailing spaces and tabs are ignored. However, if .B FS is explicitly set to any character other than a space (e.g., a tab: \fBFS = "\et"\fR), then a field is defined by each such character and trailing field separator characters are not ignored. For example, if '>' represents a tab then .DS one>>three>>five> .DE defines six fields, with fields two, four, and six being empty. .LP If .B FS is explicitly set to a space (\fBFS\fR = "\ "), then the default behavior obtains (this may be a bug); that is, both spaces and tabs are taken as field separators, there can be no null input fields, and trailing spaces and tabs are ignored. .NH 2 RS and Input Records .LP If .B RS is explicitly set to the null string (\fBRS\fR = ""), then the input record separator becomes a blank line, and the newlines at the end of input lines is a field separator. This facilitates handling multiline records. .NH 2 "Fall Through" .LP This is mentioned in the user manual, but it is important enough that it is worth pointing out here, also. .LP In the script .DS /\fIpattern_1\fR/ { [do something] } .sp /\fIpattern_2\fR/ { [do something] } .DE all input lines will be compared with both .I pattern_1 and .I pattern_2 unless the .B next function is used before the closing '}' in the .I pattern_1 portion. .NH 2 Output Redirection .LP Once a file (or pipe) is opened by .B awk it is not closed until .B awk exits. This can occassionally cause problems. For example, it means that a script that sorts its input lines into output files named by the contents of their first fields (similar to an example in the user manual) .DS { print $0 > $1 } .DE is going to fail if the number of different first fields exceeds about 10. This problem .I cannot be avoided by using something like .DS { command = "cat >> " $1 print $0 | command } .DE as the value of the variable .B command is different for each different value of .I $1 and is therefore treated as a different output "file". .LP [I have not been able to create a truly satisfactory fix for this that doesn't involve having \fBawk\fR treat output redirection to pipes differently from output to files; I would greatly appreciate hearing of one.] .NH 2 Field and Variable Types, Values, and Comparisons .LP The following is a synopsis of notes included with \fBawk\fR's source code. .NH 3 Types .LP Variables and fields can be strings or numbers or both. .NH 4 Variable Types .LP When a variable is set by the assignment .DS \fIvar\fR = \fIexpr\fR .DE its type is set to the type of .I expr (this includes +=, ++, etc). An arithmetic expression is of type .I number, a concatenation is of type .I string, etc. If the assignment is a simple copy, e.g. .DS \fIvar1\fR = \fIvar2\fR .DE then the type of .I var1 becomes that of .I var2. .LP Type is determined by context; rarely, but always very inconveniently, this context-determined type is incorrect. As mentioned in .I awk(1) the type of an expression can be coerced to that desired. E.g. .DS { \fIexpr1\fR + 0 .sp 1 \fIexpr2\fR "" # Concatenate with a null string } .DE coerces .I expr1 to numeric type and .I expr2 to string type. .NH 4 Field Types .LP As with variables, the type of a field is determined by context when possible, e.g. .RS .IP $1++ 8 clearly implies that \fI$1\fR is to be numeric, and .IP $1\ =\ $1\ ","\ $2 16 implies that $1 and $2 are both to be strings. .RE .LP Coercion is done as needed. In contexts where types cannot be reliably determined, e.g., .DS if($1 == $2) ... .DE the type of each field is determined on input by inspection. All fields are strings; in addition, each field that contains only a number is also considered numeric. Thus, the test .DS if($1 == $2) ... .DE will succeed on the inputs .DS 0 0.0 100 1e2 +100 100 1e-3 1e-3 .DE and fail on the inputs .DS (null) 0 (null) 0.0 2E-518 6E-427 .DE "only a number" in this case means matching the regular expression .DS ^[+-]?[0-9]*\e.?[0-9]+(e[+-]?[0-9]+)?$ .DE .NH 3 Values .LP Uninitialized variables have the numeric value 0 and the string value "". Therefore, if \fIx\fR is uninitialized, .DS if(x) ... if (x == "0") ... .DE are false, and .DS if(!x) ... if(x == 0) ... if(x == "") ... .DE are true. .LP Fields which are explicitly null have the string value "", and are not numeric. Non-existent fields (i.e., fields past \fBNF\fR) are also treated this way. .NH 3 Types of Comparisons .LP If both operands are numeric, the comparison is made numerically. Otherwise, operands are coerced to type string if necessary, and the comparison is made on strings. .NH 3 Array Elements .LP Array elements created by .B split are treated in the same way as fields. ---------------------------------------------------------------------------- Francois-Michel Lang Paoli Research Center, Unisys lang@prc.unisys.com (215) 648-7256 Dept of Comp & Info Science, U of PA lang@linc.cis.upenn.edu (215) 898-9511
arnold@mathcs.emory.edu (Arnold D. Robbins {EUCC}) (11/17/89)
OK. Hopefully this is the definitive word on how things work. V7 awk (old awk, /usr/bin/awk on Suns and other 4.3 based machines) awk '....' a=1 b=2 file c=3 file a is set to 1, b to 2, then the files are read and no more assignments are done. This feature was undocumented On my Sun, the value of a and b are NOT available in the BEGIN block. After the first file is read c gets set to 3. Then the next one is read. S5R3.n, n >= 1 nawk (new awk) awk '....' a=1 b=2 file c=3 file a is set to 1, b to 2, and those values ARE available in the BEGIN block. Then the first file is read, then c is set to 3, then the second file is read. The value of c is NOT set in the BEGIN block. There are inconsistencies here, since conceptually the assignments are done when it goes to do a file open, and it "notices" that it's really a variable assignment. But a and b are assigned before any program execution begins, while files aren't opened until after the BEGIN block has been run. Note that the assignment of c is done correctly, after the BEGIN block. GNU Awk 2.11 and S5R4 nawk awk -v z=26 '....' a=1 b=2 file c=3 file z is set to 26 before the BEGIN block is executed. Then the BEGIN block is run. a is set to 1, b to 2, the first file is opened and processed, then c is set to 3, and then the second file is processed. Unfortunately, people had come to rely on the way nawk did assignments before the BEGIN block was run. But yet the behavior was inconsistent. So, to have our cake and eat it too, ALL assignments that are where file names are supposed to be are done after the BEGIN block. But, to make a variable be available in the BEGIN block, the new -v option was added. You must supply a -v option for each variable to be assigned. It is important to note that normal assignments are done AT THE TIME they would have been opened as a file; don't expect c to be set while the first file is being processed. This is something that took some discussion and hammering out between the GNU people (me and David Trueman), Brian Kernighan at Bell Labs (and Al Aho through him), and Randall Howard at MKS. In fact, when Brian first changed his awk to be consistent he got the loudest complaints about needing variable assignments to happen before the BEGIN block was run (Hi Tom!). Adding a command line option was the best compromise we could come up with -- the text of the awk program does not change, just the command line to invoke it, and everyone felt that while it wasn't particularly pretty, we could all live with it. (I mentioned the S5R4 awk above; I can't promise this, but I do know that Brian has made his version of awk, which works as described above, available to them for inclusion is S5R4. Perhaps someone doing S5R4 at AT&T can let us know if it made it in. He also should have gotten his version to the toolchest, but I don't know about that for sure either.) GNU Awk 2.11.1 (version 2.11 at patchlevel 1) has been sent to comp.sources.unix and should be appearing there shortly. Some version of gnu awk will be in 4.4 BSD, when that comes out. *** There is the separate question, "what if I have a filename with an `=' in it?" The short answer is "don't do that". It should perhaps be possible to come up with a simple and consistent rule. I don't know what that rule is right now though, since we haven't given it a lot of thought yet. But I suspect you can look for a change in gawk 2.12 to address this. Any more questions, class? :-) -- Arnold Robbins -- guest account at Emory Math/CS | Laundry increases DOMAIN: arnold@emory.mathcs.emory.edu | exponentially in the UUCP: gatech!emory!arnold PHONE: +1 404 636-7221 | number of children. BITNET: arnold@emory | -- Miriam Hartholz
merlyn@iwarp.intel.com (Randal Schwartz) (11/19/89)
In article <15924@bloom-beacon.MIT.EDU>, jik@athena (Jonathan I. Kamens) writes: | 1. Why isn't this mentioned in the BSD man page awk(1), or in the | /usr/doc documentation about awk? I got it by looking through the source. That's the One True Way to know things about UNIX. Too bad the commercial world has seen fit to lock up the sources now that they "support" (ha!) things. Just another person who's read *every* line of the V7 source, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/