[comp.unix.questions] awk Question

lesh@BRL.ARPA (01/26/87)

	I would like to use 'awk' to segregate data from a master file
into specific files with the filename based on the contents of a
specific field in one of the input records.  'awk' permits directing printed
output to filenames specified in quoted variables.

	"Without quotes, the file names are treated as uninitialized
variables and all output then goes to the same file."*1  Unquoted variables
thus provide the vehicle to files named from a value obtained from a field
of a previous record.

	THE PROBLEM:
	"Users should also note that there is an upper limit to the
number of files that are written in this way.  At present it is ten."*1

	I can't find any way to close a file opened by 'awk' and very
soon get the "too many opened files error message". 

	Any suggestions?

	
1. Support Tools Guide, p. 6-31.

dce@mips.UUCP (01/27/87)

In article <3746@brl-adm.ARPA> lesh@BRL.ARPA (ISC | howard) writes:
>	THE PROBLEM:
>	"Users should also note that there is an upper limit to the
>number of files that are written in this way.  At present it is ten."*1
>
>	I can't find any way to close a file opened by 'awk' and very
>soon get the "too many opened files error message". 
>
>	Any suggestions?

Don't try to close the file (even though some newer versions of awk
may have a close builtin). I have used the following methods quite
successfully:

1. Iteration - Keep track of the number of files you have used. At that
   point, put all data that goes to a new file onto the standard output.
   By redirecting standard output to a temporary file, you can check after
   the awk script has finished to see if this file is empty. If not, run
   the script again with the temporary file as the input. Otherwise, you
   are finished.

   This method doesn't work when there is a lot of state involved.

2. Pass-thru - Instead of writing the records directly to the file, write
   records of the form

	filename data...

   and use the construct:

		awk ... | while read file data
		do
			echo "$data" >> "$file"
		done

   (functions can make this look a lot cleaner).

   The only problem with this is that the shell read command eats backslashes
   (bug?). If this is unacceptable, you could get away with generating

	filename
	data...

   and use calls to the line command (head -1 may not work correctly here)
   to get the data.
-- 
			David Elliott

UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!dce, DDD:  	408-720-1700

kraml@trwrb.UUCP (Robert P. Kraml) (06/30/87)

I have a question concerning the awk utility.  Does anyone know if
there is a way to pass awk variables out as shell variables on a 
line-by-line basis.  What I mean is the following:

1.  awk reads in a line of NF fields
2.  some of those fields get passed out as shell variables
3.  These variables are operated on (i.e. put into certain files).
4.  Awk reads another line and so on...
 .
 .
 .
I know how to get shell variables into awk but not visa versa.  Any
help would be greatly appreciated.

-- 
Phone: (213) 536-1871     {allegra,uscvax,decvax,randvax,ihnp4,sdcrdcf}
Address: One Space Park                      |
       	 82/2024                              ------>!trwrb!trwcsed!kraml
         Redondo Beach CA 90278

pdg@ihdev.ATT.COM (Joe Isuzu) (07/02/87)

In article <718@trwcsed.trwrb.UUCP> kraml@trwcsed.UUCP (Robert P. Kraml) writes:
>I have a question concerning the awk utility.  Does anyone know if
>there is a way to pass awk variables out as shell variables on a 
>line-by-line basis.  What I mean is the following:
>1.  awk reads in a line of NF fields
>2.  some of those fields get passed out as shell variables
>3.  These variables are operated on (i.e. put into certain files).
>4.  Awk reads another line and so on...


Easy.  Do something like this.....

$eval `awk -f awks`

where awks is:
{
	print $1 "=" $2;
}
or something like that.  With input of
abc def
xyzzy plugh
^D
you will find that $abc is def and $xyzzy is plugh, when you
are back at the shell.

This format requires that you are using ksh or sh.  For csh, the line
that formats the setting arguments should be
print "set " $1 " " $2;

Hope this helped.

-- 

Paul Guthrie				"Another day, another Jaguar"
ihnp4!ihdev!pdg				    -- Pat Sajak

fyl@ssc.UUCP (Phil Hughes) (07/04/87)

In article <718@trwcsed.trwrb.UUCP>, kraml@trwrb.UUCP (Robert P. Kraml) writes:
> I have a question concerning the awk utility.  Does anyone know if
> there is a way to pass awk variables out as shell variables on a 
> line-by-line basis.  

I am now prowd of this and hope someone comes up with a clean way but
I had a similar problem (passing stuff back to a calling shell script).
The child wrote a file consisting of setenv commands with the appropriate
data.  When control was returned to the parent, it did a source of the
file.  (Ok, I'm embarassed but it did what I needed.)

You could use the same method writing the file with awk.

-- 
Phil Hughes, SSC, Inc. P.O. Box 55549, Seattle, WA 98155  (206)FOR-UNIX
	...!uw-beaver!tikal!ssc!fyl

howard@COS.COM (Howard C. Berkowitz) (07/17/87)

I am attempting to write an awk program which reorganizes
text which has a repeating pattern of n lines of text 
followed by a heading line:
-----------------------INPUT TEXT EXAMPLE----------------
The purpose is to test if the implementation        
accepts an ACCEPT request correctly.                                      
.IP ISVB101       
The purpose is to test if the implementation        
detects the error when ACCEPT SPDU is sent with parameters in incorrect
order.            
.IP ISIB102 
---------------------------------------------------------
The awk program should store lines until the ".IP ..."
line is detected, then output (to file foo) the IP
line followed by all text lines:
------------------- DESIRED OUTPUT EXAMPLE  ------------
.IP ISVB101       
The purpose is to test if the implementation        
accepts an ACCEPT request correctly.                                      
.IP ISVB102       
The purpose is to test if the implementation        
detects the error when ACCEPT SPDU is sent with parameters in incorrect
order.            
---------------------------------------------------------
The awk program I have written for this, which includes
debugging code, is:

BEGIN { i = 1
        nip = 0
        ntx = 0
        print "INIT foo" >"foo" }

$1 !~ /.IP/
 {
  # add this text line to the s array.
  # do not yet output it.

  s[i++] = $0
  ++ntx
 }

$1 ~ /.IP/
 {
  # capture this line in the array's first
  # position, then print the array in order.

  s[0] = $0
  for (j=0; j<= i; j++) print j "-" s[j] > "foo"
  i = 1;
  ++nip
 }
END {print "nip=" nip " ntx=" > "foo"}
----------------------------------------------------------------
BEGIN gets control; END never does.  The output begins:

INIT foo
0-The purpose is to test if the implementation        
1-The purpose is to test if the implementation        
2-
0-accepts an ACCEPT request correctly.                                      
1-accepts an ACCEPT request correctly.                                      
2-
0-.IP ISVB101       
1-.IP ISVB101       
2-
------------------------------------------------------------
This type of duplicate output continues; the final END print
never executes.  Help!
-- 
-- howard(Howard C. Berkowitz) @cos.com
 {seismo!sundc, hadron, hqda-ai}!cos!howard
(703) 883-2812 [ofc] (703) 998-5017 [home]
DISCLAIMER:  I explicitly identify COS official positions.

bazavan@hpcesea.HP.COM (Valentin Bazavan) (07/20/87)

Try this one. It produces the output you want.

Valentin Bazavan
...!hplabs!hpcea!bazavan


awk '/.IP/ {print $0; for (i=0;i<count;i++)print line[i]; count=0}
     !/.IP/  {line[count++]=$0}
    ' infile

dph@beta.UUCP (David P Huelsbeck) (07/20/87)

You're obviously not a beginning awk programmer as you were not
far off on this one. I don't remember running into this before but
our 4.3 awk seems to get hosed up when I try to use array[0]. I
assume this was half of your problem. Also if you store into an
array using post-increment the final value of your index is 1 greater
than the index of the last valid element. I'm sure this would have 
been easy to spot if the array[0] problem hadn't been getting you.
The following does what you wanted with just a few changes to your
script. Hope this helps.

Sorry for posting this but my UUCP connection seems a bit flaky lately.
    [mitch, did you ever get my mail from last week ? ]


	David Huelsbeck
	dph@lanl.gov
	{cmcl2,ihnp4}!lanl!dph


---cut here--------cut here-------cut here--------cut here---------cut here---

BEGIN	{
	i = 0   # not really needed but looks good to pascal types ;-)  
	nip = 0; ntxt = 0
	print "INIT foo" > "foo"
	}


$1 !~ /\.IP/	{
	++ntxt
	line[++i] = $0 # if pre-increment is used "i" is always a valid
	}              # array element; just skip line[0] as it is not a
		       # valid array location; THIS WAS YOUR PROBLEM

$1 ~/\.IP/	{
	++nip 
	print > "foo"  # send $0 -the .IP line- to foo
	for (j=1; j<=i; j++) {
		print line[j] > "foo"
	}
	i = 0   # reset i
	}

END {
	printf "nip=%d\tntxt=%d\n", nip, ntxt > "foo"
	}

seb022@tijc02.UUCP (Scott Bemis ) (12/19/87)

12/18/87

The the awk program below works ok on an ms-dos version of
awk I have on my pc from Mortice Kern Systems Inc. (MKS awk). 
The book called "The AWK Programming Language" by the authors of
awk: Aho, Weinberger, and Kernighan. refers to user created awk functions and 
the ** operator to support exponentiation. The ms-dos version of awk from
Mortice Kern Systems Inc. supports reference user created awk functions 
and the ** operator to support exponentiation. Unfortuately, neither feature
appears to be supported in the version of awk that I have on my vax 8600.  
This VAX 8600 is using a port of AT&T UNIX V Release 2.0 Version 2 operating
system.  

Does anyone sell, or provide an awk for AT&T UNIX V Release 2.0 Version 
for VAXes that supports the ** operator for exponentiation and user created
functions.

Since I do not know how to get the version number of my awk on the VAX 
8600, here is a listing from the /usr/src/cmd/awk directory:
total 157
-rw-rw----   1 bin      bin         2662 Jul  4  1983 EXPLAIN
-rw-rw----   1 bin      bin         2974 Jul  4  1983 README
-rw-rw----   1 bin      bin         3169 Jul  4  1983 awk.def
-rw-rw----   1 bin      bin         6385 Jul  4  1983 awk.g.y
-rw-rw----   1 bin      bin         4936 Jul  4  1983 awk.lx.l
-rw-rw----   1 bin      bin         2226 Nov  7  1983 awk.mk
-rw-rw----   1 bin      bin        10877 Jul  4  1983 b.c
-rw-rw----   1 bin      bin          528 Jul  4  1983 freeze.c
-rw-rw----   1 bin      bin         6537 Jul  4  1983 lib.c
-rw-rw----   1 bin      bin         2167 Jul  4  1983 main.c
-rw-rw----   1 bin      bin         2379 Jul  4  1983 makeprctab.c
-rw-rw----   1 bin      bin         2386 Jul  4  1983 parse.c
-rw-rw----   1 bin      bin         2377 Jul  4  1983 proc.c
-rw-rw----   1 bin      bin        15629 Jul  4  1983 run.c
-rw-rw----   1 bin      bin         1520 Jul  3  1984 token.c
-rw-rw----   1 bin      bin          118 Jul  4  1983 tokenscript
-rw-rw----   1 bin      bin         6301 Jul  4  1983 tran.c

Below is the awk program that works with awk from Mortice Kern Systems Inc.
(MKS awk) with ms-dos.  It DOES NOT work with the awk on the VAX 8600.

BEGIN
    {
    # new record separator
    RS = "sec"

    # build look-up arrays
    # tt_types 0..15 (0x00..0x0f)
    split("L V K X Y CR X-PAC Y-PAC CR-PAC WX WY ** ** ** TCP TCC",tt1)

    # tt_types 16..23  (0x10..0x17)
    split("DSP DSC DCP ** ** ** ** **",tt2)

    # tt_types 45..53  (0x2d..0x3f)
    split("LSTATUS ** ** ** ** ** ** ** Lmode",tt3)

    # tt_type  69  (0x45)
    split("AVF",tt4)

    # tt_types 96..108 (0x60..0x6c)
    split("LKC LTI LTD LHA LLA LPV LPVH LPVL LODA LYDA LTS LSP LMN",tt5)

    # tt_types 112..119 (0x70..0x77)
    split("LERR LMX LHHA LLLA LRCA ** RSS RDS",tt6)

    # tt_types 120..127 (0x78..0x7f)
    split("RRC ST SD RSESB AHA ALA APV APVH",tt7)

    # tt_types 128..138 (0x80..0x8a)
    split("APVL AODA AYDA ATS ASP ** ** AERR AHHA ALLA ARC",tt8)

    # tt_types 240,241  (0xf0, 0xf1)
    split("V. K.",tt9)
    }

    # conv decimal tt_type to element name string
    function tt_name(n)
        {
        if (n <= 15)      { return tt1[n+1]   }
        else if(n <= 23)  { return tt2[n-15]  }
        else if(n <= 53)  { return tt3[n-44]  }
        else if(n <= 69)  { return tt4[n-68]  }
        else if(n <= 108) { return tt5[n-95]  }
        else if(n <= 119) { return tt6[n-111] }
        else if(n <= 127) { return tt7[n-119] }
        else if(n <= 138) { return tt8[n-127] }
        else if(n <= 241) { return tt9[n-239] }
        else              { return "UNKNOWN"  }
        }




    # convert hex string to decimal
    function frhex(str)
        {
        tot = 0
        l = length(str)
            {
            for(i=0; i<l; i++)
                {
                v = substr(str, l-i, 1)
                x = index("0123456789abcdef",v)
                if(x == 0)
                    {
                    print "format error"
                    exit
                    }
                tot += (16 **i)*(x-1)
                }
            }
            return tot
         }


    # main prog
         {
         if(NF == 0)           # ignore blank lines
            next
         if($11 != "55")       # ignore all but primitive 55
            next
         print "\nprimitive: "$11
         print "recno: 0x"$12
         tblks = frhex($13)
         print "total blocks: " tblks "\n"
         for(j=0; j<tblks; j++)
            {
            el_name = tt_name(frhex($(14+j*5)))
            num_loc = frhex($(15+j*5) $(16+j*5))
            start_addr = frhex($(17+j*5) $(18+j*5))
            printf("%2s %2s %2s %2s %2s   tt_type: %7s     num_loc: %2d   " \
                   "  start_addr: %4d \n", $(14+j*5),$(15+j*5),$(16+j*5), \
                   $(17+j*5),$(18+j*5),el_name,num_loc,start_addr)
            }

         }


Scott Bemis
Texas Instruments
P. O. Drawer 1255  M/S 3517
Johnson City, Tennessee 37601 U.S.A.
telephone: (615) 461-2959
e-mail:  mcnc!rti!tijc02!root

frank@hpuxa.ircc.ohio-state.edu (Frank G. Fiamingo) (10/05/89)

My problem is that I want to check a particular character
position in a record to see whether or not it is a blank.
If so, I want to change it to a zero.  I thought I could
do this with an awk script similar to the one below, but have
had no success.

BEGIN {FS=""}
{split($0,array);
if (array[28] == " ") {array[28] = 0};
for(i in array) print array[$1]}

However, it appears that split is only recognizing 2 fields
in the record, rather than the 35 characters that it contains.
Can anyone tell me where I've gone wrong, or a better way
to do this?

	Thanks,
	Frank

ok@cs.mu.oz.au (Richard O'Keefe) (10/05/89)

In article <281@nisca.ircc.ohio-state.edu>, frank@hpuxa.ircc.ohio-state.edu (Frank G. Fiamingo) writes:
> BEGIN {FS=""}
> {split($0,array);
> if (array[28] == " ") {array[28] = 0};
> for(i in array) print array[$1]}

> However, it appears that split is only recognizing 2 fields
> in the record, rather than the 35 characters that it contains.

New versions of awk may be different, but the 4.3BSD manual says
"The variable FS ... may be changed at any time to any single character."
By experiment, the awk that comes with SunOS 4.0 takes the first character
of FS as the only separator.

The following script does the trick:
BEGIN	{ n = 28 }
	{   if (substr($0, n, 1) == " ") {
		print substr($0, 1, n-1) "0" substr($0, n+1, length-n);
	    } else {
		print;
	    }
	}

Another approach would be to use sed.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/05/89)

In article <281@nisca.ircc.ohio-state.edu>, frank@hpuxa.ircc.ohio-state.edu (Frank G. Fiamingo) writes:
|  My problem is that I want to check a particular character
|  position in a record to see whether or not it is a blank.
|  [ ... ]

|  BEGIN {FS=""}
|  {split($0,array);
|  if (array[28] == " ") {array[28] = 0};
|  for(i in array) print array[$1]}
|  
|  [ ... ]
|  Can anyone tell me where I've gone wrong, or a better way
|  to do this?

One liner:
	substr($0,28,1) == " " { print }
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/05/89)

In article <281@nisca.ircc.ohio-state.edu>, frank@hpuxa.ircc.ohio-state.edu (Frank G. Fiamingo) writes:
|  My problem is that I want to check a particular character
|  position in a record to see whether or not it is a blank.
|  [ ... ]

|  BEGIN {FS=""}
|  {split($0,array);
|  if (array[28] == " ") {array[28] = 0};
|  for(i in array) print array[$1]}

|  [ ... ]
|  Can anyone tell me where I've gone wrong, or a better way
|  to do this?

One liner:
	substr($0,28,1) == " " { print }
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

dph@crystal.lanl (David Huelsbeck) (10/06/89)

It's been a little while since I did much awking but I did A LOT of it
at one time.  I'm not sure if it's the same in the new awk but in the
old awk setting FS to null gave you the default field separator of
whitespace.  That is, if you set it to space or tab, single spaces and
tabs would separate fields so a series of either would give you a bunch
of null fields, but setting it to null gave you the default behavior back.

So your:
> BEGIN {FS=""}
doesn't do anything at all.

The correct technique is to use substr().  This has already been addressed
by others so I'll spare you all the grief of seeing it again.


-dph (still waiting for comp.lang.awk)

larry@macom1.UUCP (Larry Taborek) (10/09/89)

From article <281@nisca.ircc.ohio-state.edu>, by frank@hpuxa.ircc.ohio-state.edu (Frank G. Fiamingo):
> My problem is that I want to check a particular character
> position in a record to see whether or not it is a blank.
> If so, I want to change it to a zero.  I thought I could
> do this with an awk script similar to the one below, but have
> had no success.
> 
> BEGIN {FS=""}
> {split($0,array);
> if (array[28] == " ") {array[28] = 0};
> for(i in array) print array[$1]}
> 
> However, it appears that split is only recognizing 2 fields
> in the record, rather than the 35 characters that it contains.
> Can anyone tell me where I've gone wrong, or a better way
> to do this?

Frank,

Why use split at all?  If $0 contains the entire record, and if
you have fixed position data (as array[28] suggests) then why not
just check $0[28]?

{
FS=""
array=$0
if (array[28] == " ") array[28] ="0"
print array
}

I had to assign $0 to some variable as I wanted to change it.
Awk complains if you try to change $fields directly.  You can
also remove the FS="" statement, as we are working off $0 then
any fields that awk derives are of no consequence to this code.

Hope this helps...

-- 
Larry Taborek	..!uunet!grebyn!macom1!larry	Centel Federal Systems
		larry@macom1.UUCP		11400 Commerce Park Drive
						Reston, VA 22091-1506
						703-758-7000

oneill@getafix.slcs.slb.com (Dennis O'Neill) (01/08/90)

I'm trying to use awk to convert a LaTeX file of mailing address to something
more acceptable to Oracle's bulk data loading facility. Each entry in the file
is a macro for a particular location in the form of

\def\slny{Schlumberger Limited\\
277 Park Avenue\\
New York, NY  10172-0266}

Successive addresses are separated by blank lines; so I'm trying something like

BEGIN{
        FS = "\\"
        RS = ""
        ORS = "\n\n"
        }
$2 ~ /def/ {gsub(/\\\\/, ""); print $3 $4 $5 $6 $7 $8 $9 $10}

And I get 

awk: syntax error near line 8
awk: illegal statement near line 8

It seems it doesn't like the gsub call. So I tried a test with just

{gsub(/\\\\/, "");  print $0}

and get essentially the same message:

awk: syntax error near line 1
awk: illegal statement near line 1

In fact, regardless of what I use for the first argument to gsub, I get the same
error. What am I doing wrong?

Thanks in advance,

Dennis O'Neill

norm@oglvee.UUCP (Norman Joseph) (01/10/90)

In <3384@linus.SLCS.SLB.COM>, by oneill@getafix.slcs.slb.com (Dennis O'Neill):
> 
> [writing about parsing multi-line records with awk, and getting a
>  syntax error on a line using the gsub() function call:]
>
> {gsub(/\\\\/, "");  print $0}
> 
> [generates:]
> 
> awk: syntax error near line 1
> awk: illegal statement near line 1


On my system (Altos running Unix 5.3.1) there are
two versions of awk, namely "awk" and "nawk" (new awk).
Apparently nawk is the latest version of awk as described
in the book _The_AWK_Programming_Language_ by Aho,
Kernighan, & Weinberger, and includes gsub() as a builtin
function, while plain old awk does not.  My suspicion is
that you are using the old awk.


-- 
Norm Joseph - Oglevee Computer System, Inc.
  UUCP: ...!{pitt,cgh}!amanue!oglvee!norm
    /* you are not expected to understand this */