[comp.unix.questions] Narly Nawk Script

bm@bike2work.Eng.Sun.COM (Bill Michel) (12/11/90)

I'm working on a shell script that makes extensive use of (n)awk.

I'm *really confused as to the general workings of the script, and
would apprecieate some help.

Assume my script to be called "nawkfile" it is invoked as follows:

nawkfile inputfile

where inputfile is the file containing the input to be processed.

My main questions are :

1) where does the data put into "string" go after the first call
   to nawk?

2) does $* mean a recursive call to the script, if so, how can 
   this be used as input to the second nawk call

Thanks in advance


--------------------------------

The general layout is as follows

nawk '
{
process a bunch of text, and append it to the variable "string"
}
END {
  print string
}
' $* |
nawk '
{
do some more processing
}' |

nawk '
{
do some more processing and send the output to a file
}
--
Bill Michel			
bm@eng.sun.com		These views are my own, not Sun's.

tchrist@convex.COM (Tom Christiansen) (12/11/90)

In article <4255@exodus.Eng.Sun.COM> bm@bike2work.Eng.Sun.COM 
(Bill Michel) writes:

:I'm working on a shell script that makes extensive use of (n)awk.
:
:The general layout is as follows
:
:nawk ' {
:    process a bunch of text, and append it to the variable "string"
:}
:END {
:  print string
:}
:' $* |
:nawk '
:{
:do some more processing
:}' |
:
:nawk '
:{
:do some more processing and send the output to a file
:}

The data is going down the pipe to the next awk in the pipeline.  I'd
try really hard if I were you to make this work in just one script if
you possibly can.  Otherwise, you will be slowed down a great deal.

If you can't do that, you might consider using perl instead, which is
something like nawk with a built-in fork (plus considerably more).  In
particular, it supports spawning children and communicating to them
through shared file descriptors without having to know about how pipe()
calls work.  For example:

    if (open(HANDLE, "|-")) {
        # parent code writes to HANDLE
    } else {
        # child code just reads from STDIN per usual
    }

or else:

    if (open(HANDLE, "-|")) {
        # parent code reads from HANDLE
    } else {
        # child code just writes to STDOUT per usual
    }

You can also play more elaborate games using explicit pipe() calls.

This does sound to me like an application where perl might be more
applicable than awk.  You can even use a2p awk-to-perl translator to get
started.  I can't really say though without seeing the script.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

hunt@dg-rtp.rtp.dg.com (Greg Hunt) (12/12/90)

In article <4255@exodus.Eng.Sun.COM>, bm@bike2work.Eng.Sun.COM (Bill Michel) writes:
> 
> I'm working on a shell script that makes extensive use of (n)awk.
> I'm *really confused as to the general workings of the script, and
> would apprecieate some help.
> Assume my script to be called "nawkfile" it is invoked as follows:
> 
> nawkfile inputfile
> 
> where inputfile is the file containing the input to be processed.
> My main questions are :
> 
> 1) where does the data put into "string" go after the first call
>    to nawk?
> 2) does $* mean a recursive call to the script, if so, how can 
>    this be used as input to the second nawk call
> 
> The general layout is as follows
> 
> nawk '
> {
> process a bunch of text, and append it to the variable "string"
> }
> END {
>   print string
> }
> ' $* |
> nawk '
> {
> do some more processing
> }' |
> 

  [ rest of script deleted ]

1.  The data put into "string" is being written (by the print command)
    to the file descriptor called stdout (standard output).  Using the
    pipe symbol "|", you have told the shell to connect the stdout
    from the first nawk to the stdin (standard input) of the second
    nawk.  So, the data is being written through the pipe to the
    second nawk, which will use the data as input.  Using a pipe to
    do this is called "redirecting" the input and output.

2.  The "$*" doesn't indicate a recursive call.  It is the way you
    get access to the command line arguments specified to the script.
    Specifically, $* means "get me all of the command line arguments",
    which in this case is only the name of the "inputfile".  The
    shell substitutes the arguments in place of the $*, so the first
    nawk ends up being called with "inputfile" as its argument.

    You can reference $* as many times as you care to in the script.
    You can also get specific arguments by using $1, $2, etc.  You
    don't need to use the $* for the second nawk to get it's input,
    you've already done that by using the pipe "|" from the first
    nawk.

For more details on both of these points, you might want to look
at the man page for the shell (use "man sh | more").  It will tell
you about the various ways you can redirect input and output, and also
the ways you can manipulate shell script arguments.
    
Enjoy!

--
Greg Hunt                        Internet: hunt@dg-rtp.rtp.dg.com
DG/UX Kernel Development         UUCP:     {world}!mcnc!rti!dg-rtp!hunt
Data General Corporation
Research Triangle Park, NC, USA  These opinions are mine, not DG's.