[comp.unix.questions] Awk Field Separators

cubbage@se-sd.SanDiego.NCR.COM (Sharon Cubbage) (08/22/90)

Does anybody know how to specify more than one field separator in Awk?
I would like to specify to an Awk program to treat single spaces as well
as bars as field separators so that a string such as :

12 12 12 34|34|34

will be said to have 6 fields.  I've tried to create a regular expression
to handle both cases but it hasn't been working.

Any hints?

Thanks!
Sharon

merlyn@iwarp.intel.com (Randal Schwartz) (08/22/90)

In article <3729@se-sd.SanDiego.NCR.COM>, cubbage@se-sd (Sharon Cubbage) writes:
| 
| Does anybody know how to specify more than one field separator in Awk?
| I would like to specify to an Awk program to treat single spaces as well
| as bars as field separators so that a string such as :
| 
| 12 12 12 34|34|34
| 
| will be said to have 6 fields.  I've tried to create a regular expression
| to handle both cases but it hasn't been working.

There's no easy way to handle it in plain Awk.  (Alright everyone, what's
the next phrase... come along now...) Get Perl.

...
while (<>) {
	@arr = split(/[ |]/);
	# $arr[0] .. $arr[5] now has the six fields you asked for
	...
}
...

Just another Perl [book] hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

felps@convex.com (Robert Felps) (08/22/90)

cubbage@se-sd.SanDiego.NCR.COM (Sharon Cubbage) writes:
>Does anybody know how to specify more than one field separator in Awk?
>I would like to specify to an Awk program to treat single spaces as well
>as bars as field separators so that a string such as :

>12 12 12 34|34|34

>will be said to have 6 fields.  I've tried to create a regular expression
>to handle both cases but it hasn't been working.

Try nawk(SV3.2 or later) or gawk from GNU!

nawk 'BEGIN { FS = "[ |]" }
{
# awk program ...
}'

------------------------------------------------------------------------------
| Robert Felps           |-The more you own,          | Tech. Assistant Ctr  |
| Convex Computer Corp   |  The more you have to fix! | OS System Specialist |
| 3000 Waterview Parkway |                            | felps@convex.com     |
| Richardson, Tx.  75083 |                            | 1(800) 952-0379      |
------------------------------------------------------------------------------

omerzu@quando.quantum.de (Thomas Omerzu) (08/22/90)

In article <1990Aug22.054330.24911@iwarp.intel.com>
merlyn@iwarp.intel.com (Randal Schwartz) writes:

|| Does anybody know how to specify more than one field separator in Awk?
|| I would like to specify to an Awk program to treat single spaces as well
|| as bars as field separators so that a string such as :
|| 
|| 12 12 12 34|34|34
[...]
|
|There's no easy way to handle it in plain Awk.  (Alright everyone, what's
|the next phrase... come along now...) Get Perl.
[...]

And, if you aren't lucky (?) enough to have
Perl, you might use
	sed 's/|/ /g' | awk ...

So you will only have to deal with one field separator in Awk.




-- 
*-----------------------------------------------------------------------------*
Thomas Omerzu      UUCP:     ...!unido!quando!omerzu / omerzu@quando.uucp
  Quantum GmbH,    Bitnet:   UNIDO!quando!omerzu / omerzu%quando@UNIDO(.bitnet)
Dortmund, Germany  Internet: omerzu@quando.quantum.de

john@basho.uucp (John Lacey) (08/23/90)

In article <1990Aug22.054330.24911@iwarp.intel.com> of comp.unix.questions
    merlyn@iwarp.intel.com (Randal Schwartz) writes:
} In article <3729@se-sd.SanDiego.NCR.COM>, cubbage@se-sd (Sharon Cubbage) writes:
} | 
} | Does anybody know how to specify more than one field separator in Awk?
} | I would like to specify to an Awk program to treat single spaces as well
} | as bars as field separators so that a string such as :
} | 
} | 12 12 12 34|34|34
} | 
} | will be said to have 6 fields.  I've tried to create a regular expression
} | to handle both cases but it hasn't been working.
} 
} There's no easy way to handle it in plain Awk.  (Alright everyone, what's
} the next phrase... come along now...) Get Perl.
} 
} [ some gross Perl stuff deleted ...]

"There's no easy way to handle it in plain Awk"?  Okay, maybe you like Perl
better, but that's no reason to needlessy hose Awk.  If by plain Awk you 
mean 1977 Awk, fine.  But don't get Perl, get New Awk, or GNU Awk.

Then you can type

	awk 'BEGIN { FS="[ |]" } ...' ...

or better, with GNU Awk,

	gawk -v FS="[ |]" ...

Just another Awk hacker ... :-)

-- 
John Lacey, 
   E-mail:  ...!osu-cis!n8emr!uncle!basho!john  (coming soon: john@basho.uucp)
   Voice:   (614) 436--3773, or 487--8570
"What was the name of the dog on Rin-tin-tin?"  --Mickey Rivers, ex-Yankee CF

rdavis@connie.UUCP (Ray Davis) (08/23/90)

Just run the input through

    tr \| \  <---- there is a space after that last backslash

and pipe that to your awk.

Ok then,

    tr '|' ' '

if you prefer readability.

shun@cbnewsh.att.com (shun.cheung) (08/23/90)

In article <3729@se-sd.SanDiego.NCR.COM> cubbage@se-sd.SanDiego.NCR.COM (Sharon Cubbage) writes:
>
>Does anybody know how to specify more than one field separator in Awk?
>I would like to specify to an Awk program to treat single spaces as well
>as bars as field separators so that a string such as :
>
>12 12 12 34|34|34
>
>will be said to have 6 fields.  I've tried to create a regular expression

This can be achieved by re-defining the field separator to be
a blank or a "|":

   FS = "[\ \|]"

-- 
-- Shun Cheung, AT&T Bell Laboratories, Middletown, New Jersey
     electronic: shun@hou2d.att.com,  att!hou2d!shun,  or shun@cbnewsh.att.com
       voice: (201) 615-5135

norm@oglvee.UUCP (Norman Joseph) (08/23/90)

In <1990Aug22.054330.24911@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:

>In article <3729@se-sd.SanDiego.NCR.COM>, cubbage@se-sd (Sharon Cubbage) writes:
>| Does anybody know how to specify more than one field separator in Awk?
>| [...] so that a string such as :
>| 
>| 12 12 12 34|34|34
>| 
>| will be said to have 6 fields. [...]

>There's no easy way to handle it in plain Awk.  (Alright everyone, what's
>the next phrase... come along now...) Get Perl.
>[...]

What's so hard about:

        BEGIN { FS = "[ ]|[|]" }

other than the fact that it requires the newer version of awk, as
distributed with 5.3 (on this Altos, running 5.3, it's called ``nawk'').
This makes single blanks and single bars field separators.  If you
want to allow multiple spaces or bars, use "[ ]+|[|]+".  If you want
to collapse multiple occurences of bars or spaces as a single field
separator use "[ |]+".

>Just another Perl [book] hacker,

Just another Perl wannabe (Mebbe someday when I have the time...)  :-)
-- 
Norm Joseph                                      cgh!amanue!oglvee!norm@dsi.com
  Oglevee Computer Systems, Inc.                {pitt,cgh}!amanue!oglvee!norm
      "Shucking Usenet oysters in pursuit of a pearl."  --  Bill Kennedy

guy@auspex.auspex.com (Guy Harris) (08/24/90)

>Try nawk(SV3.2 or later)

SVR3.*1* or later!

jeffr@bcs800.UUCP (Jeff Riegel) (08/24/90)

In <3729@se-sd.SanDiego.NCR.COM> cubbage@se-sd.SanDiego.NCR.COM (Sharon Cubbage) writes:


>Does anybody know how to specify more than one field separator in Awk?
>I would like to specify to an Awk program to treat single spaces as well
>as bars as field separators so that a string such as :

>12 12 12 34|34|34

>will be said to have 6 fields.  I've tried to create a regular expression
>to handle both cases but it hasn't been working.

>Any hints?

>Thanks!
>Sharon

How about using tr to comvert "|" to " ", and if you need 1the "|"'s use
awk and printf to recreate the new file properly delimited...

wrightgr@mwk.uucp (Greg, Ext. 3414) (08/24/90)

In article <3729@se-sd.SanDiego.NCR.COM>, cubbage@se-sd.SanDiego.NCR.COM (Sharon Cubbage) writes:
> Does anybody know how to specify more than one field separator in Awk?
> I would like to specify to an Awk program to treat single spaces as well
> as bars as field separators so that a string such as :
> 
> 12 12 12 34|34|34
> 
> will be said to have 6 fields.  I've tried to create a regular expression
> to handle both cases but it hasn't been working.
> 
> Any hints?

I haven't found any way to do it either.  The manual says it only accepts a
single character.  You might try getting a copy of GAWK (Gnu Awk).  GAWK allows
regular expressions for the FS variable. For example:
   awk 'BEGIN {FS="[ |]"} {print NF}' input.dat
gives a result of 6 for the data above.  Awk, however, gives 1 as a result.

It's worth checking out.

Greg Wright <wrightgr@mwk>
            uucp: uhnix1!mwk!wrightgr  So they tell me.  We just got a new feed
                                       so I don't know for sure.

adb@cs.bu.edu (Adam Bryant) (08/26/90)

In article <3353@mwk.uucp> wrightgr@mwk.uucp (Greg, Ext. 3414) writes:
+  Does anybody know how to specify more than one field separator in Awk?
+  I would like to specify to an Awk program to treat single spaces as well
+  as bars as field separators so that a string such as :
+  
+  12 12 12 34|34|34
+  
+  will be said to have 6 fields.  I've tried to create a regular expression
+  to handle both cases but it hasn't been working.

I use the split() command when handling multiple field separators.

Code to print a list of fields separated by a ' ' or a '|' is:

{
     for (i = 1; i <= NF; i++) {
	num = split($i, nstr, "|");
	for (j = 1; j <= num; j++) {
		print( nstr[j] )
	}
     }
}

Hope this helps.

adam bryant

harrison@necssd.NEC.COM (Mark Harrison) (08/29/90)

In article <3729@se-sd.SanDiego.NCR.COM>, cubbage@se-sd.SanDiego.NCR.COM
(Sharon Cubbage) writes:

> Does anybody know how to specify more than one field separator in Awk?
> I would like to specify to an Awk program to treat single spaces as well
> as bars as field separators so that a string such as :
> 
> 12 12 12 34|34|34
> 
> will be said to have 6 fields.  I've tried to create a regular expression
> to handle both cases but it hasn't been working.

The new version of Awk (nawk) can do this, but the old version can't.

$ echo "a b,c" | nawk -F",| " '{print NF}'
3
$ echo "a b,c" | awk -F",| " '{print NF}'
2
-- 
Mark Harrison             harrison@necssd.NEC.COM
(214)518-5050             {necntc, cs.utexas.edu}!necssd!harrison
standard disclaimers apply...

rice@dg-rtp.dg.com (Brian Rice) (08/30/90)

In article <427@necssd.NEC.COM>, harrison@necssd.NEC.COM (Mark Harrison)
writes:
|> In article <3729@se-sd.SanDiego.NCR.COM>,
cubbage@se-sd.SanDiego.NCR.COM
|> (Sharon Cubbage) writes:
|> 
|> > Does anybody know how to specify more than one field separator in
Awk?
|> > I would like to specify to an Awk program to treat single spaces as
well
|> > as bars as field separators so that a string such as :
|> 
|> The new version of Awk (nawk) can do this, but the old version
can't.

If you don't have nawk (you're missing out if you don't!), you can
achieve
the same effect by writing a filter program, like so (this is off the
top of
my head):

#include <stdio.h>
#include <string.h>
#define MY_DELIMITER_CHARS " |\n"
#define MY_OUTPUT_DELIMITER ' '
#define MY_BUFSIZE 256

/* Use the getline() on page 67 of K&R1. */

main()
{
     char *strtok();
     char buff[MY_BUFSIZE];
     char *token;

     while (getline(buff,MY_BUFSIZE) != 0) {
          token = strtok(buff,MY_DELIMITER_CHARS);
          while (token != NULL) {
               fputs(token,stdout);
               putchar(MY_OUTPUT_DELIMITER);
               token = strtok(NULL,MY_DELIMITER_CHARS);
          }
          putchar('\n');
     }
}

Compile and link, then use it like this:

$ echo "1 2 3 4|5|6" | the_above_program | awk '{print $5}'
5

A nice enhancement would be to allow the user to specify a list of input
delimiter characters and the output delimiter character on the command 
line (but make sure that newline is an input delimiter, unless you want
to change the code).

--
Brian Rice   rice@dg-rtp.dg.com   +1 919 248-6328
DG/UX Product Assurance Engineering
Data General Corp., Research Triangle Park, N.C.