[comp.sys.ibm.pc.misc] Is AWK up to this application?

mario@wjvax.UUCP (Mario Dona) (10/05/90)

Does anyone know of a way of extracting information from a text file
which contains variable length fields, and outputting it in a different
format?  For example, I have a text file which contains names and
addresses as shown below.

John Doe           1563 Meadow Lane         San Jose, CA 94325 more stuff --->
John Smith         345 N. First Street      Oakland, CA 94356  more stuff --->
Henry & Martha Coe 4567 Poplar Ave.         Newark, NJ 78389   more stuff--->

^                  ^                        ^
|                  |                        |
|                  |                        |_____  column 45
|                  |______________________________  column 20
|_________________________________________________  column 1

I want to extract the names and addresses and output it in the following
format:

John Doe
1563 Meadow Lane
San Jose, CA 94325

John Smith
245 N. First Street
Oakland, CA 94356

Henry & Martha Coe
4567 Poplar Ave.
Newark, NJ 7838

Can this be done using AWK, and if so how? Or is there some other way?


  Mario Dona
  ...!{ !decwrl!qubix, ames!oliveb!tymix, pyramid}!wjvax!mario         
  The above opinions are mine alone and not, in any way, those of WJ.

jlp@cbnewsh.att.com (jon.peticolas) (10/05/90)

From article <1661@wjvax.UUCP>, by mario@wjvax.UUCP (Mario Dona):
  
> Does anyone know of a way of extracting information from a text file
> which contains variable length fields, and outputting it in a different
> format?  For example, I have a text file which contains names and
> addresses as shown below.
  
> John Doe           1563 Meadow Lane         San Jose, CA 94325 more stuff --->
> John Smith         345 N. First Street      Oakland, CA 94356  more stuff --->
> Henry & Martha Coe 4567 Poplar Ave.         Newark, NJ 78389   more stuff--->
  
> ^                  ^                        ^
> |                  |                        |
> |                  |                        |_____  column 45
> |                  |______________________________  column 20
> |_________________________________________________  column 1
  
> I want to extract the names and addresses and output it in the following
> format:
  
> John Doe
> 1563 Meadow Lane
> San Jose, CA 94325
  
> John Smith
> 245 N. First Street
> Oakland, CA 94356
  
> Henry & Martha Coe
> 4567 Poplar Ave.
> Newark, NJ 7838
  
> Can this be done using AWK, and if so how? Or is there some other way?

Yes

{
print substr($0,1,19);
print substr($0,20,24);
print substr($0,45);
}

Provided that all of the whitespace made up of  space characters (no tabs).

-Jon

There's a time and place for spontaneity.

bob@wyse.wyse.com (Bob McGowen x4312 dept208) (10/06/90)

In article <1661@wjvax.UUCP> mario@wjvax.UUCP (Mario Dona) writes:
>
>Does anyone know of a way of extracting information from a text file
>which contains variable length fields, and outputting it in a different
>format?  For example, I have a text file which contains names and
>addresses as shown below.
>
>John Doe           1563 Meadow Lane         San Jose, CA 94325 more stuff --->
...
>^                  ^                        ^
>|                  |                        |_____  column 45
>|                  |______________________________  column 20
>|_________________________________________________  column 1
>
>I want to extract the names and addresses and output it in the following
>format:
>
>John Doe
>1563 Meadow Lane
>San Jose, CA 94325
.....
>Can this be done using AWK, and if so how? Or is there some other way?
>

The answer re awk is that it depends.  If all the fields are separated
by spaces, in varying numbers, then you have problems using awk.  If
the file has (or can be recreated with) tabs (or some other separator
character like a colon or |) between each set of fields then it is
relatively easy.  (Ie. if the white space marked below with the carets
were a tab in each instance.)

John Doe           1563 Meadow Lane         San Jose, CA 94325 more stuff --->
	^^^^^^^^^^^		   ^^^^^^^^^		      ^
You could then use the following awk code to process your file:

  awk -F'->' '{printf "%s\n%s\n%s\n",$1,$2,$3}' file_to_process

Note that the -> is used to represent a literal tab.  If you use some other
character to separate the fields, then substitute it.

If the file is spaces between all the visible printing characters, your
primary problem will be cases where the names are of variable size.  For
instance, San Jose and New Orleans vs Denver and Oakland or Meadow Lane
vs Meadow Lane Court.  E-mail me if you would like to discuss this in
more detail.

Bob McGowan  (standard disclaimer, these are my own ...)
Product Support, Wyse Technology, San Jose, CA
..!uunet!wyse!bob
bob@wyse.com

2113av@gmuvax2.gmu.edu (John Porter) (10/06/90)

Mario:
  Couldn't be easier, using awk.  Use the substr() function:

  { print substr($0,1,18)
    print substr($0,19,27)
    print substr($0,45,len($0)-44)
    print ""
  }

or some such (whatever numbers).

Good luck!   john.porter

glee@tigris.uucp (Godfrey Lee) (10/08/90)

In article <1661@wjvax.UUCP> mario@wjvax.UUCP (Mario Dona) writes:
>Does anyone know of a way of extracting information from a text file
>which contains variable length fields, and outputting it in a different
>format?

Looks more like fixed length fields below.

>For example, I have a text file which contains names and
>addresses as shown below.
>
>John Doe           1563 Meadow Lane         San Jose, CA 94325
>John Smith         345 N. First Street      Oakland, CA 94356
>Henry & Martha Coe 4567 Poplar Ave.         Newark, NJ 78389
>
>^                  ^                        ^
>|                  |                        |_____  column 45
>|                  |______________________________  column 20
>|_________________________________________________  column 1
>
>I want to extract the names and addresses and output it in the following
>format:
>
>John Doe
>1563 Meadow Lane
>San Jose, CA 94325
>
>John Smith
>245 N. First Street
>Oakland, CA 94356
>
>Henry & Martha Coe
>4567 Poplar Ave.
>Newark, NJ 7838
>
>Can this be done using AWK, and if so how? Or is there some other way?


Simple, for example:

	awk -f address.awk <infile >outfile

-------------- address.awk ----------------------------
{
	name = substr ( $0,  1, 19 )
	add1 = substr ( $0, 20, 25 )
	add2 = substr ( $0, 45 )
	printf ( "%s\n%s\n%s\n\n", name, add1, add2 )
}
---------end of address.awk ----------------------------


Of course, it would be simpler if you just separate the fields with a single
character such as ':', e.g.

John Doe:1563 Meadow Lane:San Jose, CA 94325:more stuff --->
John Smith:345 N. First Street:Oakland, CA 94356:more stuff --->
Henry & Martha Coe:4567 Poplar Ave.:Newark, NJ 78389:more stuff--->

then all you need is:

	awk -F: '{ printf ("%s\n%s\n%s\n\n", $1, $2, $3 ) }' <infile >outfile
	    ^^^
	      |
	      +---- defines field separator to be ':'

Good luck.
-- 
Godfrey Lee
cognos!alzabo!tigris!glee