[comp.unix.questions] Attn: AWK GURUS

vwa0201@marst2 (Larry Baca) (02/28/91)

I want to do a quick generic search of a rather large data base and I want
to do the search based on certain record cols. If I have records that look
like this:

abc   defghi j klmnop  qrst uvw x y z
a bc defg hij   lmnop qrst  uvw x yz
ab c defg hi j klmn op qrst u vwxyz

And say I want to find only the records with (a) in col1, (nop) in col19-21,
(v) in col29 and (y) in col34.

I want to do this in a script where the record cols and params are left to
to the users choice.

I tried doing this with:

--
--
while true
do
  a=`line <file` || break
  do cuts of $a and compare to given params.....

--
--

But this was slower than Sadams SCUDS. Maybe 'C' is a better way to go but
if it can be done with AWK and is still reasonably fast, I would like to know 
about it. Thank you for any ideas you may have.
-- 
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
LARRY BACA,                                                       marst2!lbaca
DAASO-VWA AIS, DEFENSE AUTOMATIC ADDRESSING OFFICE, WESTERN DIVISION
DDTC TRACY, TRACY CA. 95376-5057  AUTOVON 462-9391  COMERCIAL 832-9391

jik@athena.mit.edu (Jonathan I. Kamens) (02/28/91)

  It seems to me that if your database contains simply lines with certain
characters in each column, and you want to search for lines matching a
specified pattern of characters, the simplest thing to use is grep.

  You gave an example of finding "only the records with (a) in col1, (nop) in
col19-21, (v) in col29 and (y) in col34."  How about:

    grep "^a.................   .......v....y" database-filename

Read the man page for grep if you don't understand the periods in the regular
expression above.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

tchrist@convex.COM (Tom Christiansen) (02/28/91)

Sounds like a job for perl.  If you don't like this, enjoy your C program.

--tom
--
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things." -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist

haroldt@paralandra.yorku.ca (Harold Tomlinson) (02/28/91)

  Sorry to post this to the net, but, I could not reach the above addr
via email.

In article <354@marst2> vwa0201@marst2 (Larry Baca) writes:

:>   I want to do a quick generic search of a rather large data base and I want
:>   to do the search based on certain record cols. If I have records that look
:>   like this:
:>
:>   abc   defghi j klmnop  qrst uvw x y z
:>   a bc defg hij   lmnop qrst  uvw x yz
:>   ab c defg hi j klmn op qrst u vwxyz
:>
:>   And say I want to find only the records with (a) in col1, (nop) in col19-21,
:>   (v) in col29 and (y) in col34.
:>
:>   I want to do this in a script where the record cols and params are left to
:>   to the users choice.
:>

   ----- Transcript of session follows -----
550 <vwa0201@marst2>... Host unknown

   ----- Unsent message follows -----
Received: by paralandra.yorku.ca (5.57/Ultrix3.0-C)
	id AA17172; Thu, 28 Feb 91 08:55:21 EST
To: vwa0201@marst2
Subject: Awk db search question.
Date: Thu, 28 Feb 91 08:55:18 -0500
From: haroldt@paralandra.yorku.ca



  I don't think I fully understood what you were asking.  Did you want
column input (as in Sas column input) or variable columns?

  You wrote:
	abc   defghi j klmnop  qrst uvw x y z
	a bc defg hij   lmnop qrst  uvw x yz
	ab c defg hi j klmn op qrst u vwxyz


  Let's say there is a col1 (a string).  What did you want in col1 for
each of these rows?  abc, a, ab?  or a,a,a?

  May I suggest that you look into the substring function for AWK.
======================================================================
===                      Harold Tomlinson                          ===
==             Computing and Communications Services                ==
=                        YORK UNIVERSITY                             =
=                  haroldt@paralandra.yorku.ca                       =
=                      416- 736-5257-33802                           =
======================================================================

--
======================================================================
===                      Harold Tomlinson                          ===
==             Computing and Communications Services                ==
=                        YORK UNIVERSITY                             =
=                  haroldt@paralandra.yorku.ca                       =
=                      416- 736-5257-33802                           =
======================================================================

campbell@lotus.com (Jim Campbell) (03/09/91)

Hi,

I am posting this response to the net because "marst2" is unknown
any way I try it ......

In article <354@marst2> you write:
>I want to do a quick generic search of a rather large data base and I want
>to do the search based on certain record cols. If I have records that look
>like this:
>
>abc   defghi j klmnop  qrst uvw x y z
>a bc defg hij   lmnop qrst  uvw x yz
>ab c defg hi j klmn op qrst u vwxyz
>
>And say I want to find only the records with (a) in col1, (nop) in col19-21,
>(v) in col29 and (y) in col34.
>
>I want to do this in a script where the record cols and params are left to
>to the users choice.
>
>I tried doing this with:
>
>--
>--
>while true
>do
>  a=`line <file` || break
>  do cuts of $a and compare to given params.....
>

You don't need awk, or any Scuds, to do this....  
Try this:
	grep '^a.\{17\}nop.\{7\}v.\{4\}y' <file>

Note that none of the lines you gave fall into the column specfications
you gave.  I modified the second line by removing one of the spaces
preceding the "u", and which made the second line conform to your
specification, and this grep command matched that line. 

Here is my input file:

1234567890123456789012345678901234567
abc   defghi j klmnop  qrst uvw x y z
a bc defg hij   lmnop qrst uvw x yz
ab c defg hi j klmn op qrst u vwxyz

Bonne chance!
--
Jim Campbell, Lotus Development Corporation | harvard!ima   \ 
1 Rogers St., Cambridge, MA 02142           |                >!lotus!campbell
617/693-5652                                |   uunet       /