[comp.unix.shell] searching for data

cs342a37@cs.iastate.edu (Class login) (04/13/91)

I am a new comer in writing shell scripts. I have the following problem:

I have a data file that I use as a key for searching my Master file. Both files are text files. Each line in the Master file is a record. Both files are sorted by the key. I would like to readaa line in the data file for the key, and then read scan the Master file for the line that contains the key and append that to a file.

I have the following script written:

cat datafile : ( while read line; do fgrep "$line" masterfile >> outputfile ; done )

This however, is very slow as I have about 2000 lines of key in my data file and about 10000 lines of records in my master file, and for each key I have to scan about 10000 lines.

Can I write a shell script to do the following:
read a line from masterfile
while more  key to read do
  read a line from data file
  while (key from masterfile < line from data file)
    read line from masterfile
  (end while)
  if line from masterfile contains key
    append to output file
  else
    append empty line to output file
  (endif)
(end while)

]) (04/17/91)

In article <cs342a37.671481248@zaphod> cs342a37@cs.iastate.edu (Class login) writes:
>I am a new comer in writing shell scripts. I have the following problem:
>
>I have a data file that I use as a key for searching my Master file. Both files are text files. Each line in the Master file is a record. Both files are sorted by the key. I would like to readaa line in the data file for the key, and then read scan the Master file for the line that contains the key and append that to a file.
>
>I have the following script written:
>
>cat datafile : ( while read line; do fgrep "$line" masterfile >> outputfile ; done )
>
>This however, is very slow as I have about 2000 lines of key in my data file and about 10000 lines of records in my master file, and for each key I have to scan about 10000 lines.
>
>Can I write a shell script to do the following:
>read a line from masterfile
>while more  key to read do
>  read a line from data file
>  while (key from masterfile < line from data file)
>    read line from masterfile
>  (end while)
>  if line from masterfile contains key
>    append to output file
>  else
>    append empty line to output file
>  (endif)
>(end while)

Here's an awk script that handles it, assuming that your awk has enough
room to store all the keys (if not, send some mail to me including this
article and I'll offer an alternative).

### begin merger.awk ###
#
# call as     awk -f merger.awk key=datafile datafile masterfile
#

# Read in keys
FILENAME == key {
	keydata[$1] = $0
	next
}

# Print key info for each line from the masterfile
{
	print keydata[$1]	# Note: blank line if undefined
}
### end merger.awk ###

If the key data should be merged as lines following the possibly-keyed
data in the masterfile, add a

	print			# masterfile record

line right before the

	print keydata[$1]	# Note: blank line if undefined

line in the script.

...Kris
-- 
Kristopher Stephens, | (408-746-6047) | krs@uts.amdahl.com | KC6DFS
Amdahl Corporation   |                |                    |
     [The opinions expressed above are mine, solely, and do not    ]
     [necessarily reflect the opinions or policies of Amdahl Corp. ]

bengtl@maths.lth.se (Bengt Larsson) (04/18/91)

In article <18dQ01aK5bwe00@amdahl.uts.amdahl.com> krs@amdahl.uts.amdahl.com 
(Kris Stephens [Hail Eris!]) writes:
>In article <cs342a37.671481248@zaphod> cs342a37@cs.iastate.edu 
>(Class login) writes:
>>I am a new comer in writing shell scripts. I have the following problem:
>>I have a data file that I use as a key for searching my Master file.
[trimmed]
>>
>>I have the following script written:
>>
>>cat datafile : ( while read line; do fgrep "$line" masterfile >> 
>>outputfile ; done )
>>
>>This however, is very slow as I have about 2000 lines of key in my
>>data file and about 10000 lines of records in my master file, and for
>>each key I have to scan about 10000 lines. 

There was a suggestion using awk. My mail to the original author bounced,
and noone else has said this, so how about:

  fgrep -f datafile masterfile >> outputfile

_That_ should be faster (fgrep is optimized for this (looking for a
set of fixed strings, that is)). Of course, if I got the problem wrong,
someone will point it out :-)

Bengt Larsson.
-- 
Bengt Larsson - Dep. of Math. Statistics, Lund University, Sweden
Internet: bengtl@maths.lth.se             SUNET:    TYCHE::BENGT_L

wrp@PRC.Unisys.COM (William R. Pringle) (04/18/91)

In article <18dQ01aK5bwe00@amdahl.uts.amdahl.com> krs@amdahl.uts.amdahl.com (Kris Stephens [Hail Eris!]) writes:
>In article <cs342a37.671481248@zaphod> cs342a37@cs.iastate.edu (Class login) writes:
 >>I am a new comer in writing shell scripts. I have the following problem:
 >>
 >>I have a data file that I use as a key for searching my Master file. Both files are text files. Each line in the Master file is a record. Both files are sorted by the key. I would like to readaa line in the data file for the key, and then read scan the Master file for the line that contains the key and append that to a file.
 >>
 >>I have the following script written:
 >>
 >>cat datafile : ( while read line; do fgrep "$line" masterfile >> outputfile ; done )
 >>
 >>This however, is very slow as I have about 2000 lines of key in my data file and about 10000 lines of records in my master file, and for each key I have to scan about 10000 lines.
 >>
 >>Can I write a shell script to do the following:
 >>read a line from masterfile
 >>while more  key to read do
 >>  read a line from data file
 >>  while (key from masterfile < line from data file)
 >>    read line from masterfile
 >>  (end while)
 >>  if line from masterfile contains key
 >>    append to output file
 >>  else
 >>    append empty line to output file
 >>  (endif)
 >>(end while)
>
>Here's an awk script that handles it, assuming that your awk has enough
>room to store all the keys (if not, send some mail to me including this
>article and I'll offer an alternative).

You might also want to look at the join command.  If the keys are sorted,
and you don't have repeating keys, then you could use join to append the
data file onto the end of the master file.

Bill Pringle
wrp@prc.unisys.com