[comp.unix.questions] shell script to...

neil@ms.uky.edu (Neil Greene) (04/10/91)

Any sed gurus that would like to explain how to accomplish the following.  I
have not masterd the art of sed or awk.

I have a file that contains drug names and next to the drug name is the drug
group.

> Dipyrone		Analgesic
> Nefopam		Analgesic
> Thiosalicylic Acid	Analgesic
> Xylazine		Analgesic
> Chloramphenicol	Antibiotic 

A need a shell script that will read from another (ascii) data file, find an
occurance of a DRUG_NAME, write the line to another (ascii) file and append
the appropriate DRUG_TYPE to the new line.

# line with drug name in it
xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx 

# rewrite new line to new ascci file
xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic

-- 
Neil Greene ---	University of Kentucky Mathmatics and Sciences
		University of Kentucky Computing Center 

neil@graphlab.cc.uky.edu [NeXT Attachments]

jik@athena.mit.edu (Jonathan I. Kamens) (04/10/91)

In article <neil.671228747@s.ms.uky.edu>, neil@ms.uky.edu (Neil Greene) writes:
|> A need a shell script that will read from another (ascii) data file, find an
|> occurance of a DRUG_NAME, write the line to another (ascii) file and append
|> the appropriate DRUG_TYPE to the new line.
|> 
|> # line with drug name in it
|> xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx 
|> 
|> # rewrite new line to new ascci file
|> xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic

  You could do this in awk by reading in the first data file and putting its
contents into an associative array -- for each line in the drug type data
file, use $1 as the index in the array, and $2 as the value to associate with
that index.  Then read in the second data file and look up the drugs and
append the type to the end of the line.  Something like this:

(cat drug-list; echo "END_OF_DRUG_LIST"; cat other-data-file) | awk '
BEGIN {drug_list = 1}
/END_OF_DRUG_LIST/ {drug_list = 0; next}
drug_list != 0 {drugtypes[$1] = $2; next}
{print $0 " " drugtypes[$8]}'

  Personally, I would do this in perl, and write one function to read in the
data file and build the array, and another function to read in the other file
and do the output once the array has been built.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

colemanm@cheops.qld.tne.oz.au (Mark Coleman) (04/11/91)

neil@ms.uky.edu (Neil Greene) writes:

>Any sed gurus that would like to explain how to accomplish the following.  I
>have not masterd the art of sed or awk.

># line with drug name in it
>xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx 

># rewrite new line to new ascci file
>xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic

>-- 
>Neil Greene ---	University of Kentucky Mathmatics and Sciences
>		University of Kentucky Computing Center 

>neil@graphlab.cc.uky.edu [NeXT Attachments]


I've been wanting to sus out how do this as well.

Looks like just the right thing to have in your 'GENERIC SCRIPT' directory,
since this can apply to all sorts of applications for table lookup.

TaRuntTaRa.....Markc.....( colemanm@cheops.qld.tne.oz.au )
" I wouldn't have the foggiest what my employers think I only work here ?!? "

lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) (04/11/91)

Most AT&T versions come with 'joinn', a tool designed for just this purpose.

tchrist@convex.COM (Tom Christiansen) (04/12/91)

From the keyboard of neil@ms.uky.edu (Neil Greene):
:Any sed gurus that would like to explain how to accomplish the following.  I
:have not masterd the art of sed or awk.
:
:I have a file that contains drug names and next to the drug name is the drug
:group.
:
:> Dipyrone		Analgesic
:> Nefopam		Analgesic
:> Thiosalicylic Acid	Analgesic
:> Xylazine		Analgesic
:> Chloramphenicol	Antibiotic 
:
:A need a shell script that will read from another (ascii) data file, find an
:occurance of a DRUG_NAME, write the line to another (ascii) file and append
:the appropriate DRUG_TYPE to the new line.
:
:# line with drug name in it
:xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx 
:
:# rewrite new line to new ascci file
:xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic

Here's a simple-minded perl script to do this.  It reads from
"drugs.types" to load the table, then reads stdin and writes stdout
according to your spec:

    open (TYPES, "drugs.types") || die "can't open drugs.types: $!";
    while (<TYPES>) {
	split;
	$types{$_[0]} = $_[1];
    }

    while (<>) {
	chop;
	print;
	study; # compile pattern space for speed
	foreach $name (keys %types) {
	    if (/\b$name\b/) {
		print ' ', $types{$name};
		last;
	    }
	}
	print "\n";
    }

No checking is done on the input validity in the TYPES file.  This would
also be a bit slow if you had a big table because of all the re_comp()s that
get called.  A faster, albeit less obvious way to do this would be to use
an eval.  This makes it look like a bunch of constant strings, which when
combined with the "study" statement, does B-M one better, and really
blazes.  Another possible speed optimization would be to make the if's
into a cascading if/elsif block, which would get internalized into one big
switch statement, and perl would jump directly to the right case.


    open (TYPES, "drugs.types") || die "can't open drugs.types: $!";
    while (<TYPES>) {
	split;
	$types{$_[0]} = $_[1];
    } 

    $code = <<EO_CODE;
	while (<>) {
	    chop;
	    print;
    EO_CODE
	for $name (keys %types) {
	    $code .= <<EO_CODE;
	    if (/\\b$name\\b/) {
		print ' ', \$types{"$name"}, "\n";
		next;
	    } 
    EO_CODE
	} 
	$code .= <<EO_CODE;
	    print "\n";
	}
    EO_CODE

    print $code;

    eval $code;
    die $@ if $@;


--tom

jimr@hplsdv7.COS.HP.COM (Jim Rogers) (04/12/91)

You can do the whole job very simply using the "join" command.

This command is designed to merge records of two files on the value of
"common" fields (such as drug name).

See the "join" man page for details.

Jim Rogers
Hewlett-Packard Company
Colorado Springs, Colorado

jde@uwbln.uucp (Jutta Degener) (04/14/91)

Neil Greene writes:
> I have a file that contains drug names and next to the drug name is the drug
> group.
> 
> > Dipyrone		Analgesic
> > [...]
> 
> I need a shell script that will read from another (ascii) data file,
> find an occurance of a DRUG_NAME, write the line to another (ascii) file
> and append the appropriate DRUG_TYPE to the new line.

`write all occurances of <name> into a file, followed by <group>' is
something sed can do well.  A statement like

		"/name/s/$/ <group>/w file"
	or 	"/name/s/$/ <group>/p"		for stdout

works fine.

Your input file with the drug pairs is almost a `program' you could feed
to sed.  One would want to add a few slashes, but that, again, could be
accomplished using sed, for example with

	s/\([-a-zA-Z0-9]*\)[ 	]*\([-a-zA-Z0-9]*\)/-e \/\1\/s\/$\/\ \2\/p/p

which turns 
	 foo  bar
into
	-e /foo/s/$/ bar/p

if I'm not mistaken (quote until works).
Now you have three possibilities left: 

(a) forget it, get pearl
(b) figure out how to get those darn spaces across and end up using
    something like

	eval sed -n `sed -n -e "s/\([-a-zA-Z0-9]*\)[ 	]*\([-a-zA-Z0-9]*\)/
		-e \'\/\1\/s\/$\/\ \2\/p'/p" < $1`

    which actually seems to work (joined into one line) from sh,
    given the drug file as first argument
(c) get bitten by the command line length limit, use a tempfile for
    the sed program or go back to step (a)

Still waiting for the awk solution,
				Jutta
--
#include <std/disclaimer.h> Jutta Degener jutta@tub.cs.tu-berlin.de (owl:hugs.)