neil@ms.uky.edu (Neil Greene) (04/10/91)
Any sed gurus that would like to explain how to accomplish the following. I have not masterd the art of sed or awk. I have a file that contains drug names and next to the drug name is the drug group. > Dipyrone Analgesic > Nefopam Analgesic > Thiosalicylic Acid Analgesic > Xylazine Analgesic > Chloramphenicol Antibiotic A need a shell script that will read from another (ascii) data file, find an occurance of a DRUG_NAME, write the line to another (ascii) file and append the appropriate DRUG_TYPE to the new line. # line with drug name in it xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx # rewrite new line to new ascci file xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic -- Neil Greene --- University of Kentucky Mathmatics and Sciences University of Kentucky Computing Center neil@graphlab.cc.uky.edu [NeXT Attachments]
jik@athena.mit.edu (Jonathan I. Kamens) (04/10/91)
In article <neil.671228747@s.ms.uky.edu>, neil@ms.uky.edu (Neil Greene) writes: |> A need a shell script that will read from another (ascii) data file, find an |> occurance of a DRUG_NAME, write the line to another (ascii) file and append |> the appropriate DRUG_TYPE to the new line. |> |> # line with drug name in it |> xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx |> |> # rewrite new line to new ascci file |> xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic You could do this in awk by reading in the first data file and putting its contents into an associative array -- for each line in the drug type data file, use $1 as the index in the array, and $2 as the value to associate with that index. Then read in the second data file and look up the drugs and append the type to the end of the line. Something like this: (cat drug-list; echo "END_OF_DRUG_LIST"; cat other-data-file) | awk ' BEGIN {drug_list = 1} /END_OF_DRUG_LIST/ {drug_list = 0; next} drug_list != 0 {drugtypes[$1] = $2; next} {print $0 " " drugtypes[$8]}' Personally, I would do this in perl, and write one function to read in the data file and build the array, and another function to read in the other file and do the output once the array has been built. -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
colemanm@cheops.qld.tne.oz.au (Mark Coleman) (04/11/91)
neil@ms.uky.edu (Neil Greene) writes: >Any sed gurus that would like to explain how to accomplish the following. I >have not masterd the art of sed or awk. ># line with drug name in it >xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx ># rewrite new line to new ascci file >xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic >-- >Neil Greene --- University of Kentucky Mathmatics and Sciences > University of Kentucky Computing Center >neil@graphlab.cc.uky.edu [NeXT Attachments] I've been wanting to sus out how do this as well. Looks like just the right thing to have in your 'GENERIC SCRIPT' directory, since this can apply to all sorts of applications for table lookup. TaRuntTaRa.....Markc.....( colemanm@cheops.qld.tne.oz.au ) " I wouldn't have the foggiest what my employers think I only work here ?!? "
lewis@tramp.Colorado.EDU (LEWIS WILLIAM M JR) (04/11/91)
Most AT&T versions come with 'joinn', a tool designed for just this purpose.
tchrist@convex.COM (Tom Christiansen) (04/12/91)
From the keyboard of neil@ms.uky.edu (Neil Greene): :Any sed gurus that would like to explain how to accomplish the following. I :have not masterd the art of sed or awk. : :I have a file that contains drug names and next to the drug name is the drug :group. : :> Dipyrone Analgesic :> Nefopam Analgesic :> Thiosalicylic Acid Analgesic :> Xylazine Analgesic :> Chloramphenicol Antibiotic : :A need a shell script that will read from another (ascii) data file, find an :occurance of a DRUG_NAME, write the line to another (ascii) file and append :the appropriate DRUG_TYPE to the new line. : :# line with drug name in it :xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx : :# rewrite new line to new ascci file :xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic Here's a simple-minded perl script to do this. It reads from "drugs.types" to load the table, then reads stdin and writes stdout according to your spec: open (TYPES, "drugs.types") || die "can't open drugs.types: $!"; while (<TYPES>) { split; $types{$_[0]} = $_[1]; } while (<>) { chop; print; study; # compile pattern space for speed foreach $name (keys %types) { if (/\b$name\b/) { print ' ', $types{$name}; last; } } print "\n"; } No checking is done on the input validity in the TYPES file. This would also be a bit slow if you had a big table because of all the re_comp()s that get called. A faster, albeit less obvious way to do this would be to use an eval. This makes it look like a bunch of constant strings, which when combined with the "study" statement, does B-M one better, and really blazes. Another possible speed optimization would be to make the if's into a cascading if/elsif block, which would get internalized into one big switch statement, and perl would jump directly to the right case. open (TYPES, "drugs.types") || die "can't open drugs.types: $!"; while (<TYPES>) { split; $types{$_[0]} = $_[1]; } $code = <<EO_CODE; while (<>) { chop; print; EO_CODE for $name (keys %types) { $code .= <<EO_CODE; if (/\\b$name\\b/) { print ' ', \$types{"$name"}, "\n"; next; } EO_CODE } $code .= <<EO_CODE; print "\n"; } EO_CODE print $code; eval $code; die $@ if $@; --tom
jimr@hplsdv7.COS.HP.COM (Jim Rogers) (04/12/91)
You can do the whole job very simply using the "join" command. This command is designed to merge records of two files on the value of "common" fields (such as drug name). See the "join" man page for details. Jim Rogers Hewlett-Packard Company Colorado Springs, Colorado
jde@uwbln.uucp (Jutta Degener) (04/14/91)
Neil Greene writes: > I have a file that contains drug names and next to the drug name is the drug > group. > > > Dipyrone Analgesic > > [...] > > I need a shell script that will read from another (ascii) data file, > find an occurance of a DRUG_NAME, write the line to another (ascii) file > and append the appropriate DRUG_TYPE to the new line. `write all occurances of <name> into a file, followed by <group>' is something sed can do well. A statement like "/name/s/$/ <group>/w file" or "/name/s/$/ <group>/p" for stdout works fine. Your input file with the drug pairs is almost a `program' you could feed to sed. One would want to add a few slashes, but that, again, could be accomplished using sed, for example with s/\([-a-zA-Z0-9]*\)[ ]*\([-a-zA-Z0-9]*\)/-e \/\1\/s\/$\/\ \2\/p/p which turns foo bar into -e /foo/s/$/ bar/p if I'm not mistaken (quote until works). Now you have three possibilities left: (a) forget it, get pearl (b) figure out how to get those darn spaces across and end up using something like eval sed -n `sed -n -e "s/\([-a-zA-Z0-9]*\)[ ]*\([-a-zA-Z0-9]*\)/ -e \'\/\1\/s\/$\/\ \2\/p'/p" < $1` which actually seems to work (joined into one line) from sh, given the drug file as first argument (c) get bitten by the command line length limit, use a tempfile for the sed program or go back to step (a) Still waiting for the awk solution, Jutta -- #include <std/disclaimer.h> Jutta Degener jutta@tub.cs.tu-berlin.de (owl:hugs.)