rohit@dmdev.UUCP (Rohit Mehrotra) (02/09/91)
Hi, I want to convert a list of keywords (about 150) into upper-case where ever they occur in a text file, i.e. for ex: wherever say "update" or "Update" occurs as a full word change it to "UPDATE". Is their a PERL,SED,AWK script out their that would do this for me. Please EMAIL me your responses as my news software is screwed up a little bit these days, and I WOULD post a summary. thanks rohit EMAIL: rohit%dmdev@uunet.uu.net or uunet!dmdev!rohit -- Rohit Mehrotra Fleet Credit Corporation 8325 NW 53rd St, Miami, Fl 33166. E-MAIL Address uunet!dmdev!rohit VOICE 1-(305)-477-0390 Ext 469
tchrist@convex.COM (Tom Christiansen) (02/10/91)
> I want to convert a list of keywords (about 150) into upper-case > where ever they occur in a text file, i.e. for ex: wherever say "update" > or "Update" occurs as a full word change it to "UPDATE". > Is their a PERL,SED,AWK script out their that would do this for me. Here's my solution: #!/usr/bin/perl $WORDS = shift || die "usage: $0 wordlist [files ...]\n"; open WORDS || die "can't open $WORDS $!"; $code = "while (<>) {\n study;\n"; while (<WORDS>) { chop; s/(\W)/\\$1/g; ($lhs = $_) =~ tr/A-Z/a-z/; ($rhs = $_) =~ tr/a-z/A-Z/; $code .= " s/\\b$lhs\\b/$rhs/gi;\n"; } $code .= " print;\n}\n"; #print STDERR $code; eval $code; die $@ if $@; Whether the study helps you or not depends on the word list. I ran mine on perl's reserved words (plus fuzz) on its man page: sed -ne 's/.*strEQ(d,"\([^"]*\).*/\1/p' perl/src/toke.c > words # ~200 time perl capwords words /usr/man/man1/perl.1 > capperl It took me 20 user seconds to run this on a C-220. It takes ~10 more without the study. I doubt you'll get a sed/awk solution that approaches this speed. But I did try... I attempted to construct an equivalent sh+sed program, but didn't know how to express s/\bfoo\b/FOO/g in sed -- the \b escaped me. So I decided to make do with s/foo/FOO/g, but had problems with built-in limits on the total number of sed commands. When I reduced this to ~190 substitutions instead of 200, it ran but it took more than twice as long (just to run the dynamic sed script, not to build it with sh and paste and tr and awk). Then I remembered that the perl version was doing s/foo/FOO/gi so changed the sed to things like s/[Ff][Oo][Oo]/FOO/g and found I'd now exceeded sed's limit on the total amount of command text. When I cut the number of words in half (down to <100) and ran it, it took 4x the perl time, to do less than half the work. As we're now approaching an order of magnitude difference, I gave up on sed. One could probably construct a new awk script to do it, but that would probably run much longer still. In fact, I'll even bet that you'd need a highly tuned C program to get this fast. This might be one of the cases where a C program wouldn't be any faster. --tom -- "All things are possible, but not all expedient." (in life, UNIX, and perl)