[comp.unix.questions] merging 2 files

jharkins@sagpd1.UUCP (Jim Harkins) (05/04/90)

I need to change a lot of words with mixed upper and lower case letters into
words of all lower case letters, from there I'm going to make a sed script
to actually modify the words in our source.  So how do I make my list?
For example, I want to convert the list

FooBar			
blaTZ
GRMblE
WhEe

into

FooBar	foobar			
blaTZ	blatz
GRMblE	grmble
WhEe	whee

(from this second list it's trivial to create lines of sed commands like
'/FooBar/foobar/g', and there are around 800 words in my list)

Right now I have 2 files, one with the upper/lower case names, the second
with just lower case names.  I think I've been going down the wrong path
here though.  I'm now thinking of using sed directly but I'm no sed wizard.
Join may also do the job but I'm having trouble deciphering the man page
and none of my experiments to date have remotely resembled what I'm after.

So, has anyone got any advice (outside of having a flunky use vi :-)?
Thanks in advance.


-- 
jim		jharkins@sagpd1

My new investment plan GUARENTEES a 50% rate of return!  Just mail me $10,000
and I promise you'll get $5,000 back.

merlyn@iwarp.intel.com (Randal Schwartz) (05/04/90)

In article <757@sagpd1.UUCP>, jharkins@sagpd1 (Jim Harkins) writes:
| I need to change a lot of words with mixed upper and lower case letters into
| words of all lower case letters, from there I'm going to make a sed script
| to actually modify the words in our source.  So how do I make my list?
| For example, I want to convert the list
| 
| FooBar			
| blaTZ
| GRMblE
| WhEe
| 
| into
| 
| FooBar	foobar			
| blaTZ	blatz
| GRMblE	grmble
| WhEe	whee
| 
| (from this second list it's trivial to create lines of sed commands like
| '/FooBar/foobar/g', and there are around 800 words in my list)
| 
| Right now I have 2 files, one with the upper/lower case names, the second
| with just lower case names.  I think I've been going down the wrong path
| here though.  I'm now thinking of using sed directly but I'm no sed wizard.
| Join may also do the job but I'm having trouble deciphering the man page
| and none of my experiments to date have remotely resembled what I'm after.
| 
| So, has anyone got any advice (outside of having a flunky use vi :-)?
| Thanks in advance.

An all-in-one Perl solution...

################################################## snip snip
#!/usr/local/bin/perl

@names = split(/\n/, <<END_OF_NAMES);
FooBar
blaTZ
GRMblE
WhEe
END_OF_NAMES

$cmd = "study;\n";

for $name (@names) {
	($lcname = $name) =~ y/A-Z/a-z/;
	$cmd .= "s/\\b$name\\b/$lcname/g;\n";
}

while (<>) {
	eval $cmd;
	print;
}
################################################## snip snip

printf "%s", "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

tchrist@convex.COM (Tom Christiansen) (05/04/90)

In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
|I need to change a lot of words with mixed upper and lower case letters into
|words of all lower case letters, from there I'm going to make a sed script
|to actually modify the words in our source.  So how do I make my list?
|For example, I want to convert the list
|
|FooBar			
|blaTZ
|GRMblE
|WhEe
|
|into
|
|FooBar	foobar			
|blaTZ	blatz
|GRMblE	grmble
|WhEe	whee
|
|(from this second list it's trivial to create lines of sed commands like
|'/FooBar/foobar/g', and there are around 800 words in my list)
|
|Right now I have 2 files, one with the upper/lower case names, the second
|with just lower case names.  I think I've been going down the wrong path
|here though.  I'm now thinking of using sed directly but I'm no sed wizard.
|Join may also do the job but I'm having trouble deciphering the man page
|and none of my experiments to date have remotely resembled what I'm after.

Just for the record, I really did play around with sed's y operator a bit
and swapping pattern and hold spaces before throwing up my hands and doing
it the easy way.  Use

    % perl -n foo.pl

where foo.pl consists of:

    chop;
    print $_, " ";
    y/A-Z/a-z/;
    print $_, "\n";

I'm not sure how you plan to merge two files, but I think it'll end up
being easier in perl than in sed.

--tom
--

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

c60c-3cf@e260-3f.berkeley.edu (Dan Kogai) (05/04/90)

In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:

In article <757@sagpd1.UUCP> you write:
>I need to change a lot of words with mixed upper and lower case letters into
>words of all lower case letters, from there I'm going to make a sed script
>to actually modify the words in our source.  So how do I make my list?
>For example, I want to convert the list
>
>FooBar			
>blaTZ
>GRMblE
>WhEe
>
>into
>
>FooBar	foobar			
>blaTZ	blatz
>GRMblE	grmble
>WhEe	whee
>
>(from this second list it's trivial to create lines of sed commands like
>'/FooBar/foobar/g', and there are around 800 words in my list)
>
>Right now I have 2 files, one with the upper/lower case names, the second
>with just lower case names.  I think I've been going down the wrong path
>here though.  I'm now thinking of using sed directly but I'm no sed wizard.
>Join may also do the job but I'm having trouble deciphering the man page
>and none of my experiments to date have remotely resembled what I'm after.
>
>So, has anyone got any advice (outside of having a flunky use vi :-)?
>Thanks in advance.

	Just write eine kline C code like the following:
/*
 * mergeline.c
 */

#include <stdio.h>
#include <string.h>

main(int argc, char **argv){
	FILE *file1, *file2;
	char buf1[1024], buf2[1024]
	if (argc != 3){	/* error */
		fprintf(stderr "%s:usage: %s file1 file2\n", argv[0], argv[0]);
		exit(1);
	}
	file1 = fopen(argv[1], "r"), file2 = fropen(argv[2], "r");
	if (!file1){
		fprintf(stderr "%s: couldn't open file: %s", argv[0], argv[1]);
		exit(1);
	}
	if (!file2){
		fprintf(stderr "%s: couldn't open file: %s", argv[0], argv[2]);
        exit(1);
    }
	while (fgets(buf1, 1024, stdin) && fgets(buf2, 1024, stdout)){
		/* strtok is necessary to get rid of NL fgets gives ya! */
		printf("%s\t%s\n", strtok(buf1,'\n'), strtok(buf2, '\n'));
	}
}
/* That's it! */

	And you compile the program and give such name as "merge", then run
merge file1, file2.  I chose tab to separate 2 entries but you can
modify my program or same thing could be done via sed|awk|etc.
Maybe I shoul've suggested using perl but I'm not perl expert and
c looked much easier.  Good luck.

>My new investment plan GUARENTEES a 50% rate of return!  Just mail me $10,000
>and I promise you'll get $5,000 back.

	Well, I don't take your offer.  You can be Craig Shergold's life 
insurance contractor ;)

---
##################  Dan The "I grok therefore I am God" Man
+ ____  __  __   +  (Aka Dan Kogai)
+     ||__||__|  +  E-mail:     dankg@ocf.berkeley.edu
+ ____| ______   +  Voice:      415-549-6111
+ |     |__|__|  +  USnail:     1730 Laloma Berkeley, CA 94709
+ |___  |__|__|  +              U.S.A
+     |____|____ +  Disclaimer: I'd rather be blackmailed for my long .sig
+   \_|    |     +              than give up my cool name in Kanji. And my
+ <- THE MAN  -> +              citizenship of People's Republic o' Berkeley
##################              has nothing 2 do w/ whatever I post, ThanQ.

-- 

maart@cs.vu.nl (Maarten Litmaath) (05/05/90)

In article <757@sagpd1.UUCP>,
	jharkins@sagpd1.UUCP (Jim Harkins) writes:
)I need to change a lot of words with mixed upper and lower case letters into
)words of all lower case letters, from there I'm going to make a sed script
)to actually modify the words in our source.  So how do I make my list?

Assume the file `word-list' contains the list of words to be changed,
and the files to be processed are `file_1 file_2 ... file_N'.  Then the
following command will do what you want:

	lowercase word-list file_1 file_2 ... file_N

...where `lowercase' is the following shell script.
Can you figure out why I `complicated' the `ex'-script?
--------------------cut here--------------------
#!/bin/sh
# @(#)lowercase 1.0 90/05/04 Maarten Litmaath
# Usage: lowercase word-list [files]
# use `-' for `word-list' if the list is supplied on stdin

PATH=/bin:/usr/bin:/usr/ucb

script=/tmp/foobar.$$
tmp=/tmp/glork.$$
umask 077
cleanup='rm -f "$script" "$tmp"'
trap 'eval "$cleanup"; exit 2' 1 2 3 15
trap 'eval "$cleanup"; exit' 0

test $# = 0 && {
	set x -
	shift
}
cat "$1" > "$script"
shift

ex - "$script" << \EOF
a

.
%s-.*-s/&/\L&/g-
g-s///g-d
x
EOF

test $# = 0 && {
	sed -f "$script"
	exit
}

for i
do
	sed -f "$script" "$i" > "$tmp" && cp "$tmp" "$i"
done
--
 Antique fairy tale: Little Red Riding Hood. |Maarten Litmaath @ VU Amsterdam:
 Modern fairy tale: Oswald shot Kennedy. |maart@cs.vu.nl, uunet!cs.vu.nl!maart

jharkins@sagpd1.UUCP (Jim Harkins) (05/05/90)

In article <1990May4.004643.1994@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes:
>In article <757@sagpd1.UUCP>, jharkins@sagpd1 (Jim Harkins)  (thats me) writes:
>| I need to change a lot of words with mixed upper and lower case letters into
>| words of all lower case letters, from there I'm going to make a sed script
>| to actually modify the words in our source.  So how do I make my list?

I got my answer to this, thanks to all who responded.  The pattern of responses
was interesting.  I posted the request just before leaving last night, and
this morning I had 5-6 replies with variants of sed scripts.  Then during
lunch I got another batch of replies and these were all suggesting a tool
called patch, with nary a soul mentioning sed.  Interesting.

Anyway, thanks for the help.

-- 
jim		jharkins@sagpd1

My new investment plan GUARENTEES a 50% rate of return!  Just mail me $10,000
and I promise you'll get $5,000 back.

les@chinet.chi.il.us (Leslie Mikesell) (05/06/90)

In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
>For example, I want to convert the list

>FooBar			
>blaTZ
>into
>FooBar	foobar			
>blaTZ	blatz

>(from this second list it's trivial to create lines of sed commands like
>'/FooBar/foobar/g', and there are around 800 words in my list)

>So, has anyone got any advice (outside of having a flunky use vi :-)?

How about having someone who isn't a flunky use vi, since the single
command:
   :%s/.*/& \L&/
will do exactly what you ask, and a trivial variation would give the final
output you want.


Les Mikesell
  les@chinet.chi.il.us

iann@cmsfl@labtam.oz (Ian Nicholls) (05/07/90)

In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
>I need to change a lot of words with mixed upper and lower case letters into
>words of all lower case letters, from there I'm going to make a sed script
>to actually modify the words in our source.  So how do I make my list?

I often use the following with vi under minix and SysV.2, and I've had no
problems with vi taking commands from a file, like some have mentioned.  Maybe
they were BSD sites, or maybe only NCR SysV works.

	 echo 's/.*/& \\L&/' | vi file

The \L& translates the match into lower case (and \U& to upper case).  It's
documented somewhere in the vi manual.  I've just checked, and sed doesn't do
it.  The drawbacks are that maybe some vi's don't do it, and unless you
supress stdout (or stderr?), the screen gets messy.
-- 
"He who laughs, lasts"
Ian Nicholls         Phone : +61 3 829 6088   Fax: +61 3 829 6860
Coles/Myer Ltd.      UUCP: labtam!cmsfl!iann  Email: iann%cmsfl@labtam.oz.au
L1 M11, PO Box 2000, Tooronga 3146, Australia

peter@ficc.uu.net (Peter da Silva) (05/08/90)

In article <815@cmsfl> iann@cmsfl.UUCP (Ian Nicholls) writes:
> 	 echo 's/.*/& \\L&/' | vi file

how about "| ex file", to avoid this...

> unless you supress stdout (or stderr?), the screen gets messy.
-- 
`-_-' Peter da Silva. +1 713 274 5180.      <peter@ficc.uu.net>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Commercial solicitation *is* accepted by email to this address.

jpr@dasys1.uucp (Jean-Pierre Radley) (05/08/90)

>In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
>I need to change a lot of words with mixed upper and lower case letters into
>words of all lower case letters, from there I'm going to make a sed script
>to actually modify the words in our source.  So how do I make my list?
>For example, I want to convert the list
>
>FooBar			
>blaTZ
>GRMblE
>WhEe
>
>into
>
>FooBar	foobar			
>blaTZ	blatz
>GRMblE	grmble
>WhEe	whee

No sweat with 'tr':

	tr [A-Z] [a-z] <infile >outfile

-- 
Jean-Pierre Radley					      jpr@jpradley.uucp
New York, NY					      72160.1341@compuserve.com

cudcv@warwick.ac.uk (Rob McMahon) (05/08/90)

In article <102007@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
<In article <757@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
<|For example, I want to convert the list
<|
<|FooBar			
<|blaTZ
<|GRMblE
<|WhEe
<|
<|into
<|
<|FooBar	foobar			
<|blaTZ	blatz
<|GRMblE	grmble
<|WhEe	whee
<|
<Use
<
<    % perl -n foo.pl
<
<where foo.pl consists of:
<
<    chop;
<    print $_, " ";
<    y/A-Z/a-z/;
<    print $_, "\n";

I know perl is wonderful, and I do use it a lot, but I can't help feeling that

	tr '[A-Z]' '[a-z]' < file | lam file -s " " -

is easier (if you have `lam' I suppose, but there's an "if you have `perl'"
too ...).

BTW does anyone know what the state of rs, jot, and lam are ?  I find them all
very useful.

Rob
-- 
UUCP:   ...!mcsun!ukc!warwick!cudcv	PHONE:  +44 203 523037
JANET:  cudcv@uk.ac.warwick             INET:   cudcv@warwick.ac.uk
Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England

goer@sophist.uucp (Richard Goerwitz) (05/09/90)

On converting

><|FooBar			
><|blaTZ
><|GRMblE
><|WhEe

into

><|FooBar	foobar			
><|blaTZ	blatz
><|GRMblE	grmble
><|WhEe	whee

using the Perl script -

><    chop;
><    print $_, " ";
><    y/A-Z/a-z/;
><    print $_, "\n";

Rob McMahon (cudcv@warwick.ac.uk) writes:

>I know perl is wonderful, and I do use it a lot, but I can't help feeling that
>
>	tr '[A-Z]' '[a-z]' < file | lam file -s " " -
>
>is easier (if you have `lam' I suppose, but there's an "if you have `perl'"
>too ...).

If we are into the right-tool-for-the-job game, let me just point out
that Icon will do the same job as well:

procedure main()
  every line := !&input
  do write(line," ",map(line))
end

This'll run about the same speed as the Perl script.

The point is that there is no reason to perversely superimpose string-
processing tasks on a diverse set of utilities, when there exist inter-
nally consistent, fast, elegant (and free) methods for solving these
same problems.

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer

arnold@audiofax.com (Arnold Robbins) (05/09/90)

The original job:

Convert

FooBar
blaTZ
GRMblE
WhEe

into

FooBar	foobar			
blaTZ	blatz
GRMblE	grmble
WhEe	whee

A perl script was suggested:
   chop;
   print $_, " ";
   y/A-Z/a-z/;
   print $_, "\n";

Rob McMahon (cudcv@warwick.ac.uk) writes:
I know perl is wonderful, and I do use it a lot, but I can't help feeling that
	tr '[A-Z]' '[a-z]' < file | lam file -s " " -
is easier (if you have `lam' I suppose, but there's an "if you have `perl'"
too ...).

goer@sophist.UUCP (Richard Goerwitz) presents an Icon solution:
procedure main()
  every line := !&input
  do write(line," ",map(line))
end

And, just for completeness, the following works in gawk and V.4 awk:

	{ print $1, tolower($1) }
-- 
Arnold Robbins -- Senior Research Scientist - AudioFAX | Laundry increases
2000 Powers Ferry Road, #220 / Marietta, GA. 30067     | exponentially in the
INTERNET: arnold@audiofax.com	Phone: +1 404 933 7600 | number of children.
UUCP:	  emory!audfax!arnold	Fax:   +1 404 933 7606 |   -- Miriam Robbins