[comp.unix.questions] Isolating alphanumeric words with regular expressions

campbell@lotus.com (Jim Campbell) (09/04/90)

I have an editing script which seeks to append ".o" to all words in an input
line.  If, however, a word already contains a ".", I do not wish it to append
a ".o".  

I have been a UNIX enthusiast for several years now, but for the life of me,
I can't figure out how to solve what seems to be a simple problem.

Here is what I have tried:

	s/\([^.A-Za-z0-9_]*\)\([^. ][^. ]*\)/\1\2.o/g

This doesn't do it, since if the input line looks like this:

	abc bar foo.obj fooie baby

the regular expression will fail to match the entire word "foo.obj", but
will match "foo" and "obj" separately, yielding this:

	abc.o bar.o foo.o.obj.o fooie.o baby.o

If you do this:

	s/\([^.A-Za-z0-9_]*\)\([^. ][^. ]*\)\([^.]*\)/\1\2.o\3/g

the third expression grouped in the "\(...\)" operators swallows the next 
space in some instances, leaving you with the .o on every other word, like
this:

	abc.o bar foo.obj.o fooie baby

I have spent a lot of time on this one little problem, and I am wondering if
anyone out there knows of a solution.

(Yes -- I know it can be solved with two substitution operations, but I 
am looking for a way to do it with one.)
--
Jim Campbell, Lotus Development Corporation | harvard!ima   \ 
1 Rogers St., Cambridge, MA 02142           |   ihnp4        >!lotus!campbell
617/693-5652                                |   uunet       /