[comp.unix.questions] How do I find a word?

RSS%CALSTATE.bitnet@vm.usc.edu (Richard S. Smith) (05/01/91)

I get the feeling there's no good answer to this question, but I
am asking it anyway...

Is there a SIMPLE, NON-PAINFUL way to set up a regular expression so
that it will match a given string only when it occurs as a word, i.e.,
delimited by non-alphanumeric characters or by line boundaries?

In other words, I am looking for a simple, generalized way to find
"foo" when it occurs as "foo bar" or "foo-bar" or "foo: bar" but NOT
as "foobar".  I am hoping there is a simpler answer than:

"[^A-Za-z0-9]foo[^A-Za-z0-9]"

Thanks to anyone who can help.

Richard Smith - RSS@CALSTATE.BITNET

jik@athena.mit.edu (Jonathan I. Kamens) (05/01/91)

In article <26716@adm.brl.mil>, RSS%CALSTATE.bitnet@vm.usc.edu (Richard S. Smith) writes:
|> Is there a SIMPLE, NON-PAINFUL way to set up a regular expression so
|> that it will match a given string only when it occurs as a word, i.e.,
|> delimited by non-alphanumeric characters or by line boundaries?

  It's difficult to answer this question unless you say what utility you
intend to use the regular expression with.

  For example, with emacs (and possibly with ex and vi, I'm not sure), you can
use "\<" and "\>" to delimit a word in a regular expression.

  With "grep", you can use the "-w" argument to tell it to look for words only.

  With "perl", you can use "\b" to signify a word boundary.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

guy@auspex.auspex.com (Guy Harris) (05/02/91)

>  With "grep", you can use the "-w" argument to tell it to look for words only.

Well, with some flavors of "grep", anyway.  I think Berkeley introduced
the "-w" flag; "-w" is shorthand for "stick a \< and a \> around the
pattern", and the BSD "grep" supports the "\<" and "\>" items as well.
 
S5's standard regular expression package doesn't support "\<" and "\>"
prior to S5R4, although we added them in SunOS 4.0; "ed", "grep", and
various other programs use them.  S5R4's standard regular expression
package does, I think, support them.