[comp.unix.questions] Regular expression question.

poage@sunny.UUCP (Tom Poage) (02/24/89)

Is there a reason why I don't find regular expressions
with both alternation and explicit number of occurrence
declaration?  Here's what I mean ...

In some public-domain regexp routines I can use

	(string1|string2)

In other routines I can use

	(something){3,4}

However, I have never seen routines with the ability to use 
these two constructs together, such as

	(x|y|(z){4,5})

For example, I want to find strings of 9 digits occurring
in a certain pattern, similar to:

875000000-876000000,786992210,>789922119

The current (gnu) regexp routine I have requires the following to
match the above line.  The actual line has been split for 
demonstration purposes.

^((([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])|
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]))
(,(([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])|
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]-
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]))*$

The first problem is that this regexp overflows grep/egrep of
SunOS 3.5 (However Gnu's e?grep handles it just fine).  The 
second is that this is unwieldy.  The third is that I don't 
necessarily want to parse the line into fragments and perform
sub matches.

Why can't I do something like this (still split)?

^((([<>](=)?)?[0-9]{9})|([0-9]{9}-[0-9]{9}))(,(([<>](=)?)?[0-9]{9})|
([0-9]{9}-[0-9]{9}))*$

Don't you agree this is easier ":-):-):-)" to read?

Is this only a difference between System V and BSD variants?
Is there a public-domain version of regexp(3) with these 
features merged?  I await with bated breath.  Tom.
-- 
Tom Poage, UCDMC Clinical Engineering, Sacto., CA
poage@sunny.ucdavis.edu
...!ucbvax!ucdavis!sunny!poage

john@frog.UUCP (John Woods) (03/02/89)

In article <364@sunny.UUCP>, poage@sunny.UUCP (Tom Poage) writes:
> Is there a reason why I don't find regular expressions
> with both alternation and explicit number of occurrence
> declaration?  Here's what I mean ...
> In some public-domain regexp routines I can use
> 	(string1|string2)
> In other routines I can use
> 	(something){3,4}
> However, I have never seen routines with the ability to use 
> these two constructs together, such as
> 	(x|y|(z){4,5})
> 
If no one else admits to one, I once modified Henry Spencer's regexp package
(which has the first feature) to include the second feature (like System V).
I should probably clean it up and test it more before handing it out (Henry's
code was of excellent quality, and I wouldn't want to embarass myself :-),
but it is available (though perhaps I should send it to him and let him
worry about it (as if he isn't busy enough already, what with being months
behind summarizing Aviation Leak :-)).



-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

"He should be put in stocks in Lafeyette Square across from the White House
 and pelted with dead cats."	- George F. Will