poage@sunny.UUCP (Tom Poage) (02/24/89)
Is there a reason why I don't find regular expressions with both alternation and explicit number of occurrence declaration? Here's what I mean ... In some public-domain regexp routines I can use (string1|string2) In other routines I can use (something){3,4} However, I have never seen routines with the ability to use these two constructs together, such as (x|y|(z){4,5}) For example, I want to find strings of 9 digits occurring in a certain pattern, similar to: 875000000-876000000,786992210,>789922119 The current (gnu) regexp routine I have requires the following to match the above line. The actual line has been split for demonstration purposes. ^((([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])| ([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]- [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])) (,(([<>](=)?)?[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])| ([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]- [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]))*$ The first problem is that this regexp overflows grep/egrep of SunOS 3.5 (However Gnu's e?grep handles it just fine). The second is that this is unwieldy. The third is that I don't necessarily want to parse the line into fragments and perform sub matches. Why can't I do something like this (still split)? ^((([<>](=)?)?[0-9]{9})|([0-9]{9}-[0-9]{9}))(,(([<>](=)?)?[0-9]{9})| ([0-9]{9}-[0-9]{9}))*$ Don't you agree this is easier ":-):-):-)" to read? Is this only a difference between System V and BSD variants? Is there a public-domain version of regexp(3) with these features merged? I await with bated breath. Tom. -- Tom Poage, UCDMC Clinical Engineering, Sacto., CA poage@sunny.ucdavis.edu ...!ucbvax!ucdavis!sunny!poage
john@frog.UUCP (John Woods) (03/02/89)
In article <364@sunny.UUCP>, poage@sunny.UUCP (Tom Poage) writes: > Is there a reason why I don't find regular expressions > with both alternation and explicit number of occurrence > declaration? Here's what I mean ... > In some public-domain regexp routines I can use > (string1|string2) > In other routines I can use > (something){3,4} > However, I have never seen routines with the ability to use > these two constructs together, such as > (x|y|(z){4,5}) > If no one else admits to one, I once modified Henry Spencer's regexp package (which has the first feature) to include the second feature (like System V). I should probably clean it up and test it more before handing it out (Henry's code was of excellent quality, and I wouldn't want to embarass myself :-), but it is available (though perhaps I should send it to him and let him worry about it (as if he isn't busy enough already, what with being months behind summarizing Aviation Leak :-)). -- John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu "He should be put in stocks in Lafeyette Square across from the White House and pelted with dead cats." - George F. Will