megabyte@chinet.UUCP (Dr. Megabyte) (08/25/86)
I've poured myself over ny manual and looked at regcmp(1), regcmp(3), and regexp(3), and I'm still not sure how to use these functions. Could someone send me some clear info on how to use these functions along with some examples? For the record: I am running Zeus 3.21 which is SYS III port to those of you who are fortunate to have never heard of it. -- _________________________________________________________________________ UUCP: (1) seismo!why_not!scsnet!sunder Mark E. Sunderlin (2) ihnp4!chinet!megabyte aka Dr. Megabyte CIS: 74026,3235 (202) 634-2529 Quote: "When The Going Gets Tough, The Tough Go Shopping" (9-4 EDT) Mail: IRS PM:PFR:D:NO 1111 Constitution Ave. NW Washington,DC 20224
latham@bsdpkh.UUCP (Ken Latham) (08/27/86)
Dr. Megabyte (megabyte@chinet.UUCP) writes: >I've poured myself over ny manual and looked at regcmp(1), regcmp(3), and >regexp(3), and I'm still not sure how to use these functions. Could someone >send me some clear info on how to use these functions along with some examples? > >For the record: I am running Zeus 3.21 which is SYS III port to those of you >who are fortunate to have never heard of it. I am not familiar with Zeus and am only quasi-familiar with sys3, the following is a sys5 explanation which, if memory serves me, should cover it. 1. regcmp(3) - a function which translates regular expressions ( a variant of ed(1) style ) to an internal form. The char pointer returned is the address of a ( non-null-terminated ) string that represents the regular expression. This 'compiled' regular expression can be interpreted by regex(3). If the returned pointer is NULL then you will have to 'walk' through the regular expression by hand and determine where the syntax error is. 2. regcmp(1) - a user level command that will compile files of regular expressions into either data files containing the compiled expressions or into C files declaring data structures containing same. 3. regex(3) - the compiled regular expression interpreter which parses the subject string to determine if it is in fact a member of the language described by the compiled regular expression. It returns a pointer to the first character in the subject string which caused the pattern acceptance to fail. Usually, this is a '\0' which terminated the subject string. There are many cases where the character that stopped the acceptance may not be '\0', this is program dependent. A global variable 'loc1' ( according to the manual ) points to the position at which the match started in the subject string. This is usually the start of the subject string, but may vary with the application. The ACTUAL NAME of 'loc1' may be different than advertised!! on sys5 it is '__loc1' . You can do a 'nm' on libPW.a to determine the name for your version. EX. char *compex, *badchar, *regcomp(), *regex(); . . compex = regcomp( "[a-zA-Z][_a-zA-Z0-9]*", 0 ); if ( compex == NULL ) .. some error routine to say that the RE is BAD ! . . badchar = regex( compex, "A_long_identifier_name" ); if ( badchar == '\0' && __loc1 == compex ) { ...then HOORAH, it was COMPLETE match!!! } else { ... BOO HISSS, only a partial or no match was made. you may want to accept some partial matches in which case you can look at what caused the match to fail before the string terminator ('\0'). look at *badchar. } . . NOTE: both "[a-zA-Z][_a-zA-Z0-9]*" and "A_long_identifier_name" could just as easily be variables that are pointers to strings !!! It is much more useful when used on variables :-). Some side notes: If it is the regular expressions and not the actual calls that give you problems then you need to buy a text book on the subject and get familiar with them. If you are familiar with REs then note that the (...)$n notation utilized in regex(3) is an added extension to normal REs. The other arguments ret0, ret1 ..., ret9 in regex(3) are there simply to provide pointers to regions where the (...)$n extractions should be copied. A subexpression surrounded by (....)$1 will extract a substring from the subject string which matches the portion of the regular expression enclosed in (...)$1. The ret0 pointer must hold the address of a preallocated area large enough to hold the longest possible substring. That should just about do it! Hope that helps. Sorry if you found this long winded, but I wanted to be complete. Ken Latham, AT&T-IS (via AGS Inc.), Orlando , FL uucp: ihnp4!codas!bsdpkh!latham
root@ozdaltx.UUCP (root) (08/29/86)
In article <516@chinet.UUCP>, megabyte@chinet.UUCP (Dr. Megabyte) writes: > I've poured myself over ny manual and looked at regcmp(1), regcmp(3), and > regexp(3), and I'm still not sure how to use these functions. Could someone I'll do the best I can. Hope this helps. My manuals are a little different in layout, (no section 1,2,3......) the command regcmp compiles a regular expression (shell style) into C source code with the output going to file.i or file.c. The format is in the form, VARIABLE "expression". The resulting file.[ic] may be included as part of a C program, (#include file.[ic]). Regexp(abc,line) applies the regular expression named abc to line. EXAMPLE: Variable Name (space) Expression teleno "\({0,1}([2-9][01][1-9])$0\){0,1} *" "([2-9][0-9]{2})$1[ -]{0,1}" "([0-9]{4})$2" Basicly this says: in field 0 (area code) accept optionally a ( followed by the digits of the specified ranges followed by a optional ). In field 1 (exchange) accept a number starting with 2 through 9 plus any other 2 numbers ranging 0-9, followed by an optional space or dash (-). Finially, field 2 will accept 4 numbers ranging 0-9. The above would be typed into a file, then regcmp run on the file. The resultant file should look like: /* "({0,1}([2-9][01][1-9])$0){0,1} *([2-9][0-9]{2})$1[ -]{0,1}([0-9]{4})$2" */ char teleno[] { 060,027,00,01,074,00,030,04,020,062,071,030, 03,060,061,030,04,020,061,071,014,00,00,057, 00,00,01,025,040,074,01,030,04,020,062,071, 033,04,020,060,071,02,02,014,01,01,033,03, 040,055,00,01,074,02,033,04,020,060,071,04, 04,014,02,02,064, 0}; In the C program that uses the regcmp output the following line will apply the expression named teleno to line: regex(teleno, line, area, exch, rest); The program regcmp is a lot easier to use than the function. Have fun! Scotty ...ihnp4!killer!ozdaltx!root "Oh, my friend, it's not what they take away from you that counts- It's what you do with what you have left." - Hubert Humphrey
guy@sun.uucp (Guy Harris) (09/01/86)
> > I've poured myself over ny manual and looked at regcmp(1), regcmp(3), and > > regexp(3), and I'm still not sure how to use these functions. ... Note, BTW, that this form of regular expression parser is NOT in the System V Interface Definition, at least in Issue 2 (Issue 1 describes it, but that was an error). The package described in REGEXP(5) is the one in the SVID, and is the one you should be using. It is, for example, the package used by "ed" and "grep"; the only System V software using REGCMP(3) is REGCMP(1). Not all SVID-compatible systems will have REGCMP(1), REGCMP(3), or REGEXP(3); they all will have REGEXP(5). If you only System III, it will be found in REGEXP(7) rather than REGEXP(5). Other system may place it elsewhere (we don't supply the old "regexp" package, so we put it in REGEXP(3), along with all the other library packages). -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)