[comp.unix.wizards] Bug in sed regexps ?

tml@santra.UUCP (Tor Lillqvist) (12/23/87)

I have noticed some strange behaviour in sed (similar problems
probably also exist in other users of regexp(3)).

I have the following sed script:

sed -e 's/^\([^.]*\)[^:]*:\([^	]*\)	\1/\2	\1/' \
    -e 's/^\([^.]*\)[^:]*:\([^	]*\)	/\2	\1:/' \
    -e 's/\\-/-/g' -e 's/\\\*-/-/g'\
    -e 's/^\.TH [^ ]* \([^ 	]*\).*	\([^-]*\)/\2(\1)	/'

and this input file:

ssignal.3c:.TH SSIGNAL 3C "" "" HP-UX	ssignal, gsignal \- software signals
stdio.3s:.TH STDIO 3S "" "" HP-UX	stdio \- standard buffered input/output stream file package
stdipc.3c:.TH STDIPC 3C "" "" HP-UX	ftok \- standard interprocess communication package
string.3c:.TH STRING 3C "" "" HP-UX 	strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok \- character string operations
strtod.3c:.TH STRTOD 3C "" "" HP-UX 	strtod, atof, nl_strtod, nl_atof \- convert string to double-precision number

I get the output:

ssignal, gsignal (3C)	- software signals
stdio (3S)	- standard buffered input/output stream file package
ftok (3C)	- standard interprocess communication package
strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok (3C)	- character string operations
strtod, atof, nl_strtod, nl_atof (3C)	- convert string to double-precision number

which isn't what I want.

However, if I change the sed script to:

sed -e 's/^\([^.]*\)\.[^:]*:\([^	]*\)	\1/\2	\1/' \
    -e 's/^\([^.]*\)\.[^:]*:\([^	]*\)	/\2	\1:/' \
    -e 's/\\-/-/g' -e 's/\\\*-/-/g'\
    -e 's/^\.TH [^ ]* \([^ 	]*\).*	\([^-]*\)/\2(\1)	/'

I get:

ssignal, gsignal (3C)	- software signals
stdio (3S)	- standard buffered input/output stream file package
stdipc:ftok (3C)	- standard interprocess communication package
string:strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok (3C)	- character string operations
strtod, atof, nl_strtod, nl_atof (3C)	- convert string to double-precision number

which is what I want. I.e. I add an \. after the \([^.]*\) .

(As you probably notice, I am trying to enhance the /usr/lib/mkwhatis
script so that the whatis database would include the title of the
manual page in case it isn't the same as the (first) entry.)

Is this a bug in sed or regexp(3), or what?  The same behaviour occurs
both in HP-UX on the 9000/840 and BSD4.3 on a VAX.
-- 
Tor Lillqvist, Technical Research Centre of Finland
tml@fingate.bitnet == tml@santra.uucp == mcvax!santra!tml