tml@santra.UUCP (Tor Lillqvist) (12/23/87)
I have noticed some strange behaviour in sed (similar problems
probably also exist in other users of regexp(3)).
I have the following sed script:
sed -e 's/^\([^.]*\)[^:]*:\([^ ]*\) \1/\2 \1/' \
-e 's/^\([^.]*\)[^:]*:\([^ ]*\) /\2 \1:/' \
-e 's/\\-/-/g' -e 's/\\\*-/-/g'\
-e 's/^\.TH [^ ]* \([^ ]*\).* \([^-]*\)/\2(\1) /'
and this input file:
ssignal.3c:.TH SSIGNAL 3C "" "" HP-UX ssignal, gsignal \- software signals
stdio.3s:.TH STDIO 3S "" "" HP-UX stdio \- standard buffered input/output stream file package
stdipc.3c:.TH STDIPC 3C "" "" HP-UX ftok \- standard interprocess communication package
string.3c:.TH STRING 3C "" "" HP-UX strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok \- character string operations
strtod.3c:.TH STRTOD 3C "" "" HP-UX strtod, atof, nl_strtod, nl_atof \- convert string to double-precision number
I get the output:
ssignal, gsignal (3C) - software signals
stdio (3S) - standard buffered input/output stream file package
ftok (3C) - standard interprocess communication package
strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok (3C) - character string operations
strtod, atof, nl_strtod, nl_atof (3C) - convert string to double-precision number
which isn't what I want.
However, if I change the sed script to:
sed -e 's/^\([^.]*\)\.[^:]*:\([^ ]*\) \1/\2 \1/' \
-e 's/^\([^.]*\)\.[^:]*:\([^ ]*\) /\2 \1:/' \
-e 's/\\-/-/g' -e 's/\\\*-/-/g'\
-e 's/^\.TH [^ ]* \([^ ]*\).* \([^-]*\)/\2(\1) /'
I get:
ssignal, gsignal (3C) - software signals
stdio (3S) - standard buffered input/output stream file package
stdipc:ftok (3C) - standard interprocess communication package
string:strcat, strncat, strcmp, strncmp, strcpy, strncpy, strlen, strchr, strrchr, strpbrk, strspn, strcspn, strtok (3C) - character string operations
strtod, atof, nl_strtod, nl_atof (3C) - convert string to double-precision number
which is what I want. I.e. I add an \. after the \([^.]*\) .
(As you probably notice, I am trying to enhance the /usr/lib/mkwhatis
script so that the whatis database would include the title of the
manual page in case it isn't the same as the (first) entry.)
Is this a bug in sed or regexp(3), or what? The same behaviour occurs
both in HP-UX on the 9000/840 and BSD4.3 on a VAX.
--
Tor Lillqvist, Technical Research Centre of Finland
tml@fingate.bitnet == tml@santra.uucp == mcvax!santra!tml